Representative group. Representativeness of the sample

Representative sample

Representative sample

A representative sample is a sample that has the same distribution of relative characteristics as the population.

In English: Representative sample

See also: Sample Populations

Finam Financial Dictionary.


See what “Representative sampling” is in other dictionaries:

    Representative sample- A group of participants that more or less accurately represents the composition of the population being studied. The sample can reflect the distribution by age and gender, as well as any other characteristics that influence the result of the experiment in terms of... ...

    representative sample- - [English-Russian glossary of basic terms on vaccinology and immunization. World Health Organization, 2009] Topics vaccinology, immunization EN representative sampling ... Technical Translator's Guide

    REPRESENTATIVE SAMPLE- (representative sample) a sample that is (or is considered to be) a true reflection of the parent population, that is, has the same profile of characteristics, for example, age structure, class structure, level of education. Representative... ... Large explanatory sociological dictionary

    REPRESENTATIVE SAMPLE- See sample, representative... Dictionary in psychology

    REPRESENTATIVE SAMPLE- a sample in which all the main features population, from which it is extracted this sample, are presented in approximately the same proportion or with the same frequency as this sign appears in this general population... Encyclopedic Dictionary in psychology and pedagogy

    Representative sample- this is a sample in which all the main characteristics of the general population from which this sample is extracted are presented in approximately the same proportion or with the same frequency with which this characteristic appears in this general population... ... Sociological Dictionary Socium

    Representative sample- (representative sample). A sample that accurately reflects the condition and properties of the entire population... Developmental psychology. Dictionary by book

    representative sample- (representative sample) a sample made according to the rules, that is, in such a way that it reflects the specifics of the general population both in composition and in the individual characteristics of the included subjects. Dictionary practical psychologist. M.: AST,... ... Great psychological encyclopedia

    English sampling, representative; German Stichprobe, reprasentative. A sample that has essentially the same distribution of relative characteristics as the population. Antinazi. Encyclopedia of Sociology, 2009 ... Encyclopedia of Sociology

    Representative sample A sample that has the same distribution of relative characteristics as the population Dictionary of business terms. Akademik.ru. 2001 ... Dictionary of business terms

This means that if you interviewed, say, 400 people in district city, where the adult solvent population is 100 thousand people, found that 33% of surveyed buyers prefer the products of a local meat processing plant, then with a 95% probability you can say that 33+5% are regular buyers of these products (i.e. from 28 to 38%) of the residents of this city.

You can also use Gallup's calculations to estimate the relationship between sample size and sampling error (see above).

Today, many difficult calculations are performed by technology, and statistical programs can be obtained on the Internet. So, with the calculation of the sample, the lazy sociologist was given this opportunity.

information on the website of the Analytical Center “Business and Marketing” (http://www.bma.ru/enter.htm), where the user only needs to enter the necessary data and then click on the “Calculate” button.

Sample inspection and repair

The quality of sociological information can be reduced by many factors: incorrectly formulated questionnaire questions, an inappropriately chosen research method, missing answers in questionnaires, poorly planned sampling, etc.

The practice of empirical research - foreign and domestic, fundamental and applied - shows that errors, including in sampling, occur in almost every study. Another question is whether such shifts are significant or insignificant. And since errors, overlaps and displacements always occur, there will always be work for specialists involved in monitoring and repairing samples. And this area itself methodological science premature old age does not await. Not only in science, but also in any other field, the profession of inspectors and repairmen has always been profitable and prestigious.

Sampling control we will call the process of scientific comparison of general and sample populations, identifying the degree of their divergence, discovering the causes of deviation and developing possible ways eliminating errors. IN in the narrow sense- This equalization of sample and general distributions socio-demographic characteristics of respondents.

Under sample repairs you need to understand the process of eliminating errors itself, i.e. discrepancies between two sets, by those methods, methods and tools that methodological science offers.

Thus, the second technique is the practical implementation of the first, analytical one, and both of them constitute two mandatory stages of conducting sociological research.

Sampling control is often used in an expanded sense, including sampling repair. In this case they talk about broadly understood sample repair as primary statistical data processing, including correction: a) sample population; b) distributions of socio-demographic characteristics of respondents; c) outliers and missing answers, as well as weighing the initial data. Specified species The corrections are intended to repair the most important thing - the research sample, and increase its degree of representativeness. Why is this important? The questionnaire can be extremely interesting, deep

The main purpose of sample repair is to improve the quality of already collected information. The sample repair procedure includes several operations 40 .

Correction of the sample population. Not always the selected respondents, for a variety of reasons, are able or willing to answer questions. Someone got sick or went on an urgent business trip, another refuses for ideological reasons or is unable to respond due to mental insufficiency. It is difficult to find someone at home, although the surveyor came to him more than once.

The problem of replacing respondents arises, which can be solved using several methods: selecting the next respondent on the list (for example, the next number in telephone directory), using the initial sample large sizes and forming a re-sample. In the latter case, if the response rate is much lower than expected, the sampling frame is expanded to include additional names found, for example, at random. The most in an efficient way the search for an equivalent replacement is considered. If, for example, your sample includes a working pensioner of such and such a nationality and a widower, then it is advisable to find him as a replacement another pensioner of a similar age, nationality, widowed and working. Often this method turns into a labor- and time-consuming undertaking. If the population list is small and a replacement cannot be found, you should abandon the equivalent method and move on to another.

Correction of distributions of demographic characteristics of respondents. If, at the end of the study, it turns out in your research passport that you, for example, have too many women, people with higher education or older people compared to those percentage shares that they have in the general population, then three methods can be applied: 1) remove those groups of respondents that were overrepresented; 2) interrogate those groups that turned out to be representative

in insufficient quantities; 3) mathematically increase the value of answers that are insufficiently represented, or reduce those that are overrepresented. But first it is advisable to find out whether both influence the content of the answers. Maybe everything can be left as is.

Weighting of input data- mathematical method increasing or decreasing the value of responses specific group respondents (for example, unmarried rural women aged 30 to 45 years). Weighting means assigning a certain weight to each respondent (a coefficient by which all opinions-answers of one or a group of respondents must be multiplied in order to restore representativeness). According to A. Balabanov 41, weighing is the only way restoring representativeness in panel studies without loss of accuracy. Since there are a lot of weighing methods, the sociologist faces quite difficult methodological problems problems that cannot be solved without appropriate training and knowledge. Weighting coefficients can be determined in different ways, and the process of assigning coefficients is almost impossible to control from the outside, by other researchers. The simplest way is the number of a specific socio-demographic group, for example teenagers from 13 to 17 years old, from the general population (N) divided by the number of respondents representing a given age group (p), believing that one respondent represents the opinion of jV people of the general population.

Employees of the Institute of Sociology of the USSR Academy of Sciences A.A. Davydov and A. O. Kryshtanovsky at one time established interesting facts 42. It turns out that the demographic characteristics of respondents have almost no connection with answers about satisfaction with work and life, assessment of the pace of perestroika, approval of the activities of political leaders, assessment foreign policy events etc. In other words, men and women respond similarly to questions about life satisfaction or political events. For these indicators, reweighting is not necessary. If one characteristic, for example gender, is closely related to all substantive issues or various questions associated with different characteristics, then the correction will have to be done according to the scheme described in the manual.

VTsIOM specialists ensure careful repair of the sample during data analysis in order to minimize deviations that arose during the field work stage. Particularly strong biases are observed in terms of gender and age.

Correctionsharply distinguished responses from respondents. When surveying, sometimes you come across answers from respondents that stand out sharply from the general background. The reasons can be very different: the respondent misunderstood the survey question, he has original views on the world, or he simply decided to make fun of scientists. There may be other reasons. But you can’t go back to him and ask him again. In this case, especially if there are a lot of questionnaires, it is better to remove the defective copy from the general array.

Correction of missing answers. Gaps most often occur in open-ended and tabular questions. The easiest way to correct this is to exclude them or the entire questionnaire from the scientific analysis. When missing not the content question, but the what. is in the passport, do this. If socio-demographic characteristics are not associated with meaningful answers, then the questionnaire with missing values ​​should be assigned the most frequently occurring socio-demographic characteristics in the sample or determined randomly or proportionally (if there are many such questionnaires). If there is a connection, then you should determine which group’s answers (for example, men or women) are closer to the answers in the questionnaire, where the “gender” column is not indicated, and add this attribute 44.

If a lot of data is received, then the sample can be repaired by reduction of the sample population. This, according to A.A. Davydov and A.O. Kryshtanovsky, the most rational approach to sample repair, since this strategy does not rely on any additional assumptions. If the sample size is small, then to repair it it is necessary to make a number of additional assumptions that do not follow from the collected material and the truth of which is difficult to verify.

Resampling is carried out when verification has shown that the sample does not represent the population as a whole. In this case, new respondents are selected and added to the previously used sample until a satisfactory level of representativeness is achieved.

Not all sociologists organizing empirical research include data on sample control and repair in their “passport.” Thus, among the 300 studies contained in the Data Bank of the IS USSR Academy of Sciences for 1988, only ten carried out sample repairs 45 . For comparison, we note: abroad, sample repair has long been a common method of improving the quality of sociological information.

Earlier reasons the gaps were hidden in the lack of computer equipment, specialized software, methodological manuals, insufficient qualifications of researchers. Today there is both technology and necessary programs, but the problem is not solved. Apparently, it cannot be reduced only to technical aspects.

In practice, sampling error is determined by comparing known population characteristics with sample means. In sociology, when surveying the adult population, data from population censuses, current statistical records, and previous surveys at the same site are most often used. Socio-demographic characteristics (gender, age, nationality, marital status). Since a comparison of one’s own and other people’s data can be done after completing the study, this method of control is called a posteriori, those. carried out after the experience.

For example, the J. Gallup Institute, using samples of 1,500 people, controls representativeness using data available in national censuses on the distribution of the population by gender, age, education, income, profession, race (white - colored), place of residence, size of settlement 46. In studies conducted by VTsIOM, the reliability of sample data is determined by the method of a posteriori control. The monitoring questionnaire must include several questions on which reliable information is available from the State Statistics Committee of the Russian Federation. These usually include gender, age, education, type of settlement, marital status, sector of employment, and job status of the respondent. Four indicators - gender, age, education and place of residence of the respondent are used to identify control groups when determining weights

respondents - they must correspond to similar groups in the general population 47. Since from official statistics If we know how many men and women there are in Russia, it is easy to compare monitoring data using these figures and determine the error.

In surveys by the Socio-Express Center of the Institute of Sociology of the Russian Academy of Sciences, the representativeness of the all-Russian sample (design volume of 2 thousand people) is controlled by regional proportions of the population, proportions between urban and rural population, proportions between the population of the specified types of settlements. The survey is carried out using a formalized interview at the place of residence. The sample placement is based on ten economic-geographical zones, each of which has major cities(over 500 thousand population), medium-sized cities (50-500 thousand), small cities (up to 50 thousand) or urban-type settlements, as well as rural settlements. The authors believe that the marginal error of their sample does not exceed 3% 48 .

Effective control of the sample and, in general, the quality of data in a study is the publication of key characteristics of the study, primarily methodological tools. If the author of a study hides information, pointing out a trade secret, then suspicion of his dishonesty must necessarily arise. As A. Balabanov rightly notes, all measurement methods, even in the field of marketing research and mass media, have long been known, they are absolutely open and cannot be the subject of a trade secret. Moreover, the lack of data on the measurement methodology is a violation of all existing agreements in the world, in particular on media measurements 49 .

Sample passport

When writing scientific report and publication of an article in an academic journal, the authors of the study always require clear explanations regarding the study itself and the sampling

population: who and when conducted the study, what research methods were used, what the type, size and nature of the sample, representativeness error, composition of the sample population according to the main parameters (for example, gender, age, nationality, education), data control, etc. If these If the information is missing, then the article is usually not accepted into the journal, and if it is only partially present, then serious researchers do not trust it. Thus, the research passport and the sample passport are no less necessary for authors than for editors and readers.

The sociologist's sample passport appears twice. For the first time, a sociologist has to give a description of the type of sample with a brief justification for the advisability of its use in accordance with the objectives of the study, the requirements of representativeness and the organizational capabilities of the study. Methodical section your research program. The section on sampling contains answers to the following questions:

♦ What is the empirical object of study?

♦ Is the study continuous or selective?

♦ If it is sampled, does it claim to be representative?

♦ If it claims to be representative, what is the population?

♦ How many stages of selection are used in the sample?

♦ What is the unit of selection at each stage?

♦ What selection strategy is used at each stage (random, quota)?

♦ What specific type of random sampling is used?

♦ What parameters are used in quota sampling?

♦ What is the sampling frame (list, card index, map)?

♦ What is the unit of observation at the last stage of selection?

Sampling principles are described not only for the survey method, but also for each method used in the study: document analysis, observation, etc.

Conditional example sample descriptions. In studying the effectiveness of team forms of labor organization, such a strategy is possible. 1. Workers united in a brigade form of labor organization are taken as an empirical object. 2. The study is selective. 3. The general population is all workers united in a brigade form. 4. Three stages of selection are applied. 5. At the first stage, there are brigades engaged in the main and auxiliary arbitrary activities.

quality For the latter, a continuous survey is used (due to their small number), and for the former, a sample survey is used. 6. The second stage is the selection of teams involved in the main production. According to indicators characterizing final results, brigades are divided into three groups: a) advanced; b) average; c) lagging behind. Depending on the number of teams for each group, a list is compiled, and a random disproportionate selection is made from it (for example, three teams in each) using a certain “sampling step”. 7. Third stage - a complete survey is carried out in selected teams. The unit of observation is the individual employee 50.

The second time the sociologist encounters a description of the sample is after the research is carried out - when he writes a scientific report or scientific article to the magazine.

Incomplete description of the passport data of the study, unfortunately, is the most common disease of Russian scientists. Some do not know exactly how to compile them, others consider such information unnecessary or unimportant. And there is also a category of researchers who simply have nothing to report, because by describing all the information about the sample, they will expose their illiteracy. A common case is that a sociologist somehow conducted a study, somehow built a sample and got something out of it. But he cannot formulate a passport or express his actions in scientific language.

A chronic disease of domestic sociologists is the absence or insufficiently high methodological culture. It concerns not only the organization and conduct of field research, but also the publication of its results in the open press. This fact is known to everyone and is periodically discussed from the 1960s to the 2000s. Sometimes our sociologists and psychologists manage to be caught in the act, as they say.

According to research by V.V. Solodnikov, who conducted a secondary analysis of publications in three academic journals: “ Sociological research", "Psychological Issues"

and “Psychological Journal” for 1986-1992, neither sociologists nor psychologists tire themselves of putting forward, justifying and testing hypotheses. Most scientists (from 61% among psychologists to 92% among sociologists) do without such a cognitive tool, violating all the canons scientific method. Only 8% of sociological publications formulate hypotheses explicitly. Sociologists and psychologists are doing a poor job of describing the object of research: few indicate the number of respondents, gender and age of the respondents; the level of education of the respondents, place of residence, duration family life(for married people), income and professional status. The problem of representativeness, i.e. comparison of sample and general populations according to these characteristics is almost not discussed at all. In addition, there are rare mentions by sociologists of piloting tools and the use of previously tested techniques. Although the most common method for collecting empirical information is a survey, it is rare to describe what type of survey was used depending on the place, time, or method of filling out the questionnaire.

2.12. Representativeness

Representativeness (French) representatif- indicative) - the property of a sample population to represent the characteristics of the general population. The representativeness of the sample means that, with some predetermined or calculated error on the actual sample, what is established in the sample population can be identified with the general population or, if we use the language of statistics, we can find estimates of the parameters of the general population. First, each unit in the population must have equal probability get into the sample. Secondly, in order to avoid directional selection, the selection of units in the general population must be made regardless of the characteristic being studied. Thirdly, the selection should be made from homogeneous populations whenever possible. Fourth, the number of population units selected for the survey must be large enough.

Process direct determination The representativeness of the sample consists of the following stages: comparison of average indicators of the distributions of the sample and general populations; comparison of distribution forms of these indicators. The average of the distribution is usually taken as the average 144

arithmetic or weighted arithmetic average of this distribution.

In the case of studying populations with alternative characteristics, instead of the arithmetic mean, the proportion of units possessing the characteristic under consideration is calculated relative to the entire population. If we denote the volume of the population by the symbol N, and a phenomenon with this sign - M, That R - the proportion of phenomena with this feature is determined:

Where Q- the proportion of phenomena with an alternative sign.

It is possible to use conclusions obtained on the basis of a study of a sample population if the difference between the arithmetic means (or average shares) of the characteristics of the sample and general populations tends to zero. It is assumed that this requirement is satisfied when four conditions are met, mentioned above. However, knowing only the sample averages, it is impossible to give accurate estimates of their difference, since the average indicators of the general population are unknown. In addition, the values ​​of the sample averages themselves may fluctuate depending on which units of the general population fall into. Therefore, assessing the representativeness of a sample population based on the average indicators of its distribution comes down to searching for the representativeness error.

Comparing the sample and general populations by means does not provide a complete picture of the general population. Thus, in two populations with the same average indicators, the discrepancies between the maximum and minimum values ​​of a characteristic, which determine the shape of its distribution, may be different. If such a distribution is represented graphically, it forms a symmetrical bell-shaped (normal) curve, reflecting the fact that the sum of many independent randomly distributed random variables is approximately distributed over normal law. Ordinate y, which determines the height of the curve for each point X, represents the probability density for the value x g

The maximum probability density occurs at the average value of the variable and equal to one. This means that the less

random value variable differs from its average value, the more more likely its manifestations. And vice versa, the greater the deviation of the values ​​of a variable from its average value, the less likely it is for them to appear. Thus, the values ​​of deviations from the average values, i.e. values ​​of the form x (- X, carry information about the variation of the variables being studied. If all the values ​​of a characteristic were the same and coincided with its average size, then the totality of the value of this attribute would be extremely homogeneous.

Typically, the number of positive deviations from the arithmetic mean of the population is approximately equal to the number of negative deviations, i.e. the sum of all deviations inevitably tends to zero. Therefore, if it were necessary to sum up all the deviations of a characteristic in the aggregate, this sum would always be equal to zero:

To avoid this, each deviation is squared and the sum of squares is found - the variance.

Normal distribution is fully characterized by the following parameters: jc - the average value of the characteristic and a - the mean square (standard) deviation. Average X determines the position of the distribution relative to the x axis; standard deviation shows the shape of the curve; how more value and, the wider the curve and the lower its maximum.

The area under the normal curve is located in such a way that within the boundaries x ± o 68% of the entire distribution of the characteristic is located within the boundaries x ± 2<т - 95,5, в пределах x ± Zet - 99.7%. The probability that the difference between a random variable distributed approximately according to a normal law and its average value exceeds Z in absolute value is less than 0.3%. It follows that with almost one hundred percent accuracy we can say:

An assessment of a representative sample population based on the form of distribution of indicators is a comparison of measures of variation of these indicators in the sample and general populations. The dispersion of the general population is not always known, but in mathematical statistics it has been proven that inter-

Between the general and sample variances there is a relationship of the form:

Where p - sample size.

The problem of sample representativeness is important as a problem of the legitimacy of extrapolation of conclusions obtained from the analysis of the sample population to the entire population 52.

Chapter 3. PROGRAM


Related information.


A sample is a set of data taken using certain procedures from a population for exploratory analysis. Representativeness is the property of reproducing the idea of ​​the whole by its part. In other words, this is the possibility of extending the idea of ​​a part to the whole, which includes this part.

Representativeness of a sample is an indicator that the sample must fully and reliably reflect the characteristics of the population of which it is part. It can also be defined as the property of a sample to most fully represent the characteristics of the population that are significant from the point of view of the purpose of the study.

Let us assume that the general population is all school students (900 people from 30 classes, 30 people in each class). The object of the study is the attitude of schoolchildren towards smoking. A sample population consisting of 90 students will only represent the entire population much worse than a sample of the same 90 students, which would include 3 students from each class. The main reason is the unequal age distribution. Thus, in the first case, the representativeness of the sample will be low. In the second case - high.

In sociology they say that there is representativeness of a sample and its non-representativeness.

An example of an unrepresentative sample is a classic case that occurred in 1936 in the United States during the presidential election.

Literary Digest, which had been very successful in predicting the results of previous elections, was wrong in its forecasts this time, although it sent several million written questions to subscribers and to respondents they selected from phone books and car registration lists. Of the 1/4 of the ballots that were returned completed, the votes were distributed as follows: 57% gave preference to the Republican candidate named Alf Landon, and 41% preferred the incumbent President, Democrat Franklin Roosevelt.

In fact, F. Roosevelt won the election, gaining almost 60% of the vote. The Literary Digest's mistake was as follows. They wanted to increase the representativeness of the sample . And since they knew that most of their subscribers identified as Republicans, they decided to expand the sample to include respondents they selected from phone books and car registration lists. But they did not take into account the existing realities and actually selected even more Republican supporters, because at the time the middle and upper class could afford to have cars and telephones. And these were mostly Republicans, not Democrats.

There are different types of sampling: simple random, serial, typical, mechanical and combined.

Simple random sampling consists of selecting from the entire population of units being studied at random without any system.

Mechanical sampling is used when there is order in the general population, for example, there is a certain sequence of units of workers, electoral lists, telephone numbers of respondents, numbers of apartments and houses, etc.).

Typical selection is used when the entire population can be divided into groups by type. When working with the population, these can be, for example, educational, age, social groups; when studying enterprises - an industry or a separate organization, etc.

Serial selection is convenient when units are combined into small series or groups. Such a series can be batches of finished products, school classes, and other groups.

Combined sampling involves the use of all previous types of sampling in one or another combination.

One of the main components of a well-designed study is identifying the sample and what it is. representative sample. It's like the cake example. After all, you don’t have to eat the whole dessert to understand its taste? A small part is enough.

So, the cake is population (that is, all respondents who are eligible for the survey). It can be expressed geographically, for example, only residents of the Moscow region. Gender - women only. Or have age restrictions - Russians over 65 years old.

Calculating the population is difficult: you need to have data from the population census or preliminary assessment surveys. Therefore, usually the general population is “estimated”, and from the resulting number they calculate sample population or sample.

What is a representative sample?

Sample– this is a clearly defined number of respondents. Its structure should coincide as much as possible with the structure of the general population in terms of the main characteristics of selection.

For example, if potential respondents are the entire population of Russia, where 54% are women and 46% are men, then the sample should contain exactly the same percentage. If the parameters coincide, then the sample can be called representative. This means that inaccuracies and errors in the study are reduced to a minimum.

The sample size is determined taking into account the requirements of accuracy and economy. These requirements are inversely proportional to each other: the larger the sample size, the more accurate the result. Moreover, the higher the accuracy, the correspondingly more costs are required to conduct the study. And vice versa, the smaller the sample, the less costs it costs, and the less accurately and more randomly the properties of the general population are reproduced.

Therefore, to calculate the volume of choice, sociologists invented a formula and created special calculator:

Confidence probability And confidence error

What do the terms " confidence probability" And " confidence error"? Confidence probability is an indicator of measurement accuracy. And the confidence error is a possible error in the research results. For example, with a population of more than 500,00 people (let’s say living in Novokuznetsk), the sample will be 384 people with a confidence probability of 95% and an error of 5% OR (with a confidence interval of 95±5%).

What follows from this? When conducting 100 studies with such a sample (384 people), in 95 percent of cases the answers obtained, according to the laws of statistics, will be within ±5% of the original one. And we will get a representative sample with a minimum probability of statistical error.

After the sample size has been calculated, you can see if there is a sufficient number of respondents in the demo version of the Questionnaire Panel. You can find out more about how to conduct a panel survey.



Did you like the article? Share with your friends!