How a representative sample is constructed in psychology. Causes of systematic errors

One of the main components of a well-designed study is defining the sample and what a representative sample is. It's like the cake example. After all, you don’t have to eat the whole dessert to understand its taste? A small part is enough.

So, the cake is population (that is, all respondents who are eligible for the survey). It can be expressed geographically, for example, only residents of the Moscow region. Gender - women only. Or have age restrictions - Russians over 65 years old.

Calculating the population is difficult: you need to have data from the population census or preliminary assessment surveys. Therefore, usually the general population is “estimated”, and from the resulting number they calculate sample population or sample.

What is a representative sample?

Sample– this is a clearly defined number of respondents. Its structure should coincide as much as possible with the structure of the general population in terms of the main characteristics of selection.

For example, if potential respondents are the entire population of Russia, where 54% are women and 46% are men, then the sample should contain exactly the same percentage. If the parameters coincide, then the sample can be called representative. This means that inaccuracies and errors in the study are reduced to a minimum.

The sample size is determined taking into account the requirements of accuracy and economy. These requirements are inversely proportional to each other: the larger the sample size, the more accurate the result. Moreover, the higher the accuracy, the correspondingly more costs are required to conduct the study. And vice versa, the smaller the sample, the less costs it costs, the less accurately and more randomly the properties of the general population are reproduced.

Therefore, to calculate the volume of choice, sociologists invented a formula and created special calculator:

Confidence probability And confidence error

What do the terms " confidence probability" And " confidence error"? Confidence probability is an indicator of measurement accuracy. And the confidence error is a possible error in the research results. For example, with a population of more than 500,00 people (let’s say living in Novokuznetsk), the sample will be 384 people with a confidence probability of 95% and an error of 5% OR (with a confidence interval of 95±5%).

What follows from this? When conducting 100 studies with such a sample (384 people), in 95 percent of cases the answers obtained, according to the laws of statistics, will be within ±5% of the original one. And we will get a representative sample with a minimum probability of statistical error.

After the sample size has been calculated, you can see if there is a sufficient number of respondents in the demo version of the Questionnaire Panel. You can find out more about how to conduct a panel survey.

Representativeness of the sample

Parameter name Meaning
Article topic: Representativeness of the sample
Rubric (thematic category) Psychology

Sampling requirements

A number of mandatory requirements are applied to the sample, determined, first of all, by the goals and objectives of the study. Planning an experiment should include taking into account both the sample size and a number of its features. Thus, in psychological research the requirement is important uniformity samples. It means that a psychologist studying, for example, teenagers, cannot include adults in the same sample. On the contrary, a study carried out using the method of age sections fundamentally assumes the presence of subjects of different ages. At the same time, in this case, the homogeneity of the sample must be observed, but according to other criteria, primarily such as age and gender. The basis for forming a homogeneous sample can be various characteristics, such as level of intelligence, nationality, absence of certain diseases, etc., based on the objectives of the study.

In general statistics there is a concept repeated And non-repetitive samples, or, in other words, samples with and without return. As an example, as a rule, the choice of a ball taken from a container is given. In the case of return sampling, each selected ball is returned to the container and therefore must be selected again. In case of non-repetitive selection, the once selected ball is put aside and can no longer participate in the selection. In psychological research, one can find analogues of this kind of methods of organizing a sample study, since a psychologist often has to test the same subjects several times using the same technique. Moreover, strictly speaking, the testing procedure in this case is repeated. The sample of subjects, with complete identity of composition, in the case of repeated studies will always have some differences due to the functional and age-related variability inherent in all people. Due to the nature of the procedure, such a sample is repeated, although the meaning of the term here is obviously different than in the case of balls.

It is important to emphasize that all the requirements for any sample boil down to the fact that on its basis the psychologist must obtain the most complete, undistorted information about the characteristics of the general population from which this sample was taken. In other words, the sample should reflect the characteristics of the population being studied as fully as possible.

The composition of the experimental sample should represent (model) the general population, since the conclusions obtained in the experiment are expected to be subsequently transferred to the entire population. For this reason, the sample must have a special quality - representativeness, allowing the conclusions obtained from it to be extended to the entire population.

The representativeness of the sample is very important, however, for objective reasons it is extremely difficult to maintain. Thus, it is a well-known fact that from 70% to 90% of all psychological studies of human behavior were conducted in the United States in the 60s of the 20th century with college student subjects, most of them being student psychologists. In laboratory research performed on animals, the most common subject of study is rats. For this reason, it is no coincidence that psychology was previously called “the science of sophomores and white rats.” College psychology students make up only 3% of the total US population. It is obvious that the sample of students is not representative as a model that claims to represent the entire population of the country.

Representative sampling, or, as they also say, representative A sample is a sample in which all the main characteristics of the general population are presented in approximately the same proportion and with the same frequency with which a given characteristic appears in a given general population. In other words, a representative sample is a smaller but accurate model of the population it is intended to reflect. To the extent that the sample is representative, conclusions based on the study of that sample can be reasonably assumed to apply to the entire population. This distribution of results is usually called generalizability.

Ideally, a representative sample should be such that each of the basic characteristics, traits, personality traits, etc. studied by a psychologist. would be represented in it in proportion to the same features in the general population. According to these requirements, the sampling procedure must have an internal logic that can convince the researcher that, when compared with the general population, it will indeed be representative.

In his specific activities, the psychologist acts as follows: establishes a subgroup (sample) within the general population, studies this sample in detail (conducts experimental work with it), and then, if the results of statistical analysis allow it, extends the findings to the entire general population. These are the main stages of a psychologist’s work with a sample.

The aspiring psychologist must keep in mind a frequently repeated mistake: whenever he collects any data by any method and from any source, he is always tempted to generalize his conclusions to the entire population. In order to avoid such a mistake, you need not only to have common sense, but, above all, to have a good command of the basic concepts of mathematical statistics.

Representativeness of the sample - concept and types. Classification and features of the category "Representativeness of the sample" 2017, 2018.

The property of sampling, due to which the results of a sample study allow one to draw conclusions about the general population and the empirical object as a whole, is called representativeness.

Representativeness (representativeness) of the sample is the ability of a sample to reproduce certain characteristics of the population within acceptable errors. A sample is called representative if the result of measuring a certain parameter for a given sample coincides, taking into account the permissible error, with the known result of measuring the general population. If a sample measurement deviates from a known population parameter by more than a selected level of error, then the sample is considered unrepresentative.

The proposed definition first of all establishes relationship between sample and population research. It is the general population that is represented by the sample, and only the general population can be extended to the trends identified in the sample study. It should now be clear why such attention was previously paid to the problems of correctly defining the population and describing it in research documentation and publications. The sample cannot represent a population other than the one from which the units for measurement were actually selected. If the researcher is mistaken about the actual boundaries of the population, then his conclusions will be incorrect. If he mistakenly or intentionally expands or distorts the boundaries of the population in reporting materials, publications, or presentations based on the results of the study, then this misleads users and can be considered as falsification of results.

The test of representativeness is carried out by comparing individual parameters of the sample and the general population. A common misconception is that representative samples exist “at all.”

The representativeness or non-representativeness of a sample can be determined solely in relation to individual variables. Moreover, the same sample can be representative in some respects and unrepresentative in others.

As a rule, in the professional discourse of sociologists, representativeness is presented as a dichotomous property - a sample is either representative or not. But this is not a completely correct approach. In reality, a sample may reproduce some parameters of the population more accurately and others less accurately. Therefore, it is more correct (although from a practical point of view and less convenient) to talk about degree of representativeness specific sample according to specific parameters.

As with the sample as a whole, the key to determining the representativeness of a sample is to justify the margin of error within which the sample is considered representative for the purposes of the study. The opposite is also possible - fixing the size of factual errors and stating the fact that the sample represents the general population with certain errors. Again, the nature of the use of research findings plays a key role in this. Consequently, the same sample may be considered sufficiently representative for some purposes (for example, to predict voter turnout in upcoming elections), but not sufficiently representative for others (for example, to determine candidate ratings and predict voting results).

What parameters should be used to check the representativeness of the sample? First, there are few such parameters in most research situations. After all, it is possible to compare the results of a sample measurement with data on the general population only if the latter are available. And the research is being carried out because there is just not enough such data. Therefore, even at the stage of object modeling and subsequent development of tools, it is advisable to provide for the measurement of one or more control parameters for which data characterizing the general population is available. This will provide the necessary empirical basis for testing representativeness.

Secondly, one should strive to check the representativeness of the sample according to parameters that are significant for the subject area of ​​the study. In modern practice, control of representativeness by basic demographic parameters - gender, age, education, etc. has become widespread. These data, as a rule, are available for any territorial object, since they are recorded during population censuses and subsequently recalculated by statistical institutions using well-founded mathematical models . For this reason, the mandatory inclusion of several demographic variables in the data sheet has become a generally accepted professional norm. However, such a practice can be classified as naive and subject to justified criticism. The fact is that basic demographic parameters that are publicly available for comparison do not always play the role of structuring factors in relation to the subjects of sociological research. Their nature in itself is not social, and their influence on the objects of research is often quite indirect. Therefore, demographically representative samples may actually hide significant problems in the form of system errors and uncontrolled biases. On the contrary, the demographic representativeness of samples that are effective from the point of view of the goals and objectives of the study may turn out to be low.

Here is an interesting example from practice. In 2009, one of the research companies working in the Urals carried out a survey in the city of Kizel, Perm Territory. During the fieldwork, the researchers encountered serious obstacles to recruiting the sample envisaged by the research plan - the lack of a sufficient number of available respondents, worsening weather conditions. Apparently, the research company was not fully prepared to carry out work on such a large-scale project. Its production facilities worked at maximum capacity to ensure that 6,000 respondents were surveyed over a fairly large area within a week. As a result, the actual sample in many survey sites was, by the researchers' own admission, filled with everyone who could be recruited to participate in the study. The demographic quotas established by the terms of reference were violated in most areas of the survey. In some areas, the distortion in the proportions of the sample in relation to the quota target reached 2.5 times for certain categories of the population, which actually cast doubt on the very fact of using quota sampling. It seemed that the customer of the study had every reason to make reasonable claims against the researchers.

However, an examination carried out on behalf of the arbitration court found that such significant distortions of quotas and, accordingly, the obvious unrepresentativeness of the resulting sample in terms of basic demographic parameters practically did not lead to distortion of the research data! By reweighing the data array, the experts obtained the effect of a representative sample based on controlled parameters. Almost all frequency distributions of data tested by experts showed statistically insignificant differences between the results of processing the actual and reweighted arrays. De facto, this means that, despite gross violations of survey technology and practical disregard for quota assignments, the researchers provided the customer with the same data that he could have counted on if the sampling procedures had been fully followed and demographic representativeness had been ensured.

How could this happen? The answer is simple - the demographic parameters used to control representativeness had virtually no (and this was confirmed by correlation analysis) influence on the subject variables of the study - the population’s assessment of the socio-economic situation and the parameters of its socio-political activity. In addition, the sample size was very large relative to the general population (in fact, the study covered a quarter of the adult population of the municipal district), which, as a result of the law of large numbers, led to the stabilization of the observed distributions long before the required number of respondents was interviewed.

The practical implication from this cautionary tale is that effort and resources should be directed toward ensuring and controlling representativeness with respect to those sampling parameters that the researcher expects to have a significant impact on the subject of the study. This means that parameters to control representativeness must be selected specifically for each research project, according to its subject specifics. For example, assessments of socio-economic status are always strongly related to the real well-being of the respondent’s family, his position in the labor market and in the business sphere. Accordingly, it is advisable to use these parameters to control representativeness. Another thing is that it can be difficult to obtain objective data characterizing the general population. This requires creativity and perhaps compromise. For example, the level of well-being can be monitored by the presence of a car in the respondent’s family, because statistics of registered cars in the region may be available.

Interestingly, research reports and publications almost always refer to representative samples. Are unrepresentative samples really that rare? Of course not. There are quite a few samples that are problematic in terms of representativeness in certain parameters in research practice. Rather, there are even more of them than samples, the representativeness of which can be assessed not formally (by demographic parameters), but essentially. However, their public mention in professional sociological circles is, unfortunately, taboo. And none of the researchers is ready to admit that the representativeness of his sample in terms of parameters essential for the subject area of ​​measurement is problematic or unverifiable.

In fact, discovering signs of non-representative sampling is not a disaster. Firstly, existing technologies for “repairing” (reweighing) the sample in many cases make it possible to completely eliminate the effect of unrepresentativeness regarding the parameter of concern to the sociologist or his client. The essence of the reweighting method is to assign certain categories of observations (in the case of a survey, respondents) weighting coefficients, compensating for insufficient or excessive actual representation of these categories in the sample. Subsequently, these weights are taken into account when carrying out all calculation operations with the data array, which makes it possible to obtain distributions that fully correspond to a balanced (corresponding to the calculation quotas) data array. Modern statistical programs, such as BRvv, allow calculations to be made taking into account weighting coefficients in automatic mode, which makes this procedure quite easy to perform.

Secondly, even if it is not possible to obtain a “good” representative sample, “moderate” representativeness may be sufficient to solve many research problems. Recall that representativeness is a measure of fit rather than a dichotomous marker. And only certain research tasks - mainly related to accurate forecasting of certain events - require truly high (statistically proven) representativeness from samples.

For example, in order to predict the market share of a new product in marketing research, a sample is required that covers and represents potential customers. However, most often marketers do not have sufficient data about who actually makes up their circle of clients, especially potential ones. In this situation, it is generally impossible to check the representativeness of the sample - after all, it is not known what parameters it should reproduce. Nevertheless, many marketing tasks are successfully solved, since statistically representative samples are not needed to identify customer preferences, reactions to advertising materials, and analyze reviews of a new product - it is enough to ensure coverage of a typical clientele, which is easy to find right in stores. Non-representative samples are quite suitable for solving search problems, identifying strong trends, analyzing the specifics of individual categories (represented by small independent subsamples), comparing such categories with each other (bivariate analysis), analyzing relationships between variables and other tasks in which the accuracy of the obtained statistical distributions is limited. of secondary importance.

4.1 What the standard says

Section 8 of ISO 9001:2000 covers "measurement, analysis and improvement". Although sampling is not covered by this standard, clause 8.1, which is a general introduction to the entire measurement section, states that measurement, analysis and improvement activities (should include the identification of applicable methods, including statistical methods) and the extent of their application). Accurate measurement of customer satisfaction can only be achieved when it is based on a good sample of customers. This chapter provides an overview of the sampling methods used to achieve this goal.

4.2 Sampling theory

The sampling principle is simple. Most organizations have a large number of customers, but in order to obtain accurate IEP results, it is not necessary to conduct research with everyone, it is enough to do it with a small sample, provided that this sample represents a large group of people. There are several different types of sampling, which are shown in Figure 4.1.

Rice. 4.1 Possible samples

4.2.1 Probability and non-probability sampling

The fundamental difference between samples is whether they are probability or non-probability samples. Probability sampling is also often called random sampling, and only with random, or probability, samples can you be sure that they are free from bias. By definition, all members of the population of a random sample have an equal chance of being represented in it, and the most obvious example of a random sample is the ordinary lottery. All balls or numbers remaining in the draw retain an equal chance of being drawn the next time. It is clear that no trend influences the choice of numbers in the lottery.

4.2.2 Non-probability samples

4.2.2.1 Non-representative samples

The simplest form of sampling is non-representative sampling. Imagine that you are conducting a public opinion poll. You could go out on the street and ask the first 50 people you meet how satisfied they are with the government's actions. It will be fast, simple and cheap, but it will not be very representative. This may sound trivial, but for clearly more complex cases, as we will see later, it is very easy to slip into an unrepresentative sample.

4.2.2.2 Purposeful sampling

Another form of non-probability sampling is purposive sampling. This is the same form that we have proposed for exploratory research, and although purposive sampling is good for qualitative research that is not aimed at achieving good statistics, it is not suitable for conducting basic research, or any other research that aims to obtain a statistically reliable result. .

4.2.2.3 Sampling based on quotas

The third type of non-probability sampling is quota sampling and is often used to study large populations. Imagine that a municipal council wants to measure the level of satisfaction of the population with the services and facilities that the council provides to them. Suppose you decide to interview members of a quota sample of 500 people living in the city on the street. You could assign five interviewers, each tasked with interviewing 100 people in a main shopping area. However, interviewers are not allowed to use unrepresentative sampling, i.e. interview the first 100 people they meet. Quota sampling requires that each interviewer adhere to many carefully defined norms to ensure that the sample is representative of the local population. The standards may be based on statistics available to the municipal council showing the groups into which the population is divided. So, for example, these data may indicate that 15% of the population is aged from 21 to 30 years, 18% is from 31 to 40 years old, etc. The division can also be based on other characteristics, for example, by gender, income level , ethnic origin. If the council wants the sample to be representative, it must include all of these groups in the same proportion as they are represented in the entire population. To achieve this, interviewers must define groups and quotas for them. In the example given, 15 out of every 100 people interviewed should be between 21 and 30 years of age, 18 should be between 31 and 40 years of age, and this should be combined with quotas for other groups imposed by gender, income, etc.

Let's assume that the interviewers worked all week, from Monday to Friday, from 9 a.m. to 5 p.m. every day, interviewing in a shopping arcade, so that by the end of the week each of them had completed 100 interviews while meeting all the standards. The resulting sample size is 500, which will be fully representative of the city's population, but it will not be selected at random, so it will not be free from trend. According to the definition of random sampling, all residents of a city should have an equal chance of being represented in the sample. In the example given, only those people had such a chance who visited the shopping arcade on these days of the week from 9 am to 5 pm. Thus, the sample will inevitably be biased, perhaps towards older people, the unemployed, and people working nearby. In reality, of course, researchers try to minimize the tendencies inherent in quota sampling by interviewing in different places and at different times, but they can never completely get rid of it, since the sample can only represent those people who at a given time time ended up in a given place, so theoretically such a sample will never be random, completely free from trend.

This does not mean that quota sampling should never be used. If you don't know the people who are your customers, you can't draw a random sample because there is no way to list the entire population from which to draw it. For example, many retailers do not know who their customers are. In such situations, organizations resort to quota sampling.

4.2.3 Probability samples

If you have a database of your customers, you can and should draw a random sample, and the first step is to determine the basis of the sample. The core is the list of consumers from which you intend to sample, and defining this list is a strategic decision. Organizations typically measure customer satisfaction once a year, and the sampling frame consists of those customers who have dealt with the organization in the last twelve months. However, this may not be acceptable for everyone. For example, it is not very effective when studying customer satisfaction with the help system of an information technology to ask questions about experience using that system over the past 11 months. In this case, it is better to use a shorter time frame, for example, counting all consumers who used the help system in the last month. This may require ongoing monitoring, in which a consumer survey is conducted every month and the results are accumulated to produce a periodic report, such as quarterly or even annually if the number of consumers during the quarter is small.

Thus, you can see that the "customers" you study may be different for different organizations, and their definition is a strategic decision and you must clearly define them, for these will be the consumers who will form the basis of the study, i.e. the population samples.

4.2.3.1 Simple random sampling

A probability or random sample is trendless because all members of the population will have an equal chance of being included in the sample. As stated earlier, the lottery provides a good example of simple random sampling - each time a new number is selected, it is selected at random from all those remaining in the "population". However, this is a fairly lengthy process if you need a large sample from a large population, so in the days before computers were used to obtain complex samples, market researchers invented a less labor-intensive way of obtaining a simple random sample, known as systematic random sampling.

4.2.3.2 Systematic random sampling

To obtain a systematic random sample for conducting an IEP, you first print out a list of your consumers. Let's say there are 1000 consumers and you want to sample 100, which is 1 in 10 people from the population. First you need to use a random number generator to get a number from 1 to 10. If you get 7, then you include in your list the 7th name from the list, the 17th, 27th, etc., which will result in a systematic random sample of 100 consumers. Before receiving a random number, all consumers have an equal chance of being included in the list. Thus, it will be a random sample, but it may not be representative, especially in the business market. In this case, it is good to use stratified random sampling.

Rice. 4.2 Example of stratified random sampling

4.3 Consumer sampling

We will show with an example how sampling could be done for a typical case of a business-to-business market. The first step for this business market is to build a customer database and sort it by customer value, starting with the highest and working down to the lowest. You then typically divide the resulting list into three parts—high, medium, and low customer value segments, respectively. Finally, determine the sample size in each segment. The results of this process are summarized in Fig. 4.2.

4.2.3.3 Stratified random sampling

Often in business markets, some customers are much more valuable than others. Sometimes a very large portion of a company's activities, such as 40 or 50%, is associated with the first five or six customers. If simple or systematic random sampling is used, it is likely that none of these five or six consumers will be included in the sample. It is clear that there is no point in conducting a survey measuring customer satisfaction if 40 or 50% of the company's overall activities are completely ignored. In a business market where most companies have a small number of high-value customers and a larger number of low-value customers, a simple or systematic random sample will inevitably be dominated by low-value customers. Stratified random sampling is used to obtain a sample that is both representative and free from trend. Obtaining a stratified random sample involves first dividing consumers into segments, or types, and then selecting a random sample within each segment. The sample shown in Figure 4.2 will be representative of the consumer base according to the business contribution each consumer segment makes. In consumer markets, the segmentation may be different, such as by age or gender.

4.3.1 Sample sample

In the example shown, the company derives 40% of its turnover from high-value customers. The fundamental principle of sampling in a business market is that if a high-value customer segment makes up 40% of the turnover (or profit), they should make up 40% of the sample. If a company decides to study a sample of 200 respondents, 40% of the sample, i.e. 80 respondents, should be from high value customers. Since there are 40 high-value consumers, the sampling ratio will be 2:1, which means that 2 respondents in the high-value segment are selected from each consumer. In business markets, it is common practice to select more than one respondent from large consumers when conducting research.

Average value customers also account for 40% of turnover, so they should also make up 40% of the sample. This means that the company must select 80 respondents from its average value customers. Since there are 160 such consumers, the selected proportion will be 1:2, i.e., one respondent for every two consumers of average value. This necessitates a random sample of one representative from every two consumers. This can be easily done using the systematic random sampling procedure described earlier. First, one of two random numbers is generated: 1 or 2. Let it be 2. In this case, you select the 2nd, 4th, 6th, etc. average value consumer.

Finally, 20% of the company's turnover comes from low value customers, so they should make up 20% of the sample, i.e. 40 respondents in the example given. There are a total of 400 low-value consumers there, which corresponds to a selected share of 1:10. This can be done using the same systematic random sampling procedure. At the end of the process, the company will receive a typed random sample of consumers that will be representative of their business activity and, due to random selection, will be free from trend.

4.3.2 Sampling of contact persons

Although the above procedure produces a random and representative sample of consumers, after all, the research is not conducted on companies, but on individuals, so if you work in the business-to-business market, you must, in addition to sampling consumers, sample among personal contacts. In practice, organizations often select individuals based on convenience - people with whom they have more contacts, whose names they have on hand. If individuals are selected according to this principle, then no matter how carefully the typified sample of companies is carried out, as a result it will be reduced to an unrepresentative sample of people whom someone knows. To avoid this tendency, you should select individuals at random. The way to implement this selection is to create a list of people associated with your product or service for each customer, and then randomly select people from that list. If you want to carry out a more complex and more precise procedure, you should divide the list of all persons into sectors, which will avoid including too many minor persons. For example, you are conducting an administrative analysis and decide that to more accurately reflect the decision-making process, your sample should contain 40% purchasing contacts, 40% technical contacts, and 20% all other contacts. In this case, you must draw a random sample of individuals in this proportion.

4.4 Sample size

Another issue to decide is the number of consumers you need to have in your sample. Some companies, primarily in business-to-business markets, have a very small number of valuable customers. Other companies have more than a million consumers. In business markets, the size of the population corresponds exactly to the number of individuals in each customer who influence that customer's satisfaction judgment, and it is not necessarily equal to the number of individuals with whom you have regular contact. Typically, the higher the customer value, the more individuals should be included. For a computer software provider, a single customer may have several hundred users. Even so, some organizations will have a much larger population than others, but this will not affect the number of consumers surveyed that is needed to provide a reliable sample.

4.4.1 Reliability of sample in relation to sample size

The statistical precision of a sample is related to its absolute size, regardless of how many people there are in the entire population. The question of what proportion of consumers should be surveyed is a misleading question. A larger sample is always more reliable than a smaller sample, no matter the size of the population. This is best illustrated by the bell curve (see Figure 4.3), from which we can conclude that when we examine a set of data, it tends to follow a normal distribution. This doesn't just apply to research data.

Extreme data Normal data Extreme data

Rice. 4.3 Bell curve

For example, if you record June rainfall in Manchester over a period of five years where three years had normal June rainfall, but two years June was extremely wet, then the estimated average rainfall will be heavily biased by these two unseasonably wet months. If the data were collected over 100 years, then two exceptionally wet or dry months would have little effect on the average June rainfall in Manchester. The same applies to research. If you only study 10 people and two of them have extreme views, they will greatly skew the end result. They will have much less impact with a sample size of 50 and virtually no impact with a sample size of 500, so the larger the sample size, the less risk of getting incorrect results. Figure 4.4 shows that as sample size increases, so does sample reliability. At first, at very small sizes, reliability increases very quickly, but as sample size increases, the effect of sample size on sample reliability decreases. You can see that the curve starts to flatten out between 30 and 50 respondents, which is generally considered the threshold between qualitative and quantitative research. When the sample size reaches 200, the increase in reliability with increasing number of respondents is extremely small. Accordingly, a sample size of 200 respondents is considered the minimum sample size to ensure a reliable IEP. Companies with a very small consumer base (around or less than 200 contacts) should simply research all contacted consumers.

Some years there may have been no rain in June (even in Manchester), some years the intensity of the rain has been incredibly high, but in most years the rainfall falls somewhere between these two limits, in the "normal" zone. Whether we are looking at research data or rainfall in Manchester, the key question is: “What is the risk of getting abnormal data that skews the result?” The smaller the sample, the higher the risk.

4.4.2 In-depth analysis

As noted earlier, in business research it is generally assumed that a sample size of 200 members provides the necessary reliability for an overall measure of customer satisfaction, regardless of whether the population is 500,000 or 600,000. There is one important exception to this, however, and that comes when you have different segments and want to do an in-depth analysis of the results by comparing satisfaction across the different segments. If you split a sample of 200 items into many segments, you will be faced with the problem of a small and therefore unreliable sample size in each segment. Therefore, it is generally accepted that the minimum total sample size is 200 and the segment minimum is 50.

Because of all this, the size of the total sample is often determined by how many segments you want to analyze. If you want to divide your result into six segments, you will need a sample size of at least 300 members, so that each segment has at least 50 members. This can be important for companies with many divisions or markets. Based on a figure of 50 respondents per segment, a retailer with 100 stores would need a sample of at least 5,000 members if customer satisfaction was to be measured at the store level. In our opinion, however, if comparisons are to be made between stores and management decisions are made based on the results of the study, then the absolute minimum should be 100 consumers per store, or better yet 200. For a retailer with 100 stores, this would result in requiring a sample size of 20,000 consumers to obtain very reliable results at the store level.

4.4.3 Sample size and response rate

One more factor needs to be noted. The recommended sample size of 200 respondents to ensure adequate reliability refers to the responses, not the number of consumers selected and invited. Moreover, to ensure statistical reliability, this means 200 consumers selected and the same 200 participants answering the interview questions or returning the questionnaires. If your response rate is low, it is statistically unreliable to compensate by simply sending out more questionnaires until you get 200 responses. The problem of underresponse tendency can be very significant in IEP studies and will be discussed in more detail in the next chapter.

4.5 Conclusions

(a) ISO 9000:2000 states that recognized statistical methods must be used to obtain a reliable sample for consumer-related measurements.

(b) Non-probability sampling increases the risk of a trend influencing the result and should only be used by organizations that do not have a customer database.

(c) For most organizations, the best way to obtain a representative and bias-free sample is random sampling based on quotas.

(d) The sampling frame should be significant individuals. In business markets, it may be necessary to include many respondents (sometimes many) from large customers.

(e) 200 respondents constitute the minimum number of respondents required to reliably measure customer satisfaction across an organization. This number is independent of the number of consumers you have.

(f) Organizations with fewer than 200 customers or contacts must conduct research on all customers enumerated.

(g) If results are to be obtained by segment, the minimum sample size per segment is 50 respondents. In these cases, the required minimum size of the entire sample will be equal to the number of segments multiplied by 50.

In fact, we start with not one, but three questions: What is sampling? when is it representative? what is she?
A set is any group of people, organizations, events that interest us, about which we want to draw conclusions, and a case or object is any element of such a set1. Sample – any subgroup of a population of cases (objects) selected for analysis. If we wanted to study the decision-making activity of state legislators, we could examine such activity in the state legislatures of Virginia, North Carolina, and South Carolina, rather than in all fifty states, and from there generalize the findings to the population from which these three states were chosen. If we wanted to examine Pennsylvania's voter preference system, we could do so by surveying 50 U.S. workers. S. Steele” in Pittsburgh, and extend the survey results to all voters in the state. Likewise, if we wanted to measure the intelligence of college students, we could test all the defensive players enrolled in Ohio State in a given football season and then generalize the results to the population of which they are a part. In each example, we proceed as follows: we identify a subgroup within the population, study this subgroup, or sample, in some detail, and generalize our results to the entire population. These are the main stages of sampling.
However, it seems quite clear that each of these samples has significant shortcomings. For example, although the legislatures of Virginia, North Carolina, and South Carolina are part of a collection of state legislatures, they, for historical, geographic, and political reasons, are likely to operate in very similar ways and very differently from the legislatures of such different states. states like New York, Nebraska and Alaska. While fifty steel workers in Pittsburgh may indeed be voters in the state of Pennsylvania, they, by virtue of socioeconomic status, education, and life experience, may well have views that differ from those of many other people who are also voters. Likewise, while Ohio State football players are college students, they may well be different from other college students for a variety of reasons. That is, although each of these subgroups is indeed a sample, the members of each are systematically different from most of the other members of the population from which they are selected. As a separate group, none of them is typical in terms of the distribution of attributes of opinions, motives of behavior and characteristics in the population with which it is associated. Accordingly, political scientists would say that none of these samples are representative.
A representative sample is a sample in which all the main features of the population from which the sample is drawn are represented in approximately the same proportion or with the same frequency with which a given feature appears in this population. Thus, if 50% of all state legislatures meet only once every two years, approximately half the composition of a representative sample of state legislatures should be of this type. If 30% of Pennsylvania voters are blue collar, about 30% of the representative sample for those voters (not 100% as in the example above) should be blue collar. And if 2% of all college students are athletes, approximately the same proportion of a representative sample of college students should be athletes. In other words, a representative sample is a microcosm, a smaller but accurate model of the population it is intended to reflect. To the extent that the sample is representative, conclusions based on the study of that sample can be safely assumed to apply to the original population. This spread of results is what we call generalizability.
Perhaps a graphic illustration will help explain this. Suppose we want to study patterns of political group membership among US adults.

Rice. 5.1. Formation of a sample from the general population
Figure 5.1 shows three circles divided into six equal sectors. Figure 5.1a represents the entire population under consideration. Population members are classified according to the political groups (such as parties and interest groups) to which they belong. In this example, each adult belongs to at least one and no more than six political groups; and these six levels of membership are equally distributed in the aggregate (hence the equal sectors). Suppose we want to study people's motives for joining a group, group choice, and patterns of participation, but due to resource limitations we are only able to study one out of every six members of the population. Who should be selected for analysis?
One of the possible samples of a given volume is illustrated by the shaded area in Fig. 5.1b, but it clearly does not reflect the structure of the population. If we were to make generalizations from this sample, we would conclude: (1) that all American adults belong to five political groups and (2) that all group behavior of Americans matches the behavior of those who belong specifically to the five groups. However, we know that the first conclusion is not true, and this may give us doubt about the validity of the second. Thus, the sample depicted in Figure 5.1b is unrepresentative because it does not reflect the distribution of a given population property (often called a parameter) according to its actual distribution. Such a sample is said to be biased toward members of the five groups or biased away from all other patterns of group membership. Based on such a biased sample, we usually come to erroneous conclusions about the population.
This can be most clearly demonstrated by the disaster that befell the Literary Digest magazine in the 1930s, which organized a public opinion poll regarding the election results. Literary Digest was a periodical that reprinted newspaper editorials and other materials reflecting public opinion; this magazine was very popular at the beginning of the century. Beginning in 1920, the magazine conducted a large-scale national poll in which ballots were sent by mail to more than a million people asking them to indicate their favorite candidate in the upcoming presidential election. For a number of years, the magazine's polling results were so accurate that a September poll seemed to make the November election irrelevant. And how could an error occur with such a large sample? However, in 1936, this is exactly what happened: with a large majority of votes (60:40), victory was predicted for the Republican candidate Alf Landon. In the elections, Landon lost to a disabled man - Franklin D. Roosevelt - with almost the same result with which he should have won. The Literary Digest's credibility was so badly damaged that the magazine went out of print shortly thereafter. What happened? It's very simple: the Digest poll used a biased sample. Postcards were sent to people whose names were extracted from two sources: telephone directories and car registration lists. And although this method of selection had previously not been much different from other methods, things were very different now, during the Great Depression of 1936, when the less wealthy voters, Roosevelt's most likely supporter, could not afford to own a telephone, let alone car. Thus, in fact, the sample used in the Digest poll was skewed toward those most likely to be Republican, yet it is still surprising that Roosevelt did so well.
How to solve this problem? Returning to our example, let's compare the sample in Fig. 5.1b with the sample in Fig. 5.1c. In the latter case, a sixth of the population is also selected for analysis, but each of the main types of population is represented in the sample in the proportion in which it is represented in the entire population. Such a sample shows that one out of every six American adults belongs to one political group, one out of six belongs to two, and so on. Such a sample would also reveal other differences among members that might be correlated with participation in different numbers of groups. Thus, the sample presented in Fig. 5.1c is a representative sample for the population under consideration.
Of course, this example is simplified in at least two extremely important ways. First, most populations of interest to political scientists are more diverse than the one illustrated. People, documents, governments, organizations, decisions, etc. differ from each other not by one, but by a much larger number of characteristics. Thus, a representative sample should be such that each major, distinct area is represented in proportion to its share of the population. Secondly, the situation where the actual distribution of the variables or attributes we want to measure is not known in advance is much more common than the opposite - it may not have been measured in a previous census. Thus, a representative sample must be designed so that it can accurately reflect the existing distribution even when we are unable to directly assess its validity. The sampling procedure must have an internal logic that can convince us that, if we were able to compare the sample with the census, it would indeed be representative.
To provide the ability to accurately reflect the complex organization of a given population and some degree of confidence that proposed procedures can do so, researchers turn to statistical methods. At the same time, they act in two directions. Firstly, using certain rules (internal logic), researchers decide which specific objects to study and what exactly to include in a specific sample. Second, using very different rules, they decide how many objects to select. We will not study these numerous rules in detail; we will only consider their role in political science research. Let's begin our consideration with strategies for selecting objects that form a representative sample.



Did you like the article? Share with your friends!