When analyzing variation series distribution great importance has how much empirical distribution sign corresponds normal. To do this, the frequencies of the actual distribution must be compared with the theoretical ones, which are characteristic of a normal distribution. This means that, based on actual data, it is necessary to calculate the theoretical frequencies of the normal distribution curve, which are a function of normalized deviations.

In other words, the empirical distribution curve needs to be aligned with the normal distribution curve.

Objective characteristics of compliance theoretical And empirical frequencies can be obtained using special statistical indicators which are called consent criteria.

Agreement criterion called a criterion that allows you to determine whether the discrepancy is empirical And theoretical distributions are random or significant, i.e. whether the observational data agree with the put forward statistical hypothesis or do not agree. Distribution population, which it has due to the hypothesis put forward, is called theoretical.

There is a need to install criterion(rule) that would allow one to judge whether the discrepancy between the empirical and theoretical distributions random or significant. If the discrepancy turns out to be random, then they believe that the observational data (sample) are consistent with the hypothesis put forward about the law of distribution of the general population and, therefore, the hypothesis is accepted; if the discrepancy turns out to be significant, then the observational data do not agree with the hypothesis and it is rejected.

Typically, empirical and theoretical frequencies differ because:

the discrepancy is random and due to limited quantity observations;
the discrepancy is not accidental and is explained by the fact that the statistical hypothesis that the population is normally distributed is erroneous.

Thus, consent criteria make it possible to reject or confirm the correctness of the hypothesis put forward when aligning the series about the nature of the distribution in the empirical series.

Empirical frequencies obtained as a result of observation. Theoretical frequencies calculated using formulas.

For normal distribution law they can be found as follows:

Σƒ i - sum of accumulated (cumulative) empirical frequencies
h - difference between two neighboring options
σ - sample standard deviation
t–normalized (standardized) deviation
φ(t)–probability density function of normal distribution (found for the corresponding value of t)

There are several goodness-of-fit tests, the most common of which are: chi-square test (Pearson), Kolmogorov test, Romanovsky test.

Pearson's goodness-of-fit test χ 2– one of the main ones, which can be represented as the sum of the ratios of the squares of the differences between theoretical (f T) and empirical (f) frequencies to theoretical frequencies:

k is the number of groups into which the empirical distribution is divided,
f i –observed frequency of the trait in the i-th group,
f T – theoretical frequency.

For the χ 2 distribution, tables have been compiled that indicate the critical value of the χ 2 goodness-of-fit criterion for the selected significance level α and degrees of freedom df (or ν).
The significance level α is the probability of erroneously rejecting the proposed hypothesis, i.e. the probability that a correct hypothesis will be rejected. R - statistical significance adoption correct hypothesis. In statistics, three levels of significance are most often used:

α=0.10, then P=0.90 (in 10 cases out of 100)

α=0.05, then P=0.95 (in 5 cases out of 100)

α=0.01, then P=0.99 (in 1 case out of 100) the correct hypothesis can be rejected

The number of degrees of freedom df is defined as the number of groups in the distribution series minus the number of connections: df = k –z. The number of connections is understood as the number of indicators of the empirical series used in calculating theoretical frequencies, i.e. indicators connecting empirical and theoretical frequencies.For example, when aligned with a bell curve, there are three relationships.Therefore, when aligned bybell curvethe number of degrees of freedom is defined as df =k–3.To assess significance, the calculated value is compared with the table χ 2 tables

With complete coincidence of the theoretical and empirical distributions χ 2 =0, otherwise χ 2 >0. If χ 2 calc > χ 2 tab , then for a given level of significance and number of degrees of freedom, we reject the hypothesis about the insignificance (randomness) of the discrepancies. If χ 2 calculated< χ 2 табл то we accept the hypothesis and with probability P = (1-α) it can be argued that the discrepancy between theoretical and empirical frequencies accidentally. Therefore, there is reason to assert that the empirical distribution obeys normal distribution. Pearson's goodness-of-fit test is used if the population size is large enough (N>50), and the frequency of each group must be at least 5.

Based on determining the maximum discrepancy between the accumulated empirical and theoretical frequencies:

where D and d are, respectively, the maximum difference between the accumulated frequencies and the accumulated frequencies of the empirical and theoretical distributions.
Using the distribution table of the Kolmogorov statistics, the probability is determined, which can vary from 0 to 1. When P(λ) = 1, there is a complete coincidence of frequencies, P(λ) = 0 - a complete discrepancy. If the probability value P is significant in relation to the found value λ, then we can assume that the discrepancies between the theoretical and empirical distributions are insignificant, that is, they are random.
The main condition for using the Kolmogorov criterion is that big number observations.

Kolmogorov goodness-of-fit test

Let us consider how the Kolmogorov criterion (λ) is applied when testing the hypothesis of normal distribution general population.Aligning the actual distribution with the bell curve consists of several steps:

Compare actual and theoretical frequencies.
Based on actual data, the theoretical frequencies of the normal distribution curve, which is a function of the normalized deviation, are determined.
They check to what extent the distribution of the characteristic corresponds to normal.

ForIVtable columns:

In MS Excel, the normalized deviation (t) is calculated using the NORMALIZATION function. It is necessary to select a range of free cells by the number of options (rows spreadsheet). Without removing the selection, call the NORMALIZE function. In the dialog box that appears, indicate the following cells, which contain, respectively, the observed values (X i), average (X) and standard deviation Ϭ. The operation must be completed simultaneous by pressing Ctrl+Shift+Enter

ForVtable columns:

The probability density function of the normal distribution φ(t) is found from the table of values of the local Laplace function for the corresponding value of the normalized deviation (t)

ForVItable columns:

Kolmogorov goodness-of-fit test (λ) determined by dividing the modulemax differencebetween empirical and theoretical cumulative frequencies by the square root of the number of observations:

Using a special probability table for the agreement criterion λ, we determine that the value λ = 0.59 corresponds to a probability of 0.88 (λ

Distribution of empirical and theoretical frequencies, probability density of theoretical distribution

When applying goodness-of-fit tests to check whether the observed (empirical) distribution corresponds to the theoretical one, one should distinguish between testing simple and complex hypotheses.

The one-sample Kolmogorov-Smirnov normality test is based on maximum difference between cumulative empirical distribution sample and the assumed (theoretical) cumulative distribution. If the Kolmogorov-Smirnov D statistic is significant, then the hypothesis that the corresponding distribution is normal should be rejected.

See also

Criteria for testing randomness and assessing outlier observations Literature Introduction In practice statistical analysis experimental data, the main interest is not the calculation of certain statistics itself, but the answers to questions of this type. Accordingly, many criteria have been developed to verify the put forward statistical hypotheses. All criteria for testing statistical hypotheses are divided into two large groups: parametric and non-parametric.

Share your work on social networks

If this work does not suit you, at the bottom of the page there is a list of similar works. You can also use the search button

Test

Using Consent Criteria

Introduction

Literature

Introduction

In the practice of statistical analysis of experimental data, the main interest is not the calculation of certain statistics itself, but the answers to questions of this type. Is the population mean really equal to a certain number? Is the correlation coefficient significantly different from zero? Are the variances of the two samples equal? And many such questions may arise, depending on the specific research problem. Accordingly, many criteria have been developed to test the proposed statistical hypotheses. We will consider some of the most common of them. These will mainly relate to means, variances, correlation coefficients and abundance distributions.

All criteria for testing statistical hypotheses are divided into two large groups: parametric and non-parametric. Parametric tests are based on the assumption that the sample data are drawn from a population with a known distribution, and the main task is to estimate the parameters of this distribution. Nonparametric tests do not require any assumptions about the nature of the distribution, other than the assumption that it is continuous.

Let's look first parametric criteria. The test sequence will include the formulation of the null hypothesis and the alternative hypothesis, the formulation of the assumptions to be made, the determination of the sample statistics used in the test and, the formation of the sample distribution of the statistics being tested, the determination of the critical regions for the selected criterion, and the construction of a confidence interval for the sample statistics.

1 Goodness-of-fit criteria for means

Let the hypothesis being tested be that the population parameter. The need for such a check may arise, for example, in the following situation. Suppose that, based on extensive research, the diameter of the shell of a fossil mollusk in sediments from some fixed location has been established. Let us also have at our disposal a certain number of shells found in another place, and we make the assumption that a specific place does not affect the diameter of the shell, i.e. that the average value of the shell diameter for the entire population of mollusks that once lived in a new place is equal to the known value obtained earlier when studying this type of mollusk in the first habitat.

If this known value is equal, then the null hypothesis and the alternative hypothesis are written as follows: Let us assume that the variable x in the population under consideration has normal distribution, and the amount of population variance is unknown.

We will test the hypothesis using statistics:

, (1)
where is the sample standard deviation.

It was shown that if true, then t in expression (1) has a Student t-distribution with n-1 degrees of freedom. If we choose the significance level (the probability of rejecting the correct hypothesis) equal, then in accordance with what was discussed in previous chapter, you can define critical values for checking =0.

IN in this case, since the Student distribution is symmetrical, then (1-) part of the area under the curve of this distribution with n-1 degrees of freedom will be contained between points and, which are equal to each other in absolute value. Therefore, all values are less than negative and greater than positive for the t-distribution with given number degrees of freedom at the chosen significance level will constitute the critical region. If the sample t value falls within this region, the alternative hypothesis is accepted.

Confidence interval for is constructed according to the previously described method and is determined from the following expression

(2)

So, let us know in our case that the diameter of the shell of a fossil mollusk is 18.2 mm. We had at our disposal a sample of 50 newly found shells, for which mm, a = 2.18 mm. Let's check: =18.2 against We have

If the significance level is chosen =0.05 then critical value. It follows that it can be rejected in favor at the significance level =0.05. Thus, for our hypothetical example it can be stated (with some probability, of course) that the diameter of the shell of fossil mollusks certain type depends on the places in which they lived.

Due to the fact that the t-distribution is symmetrical, only positive values t of this distribution at selected significance levels and the number of degrees of freedom. Moreover, not only the share of the area under the distribution curve to the right of the t value is taken into account, but also to the left of the -t value at the same time. This is due to the fact that in most cases, when testing hypotheses, we are interested in the significance of deviations in themselves, regardless of whether these deviations are larger or smaller, i.e. we check against, not against: >a or:

Let's return now to our example. The 100(1-)% confidence interval for is

18,92,01

Let us now consider the case when it is necessary to compare the means of two general populations. The hypothesis being tested looks like this: : =0, : 0. It is also assumed that it has a normal distribution with a mean and variance, and - a normal distribution with a mean and the same variance. In addition, we assume that the samples from which the general populations are estimated are extracted independently of each other and have a volume, respectively, and From the independence of the samples it follows that if we take a larger number of them and calculate the average values for each pair, then the set of these pairs of averages will be completely uncorrelated.

Null hypothesis testing is done using statistics

(3)

where and are variance estimates for the first and second samples, respectively. It is easy to see that (3) is a generalization of (1).

It was shown that statistics (3) have a Student t-distribution with degrees of freedom. If and are equal, i.e. = = formula (3) is simplified and has the form

(4)

Let's look at an example. Let us assume that when measuring the stem leaves of the same plant population over two seasons, the following results are obtained: We assume that the conditions for using the Student’s t-test, i.e. the normality of the populations from which the samples are taken, the existence of an unknown but the same variance for these populations, and the independence of the samples are satisfied. Let us estimate at the significance level =0.01. We have

Table value t = 2.58. Therefore, the hypothesis about the equality of the average values of stem leaf lengths for a plant population over two seasons should be rejected at the chosen level of significance.

Attention! The null hypothesis in mathematical statistics is the hypothesis that there are no significant differences between the compared indicators, regardless of whether we are talking about means, variances or other statistics. And in all these cases, if the empirical (calculated by formula) value of the criterion is greater than the theoretical (selected from the tables), it is rejected. If the empirical value is less than the tabulated value, then it is accepted.

In order to construct a confidence interval for the difference between the means of these two populations, let us pay attention to the fact that the Student’s test, as can be seen from formula (3), evaluates the significance of the difference between the means relative to the standard error of this difference. It is easy to verify that the denominator in (3) represents exactly this standard error using the previously discussed relationships and assumptions made. In fact, we know that in the general case

If x and y are independent, then so are

Taking sample values and instead of x and y and recalling the assumption made that both populations have the same variance, we obtain

(5)

The variance estimate can be obtained from the following relation

(6)

(We divide by because two quantities are estimated from the samples and, therefore, the number of degrees of freedom must be reduced by two.)

If we now substitute (6) into (5) and take the square root, we get the denominator in expression (3).

After this digression, let's return to constructing a confidence interval for through -.

We have

Let us make some comments related to the assumptions used in constructing the t-test. First of all, it was shown that violations of the assumption of normality for have an insignificant effect on the level of significance and power of the test for 30. Violations of the assumption of homogeneity of variances of both populations from which the samples are taken are also insignificant, but only in the case when the sample sizes are equal. If the variances of both populations differ from each other, then the probabilities of errors of the first and second types will differ significantly from those expected.

In this case, the criterion should be used to check

(7)

with the number of degrees of freedom

. (8)

As a rule, it turns out to be a fractional number, therefore, when using t-distribution tables, it is necessary to take the table values for the nearest integer values and interpolate to find the t corresponding to the obtained one.

Let's look at an example. When studying two subspecies of the lake frog, the ratio of body length to tibia length was calculated. Two samples were taken with volumes =49 and =27. The means and variances of the relationship we are interested in turned out to be equal, respectively, =2.34; =2.08; =0.21; =0.35. If we now test the hypothesis using formula (2), we obtain that

At a significance level of =0.05, we must reject the null hypothesis (tabulated value t = 1.995) and assume that there are statistically significant differences at the selected significance level between the average values of the measured parameters for the two subspecies of frogs.

When using formulas (6) and (7) we have

In this case, for the same significance level =0.05, the table value is t=2.015, and the null hypothesis is accepted.

This example clearly shows that neglecting the conditions adopted when deriving a particular criterion can lead to results that are directly opposite to those that actually occur. Of course, in this case, having samples of different sizes in the absence of a pre-established fact that the variances of the measured indicator in both populations are statistically equal, it was necessary to use formulas (7) and (8), which showed the absence of statistically significant differences.

Therefore, I would like to repeat once again that checking compliance with all assumptions made when deriving a particular criterion is an absolutely necessary condition for its correct use.

The constant requirement in both of the above modifications of the t-test was the requirement that the samples be independent of each other. However, in practice there are often situations when this requirement cannot be met for objective reasons. For example, some indicators are measured on the same animal or area of territory before and after the action of an external factor, etc. And in these cases we may be interested in testing the hypothesis against. We will continue to assume that both samples are drawn from normal populations with the same variance.

In this case, we can take advantage of the fact that differences between normally distributed quantities also have a normal distribution, and therefore we can use the Student's t test in the form (1). Thus, the hypothesis will be tested that n differences are a sample from a normally distributed population with a mean equal to zero.

Denoting the i-th difference by, we have

, (9)
Where

Let's look at an example. Let us have at our disposal data on the number of impulses of an individual nerve cell during a certain time interval before () and after () the action of the stimulus:

Hence, keeping in mind that (9) has a t-distribution, and choosing a significance level of =0.01, from the corresponding table in the Appendix we find that the critical value of t for n-1=10-1=9 degrees of freedom is 3.25. A comparison of the theoretical and empirical t-statistic values shows that the null hypothesis of no statistically significant differences between firing rates before and after the stimulus should be rejected. It can be concluded that the stimulus used statistically significantly changes the frequency of impulses.

In experimental studies, as mentioned above, dependent samples appear quite often. However, this fact is sometimes ignored and the t-test is used incorrectly in form (3).

The inappropriateness of this can be seen by considering the standard errors of the difference between uncorrelated and correlated means. In the first case

And in the second

The standard error of the difference d is

Taking this into account, the denominator in (9) will have the form

Now let us pay attention to the fact that the numerators of expressions (4) and (9) coincide:

therefore, the difference in the value of t in them depends on the denominators.

Thus, if formula (3) is used in a problem with dependent samples, and the samples have a positive correlation, then the resulting t values will be less than they should be when using formula (9), and a situation may arise where that the null hypothesis will be accepted when it is false. The opposite situation may arise when there is a negative correlation between samples, i.e. in this case, differences will be recognized as significant that in fact are not.

Let's return again to the example with impulse activity and calculate the t value for the given data using formula (3), not paying attention to the fact that the samples are related. We have: For the number of degrees of freedom equal to 18, and the significance level = 0.01, the table value is t = 2.88 and, at first glance, it seems that nothing happened, even when using a formula that is unsuitable for the given conditions. And in this case, the calculated t value leads to the rejection of the null hypothesis, i.e. to the same conclusion that was made using formula (9), correct in this situation.

However, let's reformat the existing data and present it in the following form (2):

These are the same values, and they could well be obtained in one of the experiments. Since all values in both samples are preserved, using the Student's t test in formula (3) gives the previously obtained value = 3.32 and leads to the same conclusion that has already been made.

Now let’s calculate the value of t using formula (9), which should be used in this case. We have: The critical value of t at the selected significance level and nine degrees of freedom is 3.25. Consequently, we have no reason to reject the null hypothesis, we accept it, and it turns out that this conclusion is directly opposite to that which was made when using formula (3).

Using this example, we were once again convinced of how important it is to obtain correct conclusions when analyzing experimental data to strictly comply with all the requirements that were the basis for determining a particular criterion.

The considered modifications of the Student's test are intended to test hypotheses regarding the average of two samples. However, situations arise when it becomes necessary to draw conclusions regarding the equality of k averages at the same time. For this case, a certain statistical procedure has also been developed, which will be discussed later when discussing issues related to analysis of variance.

2 Goodness-of-fit tests for variances

Testing statistical hypotheses regarding population variances is carried out in the same sequence as for means. Let us briefly recall this sequence.

1. A null hypothesis is formulated (about the absence of statistically significant differences between the compared variances).

2. Some assumptions are made regarding the sampling distribution of the statistics with which it is planned to estimate the parameter included in the hypothesis.

3. The significance level for testing the hypothesis is selected.

4. The value of the statistics of interest to us is calculated and a decision is made regarding the truth of the null hypothesis.

Now let's start by testing the hypothesis that the variance of the population =a, i.e. against. If we assume that the variable x has a normal distribution and that a sample of size n is drawn randomly from the population, then statistics are used to test the null hypothesis

(10)

Remembering the formula for calculating dispersion, we rewrite (10) as follows:

. (11)

From this expression it is clear that the numerator is the sum of the squares of the deviations of normally distributed values from their mean. Each of these deviations is also normally distributed. Therefore, in accordance with the distribution known to us, the sums of squares of normally distributed values of statistics (10) and (11) have a -distribution with n-1 degrees of freedom.

By analogy with the use of the t-distribution, when checking for the selected significance level, critical points are established from the distribution table, corresponding to the probabilities of accepting the null hypothesis and. The confidence interval for at selected is constructed as follows:

. (12)

Let's look at an example. Let us assume, on the basis of extensive experimental research, that the dispersion of the alkaloid content of one plant species from a certain area is equal to 4.37 conventional units. The specialist has at his disposal a sample of n = 28 such plants, presumably from the same area. The analysis showed that for this sample =5.01 and it is necessary to make sure that this and previously known variances are statistically indistinguishable at the significance level =0.1.

According to formula (10) we have

The resulting value must be compared with the critical values /2=0.05 and 1--/2=0.95. From the Appendix table for with 27 degrees of freedom we have 40.1 and 16.2, respectively, which means that the null hypothesis can be accepted. The corresponding confidence interval for is 3.37<<8,35.

In contrast to testing hypotheses regarding sample means using the Student's test, when errors of the first and second types did not change significantly when the assumption of normal distribution of populations was violated, in the case of hypotheses about variances when the conditions of normality were not met, the errors changed significantly.

The problem considered above about the equality of the variance to some fixed value is of limited interest, since situations are quite rare when the variance of the population is known. Of much greater interest is the case when you need to check whether the variances of two populations are equal, i.e. testing a hypothesis against an alternative. It is assumed that samples of size and are randomly drawn from general populations with variances and.

To test the null hypothesis, Fisher's variance ratio test is used

(13)

Since the sums of squared deviations of normally distributed random variables from their mean values have a distribution, then both the numerator and denominator of (13) are distributed values divided by and respectively, and therefore their ratio has an F-distribution with -1 and -1 degrees of freedom.

It is generally accepted - and this is how F-distribution tables are constructed - that the largest of the variances is taken as the numerator in (13), and therefore only one critical point is determined, corresponding to the selected significance level.

Let us have at our disposal two samples of volume =11 and =28 from populations of common and oval pond snails, for which the height-to-width ratios have variances =0.59 and =0.38. It is necessary to test the hypothesis about the equality of these variances of these indicators for the populations being studied at a significance level of =0.05. We have

In the literature, you can sometimes find a statement that testing the hypothesis about the equality of means using the Student's test should be preceded by testing the hypothesis about the equality of variances. This is the wrong recommendation. Moreover, it can lead to mistakes that can be avoided if not followed.

Indeed, the results of testing the hypothesis of equality of variances using Fisher's test largely depend on the assumption that the samples are drawn from populations with a normal distribution. At the same time, the Student's test is insensitive to violations of normality, and if it is possible to obtain samples of equal size, then the assumption of equality of variances is also not significant. In the case of unequal n, formulas (7) and (8) should be used for verification.

When testing hypotheses about equality of variances, some features arise in calculations associated with dependent samples. In this case, statistics are used to test a hypothesis against an alternative

(14)

If the null hypothesis is true, then statistics (14) has a Student t-distribution with n-2 degrees of freedom.

When measuring the gloss of 35 coating samples, a dispersion of =134.5 was obtained. Repeated measurements two weeks later showed =199.1. In this case, the correlation coefficient between paired measurements turned out to be equal to =0.876. If we ignore the fact that the samples are dependent and use the Fisher test to test the hypothesis, we get F=1.48. If you choose the significance level =0.05, then the null hypothesis will be accepted, since the critical value of the F-distribution for =35-1=34 and =35-1=34 degrees of freedom is 1.79.

At the same time, if we use formula (14) suitable for this case, we obtain t = 2.35, while the critical value of t for 33 degrees of freedom and the selected significance level = 0.05 is equal to 2.03. Therefore, the null hypothesis of equal variances in the two samples should be rejected. Thus, from this example it is clear that, as in the case of testing the hypothesis of equality of means, the use of a criterion that does not take into account the specifics of experimental data leads to an error.

In the recommended literature you can find the Bartlett test, which is used to test hypotheses about the simultaneous equality of k variances. In addition to the fact that calculating the statistics of this criterion is quite laborious, the main disadvantage of this criterion is that it is unusually sensitive to deviations from the assumption of normal distribution of the populations from which samples are drawn. Thus, when using it, you can never be sure that the null hypothesis is actually rejected because the variances are statistically significantly different, and not because the samples are not normally distributed. Therefore, if the problem of comparing several variances arises, it is necessary to look for a formulation of the problem where it will be possible to use the Fisher criterion or its modifications.

3 Criteria for agreement regarding shares

Quite often it is necessary to analyze populations in which objects can be classified into one of two categories. For example, by gender in a certain population, by the presence of a certain microelement in the soil, by the dark or light color of eggs in some species of birds, etc.

We denote the proportion of elements that have a certain quality by P, where P represents the ratio of objects with the quality we are interested in to all objects in the aggregate.

Let us test the hypothesis that in some sufficiently large population the share P is equal to some number a (0

For dichotomous (having two gradations) variables, as in our case, P plays the same role as the average of the population of variables measured quantitatively. On the other hand, it was previously stated that the standard error of the fraction P can be represented as

Then, if the hypothesis is true, then the statistics

, (19)
where p is the sample P value, has a unit normal distribution. It should be noted right away that such an approximation is valid if the lesser of the products np or (1-p)n is greater than 5.

Let it be known from the literature that in the lake frog population the proportion of individuals with a longitudinal stripe on the back is 62% or 0.62. We had at our disposal a sample of 125 (n) individuals, 93 (f) of which have a longitudinal stripe on the back. It is necessary to find out whether the proportion of individuals with the trait of interest to us in the population from which the sample was taken corresponds to the known data. We have: p=f/n=93/125=0.744, a=0.62, n(1-p)=125(1-0.744)=32>5 and

Therefore, for both the significance level = 0.05 and = 0.01, the null hypothesis should be rejected, since the critical value for = 0.05 is 1.96, and for = 0.01 - 2.58.

If there are two large populations in which the proportions of objects with the property we are interested in are respectively and, then it is of interest to test the hypothesis: = against the alternative:. For testing, two samples with volumes and are extracted randomly and independently. Based on these samples, statistics are estimated and determined.

(20)

where and is the number of objects possessing this characteristic, respectively, in the first and second samples.

From formula (20) it can be understood that in its derivation the same principle was used that we encountered earlier. Namely, to test statistical hypotheses, the number of standard deviations that make up the difference between the indicators of interest to us is determined; in fact, the value (+)/(+) represents the proportion of objects with a given characteristic in both samples simultaneously. If we denote it by, then the expression in the second bracket of the denominator (20) represents (1-) and it becomes obvious that expression (20) is equivalent to the formula for testing the null hypothesis:

Because.

On the other hand, it's a standard error. Thus, (20) can be written as

. (21)

The only difference between this statistic and the statistic used in testing hypotheses about means is that z has a unit normal distribution rather than a t-distribution.

Let the study of a group of people (=82) show that the proportion of people who have a -rhythm in their electroencephalogram is 0.84 or 84%. A study of a group of people in another area (=51) found this proportion to be 0.78. For a significance level of =0.05, it is necessary to check that the proportions of individuals with brain alpha activity in the general populations from which the samples were taken are the same.

First of all, let us make sure that the available experimental data allow us to use statistics (20). We have:

and since z has a normal distribution, for which the critical point at =0.05 is 1.96, then the null hypothesis is accepted.

The considered criterion is valid if the samples for which the proportions of objects with the characteristic we are interested in were compared are independent. If this requirement is not met, for example, when a population is considered in successive time intervals, then the same object may or may not have this characteristic in these intervals.

Let us denote the presence of an object of some attribute of interest to us by 1, and its absence by 0. Then we come to table 3, where (a+c) is the number of objects in the first sample that have some attribute, (a+c) is the number of objects with this characteristic in the second sample, and n is the total number of objects examined. Obviously, this is already a well-known four-field table, the relationship in which is assessed using the coefficient

For such a table and small (<10) значений в каждой клетке Р.Фишером было найдено точное распределение для, которое позволяет проверять гипотезу: =. Это распределение имеет довольно сложный вид, и его критические точки приводятся в специальных таблицах. В реальных ситуациях, как правило, значения в каждой клетке больше 10, и было показано, что в этих случаях для проверки нулевой гипотезы можно использовать статистику

(22)
which, if the null hypothesis is true, has a chi-square distribution with one degree of freedom.

Let's look at an example. Let the effectiveness of malaria vaccinations given at different times of the year be tested over the course of two years. The hypothesis is tested that the effectiveness of vaccinations does not depend on the time of year when they are given. We have

The table value for =0.05 is 3.84, and for =0.01 is 6.64. Therefore, at any of these significance levels, the null hypothesis should be rejected, and in this hypothetical example (however related to reality), it can be concluded that bets made in the second half of the year are significantly more effective.

A natural generalization of the coupling coefficient for a four-field table is, as mentioned earlier, Chuprov’s mutual conjugation coefficient. The exact distribution for this coefficient is unknown, so the validity of the hypothesis is judged by comparing the calculated value and the selected significance level with the critical points for this distribution. The number of degrees of freedom is determined from the expression (r-1)(c-1), where r and c are the number of gradations for each of the characteristics.

Let us recall the calculation formulas

The data obtained from studying the range of vision in the right and left eyes in people without visual anomalies are presented. Conventionally, this range is divided into four categories, and we are interested in the reliability of the relationship between the visual range of the left and right eyes. First, let's find all the terms in the double sum. To do this, the square of each value given in the table is divided by the sum of the row and column to which the selected number belongs. We have

Using this value we get =3303.6 and T=0.714.

4 Criteria for comparing population distributions

In the classic pea breeding experiments that marked the beginning of genetics, G. Mendel observed the frequencies of different types of seeds obtained by crossing plants with round yellow seeds and wrinkled green seeds.

In this and similar cases, it is of interest to test the null hypothesis about the equality of the distribution functions of the general populations from which the samples are drawn, i.e. Theoretical calculations have shown that statistics can be used to solve such a problem

= (23)

The criterion using this statistics was proposed by K. Pearson and bears his name. The Pearson test is used for grouped data regardless of whether it has a continuous or discrete distribution. In (23), k is the number of grouping intervals, is the empirical numbers, and is the expected or theoretical numbers (=n). If the null hypothesis is true, statistics (23) has a distribution with k-1 degrees of freedom.

For the data given in the table

The critical points of the distribution with 3 degrees of freedom for =0.05 and =0.01 are equal to 7.81 and 11.3, respectively. Therefore, the null hypothesis is accepted and the conclusion is drawn that segregation in the offspring corresponds quite well to theoretical patterns.

Let's look at another example. In a colony of guinea pigs, the following numbers of male births were obtained during the year by month, starting from January: 65, 64, 65, 41, 72, 80, 88, 114, 80, 129, 112, 99. Can we consider that the data obtained correspond to a uniform distribution, i.e. distribution in which the number of males born in individual months is on average the same? If we accept this hypothesis, then the expected average number of males born will be equal. Then

The critical value of a distribution with 11 degrees of freedom and = 0.01 is 24.7, so at the chosen significance level the null hypothesis is rejected. Further analysis of experimental data shows that the likelihood of male guinea pigs being born in the second half of the year increases.

In the case where the theoretical distribution is assumed to be uniform, there are no problems with calculating theoretical numbers. In the case of other distributions, the calculations become more complicated. Let's look at examples of how theoretical numbers are calculated for normal and Poisson distributions, which are quite common in research practice.

Let's start by determining the theoretical numbers for the normal distribution. The idea is to transform our empirical distribution into a distribution with zero mean and unit variance. Naturally, in this case, the boundaries of class intervals will be expressed in units of standard deviation, and then, remembering that the area under the section of the curve limited by the upper and lower values of each interval is equal to the probability of falling into a given interval, multiplying this probability by the total number sampling we will obtain the desired theoretical number.

Suppose we have an empirical distribution for the length of oak leaves and we need to check whether it can be considered with a significance level of =0.05 that this distribution does not differ significantly from normal.

Let us explain how the values given in the table were calculated. First, using the standard method for grouped data, the mean and standard deviation were calculated, which turned out to be equal to =10.3 and =2.67. Using these values, the boundaries of the intervals were found in units of standard deviation, i.e. standardized values have been found. For example, for the boundaries of the interval (46) we have: (4-10.3)/2.67=-2.36; (6-10.3)/2.67=-1.61. Then, for each interval, the probability of falling into it was calculated. For example, for the interval (-0.110.64) from the normal distribution table, we have that to the left of the point (-0.11) there is 0.444 of the area of the unit normal distribution, and to the left of the point (0.64) there is 0.739 of this area. Thus, the probability of falling into this interval is 0.739-0.444=0.295. The rest of the calculations are obvious. The difference between n and... should be explained. It arises due to the fact that the theoretical normal distribution can be considered, for practical purposes, to be centered on an interval. In the experiment, there are no values deviating more than from the average. Therefore, the area under the empirical distribution curve is not equal to unity, due to which an error arises. However, this error does not significantly change the final results.

When comparing empirical and theoretical distributions, the number of degrees of freedom for the -distribution is found from the relation f=m-1-l, where m is the number of class intervals, and l is the number of independent distribution parameters estimated from the sample. For a normal distribution l=2, since it depends on two parameters: and.

The number of degrees of freedom is also reduced by 1, since for any distribution there is a condition that =1, and therefore, the number of independently determined probabilities is equal to k-1, and not k.

For the given example, f = 8-2-1 = 5 and the critical value at =0.05 for the -distribution with 5 degrees of freedom is 11.07. Therefore, the null hypothesis is accepted.

Let us consider the technique of comparing the empirical distribution with the Poisson distribution using a classic example of the number of deaths of dragoons per month in the Prussian army from a horse’s hoof. The data dates back to the 19th century, and the number of deaths is 0, 1, 2, etc. characterize these sad, but fortunately relatively rare events in the Prussian cavalry over almost 20 years of observation.

As is known, the Poisson distribution has the following form:

where is the distribution parameter equal to the average,

K =0,1,2,...,n.

Since the distribution is discrete, the probabilities we are interested in are found directly from the formula.

Let us show, for example, how the theoretical number for k=3 is determined. In the usual way we find that the mean in this distribution is 0.652. Given this value, we find

From here

If =0.05 is chosen, then the critical value for the -distribution with two degrees of freedom is 5.99, and therefore the hypothesis that the empirical distribution at the chosen significance level is not different from the Poisson one is accepted. The number of degrees of freedom in this case is two, because the Poisson distribution depends on one parameter, and therefore, in the relation f = m-1-l, the number of parameters estimated from the sample is l = 1, and f = 4-1-1 = 2.

Sometimes in practice it turns out to be important to know whether two distributions are different from each other, even if it is difficult to decide which theoretical distribution can approximate them. This is especially important in cases where, for example, their means and/or variances do not differ statistically significantly from each other. Finding significant differences in distribution patterns can help the researcher make predictions about possible factors that lead to these differences.

In this case, statistics (23) can be used, and the values of one distribution are used as empirical quantities, and the values of another as theoretical ones. Naturally, in this case, the division into class intervals should be the same for both distributions. This means that for all data from both samples, the minimum and maximum values are selected, regardless of which sample they belong to, and then, in accordance with the selected number of class intervals, their width is determined and the number of objects falling into separate intervals is calculated for each sample separately .

In this case, it may turn out that some classes do not contain or only a few (35) values fall into them. Using the Pearson criterion gives satisfactory results if at least 35 values fall into each interval. Therefore, if this requirement is not met, adjacent intervals must be merged. Of course, this is done for both distributions.

And finally, one more note regarding the comparison of the calculated value and the critical points for it at the selected significance level. We already know that if >, then the null hypothesis is rejected. However, values close to the critical point 1- on the right should arouse our suspicions, because such a too good coincidence of the empirical and theoretical distributions or two empirical distributions (after all, in this case the numbers will differ very slightly from each other) is unlikely to occur for random distributions. In this case, two alternative explanations are possible: either we are dealing with a law, and then the result obtained is not surprising, or the experimental data, for some reason, are “fitted” to each other, which requires their re-verification.

By the way, in the example with peas we have exactly the first case, i.e. the appearance of seeds of different smoothness and color in the offspring is determined by law, and therefore it is not surprising that the calculated value turned out to be so small.

Now let's return to testing the statistical hypothesis about the identity of two empirical distributions. Data are presented on the distribution of the number of petals of anemone flowers taken from different habitats.

From the tabular data it is clear that the first two and last two intervals must be combined, since the number of values falling into them is not enough for the correct use of the Pearson criterion. From this example it is also clear that if only the distribution from habitat A were analyzed, then there would be no class-interval containing 4 petals at all. It appeared as a result of the fact that two distributions are considered simultaneously, and in the second distribution there is such a class.

So, let's check the hypothesis that these two distributions do not differ from each other. We have

For a number of degrees of freedom of 4 and a significance level even equal to 0.001, the null hypothesis is rejected.

To compare two sample distributions, you can also use the nonparametric criterion proposed by N.V. Smirnov and based on statistics introduced earlier by A.N. Kolmogorov. (This is why this test is sometimes called the Kolmogorov-Smirnov test.) This test is based on a comparison of series of accumulated frequencies. The statistics of this criterion are found as

max, (24)
where and are the distribution curves of accumulated frequencies.

The critical points for statistics (24) are found from the relation

, (25)
where and are the volumes of the first and second samples.

Critical values for =0.1;=0.05; and =0.01 are equal to 1.22, respectively; 1.36; 1.63. Let us illustrate the use of the Smirnov criterion using grouped data representing the height of schoolchildren of the same age from two different regions.

The maximum difference between the accumulated frequency curves is 0.124. If we choose the significance level =0.05, then from formula (25) we have

0,098.

Thus, the maximum empirical difference is greater than the theoretically expected one, therefore, at the accepted level of significance, the null hypothesis about the identity of the two distributions under consideration is rejected.

The Smirnov test can also be used for non-clustered data, the only requirement is that the data must be drawn from a population with a continuous distribution. It is also desirable that the number of values in each sample be at least 40-50.

To test the null hypothesis, according to which two independent samples of sizes n and m correspond to the same distribution functions, F. Wilcoxon proposed a nonparametric criterion, which was justified in the works of G. Mann and F. Whitney. Therefore, in the literature this criterion is called either the Wilcoxon criterion or the Mann-Whitney criterion. This criterion is advisable to use when the sample sizes obtained are small and the use of other criteria is inappropriate.

The calculations below illustrate the approach to constructing criteria that use statistics associated not with the sample values themselves, but with their ranks.

Let us have two samples of sizes n and m values at our disposal. Let us construct a general variation series from them, and compare each of these values with its rank (), i.e. the serial number it occupies in the ranked series. If the null hypothesis is true, then any distribution of ranks is equally probable, and the total number of possible combinations of ranks for given n and m is equal to the number of combinations of N=n+m elements by m.

Wilcoxon test is based on statistics

. (26)

Formally, to test the null hypothesis, it is necessary to count all possible combinations of ranks for which the W statistic takes values equal to or less than that obtained for a specific ranked series, and find the ratio of this number to the total number of possible combinations of ranks for both samples. Comparing the obtained value with the selected significance level will allow you to accept or reject the null hypothesis. The rationale behind this approach is that if one distribution is biased relative to another, it will manifest itself in the fact that small ranks should correspond mainly to one sample, and large ones to another. Depending on this, the corresponding rank sums should be small or large depending on which alternative occurs.

It is necessary to test the hypothesis about the identity of the distribution functions characterizing both measurement methods with a significance level of =0.05.

In this example n = 3, m = 2, N = 2+3 = 5, and the sum of the ranks corresponding to measurements using method B is 1+3 = 4.

Let us write down all =10 possible distributions of ranks and their sums:

Ranks: 1.2 1.3 1.4 1.5 2.3 2.4 2.5 3.4 3.5 4.5

Amounts: 3 4 5 6 5 6 7 7 8 9

The ratio of the number of rank combinations, the sum of which does not exceed the obtained value of 4 for method B, to the total number of possible rank combinations is 2/10=0.2>0.05, so for this example the null hypothesis is accepted.

For small values of n and m, the null hypothesis can be tested by directly counting the number of combinations of the corresponding rank sums. However, for large samples this becomes practically impossible, so an approximation was obtained for the W statistic, which, as it turned out, asymptotically tends to the normal distribution with the appropriate parameters. We will calculate these parameters to illustrate the approach to synthesizing rank-based statistical tests. In doing so, we will use the results presented in Chapter 37.

Let W be the sum of ranks corresponding to one of the samples, for example, the one with volume m. Let be the arithmetic mean of these ranks. The mathematical expectation of the value is

since under the null hypothesis the ranks of elements in a sample of size m represent a sample from a finite population 1, 2,...,N (N=n+m). It is known that

That's why.

When calculating the variance, we take advantage of the fact that the sum of the squares of the ranks of the general ranked series, composed of the values of both samples, is equal to

Taking into account the previously obtained relations for estimating the variances of general populations and samples, we have

It follows that

It has been shown that statistics

(27)

for large n and m it has an asymptotically unit normal distribution.

Let's look at an example. Let data on the polarographic activity of blood serum filtrate be obtained for two age groups. It is necessary to test the hypothesis with a significance level of =0.05 that the samples are taken from general populations that have the same distribution functions. The sum of ranks for the first sample is 30, for the second - 90. Checking the correctness of calculating the sums of ranks is the fulfillment of the condition. In our case, 30+90=(7+8)(7+8+1):

:2=120. According to formula (27), using the sum of ranks of the second sample, we have

If we use the sum of ranks for the first sample, we get the value = -3.01. Since the calculated statistics have a unit normal distribution, it is natural that in both the first and second cases the null hypothesis is rejected, since the critical value for the 5% significance level is modulo 1.96.

When using the Wilcoxon test, certain difficulties arise when the same values are found in both samples, since the use of the above formula leads to a decrease in the power of the test, sometimes very significantly.

In order to reduce errors to a minimum in such cases, it is advisable to use the following rule of thumb. The first time when identical values belonging to different samples are encountered, which of them to put first in the variation series is determined randomly, for example, by tossing a coin. If there are several such values, then, having determined the first one by chance, the remaining equal values from both samples are alternated. In those cases where other equal values are found, do this. If in the first group of equal values the first value was randomly selected from one particular sample, then in the next group of equal values the value from another sample is selected first, etc.

5.Criteria for checking randomness and evaluating outlier observations

Quite often, data is acquired in series across time or space. For example, in the process of conducting psychophysiological experiments, which can last several hours, several tens or hundreds of times, the latent (latent period) of the reaction to a presented visual stimulus is measured, or in geographical surveys, when on sites located in certain places, for example, along the edge of the forests, the number of plants of a certain type is counted, etc. On the other hand, when calculating various statistics, it is assumed that the source data are independent and identically distributed. Therefore, it is of interest to test this assumption.

First, consider a criterion for testing the null hypothesis of independence of identically normally distributed values. Thus, this criterion is parametric. It is based on calculating the mean squares of successive differences

. (28)

If we introduce new statistics, then, as is known from theory, if the null hypothesis is true, the statistics

(29)
for n>10 is distributed asymptotically according to the standard normal distribution.

Let's look at an example. The reaction times () of the subject in one of the psychophysiological experiments are given.

We have: from where

Since for =0.05 the critical value is 1.96, the null hypothesis about the independence of the resulting series is accepted with the selected significance level.

Another question that often arises when analyzing experimental data is what to do with some observations that differ sharply from the bulk of observations. Such outlier observations can occur due to methodological errors, calculation errors, etc. In all cases where the experimenter knows that an error has crept into the observation, he must exclude this value, regardless of its magnitude. In other cases, there is only a suspicion of error, and then it is necessary to use appropriate criteria in order to make a particular decision, i.e. exclude or leave outlier observations.

In general, the question is posed as follows: are the observations made on the same population, or do some parts or individual values belong to a different population?

Of course, the only reliable way to exclude individual observations is to carefully study the conditions under which these observations were obtained. If for some reason the conditions differed from the standard ones, then the observations should be excluded from further analysis. But in certain cases the existing criteria, although imperfect, can be of significant benefit.

We will present here, without proof, several relationships that can be used to test the hypothesis that observations are made by chance on the same population. We have

(30)

(31)

(32)

where is the suspected “outlier” observation. If all the values of a series are ranked, then the most prominent observation in it will occupy the nth place.

For statistics (30), the distribution function is tabulated. The critical points of this distribution for some n are given.

The critical values for statistics (31) depending on n are

4,0; 6

4,5; 100

5.0; n>1000.

Formula (31) assumes that and are calculated without taking into account the suspected observation.

With statistics (32), the situation is more complicated. It is shown that if they are distributed uniformly, then the mathematical expectation and variance have the form:

The critical region is formed by small values that correspond to large values. If you are interested in checking for an “outlier” of the smallest value, then first transform the data so that they have a uniform distribution over the interval, and then take the addition of these uniform values to 1 and check using formula (32).

Consider using the above criteria for the following ranked series of observations: 3,4,5,5,6,7,8,9,9,10,11,17. You need to decide whether the highest value 17 should be rejected.

We have: According to formula (30) =(17-11)/3.81=1.57, and the null hypothesis should be accepted at =0.01. According to the formula (31) = (17-7.0)/2.61 = 3.83, and the null hypothesis should also be accepted. To use the third criterion, we find =5.53, then

The w statistic is normally distributed with zero mean and unit variance, and hence the null hypothesis at =0.05 is accepted.

The difficulty of using statistics (32) is the need to have a priori information about the distribution law of sample values, and then analytically transform this distribution into a uniform distribution over the interval.

Literature

1. Eliseeva I.I. General theory of statistics: textbook for universities / I.I. Eliseeva, M.M. Yuzbashev; edited by I.I. Eliseeva. M.: Finance and Statistics, 2009. 656 p.

2. Efimova M.R. Workshop on the general theory of statistics: textbook for universities / M.R. Efimova and others M.: Finance and Statistics, 2007. 368 p.

3. Melkumov Y.S. Socio-economic statistics: educational and methodological manual. M.: IMPE-PUBLISH, 2007. 200 p.

4. General theory of statistics: Statistical methodology in the study of commercial activity: textbook for universities / O.E. Bashina and others; edited by O.E. Bashina, A.A. Spirina. - M.: Finance and Statistics, 2008. 440 p.

5. Salin V.N. A course in the theory of statistics for training specialists in financial and economic profiles: textbook / V.N. Salin, E.Yu. Churilova. M.: Finance and Statistics, 2007. 480 p.

6. Socio-economic statistics: workshop: textbook / V.N. Salin et al.; edited by V.N. Salina, E.P. Shpakovskaya. M.: Finance and Statistics, 2009. 192 p.

7. Statistics: textbook / A.V. Bagat et al.; edited by V.M. Simchers. M.: Finance and Statistics, 2007. 368 p.

8. Statistics: textbook / I.I. Eliseeva and others; edited by I.I. Eliseeva. M.: Higher Education, 2008. - 566 p.

9. Theory of statistics: textbook for universities / R.A. Shmoilova and others; edited by R.A. Shmoilova. - M.: Finance and Statistics, 2007. 656 p.

10. Shmoilova R.A. Workshop on the theory of statistics: textbook for universities / R.A. Shmoilova and others; edited by R.A. Shmoilova. - M.: Finance and Statistics, 2007. 416 p.

PAGE \* MERGEFORMAT 1

Other similar works that may interest you.vshm>
17926.		Analysis of compactness criteria for industrial robotics	1.77 MB
	Software solutions for assessing the compactness of a robot. Miniature robots can penetrate and move through narrow openings, which allows them to be used to perform various tasks in confined spaces, such as small-diameter pipes measuring a few millimeters in size. In almost all industries, the issues of miniaturization of actuators and mechanisms are among the priorities; they are of utmost importance for low-resource technological processes...
1884.		Development of criteria for effective personnel management at OJSC Kazan-Orgsintez for QMS	204.77 KB
	Basic theoretical aspects of the personnel management system. Personnel as an object of management. Research methods for personnel management systems for QMS. Ways to improve the efficiency of personnel management.
16316.		and this theory resolves this dilemma; b the resolution of this dilemma requires the existence of criteria for this theory.	12.12 KB
	The author argues that the fundamental reason for the macroeconomic policy dilemma under conditions of a fixed exchange rate is not a violation of Tinbergen’s rule, which is in fact a consequence and not a cause, but the absence of the necessary economic prerequisites for fixing the exchange rate presented in the theory of optimal currency zones. The reason for this dilemma is usually considered to be a violation of the Tinbergen rule, according to which, in order to achieve a certain number of economic goals, the state must have...
18273.		Analysis of the legal status of the President of the Republic of Kazakhstan from the standpoint of generally accepted criteria of the rule of law and the principle of separation of powers	73.64 KB
	The essence of the President’s approach was that the country should develop in a natural, evolutionary way. Presidential rule - provided for by the Constitution of the state, this is the cessation of the activities of self-government institutions of a certain regional administrative entity and the implementation of management of the latter through authorized persons appointed by the head of state - the president and persons accountable to him; provided for by the Constitution, the vesting of the head of state - the president - with emergency powers on a global scale...
5713.		Using DotNetNuke	1.87 MB
	In this course work we will study DotNetNuke. DotNetNuke (abbreviated name DNN) is a website content management system (Web Content Management System, abbreviated WCMS), which has absorbed all the best achievements in the field of technologies for building web projects.
7073.		USING INTERFACES	56.59 KB
	The word interface is a polysemantic word, and it has different meanings in different contexts. There is the concept of a software or hardware interface, but in most cases the word interface is associated with some kind of connection between objects or processes.
6471.		Register structure and use	193.04 KB
	Structure and use of registers Registers are designed to store and convert multi-bit binary numbers. Registers are constructed as an ordered sequence of flip-flops. In microprocessors, registers are the main means for quickly remembering and storing digital information. The elements from which the registers are built are D RS JK flip-flops with dynamic pulse cutoff or static control.
6472.		Structure and use of counters	318.58 KB
	Classification and principle of construction of asynchronous counters A counter is a device at the outputs of which a binary code is formed expressing the number of pulses received at the input of the counter. The number of possible states of a counter is called its modulus or counting coefficient and is designated. Main timing characteristics of counters: maximum frequency of arrival of counting pulses; time of transition from one state to another; There are counter microcircuits themselves and circuits built on the basis of one or more...
7066.		USING THE MENU IN THE APPLICATION	240.2 KB
	Program menu The program menu must correspond to the main operating modes of the program; therefore, the selection of menu items and commands of individual items must be treated with special care. To better understand the technology of using menus in programs, consider the sequence of actions when solving the following training program. All actions must be completed using the menu.
7067.		USING THE DIALOG MENUS	73.13 KB
	Continuing the development of an application with a menu and a toolbar, we need to write code for message handlers for commands for creating a 6*6 matrix and outputting (printing) the matrix to the client area of our application. The creation of the matrix must be completed by displaying a message on the screen indicating the successful completion of the handler, for example, “The matrix has been created.”

Statistical hypotheses. Consent criteria.

Null(basic) call a hypothesis put forward about the form of an unknown distribution, or about the parameters of known distributions. Competing (alternative) called a hypothesis that contradicts the null hypothesis.

For example, if the null hypothesis is that the random variable X is distributed according to the law, then a competing hypothesis might be that the random variable X distributed according to a different law.

Statistical criterion(or simply criterion) is called a random variable TO, which serves to test the null hypothesis.

After selecting a certain criterion, for example criterion , the set of all its possible values is divided into two disjoint subsets: one of them contains the criterion values at which the null hypothesis is rejected, and the other - at which it is accepted.

Critical area is a set of criterion values at which the null hypothesis is rejected. Hypothesis Acceptance Area call the set of criterion values at which the hypothesis is accepted. Critical points They call the points separating the critical region from the region where the null hypothesis is accepted.

For our example, with a value of , the value calculated from the sample corresponds to the area of acceptance of the hypothesis: the random variable is distributed according to the law. If the calculated value is , then it falls into the critical region, that is, the hypothesis about the distribution of the random variable according to the law is rejected.

In the case of distribution, the critical region is determined by the inequality, the region where the null hypothesis is accepted is determined by the inequality.

2.6.3. Agreement criterion Pearson.

One of the tasks of animal science and veterinary genetics is the breeding of new breeds and species with the required characteristics. For example, increasing immunity, resistance to disease, or changing the color of the fur.

In practice, when analyzing the results, it very often turns out that the actual results more or less correspond to some theoretical distribution law. There is a need to assess the degree of correspondence between actual (empirical) data and theoretical (hypothetical) data. To do this, put forward a null hypothesis: the resulting population is distributed according to the “A” law. The hypothesis about the expected distribution law is tested using a specially selected random variable - the goodness-of-fit criterion.

Agreement criterion is called a criterion for testing a hypothesis about the assumed law of an unknown distribution.

There are several criteria of agreement: Pearson, Kolmogorov, Smirnov, etc. The Pearson goodness-of-fit test is the most commonly used.

Let us consider the application of the Pearson criterion using the example of testing the hypothesis about the normal law of distribution of the population. For this purpose, we will compare empirical and theoretical (calculated in the continuation of the normal distribution) frequencies.

There is usually some difference between theoretical and empirical frequencies. For example:

Empirical frequencies 7 15 41 93 113 84 25 13 5

Theoretical frequencies 5 13 36 89 114 91 29 14 6

Let's consider two cases:

The discrepancy between theoretical and empirical frequencies is random (insignificant), i.e. it is possible to make a proposal about the distribution of empirical frequencies according to the normal law;

The discrepancy between theoretical and empirical frequencies is not accidental (significant), i.e. theoretical frequencies were calculated based on the incorrect hypothesis of a normal population distribution.

Using the Pearson goodness-of-fit test, you can determine whether the discrepancy between theoretical and empirical frequencies is accidental or not, i.e. with a given confidence probability, determine whether the population is distributed according to a normal law or not.

So, let the empirical distribution be obtained from a sample of size n:

Options......

Empirical frequencies…….

Let us assume that theoretical frequencies are calculated under the assumption of a normal distribution. At the significance level, it is required to test the null hypothesis: the population is normally distributed.

As a criterion for testing the null hypothesis, we will take a random variable

(*)

This quantity is random, since in different experiments it takes on different, previously unknown values. It is clear that the less the empirical and theoretical frequencies differ, the smaller the value of the criterion and, therefore, to a certain extent it characterizes the closeness of the empirical and theoretical distributions.

It has been proven that when the distribution law of a random variable (*), regardless of which distribution law the general population is subject to, tends to a distribution law with degrees of freedom. Therefore, the random variable (*) is denoted by , and the criterion itself is called the “chi-square” goodness-of-fit test.

Let us denote the value of the criterion calculated from observational data by . The tabulated critical values of the criterion for a given significance level and number of degrees of freedom are denoted by . In this case, the number of degrees of freedom is determined from the equality , where is the number of groups (partial intervals) of the sample or classes; - number of parameters of the expected distribution. The normal distribution has two parameters - mathematical expectation and standard deviation. Therefore, the number of degrees of freedom for a normal distribution is found from the equality

If the calculated value and the table value satisfy the inequality , the null hypothesis about the normal distribution of the population is accepted. If , the null hypothesis is rejected and the alternative hypothesis is accepted (the population is not normally distributed).

Comment. When using Pearson's goodness-of-fit test, the sample size must be at least 30. Each group must contain at least 5 options. If the groups contain less than 5 frequencies, they are combined with neighboring groups.

In general, the number of degrees of freedom for the chi-square distribution is defined as the total number of values from which the corresponding indicators are calculated, minus the number of those conditions that connect these values, i.e. reduce the possibility of variation between them. In the simplest cases, when calculating, the number of degrees of freedom will be equal to the number of classes reduced by one. So, for example, with dihybrid splitting, 4 classes are obtained, but only the first class is unrelated, the subsequent ones are already related to the previous ones. Therefore, for dihybrid splitting, the number of degrees of freedom is .

Example 1. Determine the degree of compliance of the actual distribution of groups by the number of cows with tuberculosis with the theoretically expected one, which was calculated when considering the normal distribution. The source data is summarized in the table:

Solution.

Based on the level of significance and the number of degrees of freedom from the table of critical points of the distribution (see Appendix 4), we find the value . Because the , we can conclude that the difference between theoretical and actual frequencies is random. Thus, the actual distribution of groups by the number of cows with tuberculosis corresponds to the theoretically expected.

Example 2. The theoretical distribution by phenotype of individuals obtained in the second generation by dihybrid crossing of rabbits according to Mendel's law is 9: 3: 3: 1. It is required to calculate the correspondence of the empirical distribution of rabbits from crossing black individuals with normal hair with downy animals - albino. When crossing in the second generation, 120 descendants were obtained, including 45 black with short hair, 30 black downy rabbits, 25 white with short hair, 20 white downy rabbits.

Solution. Theoretically, the expected segregation in the offspring should correspond to the ratio of the four phenotypes (9: 3: 3: 1). Let's calculate the theoretical frequencies (number of goals) for each class:

9+3+3+1=16, which means we can expect that there will be black shorthairs ; black downy - ; white shorthaired - ; white downy - .

The empirical (actual) distribution of phenotypes was as follows: 45; thirty; 25; 20.

Let's summarize all this data in the following table:

Using the Pearson goodness-of-fit test, we calculate the value:

Number of degrees of freedom in dihybrid crossing. For significance level find the value . Because the , we can conclude that the difference between theoretical and actual frequencies is not random. Consequently, the resulting group of rabbits deviates in the distribution of phenotypes from Mendel’s law during dihybrid crossing and reflects the influence of certain factors that change the type of phenotypic segregation in the second generation of crossbreeds.

The Pearson chi-square goodness-of-fit test can also be used to compare two homogeneous empirical distributions with each other, i.e. those that have the same class boundaries. The null hypothesis is the hypothesis that two unknown distribution functions are equal. The chi-square test in such cases is determined by the formula

(**)

where and are the volumes of the distributions being compared; and - frequencies of the corresponding classes.

Consider a comparison of two empirical distributions using the following example.

Example 3. The length of cuckoo eggs was measured in two territorial zones. In the first zone, a sample of 76 eggs () was examined, in the second of 54 (). The following results were obtained:

Length (mm)
Frequencies
Frequencies	-	-	-

At the significance level, we need to test the null hypothesis that both samples of eggs belong to the same cuckoo population.

To test the hypothesis about the correspondence of the empirical distribution to the theoretical distribution law, special statistical indicators are used - goodness-of-fit criteria (or compliance criteria). These include the criteria of Pearson, Kolmogorov, Romanovsky, Yastremsky, etc. Most agreement criteria are based on the use of deviations of empirical frequencies from theoretical ones.

Obviously, the smaller these deviations, the better the theoretical distribution corresponds to the empirical one (or describes it). Consent criteria

- these are criteria for testing hypotheses about the correspondence of the empirical distribution to the theoretical probability distribution. Such criteria are divided into two classes: general and special. General goodness-of-fit tests apply to the most general formulation of a hypothesis, namely, the hypothesis that observed results agree with any a priori assumed probability distribution. Special goodness-of-fit tests involve special null hypotheses that state agreement with a particular form of probability distribution.

Agreement criteria, based on the established distribution law, make it possible to establish when discrepancies between theoretical and empirical frequencies should be considered insignificant (random), and when - significant (non-random). It follows from this that the agreement criteria make it possible to reject or confirm the correctness of the hypothesis put forward when aligning the series about the nature of the distribution in the empirical series and to answer whether it is possible to accept for a given empirical distribution a model expressed by some theoretical distribution law. Pearson goodness-of-fit test

c 2 (chi-square) is one of the main criteria for agreement. Proposed by the English mathematician Karl Pearson (1857-1936) to assess the randomness (significance) of discrepancies between the frequencies of empirical and theoretical distributions:

The scheme for applying criterion c 2 to assessing the consistency of theoretical and empirical distributions comes down to the following:

1. The calculated measure of discrepancy is determined.

2. The number of degrees of freedom is determined.

3. Based on the number of degrees of freedom n, using a special table, is determined.

Significance level is the probability of erroneously rejecting the put forward hypothesis, i.e. the probability that a correct hypothesis will be rejected. In statistical studies, depending on the importance and responsibility of the problems being solved, the following three levels of significance are used:

1) a = 0.1, then R = 0,9;

2) a = 0.05, then R = 0,95;

3) a = 0.01, then R = 0,99.

Using the agreement criterion c 2, the following conditions must be met:

1. The volume of the population under study must be large enough ( N≥ 50), while the frequency or group size must be at least 5. If this condition is violated, it is necessary to first combine small frequencies (less than 5).

2. The empirical distribution must consist of data obtained as a result of random sampling, i.e. they must be independent.

The disadvantage of the Pearson goodness-of-fit criterion is the loss of some of the original information associated with the need to group observation results into intervals and combine individual intervals with a small number of observations. In this regard, it is recommended to supplement the check of distribution compliance according to the criterion with 2 other criteria. This is especially necessary with a relatively small sample size ( n ≈ 100).

In statistics Kolmogorov goodness-of-fit test(also known as the Kolmogorov-Smirnov goodness-of-fit test) is used to determine whether two empirical distributions obey the same law, or to determine whether a resulting distribution obeys an assumed model. The Kolmogorov criterion is based on determining the maximum discrepancy between accumulated frequencies or frequencies of empirical or theoretical distributions. The Kolmogorov criterion is calculated using the following formulas:

Where D And d- accordingly, the maximum difference between the accumulated frequencies ( f – f¢) and between accumulated frequencies ( p – p¢) empirical and theoretical series of distributions; N- the number of units in the aggregate.

Having calculated the value of λ, a special table is used to determine the probability with which it can be stated that deviations of empirical frequencies from theoretical ones are random. If the sign takes values up to 0.3, then this means that there is a complete coincidence of frequencies. With a large number of observations, the Kolmogorov test is able to detect any deviation from the hypothesis. This means that any difference in the sample distribution from the theoretical one will be detected with its help if there are a sufficiently large number of observations. The practical significance of this property is not significant, since in most cases it is difficult to count on obtaining a large number of observations under constant conditions, the theoretical idea of the distribution law to which the sample should obey is always approximate, and the accuracy of statistical tests should not exceed the accuracy of the selected model.

Romanovsky's goodness-of-fit test is based on the use of the Pearson criterion, i.e. already found values of c 2, and the number of degrees of freedom:

where n is the number of degrees of freedom of variation.

The Romanovsky criterion is convenient in the absence of tables for. If< 3, то расхождения распределений случайны, если же >3, then they are not random and the theoretical distribution cannot serve as a model for the empirical distribution being studied.

B. S. Yastremsky used in the criterion of agreement not the number of degrees of freedom, but the number of groups ( k), a special value of q, depending on the number of groups, and a chi-square value. Yastremski's goodness-of-fit test has the same meaning as the Romanovsky criterion and is expressed by the formula

where c 2 is Pearson's goodness-of-fit criterion; - number of groups; q - coefficient, for the number of groups less than 20, equal to 0.6.

If L fact > 3, the discrepancies between theoretical and empirical distributions are not random, i.e. the empirical distribution does not meet the requirements of a normal distribution. If L fact< 3, расхождения между эмпирическим и теоретическим распределениями считаются случайными.

Definition 51. Criteria that allow you to judge whether the values are consistent X 1 , X 2 ,…, x n random variable X with a hypothesis regarding its distribution function are called consent criteria.

The idea of using consent criteria

Let a hypothesis be tested based on this statistical material N, consisting in the fact that SV X obeys some specific distribution law. This law can be specified either as a distribution function F(x), or in the form of distribution density f(x), or as a set of probabilities p i. Since of all these forms the distribution function F(x) is the most general (exists for both DSV and NSV) and determines any other, we will formulate a hypothesis N, as consisting in the fact that the quantity X has a distribution function F(x).

To accept or reject a hypothesis N, consider some quantity U, characterizing the degree of divergence (deviation) of the theoretical and statistical distributions. MagnitudeU can be selected in various ways: 1) sum of squared deviations of theoretical probabilities p i from the corresponding frequencies, 2) the sum of the same squares with some coefficients (weights), 3) the maximum deviation of the statistical (empirical) distribution function from the theoretical F(x).

Let the value U chosen in one way or another. Obviously, this is some random variable. Law of distribution U depends on the distribution law of the random variable X, on which experiments were carried out, and on the number of experiments n. If the hypothesis N is true, then the law of distribution of the quantity U determined by the law of distribution of the quantity X(function F(x)) and number n.

Let us assume that this distribution law is known. As a result of this series of experiments, it was discovered that the chosen measure of discrepancy U took on some meaning u. Question: can this be explained by random reasons or this discrepancy is too is large and indicates the presence of a significant difference between the theoretical and statistical (empirical) distributions and, therefore, the unsuitability of the hypothesis N? To answer this question, let us assume that the hypothesis N is correct, and under this assumption we calculate the probability that, due to random reasons associated with an insufficient amount of experimental material, the measure of discrepancy U will be no less than the experimentally observed value u, that is, we calculate the probability of the event: .

If this probability is small, then the hypothesis N should be rejected as little plausible, but if this probability is significant, then we conclude that the experimental data do not contradict the hypothesis N.

The question arises: how should the measure of discrepancy (deviation) be chosen? U? It turns out that with some methods of choosing it, the law of distribution of the quantity U has very simple properties and with a sufficiently large n practically independent of function F(x). It is precisely these measures of discrepancy that are used in mathematical statistics as criteria for agreement.

Definition 51/. The criterion of agreement is the criterion for testing the hypothesis about the assumed law of an unknown distribution.

For quantitative data with distributions close to normal, use parametric methods based on indicators such as mathematical expectation and standard deviation. In particular, to determine the reliability of the difference in means for two samples, the Student method (criterion) is used, and in order to judge the differences between three or more samples, the test F, or analysis of variance. If we are dealing with non-quantitative data or the samples are too small to be confident that the populations from which they are taken follow a normal distribution, then use nonparametric methods - criterion χ 2(chi-square) or Pearson for qualitative data and signs, ranks, Mann-Whitney, Wilcoxon, etc. tests for ordinal data.

In addition, the choice of statistical method depends on whether the samples whose means are being compared are independent(i.e., for example, taken from two different groups of subjects) or dependent(i.e., reflecting the results of the same group of subjects before and after exposure or after two different exposures).

pp. 1. Pearson test (- chi-square)

Let it be produced n independent experiments, in each of which the random variable X took a certain value, that is, a sample of observations of the random variable was given X(general population) volume n. Let's consider the task of checking the proximity of the theoretical and empirical distribution functions for a discrete distribution, that is, it is required to check whether the experimental data are consistent with the hypothesis N 0, stating that the random variable X has a distribution law F(x) at significance level α . Let's call this law “theoretical”.

When obtaining a goodness-of-fit criterion for testing a hypothesis, determine the measure D deviations of the empirical distribution function of a given sample from the estimated (theoretical) distribution function F(x).

The most commonly used measure is the one introduced by Pearson. Let's consider this measure. Let's split the set of random variable values X on r sets - groups S 1 , S 2 ,…, S r, without common points. In practice, such a partition is carried out using ( r- 1) numbers c 1 < c 2 < … < c r-1 . In this case, the end of each interval is excluded from the corresponding set, and the left one is included.

S 1 S 2 S 3 …. S r -1 S r

c 1 c 2 c 3 c r -1

Let p i, , - the probability that SV X belongs to many S i(obviously ). Let n i, , - the number of values (variant) from among the observables belonging to the set S i(empirical frequencies). Then the relative frequency of SV hits X in many S i at n observations. It's obvious that , .

For the split above, p i there is an increment F(x) on the set S i, and the increment is on the same set. Let us summarize the results of the experiments in a table in the form of a grouped statistical series.

Group Boundaries	Relative frequency
S 1:x 1 – x 2
S 2: x 2 – x 3
…	…
S r: x r – x r +1

Knowing the theoretical distribution law, you can find the theoretical probabilities of a random variable falling into each group: R 1 , R 2 , …, p r. When checking the consistency of the theoretical and empirical (statistical) distributions, we will proceed from the discrepancies between the theoretical probabilities p i and observed frequencies.

For measure D the discrepancies (deviations) of the empirical distribution function from the theoretical one take the sum of the squared deviations of the theoretical probabilities p i from the corresponding frequencies taken with certain "weights" c i: .

Odds c i are introduced because, in the general case, deviations belonging to different groups cannot be considered equal in significance: a deviation of the same absolute value may be of little significance if the probability itself p i is large, and very noticeable if it is small. Therefore, naturally the “weights” c i take inversely proportional to the probabilities. How to choose this coefficient?

K. Pearson showed that if we put , then for large n quantity distribution law U has very simple properties: it is practically independent of the distribution function F(x) and on the number of experiments n, but depends only on the number of groups r, namely, this law with increasing n approaches the so-called chi-square distribution .

If you need additional material on this topic, or you did not find what you were looking for, we recommend using the search in our database of works:

What will we do with the received material:

If this material was useful to you, you can save it to your page on social networks:

Pearson's goodness-of-fit statistical test in your own words. Agreement criterion

Kolmogorov goodness-of-fit test

Statistical hypotheses. Consent criteria.

What will we do with the received material: