Analysis of variance makes it possible to estimate. Analysis of variance

What is analysis of variance used for? The purpose of analysis of variance is to study the presence or absence of a significant influence of any qualitative or quantitative factor on changes in the resultant characteristic being studied. To do this, a factor that is believed to have or does not have a significant effect is divided into gradation classes (in other words, groups) and it is determined whether the influence of the factor is the same by examining the significance between the means in the data sets corresponding to the gradations of the factor. Examples: the dependence of the enterprise's profit on the type of raw materials used is studied (then gradation classes are the types of raw materials), the dependence of the cost of production per unit of production on the size of the enterprise's division (then the gradation classes are the characteristics of the division's size: large, medium, small).

The minimum number of gradation classes (groups) is two. Graduation classes can be qualitative or quantitative.

Why is analysis of variance called variance analysis? Analysis of variance examines the relationship between two variances. Dispersion, as we know, is a characteristic of the dispersion of data around the average value. The first is the dispersion explained by the influence of the factor, which characterizes the dispersion of values ​​between gradations of the factor (groups) around the average of all data. The second is unexplained variance, which characterizes the dispersion of data within gradations (groups) around the average values ​​of the groups themselves. The first variance can be called between-groups, and the second - within-groups. The ratio of these variances is called the actual Fisher ratio and is compared with the critical value of the Fisher ratio. If the actual Fisher ratio is greater than the critical one, then the averages of the gradation classes differ from each other and the factor under study significantly influences the change in the data. If it is less, then the average gradation classes do not differ from each other and the factor does not have a significant influence.

How are hypotheses formulated, accepted, and rejected in ANOVA? In analysis of variance, the specific weight of the total impact of one or more factors is determined. The significance of the influence of a factor is determined by testing hypotheses:

  • H0 : μ 1 = μ 2 = ... = μ a, Where a- number of gradation classes - all gradation classes have the same average value,
  • H1 : not all μ i equal - not all gradation classes have the same average value.

If the influence of a factor is not significant, then the difference between the gradation classes of this factor is also insignificant and in the course of analysis of variance the null hypothesis H0 is not rejected. If the influence of the factor is significant, then the null hypothesis H0 rejected: not all gradation classes have the same mean value, that is, among the possible differences between gradation classes, one or more are significant.

Some more concepts of variance analysis. A statistical complex in variance analysis is a table of empirical data. If all classes of gradations have the same number of options, then the statistical complex is called homogeneous (homogeneous), if the number of options is different - heterogeneous (heterogeneous).

Depending on the number of factors being assessed, one-factor, two-factor and multifactor analysis of variance are distinguished.

One-factor analysis of variance: the essence of the method, formulas, examples

The essence of the method, formula

is based on the fact that the sum of squared deviations of a statistical complex can be divided into components:

SS = SS a + SS e,

SS

SSa a sum of squared deviations,

SSe- unexplained sum of squared deviations or sum of squared error deviations.

If through ni indicate the number of options in each gradation class (group) and a is the total number of gradations of the factor (groups), then is the total number of observations and the following formulas can be obtained:

total number of squared deviations: ,

explained by the influence of the factor a sum of squared deviations: ,

unexplained sum of squared deviations or sum of squared error deviations: ,

- general average of observations,

(group).

Besides,

where is the variance of factor (group) gradation.

To conduct a one-way analysis of variance of data from a statistical complex, you need to find the actual Fisher ratio - the ratio of the variance explained by the influence of the factor (intergroup) and the unexplained variance (intragroup):

and compare it with the Fisher critical value.

Variances are calculated as follows:

Explained Variance,

Unexplained variance

va = a − 1 - number of degrees of freedom of explained variance,

ve = na - number of degrees of freedom of unexplained variance,

v = n

The critical value of the Fisher ratio with certain values ​​of the significance level and degrees of freedom can be found in statistical tables or calculated using the MS Excel function F.OBR (the figure below, to enlarge it, click on it with the left mouse button).


The function requires you to enter the following data:

Probability - level of significance α ,

Degrees_freedom1 - number of degrees of freedom of explained variance va,

Degrees_freedom2 - number of degrees of freedom of unexplained variance ve.

If the actual value of the Fisher ratio is greater than the critical value (), then the null hypothesis is rejected at the significance level α . This means that the factor significantly influences the change in data and the data is dependent on the factor with probability P = 1 − α .

If the actual value of the Fisher ratio is less than the critical value (), then the null hypothesis cannot be rejected at the significance level α . This means that the factor does not significantly influence the data with probability P = 1 − α .

One-Way ANOVA: Examples

Example 1. It is necessary to find out whether the type of raw materials used affects the profit of the enterprise. In six gradation classes (groups) of the factor (1st type, 2nd type, etc.) data on profits from the production of 1000 units of product in millions of rubles over 4 years are collected.

Raw material type2014 2015 2016 2017
1st7,21 7,55 7,29 7,6
2nd7,89 8,27 7,39 8,18
3rd7,25 7,01 7,37 7,53
4th7,75 7,41 7,27 7,42
5th7,7 8,28 8,55 8,6
6th7,56 8,05 8,07 7,84
Average
Dispersion
7,413 0,0367
7,933 0,1571
7,290 0,0480
7,463 0,0414
8,283 0,1706
7,880 0,0563

a= 6 and in each class (group) ni=4 observations. Total number of observations n = 24 .

Number of degrees of freedom:

va = a − 1 = 6 − 1 = 5 ,

ve = na = 24 − 6 = 18 ,

v = n − 1 = 24 − 1 = 23 .

Let's calculate the variances:

.

.

Since the actual Fischer ratio is greater than the critical one:

with significance level α = 0.05 we conclude that the profit of the enterprise, depending on the type of raw materials used in production, differs significantly.

Or, what is the same thing, we reject the main hypothesis about the equality of means in all factor gradation classes (groups).

In the example just considered, each factor gradation class had the same number of options. But, as mentioned in the introductory part, the number of options may vary. And this in no way complicates the analysis of variance procedure. This is the next example.

Example 2. It is required to find out whether there is a dependence of the cost of production per unit of production on the size of the enterprise division. The factor (unit size) is divided into three gradation classes (groups): small, medium, large. The data corresponding to these groups on the cost of production of a unit of the same type of product for a certain period is summarized.

smallaveragebig
48 47 46
50 61 57
63 63 57
72 47 55
43 32
59 59
58
Average58,6 54,0 51,0
Dispersion128,25 65,00 107,60

Number of factor gradation classes (groups) a= 3, number of observations in classes (groups) n1 = 4 , n2 = 7 , n3 = 6 . Total number of observations n = 17 .

Number of degrees of freedom:

va = a − 1 = 2 ,

ve = na = 17 − 3 = 14 ,

v = n − 1 = 16 .

Let's calculate the sum of squared deviations:

Let's calculate the variances:

,

.

Let's calculate the actual Fisher ratio:

.

Critical value of the Fisher ratio:

Since the actual value of the Fisher ratio is less than the critical one: , we conclude that the size of the enterprise division does not have a significant impact on the cost of production.

Or, what is the same, with a probability of 95% we accept the main hypothesis that the average cost of producing a unit of the same product in small, medium and large divisions of the enterprise does not differ significantly.

One-way ANOVA in MS Excel

One-way analysis of variance can be carried out using the MS Excel procedure One-way ANOVA. We use it to analyze data on the relationship between the type of raw materials used and the profit of the enterprise from example 1.

Service/Data Analysis and select an analysis tool One-way ANOVA.

In the window Input interval indicate the data area (in our case it is $A$2:$E$7). We indicate how the factor is grouped - by columns or by rows (in our case, by rows). If the first column contains the names of factor classes, mark the box Labels in the first column. In the window Alpha indicate the level of significance α = 0,05 .

The second table - Analysis of Variance - contains data on the values ​​for the factor between groups and within groups and totals. This is the sum of squared deviations (SS), the number of degrees of freedom (df), dispersion (MS). The last three columns contain the actual value of the Fisher ratio (F), p-level (P-value) and the critical value of the Fisher ratio (F crit).

MS F P-value F crit
0,58585 6,891119 0,000936 2,77285
0,085017

Since the actual value of the Fisher ratio (6.89) is greater than the critical one (2.77), with a probability of 95% we reject the null hypothesis about the equality of average productivity when using all types of raw materials, that is, we conclude that the type of raw materials used affects profit enterprises.

Two-factor analysis of variance without repetition: the essence of the method, formulas, example

Two-factor analysis of variance is used to check the possible dependence of the resulting characteristic on two factors - A And B. Then a- number of factor gradations A And b- number of factor gradations B. In the statistical complex, the sum of squared residuals is divided into three components:

SS = SS a + SS b+ SS e,

- total sum of squared deviations,

- explained by the influence of the factor A sum of squared deviations,

- explained by the influence of the factor B sum of squared deviations,

- general average of observations,

Average of observations in each factor gradation A ,

B .

A ,

Variance explained by the influence of the factor B ,

va = a − 1 A ,

vb = b − 1 - number of degrees of freedom of dispersion explained by the influence of the factor B ,

ve = ( a − 1)(b − 1)

v = ab− 1 - total number of degrees of freedom.

If the factors do not depend on each other, then to determine the significance of the factors, two null hypotheses and corresponding alternative hypotheses are put forward:

for factor A :

H0 : μ 1A = μ 2A = ... = μ aA,

H1 : not all μ iA equal;

for factor B :

H0 : μ 1B = μ 2B = ... = μ aB,

H1 : not all μ iB are equal.

A

To determine the influence of a factor B, you need to compare the actual Fischer attitude with the critical Fischer attitude.

α P = 1 − α .

α P = 1 − α .

Two-way ANOVA without repetitions: an example

Example 3. Information is given on average fuel consumption per 100 kilometers in liters depending on engine size and type of fuel.

It is necessary to check whether fuel consumption depends on engine size and type of fuel.

Solution. For factor A number of gradation classes a= 3, for factor B number of gradation classes b = 3 .

We calculate the sum of squared deviations:

,

,

,

.

Corresponding variances:

,

,

.

A . Since the actual Fisher ratio is less than the critical one, with a probability of 95% we accept the hypothesis that engine size does not affect fuel consumption. However, if we choose the significance level α = 0.1, then the actual value of the Fisher ratio and then with a probability of 95% we can accept that engine volume affects fuel consumption.

Fisher's actual ratio for the factor B , critical value of the Fisher ratio: . Since the actual Fisher ratio is greater than the critical value of the Fisher ratio, we accept with 95% probability that the type of fuel affects its consumption.

Two-way ANOVA without repetitions in MS Excel

Two-factor analysis of variance without repetitions can be carried out using the MS Excel procedure. We use it to analyze data on the relationship between the type of fuel and its consumption from example 3.

In the MS Excel menu, execute the command Service/Data Analysis and select an analysis tool Two-way ANOVA without repetitions.

We fill in the data in the same way as in the case of one-way analysis of variance.


As a result of the procedure, two tables are displayed. The first table is Totals. It contains data on all classes of factor gradation: number of observations, total value, mean value and variance.

The second table - Analysis of Variance - contains data on the sources of variation: dispersion between rows, dispersion between columns, error dispersion, total dispersion, sum of squared deviations (SS), degrees of freedom (df), dispersion (MS). The last three columns contain the actual value of the Fisher ratio (F), p-level (P-value) and the critical value of the Fisher ratio (F crit).

MS F P-value F crit
3,13 5,275281 0,075572 6,94476
8,043333 13,55618 0,016529 6,944276
0,593333

Factor A(engine displacement) is grouped in lines. Since the actual Fisher ratio of 5.28 is less than the critical one of 6.94, we accept with 95% probability that fuel consumption does not depend on engine size.

Factor B(type of fuel) is grouped in columns. The actual Fisher ratio of 13.56 is greater than the critical ratio of 6.94, so we accept with 95% probability that fuel consumption depends on its type.

Two-factor analysis of variance with repetitions: the essence of the method, formulas, example

Two-factor analysis of variance with repetitions is used to check not only the possible dependence of the resulting characteristic on two factors - A And B, but also the possible interaction of factors A And B. Then a- number of factor gradations A And b- number of factor gradations B, r- number of repetitions. In the statistical complex, the sum of squared residuals is divided into four components:

SS = SS a + SS b+ SS ab + SS e,

- total sum of squared deviations,

- explained by the influence of the factor A sum of squared deviations,

- explained by the influence of the factor B sum of squared deviations,

- explained by the influence of interaction of factors A And B sum of squared deviations,

- unexplained sum of squared deviations or sum of squared error deviations,

- general average of observations,

- average of observations in each factor gradation A ,

- average number of observations in each factor gradation B ,

Average number of observations in each combination of factor gradations A And B ,

n = abr- total number of observations.

Variances are calculated as follows:

Variance explained by the influence of the factor A ,

Variance explained by the influence of the factor B ,

- variance explained by the interaction of factors A And B ,

- unexplained variance or error variance,

va = a − 1 - number of degrees of freedom of dispersion explained by the influence of the factor A ,

vb = b − 1 - number of degrees of freedom of dispersion explained by the influence of the factor B ,

vab = ( a − 1)(b − 1) - the number of degrees of freedom of the variance explained by the interaction of factors A And B ,

ve = ab(r − 1) - number of degrees of freedom of unexplained variance or error variance,

v = abr− 1 - total number of degrees of freedom.

If the factors do not depend on each other, then to determine the significance of the factors, three null hypotheses and corresponding alternative hypotheses are put forward:

for factor A :

H0 : μ 1A = μ 2A = ... = μ aA,

H1 : not all μ iA equal;

for factor B :

To determine the influence of the interaction of factors A And B, you need to compare the actual Fischer attitude with the critical Fischer attitude.

If the actual Fisher ratio is greater than the critical Fisher ratio, then the null hypothesis should be rejected at the significance level α . This means that the factor significantly influences the data: the data depends on the factor with probability P = 1 − α .

If the actual Fisher ratio is less than the critical Fisher ratio, then the null hypothesis should be accepted at the significance level α . This means that the factor does not significantly influence the data with probability P = 1 − α .

Two-way ANOVA with repetitions: an example

about the interaction of factors A And B: Fisher's actual ratio is less than critical, therefore, the interaction of the advertising campaign and a specific store is not significant.

Two-way ANOVA with repetitions in MS Excel

Two-way analysis of variance with replicates can be performed using the MS Excel procedure. We use it to analyze data on the relationship between store income and the choice of a specific store and the advertising campaign from example 4.

In the MS Excel menu, execute the command Service/Data Analysis and select an analysis tool Two-way ANOVA with repetitions.

We fill in the data in the same way as in the case of two-factor analysis of variance without repetitions, with the addition that in the number of rows for sample window you need to enter the number of repetitions.

As a result of the procedure, two tables are displayed. The first table consists of three parts: the first two correspond to each of the two advertising campaigns, the third contains data about both advertising campaigns. The columns of the table contain information about all gradation classes of the second factor - store: number of observations, total value, mean value and dispersion.

The second table contains data on the sum of squared deviations (SS), the number of degrees of freedom (df), dispersion (MS), the actual value of the Fisher ratio (F), p-level (P-value) and the critical value of the Fisher ratio (F crit) for various sources of variation: two factors, which are given in rows (sample) and columns, interaction of factors, error (within) and total indicators (total).

MS F P-value F crit
8,013339 0,500252 0,492897 4,747221
189,1904 11,81066 0,001462 3,88529
6,925272 0,432327 0,658717 3,88529
16,01861

For factor B The actual Fisher ratio is greater than the critical ratio, therefore there is a 95% probability that revenues differ significantly between stores.

For the interaction of factors A And B Fisher's actual ratio is less than critical, therefore, with a probability of 95%, the interaction of the advertising campaign and a specific store is not significant.

Everything on the topic "Mathematical statistics"

This article discusses analysis of variance. The characteristic features of its application are analyzed, methods of variance analysis, and conditions for using variance analysis are provided. The need to use this method has been identified and justified. Based on the conducted research, the stages of classical analysis of variance are provided.

  • On the issue of ensuring quality control of cars after repairs in automotive service enterprises, taking into account the requirements of the certification system
  • Problems of implementing information technologies in logistics using the example of Russian organizations
  • Improving the efficiency of a wave generator plant
  • Educational and methodological manual “Earth-Moon System” in the Moodle distance learning system

The main purpose of analysis of variance is to examine the significance of differences between means. If you are simply comparing the means of two samples, analysis of variance will give the same result as ordinary analysis. t- test for independent samples (this is if two independent groups of objects or observations are compared) or t-test for dependent samples (this is if two variables are compared on the same set of objects or observations).

Analysis of variance has this name due to certain factors. It may seem strange that the procedure for comparing means is called analysis of variance. In reality, this is because when we examine the statistical significance of a difference between the means of two (or more) groups, we are actually comparing (i.e., analyzing) sample variances. The fundamental concept of analysis of variance was proposed by Fisher in 1920. Perhaps the more natural term would be analysis of sum of squares or analysis of variation, but due to tradition, the term analysis of variance is used.

Dispersion analysis is a method in mathematical statistics aimed at searching for dependencies in experimental data by examining the significance of differences in average values. Unlike the t-test, it allows you to compare the average values ​​of three or more groups. Developed by R. Fischer for analyzing the results of experimental studies. In the literature, the designation ANOVA is also found. ANalysis Of Variance).

When conducting market research, the question of comparability of results often arises. For example, when conducting surveys on the consumption of a product in different regions of the country, it is necessary to draw conclusions to what extent the survey data differ or do not differ from each other. It makes no sense to compare individual indicators, and therefore the comparison and subsequent assessment procedure is carried out using some averaged values ​​and deviations from this averaged assessment. Variation of the trait is studied. Dispersion can be taken as a measure of variation. Dispersion σ 2 is a measure of variation, defined as the average of the deviations of a characteristic squared.

In practice, problems of a more general nature often arise - the problem of checking the significance of differences in the averages of several sample populations. For example, it is necessary to evaluate the influence of various raw materials on the quality of products, to solve the problem of the influence of the amount of fertilizers on agricultural yields. products.

Sometimes analysis of variance is used to establish the homogeneity of several populations (the variances of these populations are the same by assumption; if the analysis of variance shows that the mathematical expectations are the same, then in this sense the populations are homogeneous). Homogeneous populations can be combined into one and thereby obtain more complete information about it, and therefore more reliable conclusions.

Analysis of Variance Methods

  1. Fisher method - F test; The method is used in one-way analysis of variance, when the total variance of all observed values ​​is decomposed into variance within individual groups and variance between groups.
  2. The "general linear model" method. It is based on correlation or regression analysis used in multivariate analysis.

The one-factor dispersion model has the form: x ij = μ + F j + ε ij ,
where x ij is the value of the variable under study obtained at the i-th level of the factor (i=1,2,...,t) with the j-th serial number (j=1,2,...,n); F i – effect caused by the influence of the i-th level of the factor; ε ij – random component, or disturbance caused by the influence of uncontrollable factors, i.e. variation within a particular level.

The simplest case of analysis of variance is univariate one-way analysis for two or more independent groups, when all groups are combined on one characteristic. During the analysis, the null hypothesis of equality of means is tested. When analyzing two groups, analysis of variance is identical to two-sample analysis t-Student's t-test for independent samples, and the value F-statistics is equal to the square of the corresponding t-statistics.

To confirm the equality of variances, the Lievene criterion is usually used ( Levene's test). If the hypothesis of equality of variances is rejected, the main analysis is not applicable. If the variances are equal, then to estimate the ratio of intergroup and intragroup variability, we use F- Fisher criterion. If F-statistics exceed the critical value, then the null hypothesis is rejected and a conclusion is made about the inequality of means. When analyzing the means of two groups, the results can be interpreted directly after applying the Fisher test.

Many factors. The world is complex and multidimensional in nature. Situations when a certain phenomenon is completely described by one variable are extremely rare. For example, if we are trying to learn how to grow large tomatoes, we should consider factors related to the plant's genetic structure, soil type, light, temperature, etc. Thus, when conducting a typical experiment, one has to deal with a large number of factors. The main reason why using ANOVA is preferable to repeated comparisons of two samples at different factor levels using series t- criterion is that analysis of variance is significantly more effective and, for small samples, more informative. You need to make some effort to master the ANOVA technique implemented in STATISTICA and experience its full benefits in specific studies.

The two-factor variance model has the form:

x ijk =μ+F i +G j +I ij +ε ijk ,

where x ijk is the observation value in cell ij with number k; μ - overall average; F i - effect caused by the influence of the i-th level of factor A; G j - effect caused by the influence of the j-th level of factor B; I ij - effect caused by the interaction of two factors, i.e. deviation from the observation average in cell ij from the sum of the first three terms in the model; ε ijk is a disturbance caused by the variation of a variable within a single cell. It is assumed that ε ijk has a normal distribution law N(0; c 2), and all mathematical expectations F *, G *, I i *, I * j are equal to zero.

There are conditions for using variance analysis:

  1. The objective of the study is to determine the strength of the influence of one (up to 3) factors on the result or to determine the strength of the combined influence of various factors (gender and age, physical activity and nutrition, etc.).
  2. The factors being studied must be independent (unrelated) to each other. For example, it is impossible to study the joint influence of work experience and age, height and weight of children, etc. on the morbidity of the population.
  3. The selection of groups for the study is carried out randomly (random selection). The organization of a dispersion complex with the implementation of the principle of randomness in the selection of options is called randomization (translated from English - random), i.e. chosen at random.
  4. Both quantitative and qualitative (attributive) characteristics can be used.

When conducting one-way analysis of variance, it is recommended (a necessary condition for use):

  1. Normality of distribution of analyzed groups or correspondence of sample groups to general populations with normal distribution.
  2. Independence (not relatedness) of the distribution of observations in groups.
  3. Availability of frequency (repetition) of observations.

The normality of the distribution is determined by the Gauss curve (De Mavoor), which can be described by the function y = f (x), since it is one of the distribution laws used to approximate the description of phenomena that are random, probabilistic in nature. The subject of biomedical research is probabilistic phenomena; normal distribution is found quite often in such research.

Classical analysis of variance is carried out in the following stages:

  1. Construction of a dispersion complex.
  2. Calculation of average squared deviations.
  3. Calculation of variance.
  4. Comparison of factor and residual variances.
  5. Evaluation of results using theoretical values ​​of the Fisher-Snedecor distribution
  6. Modern applications of analysis of variance cover a wide range of problems in economics, biology and technology and are usually interpreted in terms of the statistical theory of identifying systematic differences between the results of direct measurements made under certain changing conditions.
  7. Thanks to the automation of variance analysis, a researcher can conduct various statistical studies using a computer, while spending less time and effort on data calculations. Currently, there are many application software packages that implement the dispersion analysis apparatus. The most common software products are: MS Excel, Statistica; Stadia; SPSS.

Most statistical methods are implemented in modern statistical software products. With the development of algorithmic programming languages, it became possible to create additional blocks for processing statistical data.

Analysis of variance is a powerful modern statistical method for processing and analyzing experimental data in psychology, biology, medicine and other sciences. It is very closely related to the specific methodology for designing and conducting experimental research.

Analysis of variance is used in all areas of scientific research where it is necessary to analyze the influence of various factors on the variable under study.

References

  1. Ableeva, A. M. Formation of a fund of assessment tools in the conditions of the Federal State Educational Standard [Text] / A. M. Ableeva, G. A. Salimova // Current problems of teaching social, humanitarian, natural science and technical disciplines in the context of modernization of higher education: materials international scientific and methodological conference, April 4-5, 2014 / Bashkir State Agrarian University, Faculty of Information Technologies and Management. - Ufa, 2014. - pp. 11-14.
  2. Ganieva, A.M. Statistical analysis of employment and unemployment [Text] / A.M. Ganieva, T.N. Lubova // Current issues of economic-statistical research and information technologies: collection of articles. scientific Art.: dedicated to the 40th anniversary of the creation of the department of “Statistics and Information Systems in Economics” / Bashkir State Agrarian University. - Ufa, 2011. - pp. 315-316.
  3. Ismagilov, R. R. Creative group - an effective form of organizing scientific research in higher education [Text] / R. R. Ismagilov, M. Kh. Urazlin, D. R. Islamgulov // Scientific, technical and scientific-educational complexes of the region: problems and development prospects: materials of a scientific-practical conference / Academy of Sciences of the Republic of Belarus, UGATU. - Ufa, 1999. - pp. 105-106.
  4. Islamgulov, D.R. Competence-based approach to teaching: assessing the quality of education [Text] / D.R. Islamgulov, T.N. Lubova, I.R. Islamgulova // Modern scientific bulletin. – 2015. – T. 7. – No. 1. – P. 62-69.
  5. Islamgulov, D. R. Research work of students is the most important element of training specialists in an agricultural university [Text] / D. R. Islamgulov // Problems of practical training of students at a university at the present stage and ways to solve them: collection. materials scientific-method. Conf., April 24, 2007 / Bashkir State Agrarian University. - Ufa, 2007. - pp. 20-22.
  6. Lubova, T.N. The basis for the implementation of the federal state educational standard is the competency-based approach [Text] / T.N. Lubova, D.R. Islamgulov, I.R. Islamgulova// BIG RESEARCH - 2016: Materials for the XII International Scientific and Practical Conference, February 15-22, 2016. - Sofia: Byal GRAD-BG OOD, 2016. - Volume 4 Pedagogical sciences. – pp. 80-85.
  7. Lubova, T.N. New educational standards: implementation features [Text] / T.N. Lubova, D.R. Islamgulov // Modern scientific bulletin. – 2015. – T. 7. – No. 1. – P. 79-84.
  8. Lubova, T.N. Organization of independent work of students [Text] / T.N. Lubova, D.R. Islamgulov // Implementation of educational programs of higher education within the framework of the Federal State Educational Standard of Higher Education: materials of the All-Russian scientific and methodological conference within the framework of the visiting meeting of the National Medical Council on environmental management and water use of the Federal Educational Institution in the higher education system. / Bashkir State Agrarian University. - Ufa, 2016. - pp. 214-219.
  9. Lubova, T.N. The basis for the implementation of the federal state educational standard is the competency-based approach [Text] / T.N. Lubova, D.R. Islamgulov, I.R. Islamgulova // Modern scientific bulletin. – 2015. – T. 7. – No. 1. – P. 85-93.
  10. Saubanova, L.M. Demographic load level [Text] / L.M. Saubanova, T.N. Lubova // Current issues of economic-statistical research and information technologies: collection of articles. scientific Art.: dedicated to the 40th anniversary of the creation of the department of “Statistics and Information Systems in Economics” / Bashkir State Agrarian University. - Ufa, 2011. - P. 321-322.
  11. Fakhrullina, A.R. Statistical analysis of inflation in Russia [Text] / A.R. Fakhrullina, T.N. Lubova // Current issues of economic-statistical research and information technologies: collection of articles. scientific Art.: dedicated to the 40th anniversary of the creation of the department of “Statistics and Information Systems in Economics” / Bashkir State Agrarian University. - Ufa, 2011. - pp. 323-324.
  12. Farkhutdinova, A.T. Labor market in the Republic of Bashkortostan in 2012 [Electronic resource] / A.T. Farkhutdinova, T.N. Lubova // Student scientific forum. Materials of the V International Student Electronic Scientific Conference: electronic scientific conference (electronic collection). Russian Academy of Natural Sciences. 2013.

Analysis of variance

1. Concept of analysis of variance

Analysis of variance is an analysis of the variability of a trait under the influence of any controlled variable factors. In foreign literature, analysis of variance is often referred to as ANOVA, which is translated as analysis of variability (Analysis of Variance).

ANOVA problem consists in isolating variability of a different kind from the general variability of a trait:

a) variability due to the action of each of the independent variables under study;

b) variability due to the interaction of the independent variables being studied;

c) random variability due to all other unknown variables.

Variability due to the action of the variables under study and their interaction is correlated with random variability. An indicator of this relationship is Fisher's F test.

The formula for calculating the F criterion includes estimates of variances, that is, the distribution parameters of the characteristic, therefore the F criterion is a parametric criterion.

The more the variability of a trait is due to the variables (factors) under study or their interaction, the higher empirical criterion values.

Zero the hypothesis in the analysis of variance will state that the average values ​​of the studied effective characteristic are the same in all gradations.

Alternative the hypothesis will state that the average values ​​of the resulting characteristic in different gradations of the factor under study are different.

Analysis of variance allows us to state a change in a characteristic, but does not indicate direction these changes.

Let's begin our consideration of variance analysis with the simplest case, when we study the action of only one variable (one factor).

2. One-way analysis of variance for unrelated samples

2.1. Purpose of the method

The method of one-factor analysis of variance is used in cases where changes in an effective characteristic are studied under the influence of changing conditions or gradations of a factor. In this version of the method, the influence of each of the gradations of the factor is different samples of subjects. There must be at least three gradations of the factor. (There may be two gradations, but in this case we will not be able to establish nonlinear dependencies and it seems more reasonable to use simpler ones).

A nonparametric version of this type of analysis is the Kruskal-Wallis H test.

Hypotheses

H 0: Differences between factor grades (different conditions) are no greater than random differences within each group.

H 1: Differences between factor grades (different conditions) are greater than random differences within each group.

2.2. Limitations of One-Way Analysis of Variance for Unrelated Samples

1. One-way analysis of variance requires at least three gradations of the factor and at least two subjects in each gradation.

2. The resulting characteristic must be normally distributed in the sample under study.

True, it is usually not indicated whether we are talking about the distribution of the characteristic in the entire surveyed sample or in that part of it that makes up the dispersion complex.

3. An example of solving a problem using the method of one-way analysis of variance for unrelated samples using the example:

Three different groups of six subjects were given lists of ten words. The words were presented to the first group at a low speed - 1 word per 5 seconds, to the second group at an average speed - 1 word per 2 seconds, and to the third group at a high speed - 1 word per second. Reproduction performance was predicted to depend on the speed of word presentation. The results are presented in Table. 1.

Number of words reproduced Table 1

Subject No.

low speed

average speed

high speed

Total amount

H 0: Differences in word production span between groups are no more pronounced than random differences inside each group.

H1: Differences in word production volume between groups are more pronounced than random differences inside each group. Using the experimental values ​​presented in Table. 1, we will establish some values ​​that will be necessary to calculate the F criterion.

The calculation of the main quantities for one-way analysis of variance is presented in the table:

Table 2

Table 3

Sequence of operations in one-way analysis of variance for unrelated samples

Often found in this and subsequent tables, the designation SS is an abbreviation for “sum of squares.” This abbreviation is most often used in translated sources.

SS fact means the variability of the characteristic due to the action of the factor under study;

SS generally- general variability of the trait;

S C.A.-variability due to unaccounted factors, “random” or “residual” variability.

MS- “mean square”, or the mathematical expectation of the sum of squares, the average value of the corresponding SS.

df - the number of degrees of freedom, which, when considering nonparametric criteria, we denoted by a Greek letter v.

Conclusion: H 0 is rejected. H 1 is accepted. Differences in word recall between groups were greater than random differences within each group (α=0.05). So, the speed of presentation of words affects the volume of their reproduction.

An example of solving the problem in Excel is presented below:

Initial data:

Using the command: Tools->Data Analysis->One-way ANOVA, we get the following results:

One-way analysis of variance.

Concept and models of variance analysis.

Topic 13. Analysis of variance

Lecture 1. Questions:

Analysis of variance, as a research method, appeared in the works of R. Fischer (1918-1935) in connection with research in agriculture to identify the conditions under which the tested variety of agricultural crop produces the maximum yield. Analysis of variance was further developed in the works of Yeats. Analysis of variance allows us to answer the question of whether certain factors have a significant influence on the variability of a factor, the values ​​of which can be obtained as a result of experience. When testing statistical hypotheses, random variations in the factors being studied are assumed. In analysis of variance, one or more factors are changed in a given way, and these changes can affect the results of observations. The study of such influence is the purpose of analysis of variance.

Currently, there is an increasingly widespread use of variance analysis in economics, sociology, biology, etc., especially after the advent of software that eliminated the problems of the cumbersomeness of statistical calculations.

In practical activities, in various fields of science, we are often faced with the need to evaluate the influence of various factors on certain indicators. Often these factors are of a qualitative nature (for example, a qualitative factor influencing the economic effect may be the introduction of a new production management system) and then analysis of variance acquires particular value, since it becomes the only statistical method of research that gives such an assessment.

Analysis of variance makes it possible to determine whether one or another of the factors under consideration has a significant impact on the variability of a trait, as well as to quantify the “specific weight” of each source of variability in their totality. But analysis of variance allows us to give a positive answer only about the presence of a significant influence, otherwise the question remains open and requires additional research (most often, an increase in the number of experiments).

The following terms are used in analysis of variance.

Factor (X) is something that we believe should influence the result (resultative attribute) Y.

Factor level (or processing method, sometimes literally, for example, soil cultivation method) - values ​​(X, i = 1.2,...I) that the factor can take.

Response – the value of the measured characteristic (result value Y).

The ANOVA technique varies depending on the number of independent factors being studied. If the factors causing variability in the average value of a characteristic belong to one source, then we have a simple grouping, or one-factor analysis of variance and then, accordingly, a double grouping - two-factor analysis of variance, three-factor analysis of variance, ..., m-factor. Factors in multivariate analysis are usually denoted by Latin letters: A, B, C, etc.



The task of variance analysis is to study the influence of certain factors (or levels of factors) on the variability of the average values ​​of observed random variables.

The essence of variance analysis. Analysis of variance consists of isolating and assessing individual factors that cause variability. For this purpose, the total variance of the observed partial population (total variance of the trait), caused by all sources of variability, is decomposed into variance components generated by independent factors. Each of these components provides an estimate of the variance , ,..., caused by a particular source of variability, in the overall population. To test the significance of these component variance estimates, they are compared with the total variance in the population (Fisher's test).

For example, in two-factor analysis we get a decomposition of the form:

Total variance of the studied trait C;

The share of variance caused by the influence of factor A;

The share of variance caused by the influence of factor B;

The proportion of variance caused by the interaction of factors A and B;

The share of variance caused by unaccounted random causes (random variance);

In analysis of variance, the hypothesis is considered: H 0 - none of the factors under consideration has an effect on the variability of the trait. The significance of each variance estimate is checked by the value of its ratio to the random variance estimate and compared with the corresponding critical value, at significance level a, using tables of critical values ​​of the Fisher-Snedecor F distribution (Appendix 4). Hypothesis H 0 regarding one or another source of variability is rejected if F calculated. >F cr. (for example, for factor B: S B 2 /S ε 2 >F cr.).

Variance analysis considers experiments of 3 types:

a) experiments in which all factors have systematic (fixed) levels;

b) experiments in which all factors have random levels;

c) experiments in which there are factors that have random levels, as well as factors that have fixed levels.

Cases a), b), c) correspond to three models that are considered in analysis of variance.

The initial data for analysis of variance is usually presented in the form of the following table:

Observation number j Factor levels
A 1 A 2 A r
X 11 X 21 X p1
X 12 X 22 Xp2
X 13 X 23 X p3
. . .
. . .
. . .
n X 1n X2n Xpn
RESULTS

Consider a unit factor that takes p different levels, and assume that at each level n observations are made, giving N=np observations. (We will limit ourselves to considering the first model of variance analysis - all factors have fixed levels.)

Let the results be presented in the form X ij (i=1,2…,р; j=1,2,…,n).

It is assumed that for each level of n observations there is an average, which is equal to the sum of the overall average and its variation due to the selected level:

where m is the overall average;

A i - effect caused by the i – m level of the factor;

e ij – variation of results within an individual factor level. The term e ij takes into account all uncontrollable factors.

Let observations at a fixed factor level be normally distributed around the mean m + A i with a common variance s 2 .

Then (the dot instead of the index denotes the averaging of the corresponding observations over this index):

A.X ij – X.. = (X i . – X..) + (X ij – X i .). (12.3)

After squaring both sides of the equation and summing over i and j, we get:

since, but

Otherwise, the sum of squares can be written: S = S 1 + S 2. The value of S 1 is calculated from the deviations of p averages from the overall average X.., therefore S 1 has (p-1) degrees of freedom. The value of S 2 is calculated from the deviations of N observations from p sample means and, therefore, has N-р = np - p=p(n-1) degrees of freedom. S has (N-1) degrees of freedom. Based on the calculation results, a variance analysis table is constructed.

ANOVA table

If the hypothesis that the influence of all levels is equal is true, then both M 1 and M 2 (mean squares) will be unbiased estimates of s 2. This means that the hypothesis can be tested by calculating the ratio (M 1 / M 2) and comparing it with F cr. with ν 1 = (p-1) and ν 2 = (N-p) degrees of freedom.

If F calculated >F cr. , then the hypothesis about the insignificant influence of factor A on the result of observations is not accepted.

To assess the significance of differences at F calc. F table calculate:

a) experimental error

b) error of the difference of means

c) the smallest significant difference

Comparing the difference in average values ​​for the options with the NSR, they conclude that the differences in the level of averages are significant.

Comment. The use of analysis of variance assumes that:

2) D(ε ij)=σ 2 = const,

3) ε ij → N (0, σ) or x ij → N (a, σ).

Analysis of variance

Coursework in the discipline: “Systems analysis”

Performer student gr. 99 ISE-2 Zhbanov V.V.

Orenburg State University

Faculty of Information Technology

Department of Applied Informatics

Orenburg-2003

Introduction

Purpose of the work: to get acquainted with such a statistical method as analysis of variance.

Dispersion analysis (from the Latin Dispersio - dispersion) is a statistical method that allows you to analyze the influence of various factors on the variable under study. The method was developed by biologist R. Fischer in 1925 and was originally used to evaluate experiments in crop production. Subsequently, the general scientific significance of analysis of variance for experiments in psychology, pedagogy, medicine, etc. became clear.

The purpose of analysis of variance is to test the significance of differences between means by comparing variances. The variance of the measured characteristic is decomposed into independent terms, each of which characterizes the influence of a particular factor or their interaction. Subsequent comparison of such terms allows us to assess the significance of each factor under study, as well as their combination /1/.

If the null hypothesis (that the means are equal in several groups of observations selected from the population) is true, the estimate of the variance associated with within-group variability should be close to the estimate of between-group variance.

When conducting market research, the question of comparability of results often arises. For example, when conducting surveys on the consumption of a product in different regions of the country, it is necessary to draw conclusions to what extent the survey data differ or do not differ from each other. It makes no sense to compare individual indicators, and therefore the comparison and subsequent assessment procedure is carried out using some averaged values ​​and deviations from this averaged assessment. Variation of the trait is studied. Dispersion can be taken as a measure of variation. Dispersion σ 2 is a measure of variation, defined as the average of the deviations of a characteristic squared.

In practice, problems of a more general nature often arise - the problem of checking the significance of differences in the averages of several sample populations. For example, it is necessary to assess the influence of various raw materials on the quality of manufactured products, to solve the problem of the influence of the amount of fertilizers on the yield of agricultural products.

Sometimes analysis of variance is used to establish the homogeneity of several populations (the variances of these populations are the same by assumption; if the analysis of variance shows that the mathematical expectations are the same, then in this sense the populations are homogeneous). Homogeneous populations can be combined into one and thereby obtain more complete information about it, and therefore more reliable conclusions /2/.

1 Analysis of variance

1.1 Basic concepts of analysis of variance

In the process of observing the object under study, qualitative factors change arbitrarily or in a given way. The specific implementation of a factor (for example, a certain temperature regime, selected equipment or material) is called the factor level or processing method. An analysis of variance model with fixed levels of factors is called model I, a model with random factors is called model II. By varying the factor, it is possible to study its influence on the magnitude of the response. Currently, the general theory of analysis of variance has been developed for models I.

Depending on the number of factors that determine the variation of the resulting characteristic, analysis of variance is divided into single-factor and multifactor.

The main schemes for organizing source data with two or more factors are:

Cross-classification, characteristic of models I, in which each level of one factor is combined when planning an experiment with each gradation of another factor;

Hierarchical (cluster) classification, characteristic of model II, in which each random, randomly selected value of one factor corresponds to its own subset of values ​​of the second factor.

If the dependence of the response on qualitative and quantitative factors is simultaneously studied, i.e. factors of a mixed nature, then covariance analysis is used /3/.

Thus, these models differ in the way they select factor levels, which obviously primarily affects the possibility of generalizing the experimental results obtained. For ANOVA of single-factor experiments, the difference between these two models is not so significant, but in multivariate ANOVA it can be quite important.

When conducting analysis of variance, the following statistical assumptions must be met: regardless of the level of the factor, the response values ​​have a normal (Gaussian) distribution law and the same variance. This equality of variances is called homogeneity. Thus, a change in the processing method affects only the position of the random response variable, which is characterized by the average value or median. Therefore, all response observations belong to the shift family of normal distributions.

The ANOVA technique is said to be “robust.” This term, used by statisticians, means that given assumptions may be violated to some extent, but the technique can still be used.

When the law of distribution of response values ​​is unknown, nonparametric (most often rank) analysis methods are used.

Analysis of variance is based on dividing variance into parts or components. The variation due to the influence of the factor underlying the grouping is characterized by intergroup dispersion σ 2 . It is a measure of the variation of partial averages across groups

around the general average and is determined by the formula: ,

where k is the number of groups;

n j - number of units in the j-th group;

- partial average for the j-th group; - overall average for a set of units.

The variation due to the influence of other factors is characterized in each group by the intragroup variance σ j 2 .

.

Between the total variance σ 0 2 , within-group variance σ 2 and between-group variance

1.2 One-way analysis of variance

The one-factor variance model has the form:

x ij = μ + F j + ε ij , (1)

where x ij is the value of the variable under study, obtained at the i-th level of the factor (i=1,2,...,t) with the j-th serial number (j=1,2,...,n);

F i – effect caused by the influence of the i-th level of the factor;

ε ij – random component, or disturbance caused by the influence of uncontrollable factors, i.e. variation within a particular level.

Basic prerequisites for analysis of variance:

The mathematical expectation of the disturbance ε ij is equal to zero for any i, i.e.

M(ε ij) = 0; (2)

The disturbances ε ij are mutually independent;

The variance of the variable x ij (or disturbance ε ij) is constant for

any i, j, i.e.

D(ε ij) = σ 2 ; (3)

The variable x ij (or the disturbance ε ij) has a normal law

distribution N(0;σ 2).

The influence of factor levels can be either fixed or systematic (Model I) or random (Model II).

Suppose, for example, it is necessary to find out whether there are significant differences between batches of products in terms of some quality indicator, i.e. check the influence on the quality of one factor - a batch of products. If we include all batches of raw materials in the study, then the influence of the level of such a factor is systematic (model I), and the conclusions obtained are applicable only to those individual batches that were involved in the study. If we include only a randomly selected part of the parties, then the influence of the factor is random (model II). In multifactor complexes, a mixed model III is possible, in which some factors have random levels, while others have fixed levels.



Did you like the article? Share with your friends!