Examples of calculating variance. Calculation of group, intergroup and total variance (according to the rule of adding variances)

Network topology refers to the physical or electrical configuration of the network's cabling and connections.

In describing the topology of networks, several specialized terms are used: network node - a computer or network switching device; network branch - a path connecting two adjacent nodes; terminal node - a node located at the end of only one branch; intermediate node - a node located at the ends of more than one branch; adjacent nodes are nodes connected by at least one path that does not contain any other nodes.

There are only 5 main types of network topologies:

1. “Shared Bus” topology. In this case, connection and data exchange is carried out via common channel communication, called a shared bus: A shared bus is a very common topology for local area networks. The transmitted information can be distributed in both directions. The use of a common bus reduces wiring costs and unifies the connection of various modules. The main advantages of this scheme are the low cost and ease of cable distribution throughout the premises. The most serious disadvantage of the common bus is its low reliability: any defect in the cable or any of the numerous connectors completely paralyzes the entire network. Another disadvantage of the shared bus is its low performance, since with this connection method only one computer at a time can transmit data to the network. Therefore, the communication channel bandwidth is always divided here between all network nodes.

2. Star topology. In this case, each computer is connected with a separate cable to general device, called a hub, which is located at the center of the network:

The function of a hub is to direct information transmitted by a computer to one or all other computers on the network. The main advantage of this topology over a common bus is greater reliability. Any problems with the cable affect only the computer to which this cable is connected, and only a malfunction of the hub can bring down the entire network. In addition, the hub can play the role of an intelligent filter of information coming from nodes on the network and, if necessary, block transmissions prohibited by the administrator. The disadvantages of a star topology include the higher cost of network equipment due to the need to purchase a hub. In addition, the ability to increase the number of nodes in the network is limited by the number of hub ports. Currently, a hierarchical star is the most common type of connection topology in both local and global networks.

3. “Ring” topology. In networks with a ring topology, data in the network is transmitted sequentially from one station to another along the ring, usually in one direction:

If the computer recognizes the data as intended for it, then it copies it to its internal buffer. In a network with a ring topology, it is necessary to take special measures so that in the event of a failure or disconnection of any station, the communication channel between the remaining stations is not interrupted. The advantage of this topology is ease of management, the disadvantage is the possibility of failure of the entire network if there is a failure in the channel between two nodes.

4. Mesh topology. The mesh topology is characterized by a computer connection scheme in which physical communication lines are established with all nearby computers:

In a network with a mesh topology, only those computers between which intensive data exchange occurs are directly connected, and for data exchange between computers that are not directly connected, transit transmissions through intermediate nodes are used. The mesh topology allows the connection of a large number of computers and is typically characteristic of global networks. The advantages of this topology are its resistance to failures and overloads, because There are several ways to bypass individual nodes.

5. Mixed topology. While small networks typically have a typical star, ring, or bus topology, large networks typically have random connections between computers. In such networks, individual subnetworks with a typical topology can be identified, which is why they are called networks with mixed topology.

Dispersion random variable is a measure of the spread of values of this quantity. Low variance means that the values are clustered close together. Large dispersion indicates a strong spread of values. The concept of variance of a random variable is used in statistics. For example, if you compare the variance of two values (such as between male and female patients), you can test the significance of a variable. Variance is also used when building statistical models, since low variance can be a sign that you are overfitting the values.

Steps

Calculating sample variance

Record the sample values. In most cases, statisticians only have access to samples of specific populations. For example, as a rule, statisticians do not analyze the costs of maintaining the aggregate of all cars in Russia - they analyze random sample from several thousand cars. Such a sample will help determine the average cost of a car, but, most likely, the resulting value will be far from the real one.
- For example, let’s analyze the number of buns sold in a cafe over 6 days, taken in random order. The sample looks like this: 17, 15, 23, 7, 9, 13. This is a sample, not a population, because we do not have data on buns sold for each day the cafe is open.
- If you are given a population rather than a sample of values, continue to the next section.
Write down a formula to calculate sample variance. Dispersion is a measure of the spread of values of a certain quantity. How closer value dispersion to zero, the closer the values are grouped to each other. When working with value selection, use the following formula to calculate variance:
- s 2 (\displaystyle s^(2)) = ∑[(x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))] / (n - 1)
- s 2 (\displaystyle s^(2))– this is dispersion. Dispersion is measured in square units measurements.
- x i (\displaystyle x_(i))– each value in the sample.
- x i (\displaystyle x_(i)) you need to subtract x̅, square it, and then add the results.
- x̅ – sample mean (sample mean).
- n – number of values in the sample.
Calculate the average samples. It is denoted as x̅. The sample mean is calculated as a simple arithmetic mean: add up all the values in the sample, and then divide the result by the number of values in the sample.
- In our example, add the values in the sample: 15 + 17 + 23 + 7 + 9 + 13 = 84
  Now divide the result by the number of values in the sample (in our example there are 6): 84 ÷ 6 = 14.
  Sample mean x̅ = 14.
- The sample mean is the central value around which the values in the sample are distributed. If the values in the sample cluster around the sample mean, then the variance is small; otherwise the variance is large.
Subtract the sample mean from each value in the sample. Now calculate the difference x i (\displaystyle x_(i))- x̅, where x i (\displaystyle x_(i))– each value in the sample. Each result obtained indicates the degree of deviation of a particular value from the sample mean, that is, how far this value is from the sample mean.
- In our example:
  x 1 (\displaystyle x_(1))- x̅ = 17 - 14 = 3
  x 2 (\displaystyle x_(2))- x̅ = 15 - 14 = 1
  x 3 (\displaystyle x_(3))- x = 23 - 14 = 9
  x 4 (\displaystyle x_(4))- x̅ = 7 - 14 = -7
  x 5 (\displaystyle x_(5))- x̅ = 9 - 14 = -5
  x 6 (\displaystyle x_(6))- x̅ = 13 - 14 = -1
- The correctness of the results obtained is easy to check, since their sum should be equal to zero. This is related to the determination of the average value, since negative values(distances from the average to lower values) are fully compensated positive values(distances from average to large values).
As noted above, the sum of the differences x i (\displaystyle x_(i))- x̅ must be equal to zero. This means that average variance is always equal to zero, which does not give any idea about the spread of values of a certain quantity. To solve this problem, square each difference x i (\displaystyle x_(i))- x̅. This will result in you only getting positive numbers, which will never add up to 0.
- In our example:
  (x 1 (\displaystyle x_(1))- x̅) 2 = 3 2 = 9 (\displaystyle ^(2)=3^(2)=9)
  (x 2 (\displaystyle (x_(2))- x̅) 2 = 1 2 = 1 (\displaystyle ^(2)=1^(2)=1)
  9 2 = 81
  (-7) 2 = 49
  (-5) 2 = 25
  (-1) 2 = 1
- You found the square of the difference - x̅) 2 (\displaystyle ^(2)) for each value in the sample.
Calculate the sum of the squares of the differences. That is, find that part of the formula that is written like this: ∑[( x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))]. Here the sign Σ means the sum of squared differences for each value x i (\displaystyle x_(i)) in the sample. You have already found the squared differences (x i (\displaystyle (x_(i))- x̅) 2 (\displaystyle ^(2)) for each value x i (\displaystyle x_(i)) in the sample; now just add these squares.
- In our example: 9 + 1 + 81 + 49 + 25 + 1 = 166 .
Divide the result by n - 1, where n is the number of values in the sample. Some time ago, to calculate sample variance, statisticians simply divided the result by n; in this case you will get the mean of the squared variance, which is ideal for describing the variance of a given sample. But remember that any sample is only a small part population values. If you take another sample and perform the same calculations, you will get a different result. As it turns out, dividing by n - 1 (rather than just n) gives a more accurate estimate of the population variance, which is what you're interested in. Division by n – 1 has become common, so it is included in the formula for calculating sample variance.
- In our example, the sample includes 6 values, that is, n = 6.
  Sample variance = s 2 = 166 6 − 1 = (\displaystyle s^(2)=(\frac (166)(6-1))=) 33,2
The difference between variance and standard deviation. Note that the formula contains an exponent, so the dispersion is measured in square units of the value being analyzed. Sometimes such a magnitude is quite difficult to operate; in such cases, use the standard deviation, which is equal to square root from dispersion. That is why the sample variance is denoted as s 2 (\displaystyle s^(2)), A standard deviation samples - how s (\displaystyle s).
- In our example, the standard deviation of the sample is: s = √33.2 = 5.76.
Calculating Population Variance
1. Analyze some set of values. The set includes all values of the quantity under consideration. For example, if you are studying the age of residents Leningrad region, then the population includes the ages of all residents of this area. When working with a population, it is recommended create table and add the values of the totality into it. Consider the following example:
  - In a certain room there are 6 aquariums. Each aquarium contains the following number of fish:
    x 1 = 5 (\displaystyle x_(1)=5)
    x 2 = 5 (\displaystyle x_(2)=5)
    x 3 = 8 (\displaystyle x_(3)=8)
    x 4 = 12 (\displaystyle x_(4)=12)
    x 5 = 15 (\displaystyle x_(5)=15)
    x 6 = 18 (\displaystyle x_(6)=18)
2. Write down a formula to calculate the population variance. Since the totality includes all values of a certain quantity, the formula below allows us to obtain exact value population variances. To distinguish population variance from sample variance (which is only an estimate), statisticians use various variables:
  - σ 2 (\displaystyle ^(2)) = (∑(x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)))/n
  - σ 2 (\displaystyle ^(2))– population dispersion (read as “sigma squared”). Dispersion is measured in square units.
  - x i (\displaystyle x_(i))– each value in its entirety.
  - Σ – sum sign. That is, from each value x i (\displaystyle x_(i)) you need to subtract μ, square it, and then add the results.
  - μ – population mean.
  - n – number of values in the population.
3. Calculate the population mean. When working with a population, its mean is denoted as μ (mu). The population mean is calculated as a simple arithmetic mean: add up all the values in the population, and then divide the result by the number of values in the population.
  - Keep in mind that averages are not always calculated as the arithmetic mean.
  - In our example, the population mean: μ = 5 + 5 + 8 + 12 + 15 + 18 6 (\displaystyle (\frac (5+5+8+12+15+18)(6))) = 10,5
4. Subtract the population mean from each value in the population. The closer the difference value is to zero, the closer specific meaning to the population mean. Find the difference between each value in the population and its mean, and you will get a first idea of the distribution of values.
  - In our example:
    x 1 (\displaystyle x_(1))- μ = 5 - 10.5 = -5.5
    x 2 (\displaystyle x_(2))- μ = 5 - 10.5 = -5.5
    x 3 (\displaystyle x_(3))- μ = 8 - 10.5 = -2.5
    x 4 (\displaystyle x_(4))- μ = 12 - 10.5 = 1.5
    x 5 (\displaystyle x_(5))- μ = 15 - 10.5 = 4.5
    x 6 (\displaystyle x_(6))- μ = 18 - 10.5 = 7.5
5. Square each result obtained. The difference values will be both positive and negative; If these values are plotted on a number line, they will lie to the right and left of the population mean. This is not suitable for calculating variance, since positive and negative numbers compensate each other. So square each difference to get exclusively positive numbers.
  - In our example:
    (x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)) for each population value (from i = 1 to i = 6):
    (-5,5)2 (\displaystyle ^(2)) = 30,25
    (-5,5)2 (\displaystyle ^(2)), Where x n (\displaystyle x_(n)) – last value in the general population.
  - To calculate the average value of the results obtained, you need to find their sum and divide it by n:(( x 1 (\displaystyle x_(1)) - μ) 2 (\displaystyle ^(2)) + (x 2 (\displaystyle x_(2)) - μ) 2 (\displaystyle ^(2)) + ... + (x n (\displaystyle x_(n)) - μ) 2 (\displaystyle ^(2)))/n
  - Now let's write down the above explanation using variables: (∑( x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2))) / n and get a formula for calculating the population variance.

Conversely, if is a non-negative a.e. function such that , then there is an absolutely continuous probability measure on such that it is its density.

Replacing the measure in the Lebesgue integral:

where is any Borel function that is integrable with respect to the probability measure.

Dispersion, types and properties of dispersion The concept of dispersion

Dispersion in statistics is found as the standard deviation of the individual values of the characteristic squared from the arithmetic mean. Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. Simple variance(for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Definition of group, average of group, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 students correspondence department. Need to build interval series distribution of a characteristic, calculate the average value of the characteristic and study its variance

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max– maximum value grouping feature; X min – minimum value of the grouping characteristic; n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Variance in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval; A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency; m1 is the square of the first order moment; m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting in this formula variance q =1- p, we get:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations individual values characteristic x from the overall average value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average; ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of within-group variances reflects random variation, that is, that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Intergroup variance characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-sign, which forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula: