How to find the frequencies of an interval series. Grouping data and constructing a distribution series

They are presented in the form of distribution series and are presented in the form.

A distribution series is one of the types of groupings.

Distribution range— represents an ordered distribution of units of the population being studied into groups according to a certain varying characteristic.

Depending on the characteristic underlying the formation of the distribution series, they are distinguished attributive and variational distribution rows:

  • Attributive- are called distribution series constructed according to qualitative characteristics.
  • Distribution series constructed in ascending or descending order of values ​​of a quantitative characteristic are called variational.
The variation series of the distribution consists of two columns:

The first column provides quantitative values ​​of the varying characteristic, which are called options and are designated . Discrete option - expressed as an integer. The interval option ranges from and to. Depending on the type of options, you can construct a discrete or interval variation series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers that show how many times a given value of a feature occurs in total, which denote . The sum of all frequencies must be equal to the number of units in the entire population.

Frequencies() are frequencies expressed as a percentage of the total. The sum of all frequencies expressed as percentages must be equal to 100% in fractions of one.

Graphic representation of distribution series

The distribution series are visually presented using graphical images.

The distribution series are depicted as:
  • Polygon
  • Histograms
  • Cumulates
  • Ogives

Polygon

When constructing a polygon, the values ​​of the varying characteristic are plotted on the horizontal axis (x-axis), and frequencies or frequencies are plotted on the vertical axis (y-axis).

The polygon in Fig. 6.1 is based on data from the micro-census of the population of Russia in 1994.

6.1. Household size distribution

Condition: Data is provided on the distribution of 25 employees of one of the enterprises according to tariff categories:
4; 2; 4; 6; 5; 6; 4; 1; 3; 1; 2; 5; 2; 6; 3; 1; 2; 3; 4; 5; 4; 6; 2; 3; 4
Task: Construct a discrete variation series and depict it graphically as a distribution polygon.
Solution:
In this example, the options are the employee's pay grade. To determine frequencies, it is necessary to calculate the number of employees with the corresponding tariff category.

The polygon is used for discrete variation series.

To construct a distribution polygon (Fig. 1), we plot the quantitative values ​​of the varying characteristic—options—on the abscissa (X) axis, and frequencies or frequencies on the ordinate axis.

If the values ​​of a characteristic are expressed in the form of intervals, then such a series is called interval.
Interval series distributions are depicted graphically in the form of a histogram, cumulate or ogive.

Statistical table

Condition: Data are provided on the size of deposits of 20 individuals in one bank (thousand rubles) 60; 25; 12; 10; 68; 35; 2; 17; 51; 9; 3; 130; 24; 85; 100; 152; 6; 18; 7; 42.
Task: Construct an interval variation series with equal intervals.
Solution:

  1. The initial population consists of 20 units (N = 20).
  2. Using the Sturgess formula, we determine the required number of groups used: n=1+3.322*lg20=5
  3. Let's calculate the value of the equal interval: i=(152 - 2) /5 = 30 thousand rubles
  4. Let's divide the initial population into 5 groups with an interval of 30 thousand rubles.
  5. We present the grouping results in the table:

With such a recording of a continuous characteristic, when the same value occurs twice (as the upper limit of one interval and the lower limit of another interval), then this value belongs to the group where this value acts as the upper limit.

Histogram

To construct a histogram, the values ​​of the boundaries of the intervals are indicated along the abscissa axis and, based on them, rectangles are constructed, the height of which is proportional to the frequencies (or frequencies).

In Fig. 6.2. shows a histogram of the distribution of the Russian population in 1997 by age group.

Rice. 6.2. Distribution of the Russian population by age groups

Condition: The distribution of 30 employees of the company by monthly salary is given

Task: Display the interval variation series graphically in the form of a histogram and cumulate.
Solution:

  1. The unknown boundary of the open (first) interval is determined by the value of the second interval: 7000 - 5000 = 2000 rubles. With the same value we find the lower limit of the first interval: 5000 - 2000 = 3000 rubles.
  2. To construct a histogram in a rectangular coordinate system, we plot along the abscissa axis the segments whose values ​​correspond to the intervals of the varicose series.
    These segments serve as the lower base, and the corresponding frequency (frequency) serves as the height of the formed rectangles.
  3. Let's build a histogram:

To construct cumulates, it is necessary to calculate the accumulated frequencies (frequencies). They are determined by sequentially summing the frequencies (frequencies) of previous intervals and are designated S. The accumulated frequencies show how many units of the population have a characteristic value no greater than the one under consideration.

Cumulates

The distribution of a characteristic in a variation series over accumulated frequencies (frequencies) is depicted using a cumulate.

Cumulates or a cumulative curve, unlike a polygon, is constructed from accumulated frequencies or frequencies. In this case, the values ​​of the characteristic are placed on the abscissa axis, and accumulated frequencies or frequencies are placed on the ordinate axis (Fig. 6.3).

Rice. 6.3. Cumulates of household size distribution

4. Let's calculate the accumulated frequencies:
The cumulative frequency of the first interval is calculated as follows: 0 + 4 = 4, for the second: 4 + 12 = 16; for the third: 4 + 12 + 8 = 24, etc.

When constructing a cumulate, the accumulated frequency (frequency) of the corresponding interval is assigned to its upper limit:

Ogiva

Ogiva is constructed similarly to the cumulate with the only difference being that the accumulated frequencies are placed on the abscissa axis, and the characteristic values ​​are placed on the ordinate axis.

A type of cumulate is a concentration curve or Lorentz plot. To construct a concentration curve, a scale scale in percentages from 0 to 100 is plotted on both axes of the rectangular coordinate system. At the same time, the accumulated frequencies are indicated on the abscissa axis, and the accumulated values ​​of the share (in percent) by volume of the characteristic are indicated on the ordinate axis.

The uniform distribution of the characteristic corresponds to the diagonal of the square on the graph (Fig. 6.4). With an uneven distribution, the graph represents a concave curve depending on the level of concentration of the trait.

6.4. Concentration curve

The simplest way to summarize statistical material is to construct series. The summary result of a statistical study can be distribution series. A distribution series in statistics is an ordered distribution of population units into groups according to any one characteristic: qualitative or quantitative. If a series is constructed on a qualitative basis, then it is called attributive, and if on a quantitative basis, then it is called variational.

A variation series is characterized by two elements: variant (X) and frequency (f). A variant is a separate value of a characteristic of an individual unit or group of a population. The number showing how many times a particular value of a feature occurs is called frequency. If frequency is expressed as a relative number, then it is called frequency. A variation series can be intervalal, when the boundaries “from” and “to” are defined, or it can be discrete, when the characteristic being studied is characterized by a certain number.

Let's look at the construction of variation series using examples.

Example. and there is data on the tariff categories of 60 workers in one of the plant’s workshops.

Distribute workers according to tariff category, build a variation series.

To do this, we write down all the values ​​of the characteristic in ascending order and count the number of workers in each group.

Table 1.4

Distribution of workers by category

Worker Rank (X)

Number of workers

person (f)

in % of the total (particularly)

We received a variational discrete series in which the characteristic being studied (the worker’s rank) is represented by a certain number. For clarity, variation series are depicted graphically. Based on this distribution series, a distribution surface was constructed.

Rice. 1.1. Polygon for distribution of workers by tariff category

We will consider the construction of an interval series with equal intervals using the following example.

Example. Data are known on the value of fixed capital of 50 companies in million rubles. It is required to show the distribution of firms by cost of fixed capital.

To show the distribution of firms by value of fixed capital, we first solve the question of the number of groups that we want to highlight. Suppose we decided to identify 5 groups of enterprises. Then we determine the size of the interval in the group. To do this, we use the formula

According to our example.

By adding the value of the interval to the minimum value of the attribute, we obtain groups of firms by cost of fixed capital.

A unit with a double value belongs to the group where it acts as an upper limit (i.e., the value of the attribute 17 will go to the first group, 24 to the second, etc.).

Let's count the number of factories in each group.

Table 1.5

Distribution of firms by value of fixed capital (million rubles)

Cost of fixed capital
in million rubles (X)

Number of firms
(frequency) (f)

Accumulated frequencies
(cumulative)

According to this distribution, a variational interval series was obtained, from which it follows that 36 firms have fixed capital worth from 10 to 24 million rubles. etc.

Interval distribution series can be represented graphically in the form of a histogram.

The results of data processing are presented in statistical tables. Statistical tables contain their own subject and predicate.

The subject is the totality or part of the totality that is being characterized.

Predicates are indicators that characterize the subject.

Tables are distinguished: simple and group, combinational, with simple and complex development of the predicate.

A simple table in the subject contains a list of individual units.

If the subject contains a grouping of units, then such a table is called a group table. For example, a group of enterprises by number of workers, population groups by gender.

The subject of the combination table contains grouping according to two or more characteristics. For example, the population is divided by gender into groups by education, age, etc.

Combination tables contain information that allows one to identify and characterize the relationship of a number of indicators and the pattern of their changes both in space and time. To make the table clear when developing its subject, limit yourself to two or three characteristics, forming a limited number of groups for each of them.

The predicate in tables can be developed in different ways. With a simple development of the predicate, all its indicators are located independently of each other.

In complex development of the predicate, the indicators are combined with each other.

When constructing any table, one must proceed from the purposes of the study and the content of the processed material.

In addition to tables, statistics also use graphs and diagrams. Diagram – statistical data is depicted using geometric shapes. Charts are divided into linear and bar charts, but there can be figured charts (drawings and symbols), pie charts (a circle is taken as the size of the entire population, and the areas of individual sectors display the specific gravity or proportion of its components), radial charts (built on the basis of polar ordinates ). A cartogram is a combination of an outline map or site plan with a diagram.

If the random variable under study is continuous, then ranking and grouping of observed values ​​often does not allow identifying the characteristic features of variation in its values. This is explained by the fact that individual values ​​of a random variable can differ from each other as little as desired, and therefore, in the totality of observed data, identical values ​​of a quantity can rarely occur, and the frequencies of variants differ little from each other.

It is also impractical to construct a discrete series for a discrete random variable, the number of possible values ​​of which is large. In such cases, you should build interval variation series distributions.

To construct such a series, the entire interval of variation of the observed values ​​of a random variable is divided into a series partial intervals and counting the frequency of occurrence of the value values ​​in each partial interval.

Interval variation series call an ordered set of intervals of varying values ​​of a random variable with corresponding frequencies or relative frequencies of values ​​of the variable falling into each of them.

To build an interval series you need:

  1. define size partial intervals;
  2. define width intervals;
  3. set it for each interval top And lower limit ;
  4. group the observation results.

1 . The question of choosing the number and width of grouping intervals has to be decided in each specific case based on goals research, volume samples and degree of variation characteristic in the sample.

Approximately number of intervals k can be estimated based only on sample size n in one of the following ways:

  • according to the formula Sturges : k = 1 + 3.32 log n ;
  • using table 1.

Table 1

2 . Spaces of equal width are generally preferred. To determine the width of intervals h calculate:

  • range of variation R - sample values: R = x max - x min ,

Where xmax And xmin - maximum and minimum sampling options;

  • width of each interval h determined by the following formula: h = R/k .

3 . Lower limit first interval x h1 is selected so that the minimum sample option xmin fell approximately in the middle of this interval: x h1 = x min - 0.5 h .

Intermediate intervals obtained by adding the length of the partial interval to the end of the previous interval h :

x hi = x hi-1 +h.

The construction of an interval scale based on the calculation of interval boundaries continues until the value x hi satisfies the relation:

x hi< x max + 0,5·h .

4 . In accordance with the interval scale, the characteristic values ​​are grouped - for each partial interval the sum of frequencies is calculated n i option included in i th interval. In this case, the interval includes values ​​of the random variable that are greater than or equal to the lower limit and less than the upper limit of the interval.

Polygon and histogram

For clarity, various statistical distribution graphs are constructed.

Based on the data of a discrete variation series, they construct polygon frequencies or relative frequencies.

Frequency polygon x 1 ; n 1 ), (x 2 ; n 2 ), ..., (x k ; n k ). To construct a frequency polygon, options are plotted on the abscissa axis. x i , and on the ordinate - the corresponding frequencies n i . Points ( x i ; n i ) are connected by straight segments and a frequency polygon is obtained (Fig. 1).

Polygon of relative frequencies called a broken line whose segments connect points ( x 1 ; W 1 ), (x 2 ; W 2 ), ..., (x k ; Wk ). To construct a polygon of relative frequencies, options are plotted on the abscissa axis x i , and on the ordinate - the corresponding relative frequencies W i . Points ( x i ; W i ) are connected by straight segments and a polygon of relative frequencies is obtained.

In case continuous sign it is advisable to build histogram .

Frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h , and the heights are equal to the ratio n i/h (frequency density).

To construct a frequency histogram, partial intervals are laid out on the abscissa axis, and segments parallel to the abscissa axis are drawn above them at a distance n i/h .

The results of grouping the collected statistical data are usually presented in the form of distribution series. A distribution series is an ordered distribution of population units into groups according to the characteristic being studied.

Distribution series are divided into attributive and variational, depending on the characteristic that forms the basis of the grouping. If the attribute is qualitative, then the distribution series is called attributive. An example of an attribute series is the distribution of enterprises and organizations by type of ownership (see Table 3.1).

If the characteristic by which the distribution series is constructed is quantitative, then the series is called variational.

The variational series of a distribution always consists of two parts: a variant and the corresponding frequencies (or frequencies). A variant is the value that a characteristic can take on in population units, while frequency is the number of observation units that have a given value of the characteristic. The sum of frequencies is always equal to the volume of the population. Sometimes, instead of frequencies, frequencies are calculated - these are frequencies expressed either as fractions of a unit (then the sum of all frequencies is equal to 1), or as a percentage of the volume of the population (the sum of the frequencies will be equal to 100%).

Variation series are discrete and interval. For discrete series (Table 3.7), the options are expressed in specific numbers, most often integers.

Table 3.8. Distribution of employees by time of work in the insurance company
Time worked in the company, full years (options) Number of employees
Man (frequencies) in % of total (frequency)
up to a year 15 11,6
1 17 13,2
2 19 14,7
3 26 20,2
4 10 7,8
5 18 13,9
6 24 18,6
Total 129 100,0

In interval series (see Table 3.2), the indicator values ​​are specified in the form of intervals. Intervals have two boundaries: lower and upper. Intervals can be open or closed. Open ones do not have one of the boundaries, so in Table. 3.2 the first interval has no lower boundary, and the last one has no upper boundary. When constructing an interval series, depending on the nature of the spread of attribute values, both equal and unequal intervals are used (Table 3.2 shows a variation series with equal intervals).

If a characteristic takes on a limited number of values, usually no more than 10, discrete distribution series are constructed. If the option is larger, then the discrete series loses its clarity; in this case, it is advisable to use the interval form of the variation series. With continuous variation of a characteristic, when its values ​​within certain limits differ from each other by an arbitrarily small amount, an interval distribution series is also constructed.

3.3.1. Construction of discrete variation series

Let's consider the methodology for constructing discrete variation series using an example.

Example 3.2. The following data is available on the quantitative composition of 60 families:

In order to get an idea of ​​the distribution of families by the number of their members, a variation series should be constructed. Since the sign takes a limited number of integer values, we construct a discrete variation series. To do this, it is first recommended to write down all the values ​​of the attribute (the number of members in the family) in ascending order (i.e., rank the statistical data):

Then you need to count the number of families with the same composition. The number of family members (the value of a varying characteristic) are variants (we will denote them by x), the number of families with the same composition are frequencies (we will denote them by f). We present the grouping results in the form of the following discrete variational distribution series:

Table 3.11.
Number of family members (x) Number of families (y)
1 8
2 14
3 20
4 9
5 5
6 4
Total 60

3.3.2. Construction of interval variation series

Let us demonstrate the technique for constructing interval variation distribution series using the following example.

Example 3.3. As a result of statistical observation, the following data were obtained on the average interest rate of 50 commercial banks (%):

Table 3.12.
14,7 19,0 24,5 20,8 12,3 24,6 17,0 14,2 19,7 18,8
18,1 20,5 21,0 20,7 20,4 14,7 25,1 22,7 19,0 19,6
19,0 18,9 17,4 20,0 13,8 25,6 13,0 19,0 18,7 21,1
13,3 20,7 15,2 19,9 21,9 16,0 16,9 15,3 21,4 20,4
12,8 20,8 14,3 18,0 15,1 23,8 18,5 14,4 14,4 21,0

As we can see, viewing such an array of data is extremely inconvenient; in addition, no patterns of changes in the indicator are visible. Let's construct an interval distribution series.

  1. Let's determine the number of intervals.

    The number of intervals in practice is often set by the researcher himself based on the objectives of each specific observation. At the same time, it can also be calculated mathematically using the Sturgess formula

    n = 1 + 3.322lgN,

    where n is the number of intervals;

    N is the volume of the population (number of observation units).

    For our example we get: n = 1 + 3.322lgN = 1 + 3.322lg50 = 6.6 "7.

  2. Let us determine the size of the intervals (i) using the formula

    where x max is the maximum value of the attribute;

    x min - minimum value of the attribute.

    For our example

    The intervals of a variation series are clear if their boundaries have “round” values, so let’s round the value of the interval 1.9 to 2, and the minimum value of the characteristic 12.3 to 12.0.

  3. Let's determine the boundaries of the intervals.

    Intervals, as a rule, are written in such a way that the upper limit of one interval is also the lower limit of the next interval. So, for our example we get: 12.0-14.0; 14.0-16.0; 16.0-18.0; 18.0-20.0; 20.0-22.0; 22.0-24.0; 24.0-26.0.

    Such an entry means that the attribute is continuous. If the variants of a characteristic take strictly defined values, for example, only integers, but their number is too large to construct a discrete series, then you can create an interval series, where the lower boundary of the interval will not coincide with the upper boundary of the next interval (this will mean that the characteristic is discrete ). For example, in the distribution of enterprise employees by age, you can create the following interval groups of years: 18-25, 26-33, 34-41, 42-49, 50-57, 58-65, 66 and more.

    Additionally, in our example, we could make the first and last intervals open, etc. write: up to 14.0; 24.0 and above.

  4. Based on the initial data, we will construct a ranked series. To do this, we write down in ascending order the values ​​that the sign takes. We present the results in the table: Table 3.13. Ranked series of interest rates of commercial banks
    Bank rate % (options)
    12,3 17,0 19,9 23,8
    12,8 17,4 20,0 24,5
    13,0 18,0 20,0 24,6
    13,3 18,1 20,4 25,1
    13,8 18,5 20,4 25,6
    14,2 18,7 20,5
    14,3 18,8 20,7
    14,4 18,9 20,7
    14,7 19,0 20,8
    14,7 19,0 21,0
    15,1 19,0 21,0
    15,2 19,0 21,1
    15,3 19,0 21,4
    16,0 19,6 21,9
    16,9 19,7 22,7
  5. Let's count the frequencies.

    When counting frequencies, a situation may arise when the value of a characteristic falls on the boundary of some interval. In this case, you can be guided by the rule: a given unit is assigned to the interval for which its value is the upper limit. So, the value 16.0 in our example will refer to the second interval.

The grouping results obtained in our example will be presented in a table.

Table 3.14. Distribution of commercial banks by lending rate
Short rate, % Number of banks, units (frequencies) Accumulated frequencies
12,0-14,0 5 5
14,0-16,0 9 14
16,0-18,0 4 18
18,0-20,0 15 33
20,0-22,0 11 44
22,0-24,0 2 46
24,0-26,0 4 50
Total 50 -

The last column of the table presents accumulated frequencies, which are obtained by sequentially summing frequencies starting from the first (for example, for the first interval - 5, for the second interval 5 + 9 = 14, for the third interval 5 + 9 + 4 = 18, etc. .). The accumulated frequency, for example, 33, shows that 33 banks have a loan rate that does not exceed 20% (the upper limit of the corresponding interval).

In the process of grouping data when constructing variation series, unequal intervals are sometimes used. This applies to those cases when the values ​​of a characteristic obey the rule of arithmetic or geometric progression or when the application of the Sturgess formula leads to the appearance of “empty” interval groups that do not contain a single observation unit. Then the boundaries of the intervals are set arbitrarily by the researcher himself, based on common sense and the objectives of the survey, or using formulas. So, for data that changes in arithmetic progression, the size of the intervals is calculated as follows.

In many cases, when a statistical population includes a large or, even more so, an infinite number of variants, which most often occurs with continuous variation, it is practically impossible and impractical to form a group of units for each variant. In such cases, combining statistical units into groups is possible only on the basis of an interval, i.e. such a group that has certain limits for the values ​​of a varying characteristic. These limits are indicated by two numbers indicating the upper and lower limits of each group. The use of intervals leads to the formation of an interval distribution series.

Interval rad is a variation series, the variants of which are presented in the form of intervals.

An interval series can be formed with equal and unequal intervals, while the choice of the principle for constructing this series depends mainly on the degree of representativeness and convenience of the statistical population. If the population is large enough (representative) in terms of the number of units and is completely homogeneous in its composition, then it is advisable to base the formation of an interval series on equality of intervals. Usually, using this principle, an interval series is formed for those populations where the range of variation is relatively small, i.e. the maximum and minimum options usually differ from each other several times. In this case, the value of equal intervals is calculated by the ratio of the range of variation of a characteristic to a given number of formed intervals. To determine equal And interval, the Sturgess formula can be used (usually with a small variation of interval characteristics and a large number of units in the statistical population):

where x i - equal interval value; X max, X min - maximum and minimum options in a statistical aggregate; n . - the number of units in the aggregate.

Example. It is advisable to calculate the size of an equal interval according to the density of radioactive contamination with cesium - 137 in 100 settlements of the Krasnopolsky district of the Mogilev region, if it is known that the initial (minimum) option is equal to I km / km 2, the final ( maximum) - 65 ki/km 2. Using formula 5.1. we get:

Consequently, in order to form an interval series with equal intervals in terms of the density of cesium contamination - 137 settlements in the Krasnopolsky region, the size of the equal interval can be 8 ki/km 2 .

Under conditions of uneven distribution, i.e. when the maximum and minimum options are hundreds of times, when forming an interval series, you can apply the principle unequal intervals. Unequal intervals usually increase as we move to larger values ​​of the characteristic.

The shape of the intervals can be closed or open. Closed It is customary to call intervals that have both lower and upper boundaries. Open intervals have only one boundary: in the first interval there is an upper boundary, in the last one there is a lower boundary.

It is advisable to evaluate interval series, especially with unequal intervals, taking into account distribution density, the simplest way to calculate which is the ratio of the local frequency (or frequency) to the size of the interval.

To practically form an interval series, you can use the table layout. 5.3.

Table 5.3. The procedure for forming an interval series of settlements in the Krasnopolsky region according to the density of radioactive contamination with cesium –137

The main advantage of the interval series is its maximum compactness. at the same time, in the interval distribution series, individual variants of the characteristic are hidden in the corresponding intervals

When graphically depicting an interval series in a system of rectangular coordinates, the upper boundaries of the intervals are plotted on the abscissa axis, and the local frequencies of the series are plotted on the ordinate axis. The graphical construction of an interval series differs from the construction of a distribution polygon in that each interval has lower and upper boundaries, and two abscissas correspond to one ordinate value. Therefore, on the graph of an interval series, not a point is marked, as in a polygon, but a line connecting two points. These horizontal lines are connected to each other by vertical lines and the figure of a stepped polygon is obtained, which is commonly called histogram distribution (Fig. 5.3).

When graphically constructing an interval series for a sufficiently large statistical population, the histogram approaches symmetrical form of distribution. In those cases where the statistical population is small, as a rule, asymmetrical histogram.

In some cases, it is advisable to form a number of accumulated frequencies, i.e. cumulative row. A cumulative series can be formed on the basis of a discrete or interval distribution series. When graphically depicting a cumulative series in a system of rectangular coordinates, variants are plotted on the abscissa axis, and accumulated frequencies (frequencies) are plotted on the ordinate axis. The resulting curved line is usually called cumulative distribution (Fig. 5.4).

The formation and graphical representation of various types of variation series contributes to a simplified calculation of the main statistical characteristics, which are discussed in detail in topic 6, and helps to better understand the essence of the distribution laws of the statistical population. Analysis of a variation series acquires particular importance in cases where it is necessary to identify and trace the relationship between options and frequencies (frequencies). This dependence is manifested in the fact that the number of cases per option is in a certain way related to the size of this option, i.e. with increasing values ​​of the varying characteristic, the frequencies (frequencies) of these values ​​experience certain, systematic changes. This means that the numbers in the frequency (frequency) column do not fluctuate chaotically, but change in a certain direction, in a certain order and sequence.

If the frequencies show a certain systematicity in their changes, then this means that we are on the way to identifying a pattern. The system, order, sequence in changes in frequencies is a reflection of general causes, general conditions characteristic of the entire population.

It should not be assumed that the distribution pattern is always given in ready-made form. There are quite a lot of variation series in which the frequencies freakishly jump, sometimes increasing, sometimes decreasing. In such cases, it is advisable to find out what kind of distribution the researcher is dealing with: either this distribution does not have any inherent patterns at all, or its nature has not yet been revealed: The first case is rare, but the second case is a fairly common and very widespread phenomenon.

Thus, when forming an interval series, the total number of statistical units may be small, and each interval contains a small number of variants (for example, 1-3 units). In such cases, one cannot count on the manifestation of any pattern. In order for a natural result to be obtained based on random observations, the law of large numbers must come into force, i.e. so that for each interval there would be not several, but tens and hundreds of statistical units. To this end, we must try to increase the number of observations as much as possible. This is the surest way to detect patterns in mass processes. If there is no real opportunity to increase the number of observations, then identifying a pattern can be achieved by reducing the number of intervals in the distribution series. By reducing the number of intervals in a variation series, the number of frequencies in each interval thereby increases. This means that the random fluctuations of each statistical unit are superimposed on each other, “smoothed out”, turning into a pattern.

The formation and construction of variation series allows us to obtain only a general, approximate picture of the distribution of the statistical population. For example, a histogram only in rough form expresses the relationship between the values ​​of a characteristic and its frequencies (frequencies). Therefore, variation series are essentially only the basis for further, in-depth study of the internal regularity of the static distribution.

TEST QUESTIONS FOR TOPIC 5

1. What is variation? What causes variation in a trait in a statistical population?

2. What types of varying characteristics can occur in statistics?

3. What is a variation series? What types of variation series can there be?

4. What is a ranked series? What are its advantages and disadvantages?

5. What is a discrete series and what are its advantages and disadvantages?

6. What is the procedure for forming an interval series, what are its advantages and disadvantages?

7. What is a graphical representation of ranked, discrete, interval distribution series?

8. What is the cumulate of distribution and what does it characterize?



Did you like the article? Share with your friends!