A statistical series is given. Distribution series

Theory of statistics: lecture notes Burkhanova Inessa Viktorovna

1. Statistical distribution series

As a result of processing and systematization of primary data statistical observation groupings called distribution series are obtained.

Statistical distribution series represent an ordered arrangement of units of the population being studied into groups according to grouping characteristics.

There are attributive and variational distribution series.

Attributive is a distribution series constructed according to qualitative features. It characterizes the composition of the population according to various essential characteristics.

Based on quantitative criteria, it is built variational distribution series. It consists of the frequency (number) of individual options or each group of a variation series. These numbers show how often different options (attribute values) occur in the distribution series. The sum of all frequencies determines the size of the entire population.

The numbers of groups are expressed in absolute and relative values Oh. In absolute terms it is expressed by the number of population units in each selected group, and in relative terms - in the form of shares, specific gravity, presented as a percentage of the total.

Depending on the nature of the variation of the attribute, discrete and interval variational distribution series are distinguished. In a discrete variation series, group distributions are composed according to a characteristic that changes discretely and takes only integer values.

In an interval variational distribution series, the grouping characteristic that forms the basis of the grouping can take on any values ​​in a certain interval.

Variation series consist of two elements: frequencies and variations.

Option called separate meaning variable characteristic that it takes in the distribution series.

Frequency– this is the number of individual variants or each group of a variation series. If frequencies are expressed in fractions of a unit or as a percentage of the total, then they are called frequencies.

The rules and principles for constructing interval distribution series are based on similar rules and principles for constructing statistical groupings. If the interval variation series of the distribution is constructed with equal intervals, the frequencies make it possible to judge the degree to which the interval is filled with population units. To carry out comparative analysis the occupancy of the intervals determines the indicator that will characterize the distribution density.

Distribution density is the ratio of the number of population units to the width of the interval.

From the book Buy a restaurant. Selling a restaurant: from creation to sale author Gorelkina Elena

Statistical methods Counting in crowds. The method, frankly speaking, is naive, but very popular. The organizer of a restaurant business takes a notepad and pencil, stands at the door of a similar establishment in an equivalent area and counts how many people pass by per unit of time.

From the book A Century of War. (Anglo-American Oil Policy and the New World Order) author Engdahl William Frederick

Chapter 6 ENGLO-AMERICANS CLOSED RANKS Genoa Conference On April 16, 1922, at the Genoese Villa Alberta, the German delegation present at the post-war international conference in economics, detonated a bomb, shock wave from which it rolled to another

From the book Theory of Statistics: Lecture Notes author

1. Statistical distribution series As a result of processing and systematization of primary statistical observation data, groupings called distribution series are obtained. Statistical distribution series represent an ordered arrangement of units

From the book General theory statistics: lecture notes author Konik Nina Vladimirovna

3. Statistical tables In the form of statistical tables, the results of a summary and grouping of observation materials are presented. A statistical table is a special way of briefly and clearly recording information about the studied social phenomena. Statistical table

From the book Theory of Statistics author Burkhanova Inessa Viktorovna

LECTURE No. 10. Dynamic series and their study in commercial activities 1. Basic concepts about dynamic series All processes and phenomena occurring in public life human beings are the subject of study of statistical science; they are in constant movement And

From the book Financial Statistics author Sherstneva Galina Sergeevna

3. Statistical tables After statistical observation data has been collected and even grouped, it is difficult to perceive and analyze them without a certain, visual systematization. The results of statistical summaries and groupings are presented in the form

From the book General Theory of Statistics author Shcherbina Lidiya Vladimirovna

4. Series of aggregate indices with constant and variable weights When studying dynamics economic phenomena indices are constructed and calculated for a number of consecutive periods. They form series of either basic or chain indices. In a series of basic indices comparison

From the book Business Plan 100%. Effective business strategy and tactics by Rhonda Abrams

18. Statistical distribution series and their graphical representation Statistical distribution series represent an ordered arrangement of units of the population under study into groups according to grouping characteristics. There are attribute and variation series

From the author's book

19. Statistical tables In the form of statistical tables, the results of a summary and grouping of observation materials are presented. A statistical table is a special way of briefly and clearly recording information about the social phenomena being studied. Statistical table

From the author's book

6. Statistical terms Statistical information obtained as a result of observation is necessary to provide to authorities public administration, to provide information to managers of enterprises, companies, etc., to inform the public about

From the author's book

44. Statistical methods Statistical methods are especially widely used in the study of financial investments. The study of financial investments is based on the construction of an equivalence equation, the so-called balance sheet of a financial transaction. Contents of this

From the author's book

45. Statistical Models For efficient work in the stock market, you need to know how the return of a particular stock (or a portfolio of shares of a particular investor) relates to the average market return of the entire universe of stocks, i.e., to the market index. For

From the author's book

15. Statistical tables A statistical table is a table that gives a quantitative characteristic of a statistical population and is a form of visual presentation of the results obtained statistical summary and groupings of numeric (digital)

From the author's book

19. Statistical maps Statistical maps are a type of graphical representation of statistical data on a schematic geographical map, characterizing the level or degree of distribution of a particular phenomenon in a certain territory.

From the author's book

38. Series of aggregate indices with constant and variable weights When studying the dynamics of economic phenomena, indices are constructed and calculated for a number of successive periods. They form series of either basic or chain indices. In a series of basic index comparison

From the author's book

International Statistics The Internet has greatly simplified the collection of data on a global scale. In most developed and many developing countries Internet access is provided statistical information. They post their data and international data in free access.

Target: learn to compile statistical distributions of samples, build polygons, histograms, build empirical distribution functions.

Mathematical statistics is a branch of applied mathematics devoted to methods of collecting, grouping and analyzing statistical information obtained as a result of observations or experiments.

General population call a set of objects that are homogeneous with respect to some attribute.

Sample population (sample) is a collection of randomly selected objects.

Repeat called a sample in which the selected object (before selecting the next one) is returned to the population.

Repeatless called a sample in which the selected object is not returned to the population.

The number of objects in a collection is called its volume.

The sample is called representative, if each sample item is selected at random from the population, and if all items have the same probability of being included in the sample.

The numerical value of a quantitative characteristic is called option.

Statistical distribution samples are called a list of options and their corresponding frequencies or relative frequencies.

Variation series is called a series of options ranked in ascending (or descending) order with their corresponding frequencies.

The variation series is called discrete, if any of its options differ by constant value, And - interval, if the options can differ from one another by an arbitrarily small amount.

A discrete statistical series is specified by a table that indicates variants, frequencies or relative frequencies of their occurrence. A graphical representation of a discrete statistical series is called polygon of frequencies (relative frequencies). This is a broken line in which the ends of the segments have coordinates or , .

Example. The law of distribution of a discrete statistical series and the frequency range.

Interval statistical series for random continuous quantities and for random discrete variables at large volumes samples. An interval series is a table that shows partial intervals, frequency densities, or relative frequency densities. A graphical representation of an interval statistical series is called histogram. It is a stepped figure of rectangles with bases equal to the intervals of attribute values, and heights equal to equal frequencies intervals.

Example. The law of distribution of interval statistical series and histogram.

(55;60) (60;65) (65;70) (70;75) (75;80) (80;85) (85;90)

Construction algorithm interval series:

Let a sample be given with volume.

1) find the sample range,

2) determine the number of partition classes using the formulas:

(Sturgess formula for)

(Brooks formula for),

3) find the value of the class interval,

4) borders partial intervals we find using the formulas:

, , , .



5) count the frequency of occurrence of the variant in each interval.

Cumulative curve (cumulate)– curve of accumulated frequencies. For discrete series The cumulate is a broken line connecting the points or , . For an interval variation series, the broken line starts from a point whose abscissa is equal to the beginning of the first interval, and the ordinate is equal to the accumulated frequency, equal to 0. Other points correspond to the ends of the intervals.

Empirical distribution function is called the relative frequency that a sign will take a value less than a given one, that is.

For a discrete variation series, the empirical function is a discontinuous step function; for an interval series, it coincides with the cumulate.

Basic numerical characteristics of a variation series:

Arithmetic mean variational series, where are variants of a discrete series or the middle of intervals, and are the corresponding frequencies.

Basic properties of the arithmetic mean:

6), where is the overall average, is the group average of the group with volume, and is the number of groups.

Dispersion variation series .

Basic properties of dispersion:

2) ,

3) ,

4) ,

5) , where - total variance, - group dispersion, - arithmetic mean group variances, - intergroup dispersion.

6) - dispersion of the average value.

Standard deviation .

Coefficient of variation .

Median variation series , where is the beginning of the median interval, is its length, is the sample size, is the sum of the frequencies of the intervals preceding the median, is the frequency of the median interval. For a discrete series, the median is the value of the attribute that falls in the middle of the ranked series of observations.

Fashion , where is the beginning of the modal interval, is its length, is the frequency of the modal interval, and are the frequencies of the preceding and following modal intervals, respectively. For a discrete series, the mode is the variant that corresponds to the highest frequency.

Starting moment -th order.

Central moment-th order .

Asymmetry coefficient .

Excess .

Security questions:

1. General and sample population, their volume.

2. Statistical distribution samples. Variation series.

3. Discrete statistical series. Frequency polygon.

4. Interval statistical series. Histogram.

5. Algorithm for constructing an interval statistical series.

6. Empirical distribution function. Cumulative curve.

7. The arithmetic mean of a variation series and its properties.

8. Dispersion and its properties. RMS.

Test tasks:

1. As you know, a person’s handwriting, including the inclination of letters, is closely related to his character. A low slope (30 - 40 degrees) indicates a person’s temper and excitability, excessive directness and haste in actions; tilt 40 – 50 degrees. characterizes harmonious development nature; tilt 50 – 90 degrees. indicates self-control, a narrow range of hobbies.

Among the students of the institute, the handwriting of 50 people was selectively studied. It turned out that the handwriting of 30% of those present had a low slope, 50% had a slope of 40–50 degrees, and 20% had a slope of 50–90 degrees.

Find the distribution of frequencies, relative frequencies, build a polygon and a histogram.

2. Given is the distribution of the characteristic obtained from observations. Necessary:

4. The height (cm) of men aged 25 years was studied. By random sample volume 35: 175, 167, 168, 169, 168, 170, 174, 173, 177, 172, 174, 167, 173, 172, 171, 171, 170, 167, 174, 177, 171, 172, 16 9 , 171, 173, 173, 168, 173, 172, 166, 164, 168, 172, 174, find the statistical interval distribution series and construct a frequency histogram.

Tasks for homework:

The distribution of the characteristic obtained from observations is given. Necessary:

1) construct (polygon) a histogram, cumulate and empirical distribution function;

2) find: arithmetic mean, mode and median, dispersion, standard deviation and coefficient of variation, initial and central points-th order.

5-10 10-15 15-20 20-25 25-30 30-35 35-40

Topic No. 12 “Finding point and interval estimates of distribution parameters”

Target: learn to determine point and interval statistical estimates of the general parameters of a normal distribution using sample data from the general population.

Brief theoretical information:

Statistical assessment (statistics) unknown parameter q the distribution of the population is called a function of the results of observations q* .

Statistical evaluation q* is a random variable.

An estimate determined by a single number depending on the sample data is called point.

Requirements for point statistical estimates:

1) consistency (striving according to probability to the estimated parameter at ),

2) undisplaced (absence systematic errors for any sample size (q*) = q),

3) efficiency (among all possible assessments effective assessment has the least dispersion).

Point estimates of general parameters of a normally distributed population:

Interval estimation called an estimate that is determined by two numbers - the ends of the interval.

Interval estimates allow us to establish the accuracy and reliability of a point estimate.

Accuracy estimates is called modulus deviation q* from q.

Ultimate error samples is called the maximum permissible deviation in absolute value q* from q.

Reliability (confidence probability) assessments q* called probability , with which inequality is realized |q - q*|< . Usually = 0,95; 0,99; 0,999…

Probability that an unknown parameter will not fall within the interval |q - q*|< , is equal to - significance level.

Trusted is called the interval ( q*- ;q*+), which covers the unknown parameter with a given reliability .

Interval estimates of normal distribution parameters:

1) Confidence interval for the mathematical expectation at known variance.

, where the Laplace functions are found from the table, taking into account .

2) Confidence interval for the mathematical expectation with unknown variance.

Rice.:
, where they are found from the table of Student coefficients.

3) Confidence interval for the variance when .

< < , Where , - found at with the number of degrees of freedom.

4) Confidence interval for the variance for the unknown.

, Where - found from the distribution table at 1- , - found at with the number of degrees of freedom.

Example 1. Calculate unbiased estimates of population parameters from sample data: 64 63 71 68 73 71 74 73 70 75 68 67 73.

,

,

.

Example 2. Find confidence intervals for mathematical expectation, variance and standard deviation at a significance level of 0.05, if the sample used in Example 1 is drawn from the population.

Solution. We use the data from example 1 to find the confidence interval for the mathematical expectation with unknown variance:

,

.

We use the data from example 1 to find the confidence interval for the variance with an unknown mathematical expectation:

,

Where = ()= =4.4 and =

,

Security questions:

1. Statistical assessment of the unknown parameter of the theoretical distribution.

2. Point estimate.

3. Requirements point estimates: unbiasedness, consistency, efficiency.

4. General and sample average.

5. General and sample variances.

6. Correction factor. Corrected sample variance.

7. General standard deviation and its point estimate.

8. Estimation of dispersion and standard deviation of the sample mean.

9. Interval estimation unknown population parameter.

10. Confidence probability and level of significance.

11. Confidence interval.

12. Rule for finding the confidence interval.

13. Confidence interval for mathematical expectation with known variance.

14. Confidence interval for mathematical expectation with unknown variance.

15. Confidence interval for the variance with known .

16. Confidence interval for variance for unknown.

Test tasks:

1. When checking the progress of the faculty, 50 students were randomly tested, distributed according to test results as follows ( - score, - number of students with a given score):

Find the sample average communication distance.

3. Find the spread of the average score in task 1 of testing 50 students.

4. Find an estimate of the spread of reading speed, the distribution presented in the table, having previously determined relative frequency average speed reading.

5. Find unbiased estimates of the general mean, variance and average square deviation population based on a sample size of 12, describing the duration in seconds physical activity before the development of an angina attack: 289, 208, 259, 243, 232, 210, 251, 246, 224, 239, 220, 211.

6. There is a sample volume - these are the values ​​of systolic pressure in men in the initial stage of shock: 127, 124, 155, 129, 77, 147, 65, 109, 145, 141. Determine the dispersion and standard deviation of the sample mean.

7. According to the non-repetition sampling scheme, from 400 subjects in the experiments of Franzen and Offenloch using evoked potentials, 100 people were selected and latent periods were measured. The test results are shown in the table:

The standard deviation is specified. Find:

a) the probability that the average latent period of all 400 people differs from the average period in the sample by no more than 0.31 ms (according to absolute value),

b) boundaries within which the average value of the latent period is likely to be contained,

c) the sample size for which confidence limits with a maximum error would occur with a confidence probability.

8. The distribution of Carlson’s daily visits to Baby during the month is shown in the table:

Determine the boundaries within which the average number of visits is likely to lie.

9. A random variable has normal distribution with known standard deviation =3. Find confidence intervals for estimating the unknown mathematical expectation A according to sample means = 24.5, if the sample size and the reliability of the estimate are specified.

10. A quantitative characteristic of the general population is normally distributed. Based on the volume sample, the sample mean = 20.2 and the corrected standard deviation were found. Estimate the unknown mathematical expectation using a confidence interval with a reliability of 0.95.

11. For 9 applicants for the position of manager, a professional indicator was assessed, characterizing the ability to lead people. Considering the indicator distributed over normal law with standard deviation arb. units, determine reliably the confidence interval for the true standard deviation of the indicator.

Homework assignments:

1. Find estimates of the general mean, dispersion and standard deviation, if the population is specified by a distribution table:

Estimate with a reliability of 0.95 the mathematical expectation of a normally distributed characteristic of the population using a confidence interval.

4. Find confidence intervals for the mathematical expectation, dispersion and standard deviation at confidence probability 0.95 if a sample is taken from the population:

67 70 69 68 74 72 66 66 74 69 72 78 67

Topic No. 13 « Testing statistical hypotheses about equality of variances and mathematical expectations»

Target: learn to check statistical hypotheses on the equality of variances and mathematical expectations of normal general populations.

Brief theoretical information:

Statistical called a hypothesis about the form of an unknown distribution, or about the parameters of known distributions.

Null(main) is called the put forward hypothesis.

Competing(alternative) is a hypothesis that contradicts the null hypothesis.

Error of the first kind is that the correct hypothesis will be rejected.

Error of the second type is that the wrong hypothesis will be accepted.

The probability of making a type II error is level of significance.

Statistical criterion called a random variable that serves to test the null hypothesis.

Observed value call the criterion value calculated from samples.

Critical area is a set of criterion values ​​at which the null hypothesis is rejected.

Hypothesis Acceptance Area– a set of criterion values ​​at which the hypothesis is accepted.

If it belongs to the critical area, the hypothesis is rejected; if it belongs to the area where the hypothesis is accepted, the hypothesis is accepted.

Critical points They call the points separating the critical region from the region where the hypothesis is accepted.

Critical points are sought based on the requirement that, provided the null hypothesis is true, the probability that the criterion will fall into the critical region was equal to the accepted significance level.

For each criterion there are corresponding tables from which the critical point that satisfies this requirement is found.

When found, calculate from the sample data and, if > (right-hand critical region),< (левосторонняя), < < , < (двусторонняя), то отвергается.

Comparing two variances of normal populations:

Let them be distributed normally. Based on independent samples with volumes correspondingly equal to and , extracted from these populations, corrected sample variances And . It is required to test the null hypothesis using corrected variances at a given significance level .

1) put forward a competing hypothesis (),

2) we find,

3) according to the table critical points Fischer-Snedekor we find (), where , and is the sample size to which corresponds , - ,

4) if , then we accept the null hypothesis, otherwise – the alternative.

Introduction

Since time immemorial, humanity has been taking into account many phenomena and objects accompanying its life activity and related calculations. People received versatile, although varying in completeness at different stages social development. Data taken into account on a daily basis in the process of making business decisions, and in generalized form on state level when determining the direction of economic and social policy and the nature of foreign policy activities.

Guided by considerations of the dependence of the well-being of the nation on the amount of created useful product, interests of strategic security of states and peoples from the number of adults male population, treasury income from the size of taxable resources, etc., has long been clearly recognized and implemented in the form of various accounting shares.

Taking into account achievements economic science it became possible to calculate indicators that generally characterize the results of the reproduction process at the level of society: the total social product, national income, gross national product.

All of the above information in ever-increasing volumes is provided to society by statistics, which is a necessary accessory state apparatus. Statistics are thus able to speak in language statistical indicators about many things in a very vivid and convincing form.

For statistical analysis data in my work I used the Excel program (calculating formulas and plotting graphs).

Statistical distribution series, their meaning and application in statistics

As a result of processing and systematization of primary statistical observation data, groupings called distribution series are obtained. In them, the number of observation units in groups is known. Presented in absolute and relative terms.

A statistical distribution series is an ordered distribution of units of the population being studied into groups according to a certain varying characteristic. It characterizes the composition (structure) of the phenomenon under study, allows us to judge the homogeneity of the population, the pattern of distribution and the limits of variation of units of the population.

Statistical series are divided into:

Attributive - these are series constructed according to attributive characteristics, in ascending or descending order of observed knowledge.

That is, qualitative characteristics that do not have numerical expression and characterizing the property, quality of the socio-economic phenomenon being studied.

Attributive distribution series characterize the composition of the population according to certain essential characteristics.

Taken over several periods, these data make it possible to study changes in structure.

The number of groups of the attribute distribution series is adequate to the number of gradations. Varieties of attributive characteristics.

An example of an attribute distribution series is given in Table 1.

Table 1. Distribution of 1st year students by academic performance

Elements this series The distributions are gradations of the attributive feature “Achievement” (“they have time” - “they don’t have time”) and the number of each group in absolute (person) and relative (%) terms.

There were 46 students who passed the exam in the discipline. Their specific gravity amounted to 92%.

Variational series are series built on a quantitative basis.

Variational distribution series consist of two elements: options and frequencies:

Options are numeric values quantitative characteristic in the variation series of distribution. They can be positive and negative, absolute and relative. Thus, when grouping enterprises according to results economic activity positive options mean profit, and negative numbers- this is a loss.

Frequencies are the numbers of individual options or each group of a variation series, i.e. These are numbers showing how often certain options occur in a distribution series. The sum of all frequencies is called the volume of the population and is determined by the number of elements of the entire population.

Frequencies are frequencies expressed as relative values ​​(fractions of units or percentages). The sum of the frequencies is equal to one or 100%. Replacing frequencies with frequencies allows one to compare variation series with different numbers observations.

Variation series, depending on the nature of the variation, are divided into discrete and interval.

A discrete variational distribution series is a series in which groups are composed according to a characteristic that changes discretely and takes only integer values.

An example of a discrete variational distribution series is given in Table 2.

Table 2. Distribution of students by exam score

In gr. Table 1, Table 2 presents options for a discrete variation series. In gr. 2 - frequencies, and in gr. 3 - frequencies. In case continuous variation the value of a characteristic for population units can take on within certain limits any values. Differing from each other by an arbitrarily small amount.

An interval variational distribution series is a series in which the grouping characteristic that forms the basis of the grouping can take on any values, including fractional ones, in a certain interval.

It is advisable to construct an interval distribution series, first of all, with a continuous variation of a characteristic, and also if a discrete variation manifests itself over a wide range, i.e. the number of variants of a discrete characteristic is quite large.

The rules and principles for constructing interval distribution series are similar to the rules and principles for constructing statistical groupings. If the interval variation distribution series is constructed with equal intervals, the frequencies make it possible to judge the degree to which the interval is filled with population units. When constructing unequal intervals, it is impossible to obtain information about the degree of filling of each interval. In order to conduct a comparative analysis of the occupancy of the intervals, an indicator characterizing the distribution density is determined. This is the ratio of the number of population units to the width of the interval.

An example of an interval variation distribution is given in Table 3.

Table 3. Distribution of construction firms in the region by average number of employees*

* - The numbers are conditional

The presented distribution series is interval, the formation of groups of which is based on a continuous feature.

For clarity, analysis of distribution series can be carried out based on their graphic image. For this purpose, a polygon, histogram, ogive and distribution cumulate are constructed.

Calculation part of task No. 5

There are sample data (5% mechanical sample) on the average annual cost of fixed production assets and the output of enterprises in the economic sector for the reporting period.

Table 4. Initial data

Product output, million rubles.

According to the original data:

1. Construct a statistical series of distribution of enterprises by the average annual cost of fixed production assets, forming four groups of enterprises at equal intervals, characterizing them by the number of enterprises and the share of enterprises.

2. Calculate the general indicators of the distribution series:

a) the average annual cost of fixed production assets, weighing the values ​​of the attribute by the absolute number of enterprises and their share;

b) mode and median;

c) construct graphs of the distribution series and determine the value of the mode and median on them.

Solution:

1. First, determine the length of the interval using the formula:

e=(x max - x min)/k,

where k is the number of groups in the grouping (from the condition k=4),

x max and x min - maximum and minimum values ​​of the distribution series,

e=(60 - 20)/4=10 million rubles.

Then we define the lower and upper interval limits for each group:

Group number

lower limit

upper limit

Let’s create worksheet 5, where we will summarize the initial data:

Table 5. Worksheet

Groups of enterprises by average annual cost of open pension fund,

Enterprise No.

Average annual cost of OPF, million rubles.

Product release,

Let us calculate the characteristics of the distribution series by the share of enterprises using the formula:

where d is the share of the enterprise;

f i - number of enterprises in the group;

F i - total number of enterprises.

Substitute the data into the formulas. The results obtained are entered into the final table 6.

All formulas and calculations in Table 6 are included in Excel program and are given in Appendix 1.

Table 6. Distribution of enterprises by average annual value of fixed production assets

This grouping shows that the majority of these enterprises (33.3%) have an average annual cost of fixed production assets ranging from 40 to 50 million rubles.

2. a) Calculate the average annual cost of fixed production assets using the weighted arithmetic average formula, weighing the values ​​by the absolute number of enterprises:

and by specific gravity:

To calculate the average from an interval series, it is necessary to express the options in one (discrete) number, this is the simple arithmetic average of the upper and lower values ​​of the interval:

Substitute the data into the formulas. We will record the results obtained in Table 7.

All formulas and calculations in Table 7 are entered in Excel and are given in Appendix 1.

Table 7. Calculation of the average annual cost of open pension fund

The average values ​​are equal, which proves the calculations are correct. The average annual cost of OPF is 41.333 million rubles.

b) Calculate the mode and median of this series.

Mode is the value of a feature that occurs most frequently in the population being studied. For interval variation series The mode distribution is calculated using the formula:

where x Mo is the lower limit of the modal interval;

i Mo is the value of the modal interval;

f Mo - frequency of the modal interval;

f Mo-1 - frequency of the interval preceding the modal one;

f Mo+1 - frequency of the interval following the modal one.

Originally by highest frequency characteristic we define the modal interval. Nai larger number enterprises - 10 - average annual cost of fixed production assets in the range of 40 - 50 million rubles, which is modal.

Substitute the data into the formula.

From the calculation it is clear that modal meaning the cost of OPF of enterprises is the cost equal to 44 million rubles.

The median is an option located in the middle of an ordered variation series, dividing it into two equal parts. For interval variation series, the median is calculated using the formula:

where x Me is the lower limit of the median interval;

i Me - the value of the median interval;

F is the sum of the frequencies of the series;

S Me-1 is the sum of the accumulated frequencies of the series preceding the median interval;

f Me - frequency of the median interval.

We determine the median interval in which serial number medians. To do this, we calculate the sum of the frequencies as an accumulated total to a number exceeding half the volume of the population (30/2 = 15). We enter the obtained data into the calculation table 8.

Table 8. Calculation of median

In the “Sum of accumulated frequencies” column, the value 23 corresponds to the interval 40 - 50. This is the median interval in which the median is located.

Substitute the data into the formula.

The calculation shows that half of the enterprises have an average annual cost of fixed production assets of up to 42 million rubles, while the other half is above this amount.

c) Construct graphs of this distribution series based on the data obtained:

Rice. 1.

Median

Rice. 2. Cumulative distribution of enterprises by average annual cost of general fund

Sample obtained during experimental research, is an unordered set of numbers written in the sequence in which the measurements were made. Typically, the sample is drawn up in the form of a table, the first row (or column) of which contains the experiment number i, and in the second (second) - the fixed value of the random variable of the attribute. In this form, the sample represents the primary form of recording statistical material that can be processed in various ways. As an example, consider the results shown at athletics competitions by shot putters and shown in Table 1. The first line of this table contains the measurement numbers, and the second - their numerical values ​​in meters.

Table 1

Shot put competition results

x i 16,36 14,91 15,31 14,26 14,77 13,88 14,97 14,01 14,07 14,48
x i 14,44 14,81 13,81 15,15 15,23 15,69 14,29 14,15 14,57 13,92
x i 13,62 14,92 15,73 13,22 14,65 14,8 13,04 15,1 13,3

As can be seen from Table 1, a simple statistical aggregate ceases to be a convenient form of presenting statistical material even with a relatively small sample size: it is quite cumbersome and not very visual. It is very difficult to analyze the experimental data obtained, much less draw any conclusions based on them. Based on this, the obtained statistical material must be processed for further research. The simplest way to process a sample is ranking. Ranking is the arrangement of options in ascending or descending order of their values. Table 2 below shows a ranked sample, the elements of which are arranged in ascending order.

Table 2

Ranked competition results in shot put

x i 13,04 13,22 13,3 13,62 13,81 13,88 13,92 14,01 14,07 14,15
x i 14,26 14,29 14,44 14,48 14,57 14,65 14,77 14,8 14,81 14,91
x i 14,92 14,97 15,1 15,15 15,23 15,31 15,69 15,73 16,36

But even in this form, the experimental data obtained are poorly visible and are of little use for direct analysis. That is why, in order to make the statistical material more compact and clear, it must be subjected to further processing - a so-called statistical series is constructed. The construction of a statistical series begins with grouping.

Grouping is the process of organizing and systematizing data obtained during an experiment, aimed at extracting the information contained in them. In the process of grouping, the sample is distributed into groups or grouping intervals, each of which contains a certain range of values ​​of the characteristic being studied. The grouping process begins by dividing the entire range of variation of a characteristic into grouping intervals.

For each specific purpose statistical research, the size of the sample under consideration and the degree of variation of the characteristic in it, there is optimal value the number of intervals and the width of each of them. Approximate value of the optimal number of intervals k can be determined based on sample size n either using the data given in Table 3, or using the Sturgess formula:

k = 1 + 3.322 lg n.

Table 3

Determining the number of grouping intervals

The value obtained by the formula k almost always turns out to be a fractional value that must be rounded to a whole number, since the number of intervals cannot be fractional. Practice shows that, as a rule, it is better to round down, because the formula gives good results at large values n, and when small - somewhat overestimated.

Consider grouping the sample option into specific example. To do this, let's look at the example of shot putters (see tables 1, 2). We will determine the number of grouping intervals based on the data given in Table 3. With a sample size n=29 it is advisable to choose the number of intervals equal to k=5 (Sturgess formula gives the value k =5,9).

Let us agree to use intervals of equal width in the example under consideration. In this case, after the number of grouping intervals is determined, the width of each of them should be calculated using the relationship:

Here h- the width of the intervals, and X max and X min - respectively the maximum and minimum value of the attribute in the sample. Quantities X max and X min are determined directly from the source data table (see Table 2). In this case:

(m).

Here it is necessary to dwell on the accuracy of determining the width of the interval. Two situations are possible: the accuracy of the calculated value h matches the accuracy of the experiment or exceeds it. IN the latter case It is possible to use two approaches to determine the boundaries of the intervals. WITH theoretical point view it is most correct to use the obtained value h to construct intervals. This approach will not introduce additional distortions associated with the processing of experimental data. However, for practical purposes in statistical studies related to physical culture and sports, it is customary to round the resulting value h to the accuracy of data measurement. This is due to the fact that for a visual representation of the results obtained, it is convenient for the boundaries of the intervals to be possible values sign. Thus, the resulting value of the interval width should be rounded taking into account the accuracy of the experiment. We especially note that rounding must be done not in the generally accepted mathematical sense, but upward, i.e. in excess, so as not to reduce the overall range of variation of the characteristic - the sum of the width of all intervals should not be less than the difference between the maximum and minimum values ​​of the characteristic. In the example under consideration, the experimental data are determined to the nearest hundredth (0.01 m), therefore the value of the interval width obtained above should be rounded up to the nearest hundredth. As a result we get:

h= 0.67 (m).

After determining the width of the grouping intervals, their boundaries must be determined. It is advisable to take the lower limit of the first interval equal to the minimum value of the attribute in the sample x min:

x H1 = x min.

In the example under consideration x H1 = 13.04 (m).

To receive upper limit first interval ( x B1) you should add the value of the interval width to the value of the lower boundary of the first interval:

x B1 = X H1 + h.

Note that the upper limit of each interval (here, the first) will simultaneously be the lower limit of the next one (in in this case second) interval: x H2 = x B1.

The values ​​of the lower and upper boundaries of all remaining intervals are determined in a similar way:

x B i = x N i +1 = x N i + h.

In this example:

x B1 = x H2 = x H1 + h=13.04+0.67=13.71 (m),

x B2 = x H3 = x H2+ h=13.71+0.67=14.38 (m),

x B3 = x H4 = x H3+ h=14.38+0.67=15.05 (m),

x B4 = x H5 = x H4 + h=15.05+0.67=15.72 (m),

x B5 = x H5+ h=15.72+0.67=16.39 (m).

Before grouping the option, we introduce the concept median value of the interval x i, equal to the value feature equidistant from the ends of this interval. Considering that it is spaced from the lower boundary by the amount equal to half width of the interval, to determine it it is convenient to use the relation:

x i=x N i+ h/2,

Where x N i - lower limit i-ro interval, and h- its width. The median values ​​of the intervals will be used later when processing grouped data.

After determining the boundaries of all intervals, the sample options should be distributed across these intervals. But first you need to decide to which interval to include a value located exactly on the border of two intervals, that is, when the value of the options coincides with the upper limit of one and the lower limit of the interval adjacent to it. In this case, the option can be assigned to any of the two adjacent intervals and, to eliminate ambiguity in grouping, we agree in such cases to assign the options to the upper interval. The following argument can be made in favor of this approach. Since the minimum value of the attribute coincides with the lower limit of the first interval and is included in this interval, then the option that falls on the boundary of two intervals should be classified as one of them, the value of the lower limit of which is equal to the option under consideration.

Let's move on to consider the statistical table - see Table 4, which consists of seven columns.

Table 4

Tabular presentation of results in shot put

The first three columns of the statistical table contain, respectively, the numbers of grouping intervals i, their boundaries x N i- x IN i and median values ​​of intervals x i .

The fourth column contains the frequencies of the intervals. Frequency interval is a number showing how many options there are, i.e. measurement results fell within this interval. To denote this quantity it is customary to use the symbol n i. The sum of all frequencies of all intervals is always equal to the sample size n, which can be used to check the correctness of the grouping.

The fifth column of table 4 is intended for entering into it accumulated frequency interval - a number obtained by summing the frequency of the current interval with the frequencies of all previous intervals. The accumulated frequency is usually denoted Latin letter N i. The accumulated frequency shows how many options have values ​​no greater than the upper limit of the interval.

The sixth column of the table contains frequency. Frequency is called a frequency presented in relative terms, i.e. ratio of frequency to sample size. The sum of all frequencies is always equal to 1. The symbol is used to indicate frequency f i:

f i=n i /n.

The frequency of an interval is related to the probability of a random variable falling into this interval. According to Bernoulli's theorem, with an unlimited increase in the number of experiments, the frequency of an event converges in probability to its probability. If we understand by an event that the value of the studied quantity falls into a certain interval, then it becomes clear that when large number experiments, the frequency of the interval approaches the probability of the measured random variable falling into this interval.

Both frequency and frequency describe the repeatability of results in a sample. Comparing them statistical significance, it should be noted that the information content of frequency is significantly higher than that of frequency. Indeed, if, as, for example, in Table 4, the frequency of the second interval is 8 and, therefore, 8 results fell into this interval, then it is difficult to understand whether this is a little or a lot; if there are 1000 variants in the sample, then this frequency is small, and if there are 20, then it is large. In this case, for objective assessment it is necessary to compare the frequency value with the sample size. If you use frequency, you can immediately tell what proportion of the results fell within the interval under consideration (approximately 28% in the example given). Therefore, frequency gives a more visual representation of the repeatability of a characteristic in a sample. Another important advantage of frequency should be especially noted. Its use makes it possible to compare samples of different sizes. Frequency is not applicable for such purposes.

The seventh column of the table contains the accumulated frequency. Cumulative frequency is the ratio of the accumulated frequency to the sample size. The accumulated frequency is indicated by the letter F i:

The accumulated frequency shows what proportion of the sample variant has values ​​that do not exceed the value of the upper limit of the interval.

Last line The statistical table is used to control the grouping.

After filling out the table, let's return to defining the statistical series. As a rule, a statistical series is presented in the form of a table, the first line of which lists the intervals, and the second line lists the frequencies or frequencies corresponding to them. Thus, statistically close called double number series, establishing a connection between numerical value the characteristic being studied and its repeatability in the sample. Significant advantage statistical series is that they, unlike statistical aggregates, give a clear idea of characteristic features variation of signs.


©2015-2019 site
All rights belong to their authors. This site does not claim authorship, but provides free use.
Page creation date: 2016-08-20

With a large number of observations (on the order of hundreds), a simple statistical aggregate ceases to be a convenient form of recording statistical material - it becomes too cumbersome and not very visual. To make it more compact and clear, the statistical material must be subjected to additional processing - the so-called “statistical series” is constructed.

Let us assume that we have at our disposal the results of observations on a continuous random variable, presented in the form of a simple statistical population. Let us divide the entire range of observed values ​​into intervals or “digits” and count the number of values ​​per each digit. Divide this number by total number observations and find the frequency corresponding to this category:

The sum of the frequencies of all digits must obviously be equal to one.

Let's build a table that shows the digits in the order of their location along the abscissa axis and the corresponding frequencies. This table is called a statistical series:

Here, the designation of the th category is its boundaries; - corresponding frequency; - number of digits.

Example 1. 500 measurements of the lateral aiming error were made when firing from an aircraft at a ground target. The measurement results (in thousandths of a radian) are summarized in a statistical series:

The ranges of pointing error values ​​are indicated here; - the number of observations in a given interval, - the corresponding frequencies.

When grouping the observed values ​​of a random variable into categories, the question arises of which category to assign a value that is exactly on the border of two categories. In these cases, it can be recommended (purely conditionally) to consider this value to belong to equally to both digits and add to the numbers of both digits according to .

The number of digits into which statistical material should be grouped should not be too large (then the distribution series becomes inexpressive, and the frequencies in it exhibit irregular fluctuations); on the other hand, it should not be too small (with a small number of digits, the distribution properties are described too roughly by the statistical series). Practice shows that in most cases it is rational to choose a number of digits of the order of 10 - 20. The richer and more homogeneous the statistical material, the greater the number of digits can be chosen when compiling a statistical series. The lengths of the bits can be either the same or different. It’s easier, of course, to take them the same. However, when registering data about random variables, distributed extremely unevenly, sometimes it is convenient to choose narrower discharges in the region of highest distribution density than in the region of low density.

A statistical series is often presented graphically in the form of a so-called histogram. The histogram is constructed as follows. The digits are plotted along the abscissa axis, and on each of the digits, as their base, a rectangle is constructed, the area of ​​which is equal to the frequency of the given digit. To construct a histogram, you need to divide the frequency of each digit by its length and take the resulting number as the height of the rectangle. In the case of discharges of equal length, the heights of the rectangles are proportional to the corresponding frequencies. From the method of constructing the histogram it follows that total area its equal to one.

As an example, we can give a histogram for the pointing error, constructed from the data of the statistical series considered in example 1 (Fig. 7.3.1).

Obviously, as the number of experiments increases, smaller and smaller categories can be selected; in this case, the histogram will increasingly approach a certain curve limiting the area, equal to one. It is easy to see that this curve is a graph of the distribution density of the quantity .

Using the data of the statistical series, it is possible to approximately construct the statistical distribution function of the quantity . Constructing an accurate statistical distribution function with several hundred jumps in all observed values ​​is too labor-intensive and does not justify itself. For practice, it is usually enough to construct a statistical distribution function over several points. It is convenient to take as these points the boundaries of the categories that appear in the statistical series. Then obviously

(7.3.2)

By connecting the obtained points with a broken line or a smooth curve, we obtain an approximate graph of the statistical distribution function.

Example 2. Build approximately statistical function distribution of the aiming error according to the statistical series of example 1.



Did you like the article? Share with your friends!