Application of Chebyshev's law of large numbers in economics. The concept of Lyapunov's central limit theorem

Law large numbers is central law theory of probability due to the fact that it formulates a fundamental connection between regularity and randomness. Namely, he argues that a large number of accidents leads to a pattern, which makes it possible to predict the course of events. In its most general form it is expressed Chebyshev's theorem:

Let ( Χ 1; X 2 ; … X n ; ...) independent random variables (they are assumed to be infinite number). And let their variances be uniformly bounded (that is, the variances of all these random variables do not exceed some constant WITH):

Then, no matter how small the positive number is, the limiting probability relation is satisfied:

if the number of random variables is large enough. Or, what is the same thing, probability

Thus, Chebyshev's theorem states that if we consider a sufficiently large number n independent random variables ( Χ 1; X 2 ; … Xn), then the event can be considered almost reliable (with a probability close to unity) that the deviation of the arithmetic mean of these random variables from the arithmetic mean of their mathematical expectations will be according to absolute value as small as you like.

Proof. Χ 1; X 2 ; … Xn):

(4)

; (5)

Taking into account conditions (1), we establish that

(6)

Thus, when the variance is . That is, when the spread of values ​​of a random variable around it mathematical expectation decreases indefinitely. And this means that when the value, that is, . Or, to be more precise, the probability that a random variable will at least somehow deviate from its mathematical expectation - a constant - tends to zero. Namely, for any arbitrarily small positive number

So, according to the proven Chebyshev theorem, the arithmetic mean large number independent random variables ( Χ 1; X 2 ; … Xn), being a random variable, actually loses the character of randomness, becoming, in fact, an unchangeable constant. This constant is equal to the arithmetic mean of the mathematical expectations of the values ​​( Χ 1; X 2 ; … Xn). This is the law of large numbers.

Another proof of Chebyshev's theorem can be given. To do this, we use Chebyshev’s inequality. It is valid for both discrete and continuous random variables and has value in itself. Chebyshev's inequality allows us to estimate the probability that the deviation of a random variable from its mathematical expectation does not exceed in absolute value positive number. Let us present a proof of Chebyshev's inequality for discrete random variables.



Chebyshev's inequality: The probability that the deviation of a random variable X from its mathematical expectation in absolute value is less than a positive number, not less than:

.

Proof: Since events consisting in the implementation of inequalities And , are opposite, then the sum of their probabilities is equal to 1, i.e. . Hence the probability we are interested in. (*)

We'll find . For this let's find the variance random variable X.

All terms of this sum are non-negative. Let us discard those terms for which (for the remaining terms ), as a result of which the amount can only decrease. Let us agree to assume, for definiteness, that k first terms (we will assume that in the distribution table possible values numbered in that order). Thus,

Since both sides of the inequality are positive, therefore, squaring them, we obtain the equivalent inequality . Let us use this remark, replacing each of the factors in the remaining sum number (in this case the inequality can only increase), we get. (**)

According to the addition theorem, the sum of the probabilities is the probability that X will take one, no matter which, value , and for any of them the deviation satisfies the inequality . It follows that the sum expresses the probability . This allows us to rewrite inequality (**) as follows: . (***).

Let's substitute (***) V (*) and we get , which was what needed to be proven.

Proof of Chebyshev's Theorem 2:

Let us introduce a new random variable- arithmetic mean of random variables ( Χ 1; X 2 ; … Xn):

Using the properties of mathematical expectation and dispersion, we obtain:

; . (*)

Applying Chebyshev's inequality to the quantity, we have.

Considering the ratio (*),

By condition, it means . (***) Substituting right side(***) into inequality (**) we have

From here, passing to the limit at , we obtain

Since the probability cannot exceed one, we finally get:

Which is what we needed to prove.

Let us dwell on one important particular case of Chebyshev’s theorem. Namely, consider the case when independent random variables ( Χ 1; X 2 ; … Xn) have the same distribution laws, and, consequently, the same numerical characteristics:

(8)

Then for the random variable , according to (5), we have:

(9)

The limiting probability relation (7) in this case will take the form:

(10)

The conclusion following from (10) has great value to combat random errors when making various types of measurements.

Let, for example, you need to measure a certain quantity A. We will produce not one, but several ( n) independent repeated measurements of the value of this quantity. Any measurements are inherent in a random error associated with the imperfection of the measuring device, all kinds of random interference in the measurement, etc. Therefore the results ( Χ 1; X 2 ; … Xn) individual sequential measurements of the desired value A, generally speaking, will not be given - they will be random variables. Moreover, with quantities having identical distributions, because measurements are made repeatedly, that is, at constant external conditions. Then for the quantity - the arithmetic mean of the results of all n measurements - the limiting probability relation (10) will be fulfilled. This means that this arithmetic mean loses the character of randomness, turning into Atrue meaning measured quantity. This, by the way, is evidenced by formulas (9), according to which:

(11)

That is, having carried out a sufficiently large number of repeated measurements of the desired quantity A, in each of which a random measurement error is possible, and then finding the arithmetic mean of the results of these measurements, we use the formula

A(12)

we can get the value and practically without random errors.

This conclusion is a consequence of the law of large numbers. IN in this case this law is manifested in the fact that when summing up the measurement results in (4) random errors individual dimensions, occurring in principle equally often with both a plus and a minus sign, will generally cancel each other out. And the remaining error will still be divided into n, that is, it will further decrease by n once. So when large values n the value will be almost exactly equal to the measured value A. This conclusion is naturally widely used in practice.

Note. In magnitude they cancel each other out only random errors measurements, that is, errors associated with the action of random factors (interference). But systematic (permanent) errors, that is, errors inherent in each measurement, naturally remain in . For example, a arrow that is knocked down (not adjusted) in a device causes a constant (systematic) error in each measurement, and therefore causes it in the arithmetic mean of the results of these measurements. Systematic errors must be eliminated even before measurements are taken and not allowed during the measurement process.

Then, if α is the division value of the measuring device, then all repeated measurements are made with an accuracy of α. But then, naturally, the arithmetic mean of the results of all measurements can only be indicated with an accuracy of α, that is, with an accuracy determined by the accuracy of the device.

Therefore, one should not think that, having made a sufficiently large number of repeated measurements of the quantity A and then finding the arithmetic mean of the results of these measurements, we get exact meaning A. We will get it only within the accuracy of the measuring device. And even then, if we exclude systematic measurement error.

Here's another important one special case law of large numbers. Let X=k– the number of occurrences of some event A V n repeated tests ( X– random variable). And let and – probability of occurrence and non-occurrence of an event A in one test. Consider a random variable - the relative frequency of occurrence of an event A V n tests. Let us also introduce n random variables ( X 1, X 2, …X n), which represent the number of occurrences of the event A in the first, second,... n-th tests. Then k = X 1 + X 2 +…+ X p, and the occurrence of an event A practically coincides with the probability of the event occurring A in one test. This conclusion is the basis for finding the probabilities of many random events, whose probabilities cannot be found in some other way (theoretically).

For example, let the test be tossing a deformed (asymmetrical) coin, and the event A for this challenge, it's a crest drop. Probability of event A By classical formula or some other way theoretical formula it is difficult to find, because such a formula must somehow reflect the characteristics of the deformation of the coin. Therefore, the real path leading to the goal is one: toss the coin repeatedly (the greater the number of tosses n, the better) and determine empirically the relative frequency of the appearance of the coat of arms. If n is large, then in accordance with the law of large numbers it is possible with high probability assert that .

The law of large numbers manifests itself in many natural and social phenomena.

Example 1. As is known, gas placed in a closed vessel exerts pressure on the walls of the vessel. According to the laws of the gas state, at a constant gas temperature, this pressure is constant. Gas pressure is caused by chaotic impacts of individual molecules against the walls of the vessel. The speeds and directions of movement of all molecules are different, therefore the forces of impacts of different molecules on the walls of the vessel are also different. However, the gas pressure on the walls of the vessel is determined not by the impact force of individual molecules, but by their average by force. But she's like the average one huge number regardless active forces, according to the law of large numbers, will remain practically unchanged. Therefore, the gas pressure on the walls of the vessel remains practically unchanged.

Example 2. An insurance company that deals, for example, with auto insurance, pays different insurance amounts for different insured events (car accidents and road traffic accidents). However, the average value of this insurance amount, as an average of many different n independent insurance amounts, according to the law of large numbers, will be practically unchanged. It can be determined by examining the actual statistics of insurance claims. In order for an insurance company to avoid losses, the average insurance premium charged to its clients must be higher than the average premium paid by the company to its clients. But this premium should not be too high for the company to be competitive (to compete in attractiveness with other insurance companies).

At the beginning of the course we already talked about the fact that mathematical laws probability theories are obtained by abstracting real statistical patterns inherent in mass random phenomena. The presence of these patterns is associated precisely with the mass nature of the phenomena, that is, with a large number of homogeneous experiments performed or with a large number of cumulative random influences, which in their totality generate a random variable that is subject to a well-defined law. The property of stability of mass random phenomena has been known to mankind since ancient times. Whatever area it manifests itself, its essence boils down to the following: specific features each individual random phenomenon has almost no effect on the average result of masses and such phenomena; random deviations from the average, inevitable in each individual phenomenon, are mutually canceled out, leveled out, leveled out in the mass. It is this stability of averages that represents the physical content of the “law of large numbers,” understood in the broad sense of the word: with a very large number of random phenomena, their average result practically ceases to be random and can be predicted with a high degree of certainty.

IN in the narrow sense the word “law of large numbers” in probability theory means a series mathematical theorems, in each of which, for certain conditions, the fact that the average characteristics of a large number of experiments approach certain certain constants is established.

In 2.3 we already formulated the simplest of these theorems - the theorem of J. Bernoulli. She claims that with a large number of experiments, the frequency of an event approaches (more precisely, converges in probability) to the probability of this event. With others, more general forms We will introduce the law of large numbers in this chapter. All of them establish the fact and conditions of convergence in probability of certain random variables to constant, non-random variables.

The law of large numbers plays an important role in practical applications probability theory. The property of random variables, under certain conditions, to behave almost like non-random ones allows one to confidently operate with these quantities and predict the results of mass random phenomena with almost complete certainty.

The possibilities of such predictions in the field of mass random phenomena are further expanded by the presence of another group of limit theorems, which concern not the limiting values ​​of random variables, but the limiting laws of distribution. It's about about a group of theorems known as the "central limit theorem". We have already said that when summing a sufficiently large number of random variables, the distribution law of the sum indefinitely approaches normal, subject to certain conditions. These conditions, which can be formulated mathematically in various ways - in a more or less general form - essentially boil down to the requirement that the influence on the sum of individual terms be uniformly small, that is, that the sum does not include members that clearly dominate the totality the rest according to their influence on the dispersion of the amount. The various forms of the central limit theorem differ from each other in the conditions for which this limiting property of the sum of random variables is established.

Various forms of the law of large numbers along with various forms The central limit theorem forms a set of so-called limit theorems of probability theory. Limit theorems make it possible not only to make scientific forecasts in the field of random phenomena, but also to evaluate the accuracy of these forecasts.

In this chapter we will consider only some of the most simple shapes limit theorems. First, we will consider theorems related to the “law of large numbers” group, then the theorems related to the “central limit theorem” group.

()

The meaning of Chebyshev's law of large numbers is as follows. While an individual random variable can take values ​​very far from its mathematical expectation, the arithmetic mean of a large number of random variables with a probability close to unity takes a value that differs little from the arithmetic mean of their mathematical expectations.
A special case of Chebyshev's law of large numbers. Let - a sequence of pairwise independent random variables that have jointly limited variances, i.e. and the same mathematical expectations . Then, whatever it may be , the relation is valid

This follows directly from formula (), since

Comment. They say that a random variable converges in probability to the number A, if for arbitrarily small probability of inequality with increasing n approaches unity without limit. Convergence in probability does not mean that . Indeed, in the latter case the inequality holds for all sufficiently large values n. In the case of convergence in probability, this inequality for individual arbitrarily large values n Maybe not executed. However, failure to satisfy the inequality for large values n There is a very rare (unlikely) event. Taking this into account, a special case of Chebyshev's law of large numbers can be formulated as follows.
Arithmetic mean pairwise independent random variables , having jointly limited variances and identical mathematical expectations , converges in probability to a.
Let us explain the meaning of a special case of Chebyshev’s law of large numbers. Suppose we want to find the true value A some physical quantity(for example, the size of some part). To do this, we will make a series of measurements independent of each other. Every measurement is accompanied by some error (). Therefore, every possible measurement result is a random variable (index i- measurement number). Let us assume that in each measurement there is no systematic error, i.e. deviation from the true value A of the measured quantity in both directions are equally probable. In this case, the mathematical expectations of all random variables are the same and equal to the measured value A, i.e.
Let us finally assume that the measurements are made with some guaranteed accuracy. This means that for all measurements . Thus, we are in the conditions of Chebyshev’s law of large numbers, and therefore, if the number of dimensions is large enough, then with practical certainty we can say that whatever , the average arithmetic results measurement differs from the true value A less than

1. /PB-MS-theory/Lectures-1(4 pages).doc
2. /PB-MS-theory/Lectures-2(4 pages).doc
3. /PB-MS-theory/Lectures-3(4 pages).doc
4. /PB-MS-theory/Lectures-4 (4 pp.).doc
5. /PB-MS-theory/Table of Contents.doc
Lecture 1
Lecture 19. Statistical testing of statistical hypotheses. General principles of hypothesis testing. Concepts of statistical hypothesis (simple and complex), null and competing hypothesis,
Law of large numbers. Chebyshev's inequality. Theorems of Chebyshev and Bernoulli
Lecture Basic numerical characteristics of discrete and continuous random variables: mathematical expectation, dispersion and standard deviation. Their properties and examples
Lecture Subject of probability theory. Random events. Algebra of events. Relative frequency and probability of a random event. Complete group of events. Classic definition of probability. Basic properties of probability. Basic formulas of combinatorics

Lecture 13.

Law of large numbers. Chebyshev's inequality. Theorems of Chebyshev and Bernoulli.
The study of statistical patterns made it possible to establish that, under certain conditions, the overall behavior large quantity random variables almost loses their random character and becomes natural (in other words, random deviations from some average behavior cancel out each other). In particular, if the influence on the sum of individual terms is uniformly small, the distribution law of the sum approaches normal. Mathematical formulation This statement is given in a group of theorems called law of large numbers.

Chebyshev's inequality.
Chebyshev's inequality, used to prove further theorems, is valid for both continuous and discrete random variables. Let us prove it for discrete random variables.
Theorem 13.1 (Chebyshev inequality). p( | XM(X)| D( X) / ε². (13.1)

Proof. Let X is given by the distribution series


X

X 1

X 2



X n

r

r 1

r 2



r n

Since events | XM(X)| X M(X)| ≥ ε are opposite, then r (|XM(X)| p(| XM(X)| ≥ ε) = 1, therefore, r (|XM(X)| p(| XM(X)| ≥ ε). We'll find r (|XM(X)| ≥ ε).

D(X) = (x 1 – M(X))² p 1 + (x 2 – M(X))² p 2 + … + (x n M(X))² p n . Let us exclude from this sum those terms for which | XM(X)| k terms. Then

D(X) ≥ (x k + 1 – M(X))² p k + 1 + (x k + 2 – M(X))² p k +2 + … + (x n M(X))² p n ≥ ε² ( p k + 1 + p k + 2 + … + p n).

Note that p k + 1 + p k + 2 + … + p n there is a possibility that | XM(X)| ≥ ε, since this is the sum of the probabilities of all possible values X, for which this inequality is true. Hence, D(X) ≥ ε² r(|XM(X)| ≥ ε), or r (|XM(X)| ≥ ε) ≤ D(X) / ε². Then the probability of the opposite event p( | XM(X)| D( X) / ε², which is what needed to be proven.
Theorems of Chebyshev and Bernoulli.

Theorem 13.2 (Chebyshev's theorem). If X 1 , X 2 ,…, X n– pairwise independent random variables whose variances are uniformly limited ( D(X i) ≤ C), then for an arbitrarily small number ε the probability of inequality

will be arbitrarily close to 1 if the number of random variables is large enough.

Comment. In other words, if these conditions are met

Proof. Consider a new random variable
and find its mathematical expectation. Using the properties of mathematical expectation, we obtain that . Apply to Chebyshev inequality: Since the random variables under consideration are independent, then, taking into account the conditions of the theorem, we have: Using this result, we present the previous inequality in the form:

Let's go to the limit at
: Since the probability cannot be greater than 1, it can be stated that

The theorem has been proven.
Consequence.

If X 1 , X 2 , …, X n– pairwise independent random variables with uniformly limited variances, having the same mathematical expectation equal to A, then for any arbitrarily small ε > 0 the probability of inequality
will be as close to 1 as desired if the number of random variables is large enough. In other words,
.

Conclusion: the arithmetic mean of a sufficiently large number of random variables takes values ​​close to the sum of their mathematical expectations, that is, it loses the character of a random variable. For example, if a series of measurements of any physical quantity is carried out, and: a) the result of each measurement does not depend on the results of the others, that is, all the results are pairwise independent random variables; b) measurements are made without systematic errors (their mathematical expectations are equal to each other and equal to the true value A measured quantity); c) a certain accuracy of measurements is ensured, therefore, the dispersions of the random variables under consideration are uniformly limited; then, with a sufficiently large number of measurements, their arithmetic mean will turn out to be arbitrarily close to the true value of the measured quantity.
Bernoulli's theorem.
Theorem 13.3 (Bernoulli's theorem). If in each of n independent experiments probability r occurrence of an event A is constant, then with a sufficiently large number of tests, the probability that the deviation module of the relative frequency of occurrences A V n experiments from r will be as small as desired, as close to 1 as desired:

(13.2)

Proof. Let's introduce random variables X 1 , X 2 , …, X n, Where X i number of appearances A V i-m experience. At the same time X i can take only two values: 1 (with probability r) and 0 (with probability q = 1 – p). In addition, the random variables under consideration are pairwise independent and their variances are uniformly bounded (since D(X i) = pq, p + q = 1, where from pq ≤ ¼). Consequently, Chebyshev’s theorem can be applied to them for M i = p:

.

But
, because X i takes on a value of 1 when it appears A V this experience, and a value equal to 0 if A didn't happen. Thus,

Q.E.D.
Comment. From Bernoulli's theorem shouldn't, What
It's only about probabilities that the difference between the relative frequency and the absolute probability can become arbitrarily small. The difference is as follows: with the usual convergence considered in mathematical analysis, for everyone n, starting from some value, the inequality
always executed; in our case there may be such values n, for which this inequality is not true. This type of convergence is called convergence in probability.

Lecture 14.

Lyapunov's central limit theorem. Moivre-Laplace limit theorem.
The law of large numbers does not examine the form limit law distribution of the sum of random variables. This question is considered in a group of theorems called central limit theorem. They argue that the law of distribution of a sum of random variables, each of which can have different distributions, approaches normal when the number of terms is sufficiently large. This explains the importance of the normal law for practical applications.
Characteristic functions.

To prove the central limit theorem, the method of characteristic functions is used.
Definition 14.1.Characteristic function random variable X called function

g(t) = M (e itX ) (14.1)

Thus, g (t) represents the mathematical expectation of some complex random variable U = e itX, associated with the value X. In particular, if X– discrete random variable, given nearby distribution, then

. (14.2)

For a continuous random variable with distribution density f(x)

(14.3)

Example 1. Let X– number of 6 points in one throw dice. Then according to formula (14.2) g(t) =

Example 2. Let's find the characteristic function for a normalized continuous random variable distributed over normal law
. According to formula (14.3) (we used the formula
and what i² = -1).

Properties of characteristic functions.
1. Function f(x) can be found at known function g(t) according to the formula

(14.4)

(transformation (14.3) is called Fourier transform, and transformation (14.4) – inverse transformation Fourier).

2. If random variables X And Y related by the relation Y = aX, then their characteristic functions are related by the relation

g y (t) = g x (at). (14.5)

3. The characteristic function of the sum of independent random variables is equal to the product of the characteristic functions of the terms: for

(14.6)
Theorem 14.1 (central limit theorem for identically distributed terms). If X 1 , X 2 ,…, X n,… - independent random variables with the same law distribution, mathematical expectation T and variance σ 2, then with unlimited increase n law of sum distribution
infinitely approaches normal.

Proof.

Let us prove the theorem for continuous random variables X 1 , X 2 ,…, X n(proof for discrete quantities similarly). According to the conditions of the theorem, the characteristic functions of the terms are identical:
Then, by property 3, the characteristic function of the sum Y n will
Let's expand the function g x (t) in the Maclaurin series:

, Where
at
.

Assuming that T= 0 (that is, move the origin to the point T), That
.

(because T= 0). Substituting the results obtained into the Maclaurin formula, we find that

.

Consider a new random variable
, different from Y n in that its dispersion for any n equals 0. Since Y n And Z n connected linear dependence, it is enough to prove that Z n distributed according to the normal law, or, which is the same thing, that its characteristic function approaches characteristic function normal law (see example 2). By the property of characteristic functions

Let's logarithm the resulting expression:

Where

Let's decompose
in a row at n→ ∞, limiting ourselves to two terms of the expansion, then ln(1 - k) ≈ - k. From here

Where the last limit is 0, since at . Hence,
, that is
- characteristic function normal distribution. So, with an unlimited increase in the number of terms, the characteristic function of the quantity Z n unlimitedly approaches the characteristic function of the normal law; therefore, the distribution law Z n (And Y n) approaches normal without limit. The theorem has been proven.

A.M. Lyapunov proved the central limit theorem for conditions more general view:
Theorem 14.2 (Lyapunov's theorem). If the random variable X is the sum of a very large number of mutually independent random variables for which the following condition is satisfied:

, (14.7)

Where b k – third absolute central moment of magnitude X To, A D k is its variance, then X has a distribution close to normal (Lyapunov’s condition means that the influence of each term on the sum is negligible).
In practice, one can use the central limit theorem for sufficiently small quantity terms, since probabilistic calculations require relatively low accuracy. Experience shows that for a sum of even ten or fewer terms, the law of their distribution can be replaced by a normal one.

A special case of the central limit theorem for discrete random variables is the Moivre-Laplace theorem.

Theorem 14.3 (Moivre-Laplace theorem). If produced n independent experiments, in each of which an event A appears with probability r, then the following relation is valid:

(14.8)

Where Y – number of occurrences of the event A V n experiments, q = 1 – p.

Proof.

We will assume that
, Where X i– number of occurrences of the event A V i-m experience. Then the random variable
(see Theorem 14.1) can be considered normally distributed and normalized; therefore, the probability of its falling into the interval (α, β) can be found by the formula

Since Y has binomial distribution, . Then
. Substituting this expression into the previous formula, we obtain equality (14.8).

Consequence.

Under the conditions of the Moivre-Laplace theorem, the probability
that the event A will appear in n experiments exactly k times, with a large number of experiments can be found using the formula:

(14.9)

Where
, A
(the values ​​of this function are given in special tables).

Example 3. Find the probability that with 100 coin tosses, the number of coats of arms will be in the range from 40 to 60.

Let us apply formula (14.8), taking into account that n= 0.5. Then pr= 100·0.5 = 50, Then, if
Hence,

Example 4. Under the conditions of the previous example, find the probability that 45 coats of arms will appear.

We'll find
, Then

Lecture 15.

Basic Concepts mathematical statistics. Population and sample. Variation series, statistical series. Grouped sample. Grouped statistical series. Frequency polygon. Sample distribution function and histogram.
Mathematical statistics deals with the establishment of patterns that govern mass random phenomena, based on the processing of statistical data obtained as a result of observations. The two main tasks of mathematical statistics are:

Determining how to collect and group these statistics;

Development of methods for analyzing the obtained data depending on the objectives of the study, which include:

a) assessment of the unknown probability of an event; estimation of unknown distribution function; estimation of distribution parameters, the type of which is known; assessment of dependence on other random variables, etc.;

b) check statistical hypotheses about the view unknown distribution or about the values ​​of the parameters of a known distribution.

To solve these problems, you need to choose from large population homogeneous objects limited quantity objects, based on the results of the study of which it is possible to make a prediction regarding the studied characteristic of these objects.

Let us define the basic concepts of mathematical statistics.

Population – the entire set of available objects.

Sample– a set of objects randomly selected from population.

Population sizeN and sample sizen – the number of objects in the population under consideration.

Types of sampling:

Repeated– each selected object is returned to the general population before selecting the next one;

Repeatless– the selected object is not returned to the general population.
Comment. In order to be able to draw conclusions from the study of the sample about the behavior of the characteristic of the general population that interests us, it is necessary that the sample correctly represents the proportions of the general population, that is, it is representative(representative). Taking into account the law of large numbers, it can be argued that this condition is satisfied if each object is selected at random, and for any object the probability of being included in the sample is the same.
Primary processing of results.

Let the random variable we are interested in X takes the value in the sample X 1 n 1 time, X 2 – n 2 times, ..., X To – p To times, and
Where n– sample size. Then the observed values ​​of the random variable X 1 , X 2 ,…, X To called options, A n 1 , n 2 ,…, n Tofrequencies. If we divide each frequency by the sample size, we get relative frequencies
A sequence of options written in ascending order is called variational next to it, and a list of options and their corresponding frequencies or relative frequenciesstatistical series:


x i

x 1

x 2



x k

n i

n 1

n 2



n k

w i

w 1

w 2



w k

When performing 20 series of 10 dice throws, the number of six points turned out to be 1,1,4,0,1,2,1,2,2,0,5,3,3,1,0,2,2,3 ,4,1.Let's compose variation series: 0,1,2,3,4,5. Statistical series for absolute and relative frequencies has the form:


x i

0

1

2

3

4

5

n i

3

6

5

3

2

1

w i

0,15

0,3

0,25

0,15

0,1

0,05

If some continuous feature is being studied, then the variation series can consist of a very large number of numbers. In this case it is more convenient to use grouped sample. To obtain it, the interval containing all observed values ​​of the attribute is divided into several equal partial intervals of length h, and then find for each partial interval n i– the sum of frequencies of the variant included in i th interval. The table compiled from these results is called grouped statistically close :

Frequency polygon. Sample distribution function and histogram.
To visualize the behavior of the random variable under study in the sample, you can build various graphs. One of them is frequency range: a broken line whose segments connect points with coordinates ( x 1 , n 1), (x 2 , n 2),…, (x k , n k), Where x i are plotted on the x-axis, and n i – on the ordinate axis. If non-absolute values ​​are plotted on the ordinate axis ( n i), and relative ( w i) frequency, we get relative frequency polygon(Fig.1) . Rice. 1.

By analogy with the distribution function of a random variable, you can specify a certain function, the relative frequency of the event X x.

Definition 15.1.Sample (empirical) distribution function call the function F* (x), defining for each value X relative frequency of the event X x. Thus,

, (15.1)

Where n X– number of options, smaller X, n– sample size.
Comment. In contrast to the empirical distribution function found experimentally, the distribution function F(x) of the general population is called theoretical function distribution. F(x) determines the probability of an event X x, A F* (x) – its relative frequency. For sufficiently large n, as follows from Bernoulli’s theorem, F* (x) tends in probability to F(x).

From the definition of the empirical distribution function it is clear that its properties coincide with the properties F(x), namely:


  1. 0 ≤F* (x) ≤ 1.

  2. F* (x) is a non-decreasing function.

  3. If X 1 is the smallest option, then F* (x) = 0 at XX 1 ; If X To – the greatest option, then F* (x) = 1 at X> X To .
For a continuous feature, a graphic illustration is histogram, that is, a stepped figure consisting of rectangles, the bases of which are partial intervals length h, and heights lengths n i / h(frequency histogram) or w i / h (histogram of relative frequencies). In the first case, the histogram area is equal to the sample volume, in the second – unity (Fig. 2). Fig.2.

Lecture 16.

Numerical characteristics statistical distribution: sample mean, variance estimates, mode and median estimates, initial and central moment estimates. Statistical Description and calculating estimates of the parameters of a two-dimensional random vector.
One of the tasks of mathematical statistics is to estimate the values ​​of the numerical characteristics of the random variable being studied using the available sample.

Definition 16.1.Sample mean called the average arithmetic values random variable accepted in the sample:

, (16.1)

Where x i– options, n i- frequencies.

Comment. The sample mean serves to estimate the mathematical expectation of the random variable under study. The question of how accurate such an estimate is will be discussed later.

Definition 16.2.Sample variance called

, (16.2)

A sample standard deviation

(16.3)

Just as in the theory of random variables, it can be proven that following formula to calculate sample variance:

. (16.4)

Example 1. Let's find the numerical characteristics of a sample given by a statistical series


x i

2

5

7

8

n i

3

8

7

2

Other characteristics of the variation series are:

- fashionM 0 – option having highest frequency(in the previous example M 0 = 5).

- medianT e - option, which divides the variation series into two parts, equal in number of options. If the number option is odd ( n = 2k+ 1), then m e = x k + 1 , and for even n = 2k
. In particular, in example 1

Estimates of the initial and central moments (the so-called empirical moments) are determined similarly to the corresponding theoretical moments:

- the initial empirical moment of orderk called

. (16.5)

In particular,
, that is, the initial empirical moment of the first order is equal to the sample average.

- central empirical moment of orderk called

. (16.6)

In particular,
, that is, the second-order central empirical moment is equal to the sample variance.
Statistical description and calculation of characteristics

two-dimensional random vector.
In the statistical study of two-dimensional random variables, the main task is usually to identify the relationship between the components.

A two-dimensional sample is a set of random vector values: ( X 1 , at 1), (X 2 , at 2), …, (X n , y n). For it, you can determine sample averages of the components:

and the corresponding sample variances and standard deviations. In addition, one can calculate conditional averages: - arithmetic mean of observed values Y, corresponding X = x, And - average of observed values X, corresponding Y = y.

If there is a dependence between the components of a two-dimensional random variable, it may have different type: functional dependence if each possible value X matches one value Y, and statistical, in which a change in one quantity leads to a change in the distribution of another. If, as a result of a change in one value, the average value of another changes, then the statistical dependence between them is called correlation.

Lecture 17.

Basic properties of statistical characteristics of distribution parameters: unbiasedness, consistency, efficiency. Unbiasedness and consistency of the sample mean as an estimate of mathematical expectation. Sampling variance bias. An example of an unbiased variance estimator. Asymptotically unbiased estimates. Methods for constructing estimates: maximum likelihood method, moment method, quantile method, method least squares,Bayesian approach to estimation.
Having obtained statistical estimates of the distribution parameters (sample mean, sample variance, etc.), you need to make sure that they sufficiently serve as an approximation of the corresponding characteristics of the population. Let us determine the requirements that must be met.

Let Θ* - statistical evaluation unknown parameter Θ theoretical distribution. Let's extract several samples of the same size from the general population n and calculate for each of them the estimate of the parameter Θ:
Then the estimate Θ* can be considered as a random variable that takes on possible values. If the mathematical expectation of Θ* is not equal to the estimated parameter, we will receive when calculating estimates systematic errors one sign (with excess if M(Θ*) >Θ, and with a disadvantage if M(Θ*) M (Θ*) = Θ.
Definition 17.2. The statistical estimate Θ* is called unbiased, if its mathematical expectation is equal to the estimated parameter Θ for any sample size:

M(Θ*) = Θ. (17.1)

Displaced called an estimate whose mathematical expectation is not equal to the estimated parameter.

However, unbiasedness is not sufficient condition good approximation to the true value of the estimated parameter. If, in this case, possible values ​​of Θ* can deviate significantly from the average value, that is, the dispersion of Θ* is large, then the value found from the data of one sample may differ significantly from the estimated parameter. Therefore, it is necessary to impose restrictions on the dispersion.
Definition 17.2. The statistical assessment is called effective, if it is for a given sample size n has the smallest possible variance.
When considering large samples, statistical estimates are also subject to the requirement of consistency.
Definition 17.3.Wealthy is called a statistical estimate that, when n→∞ tends in probability to the estimated parameter (if this estimate is unbiased, then it will be consistent if at n→∞ its variance tends to 0).
Let's make sure that represents an unbiased estimate of the mathematical expectation M(X).

We will consider it as a random variable, and X 1 , X 2 ,…, X n, that is, the values ​​of the random variable under study that make up the sample, – as independent, identically distributed random variables X 1 , X 2 ,…, X n, having mathematical expectation A. From the properties of the mathematical expectation it follows that

But, since each of the quantities X 1 , X 2 ,…, X n has the same distribution as the general population, A = M(X), that is M(
) = M(X), which was what needed to be proven. The sample mean is not only an unbiased, but also a consistent estimate of the mathematical expectation. Assuming that X 1 , X 2 ,…, X n have limited variances, then from Chebyshev’s theorem it follows that their arithmetic mean, that is, with increasing n tends in probability to the mathematical expectation A each of their values, that is, to M(X). Consequently, the sample mean is a consistent estimate of the mathematical expectation.

Unlike the sample mean, the sample variance is a biased estimate of the population variance. It can be proven that

, (17.2)

Where D G – true value of the population variance. Another estimate of the dispersion can be proposed: corrected variances ² , calculated by the formula

. (17.3)

Such an estimate will be unbiased. It matches corrected mean standard deviation

. (17.4)

Definition 17.4. The evaluation of some attribute is called asymptotically unbiased, if for sample X 1 , X 2 , …, X n

, (17.5)

Where X– true value of the studied quantity.
Methods for constructing assessments.
1. Maximum likelihood method.
Let X– discrete random variable, which as a result n tests took values X 1 , X 2 , …, X n. Let us assume that we know the distribution law of this quantity, determined by the parameter Θ, but we do not know numerical value this parameter. Let's find its point estimate.

Let r(X i, Θ) is the probability that as a result of the test the value X will take the value X i. Let's call likelihood function discrete random variable X argument function Θ, determined by the formula:

L (X 1 , X 2 , …, X n ; Θ) = p(x 1 ,Θ) p(x 2 ,Θ)… p(x n ,Θ).

Then, as a point estimate of the parameter Θ, we take its value Θ* = Θ( X 1 , X 2 , …, X n), at which the likelihood function reaches its maximum. The estimate Θ* is called maximum likelihood estimate.

Since the functions L and ln L reach a maximum at the same value of Θ, it is more convenient to look for the maximum ln Llogarithmic function credibility. To do this you need:


Advantages of the maximum likelihood method: the obtained estimates are consistent (although they may be biased), distributed asymptotically normally for large values n and have the smallest variance compared to other asymptotically normal estimates; if for the estimated parameter Θ there is effective assessmentΘ*, then the likelihood equation has the only solutionΘ*; the method makes the most complete use of sample data and is therefore especially useful in the case of small samples.

Disadvantage of the maximum likelihood method: computational complexity.
For a continuous random variable with a known type of distribution density f(x) and an unknown parameter Θ, the likelihood function has the form:

L (X 1 , X 2 , …, X n ; Θ) = f(x 1 ,Θ) f(x 2 ,Θ)… f(x n ,Θ).

The maximum likelihood estimation of an unknown parameter is carried out in the same way as for a discrete random variable.
2. Method of moments.
The method of moments is based on the fact that the initial and central empirical moments are consistent estimates of the initial and central theoretical moments, respectively, so we can equate theoretical points corresponding empirical moments of the same order.

If the distribution density type is specified f(x, Θ), determined by one unknown parameter Θ, then to estimate this parameter it is enough to have one equation. For example, one can equate initial moments first order:

,

thereby obtaining an equation for determining Θ. Its solution Θ* will be a point estimate of the parameter, which is a function of the sample mean and, therefore, of the sample variant:

Θ = ψ ( X 1 , X 2 , …, X n).

If known species distribution density f(x, Θ 1, Θ 2) is determined by two unknown parameters Θ 1 and Θ 2, then two equations are required, for example

ν 1 = M 1 , μ 2 = T 2 .

From here
- a system of two equations with two unknowns Θ 1 and Θ 2. Its solutions will be point estimates Θ 1 * and Θ 2 * - functions of the sampling option:

Θ 1 = ψ 1 ( X 1 , X 2 , …, X n),

Θ 2 = ψ 2 ( X 1 , X 2 , …, X n).
3. Least squares method.

If you need to estimate the dependence of quantities at And X, and the form of the function connecting them is known, but the values ​​of the coefficients included in it are unknown; their values ​​can be estimated from the available sample using the least squares method. For this purpose the function at = φ ( X) is chosen so that the sum of squared deviations of the observed values at 1 , at 2 ,…, at n from φ( X i) was minimal:

In this case it is necessary to find stationary point functions φ( x; a, b, c), that is, solve the system:

(the solution, of course, is possible only in the case when it is known specific type functions φ).

Let us consider as an example the selection of parameters linear function least squares method.

In order to evaluate the parameters A And b in function y = ax + b, we'll find
Then
. From here
. Dividing both resulting equations by n and remembering the definitions of empirical moments, we can obtain expressions for A And b in the form:

. Therefore, the connection between X And at can be specified in the form:


4. Bayesian approach to obtaining estimates.
Let ( Y, X) – random vector for which the density is known r(at|x) conditional distribution Y at each value X = x. If the experiment results in only values Y, and the corresponding values X unknown, then to estimate some given function φ( X) as its approximate value, it is proposed to look for the conditional mathematical expectation M (φ‌‌( X)‌‌‌‌‌‌|Y), calculated by the formula:

, Where , r(X X, q(y) – density of unconditional distribution Y. A problem can be solved only when it is known r(X). Sometimes, however, it is possible to construct a consistent estimate for q(y), depending only on the values ​​obtained in the sample Y.

Lecture 18.

Interval estimation of unknown parameters. Estimation accuracy, confidence probability(reliability), confidence interval. Construction of confidence intervals for estimating the mathematical expectation of a normal distribution with known and unknown variance. Confidence intervals for estimating the standard deviation of a normal distribution.
When sampling a small volume point estimate may differ significantly from the estimated parameter, which leads to gross errors. Therefore, in this case it is better to use interval estimates , that is, indicate the interval in which given probability the true value of the estimated parameter falls. Of course, the shorter the length of this interval, the more accurate the parameter estimate. Therefore, if the inequality | Θ* - Θ | 0 characterizes estimation accuracy(the smaller δ, the more accurate the estimate). But statistical methods allow us to say only that this inequality is satisfied with some probability.

Definition 18.1.Reliability (confidence probability) estimate Θ* of the parameter Θ is the probability γ that the inequality is satisfied | Θ* - Θ |
p (Θ* - δ
Thus, γ is the probability that Θ falls in the interval (Θ* - δ, Θ* + δ).

Definition 18.2.Trusted is called the interval in which it falls unknown parameter with a given reliability γ.
Constructing confidence intervals.
1. Confidence interval for estimating the mathematical expectation of a normal distribution with a known variance.

Let the random variable under study X is distributed according to the normal law with a known mean square σ, and it is required to estimate its mathematical expectation based on the value of the sample mean A. We will consider the sample mean as a random variable and the values ​​are the sample option X 1 , X 2 ,…, X n as identically distributed independent random variables X 1 , X 2 ,…, X n, each of which has a mathematical expectation A and standard deviation σ. At the same time M() = A,
(we use the properties of mathematical expectation and dispersion of the sum of independent random variables). Let us estimate the probability of the inequality
. Let's apply the formula for the probability of a normally distributed random variable falling into a given interval:

r (
) = 2F
. Then, taking into account the fact that, r() = 2F
=

2F( t), Where
. From here
, and the previous equality can be rewritten as follows:

. (18.1)

So, the value of the mathematical expectation A with probability (reliability) γ falls into the interval
, where the value t is determined from the tables for the Laplace function so that the equality 2Ф( t) = γ.
Example. Let's find the confidence interval for the mathematical expectation of a normally distributed random variable if the sample size n = 49,
σ = 1.4, and confidence probability γ = 0.9.

Let's define t, at which Ф( t) = 0,9:2 = 0,45: t= 1.645. Then

, or 2.471 a a with a reliability of 0.9.
2. Confidence interval for estimating the mathematical expectation of a normal distribution with unknown variance.

If it is known that the random variable under study X distributed according to the normal law with an unknown standard deviation, then to search confidence interval for its mathematical expectation, we construct a new random variable

, (18.2)

Where - sample average, s– corrected variance, n– sample size. This random variable, the possible values ​​of which will be denoted by t, has a Student distribution (see Lecture 12) with k = n– 1 degrees of freedom.

Since the Student distribution density
, Where
, does not explicitly depend on A and σ, you can set the probability of it falling into a certain interval (- t γ , t γ ), taking into account the evenness of the distribution density, as follows:
. From here we get:

(18.3)

Thus, a confidence interval was obtained for A, Where t γ can be found from the corresponding table for given n and γ.

Example. Let the sample size n = 25, = 3, s= 1.5. Let's find the confidence interval for A at γ = 0.99. From the table we find that t γ (n= 25, γ = 0.99) = 2.797. Then
, or 2.161a a with a probability of 0.99.
3. Confidence intervals for estimating the standard deviation of a normal distribution.

We will look for a confidence interval of the form ( s – δ, s), Where s is the corrected sample standard deviation, and for δ the following condition is satisfied: p (|σ – s|
Let us write this inequality in the form:
or, designating
,

Let us consider the random variable χ, determined by the formula

,

which is distributed according to the chi-square law with n-1 degrees of freedom (see lecture 12). Its distribution density

does not depend on the estimated parameter σ, but depends only on the sample size n. Let us transform inequality (18.4) so ​​that it takes the form χ 1 Let us assume that q

,

or, after multiplying by
,
. Hence,
. Then
There are tables for the chi-square distribution from which you can find q according to given n and γ without solving this equation. Thus, having calculated the value from the sample s and determining the value from the table q, you can find the confidence interval (18.4), in which the value σ falls with a given probability γ.
Comment. If q> 1, then, taking into account the condition σ > 0, the confidence interval for σ will have boundaries

. (18.5)

Let n = 20, s= 1.3. Let us find the confidence interval for σ for a given reliability γ = 0.95. From the corresponding table we find q (n= 20, γ = 0.95) = 0.37. Therefore, the limits of the confidence interval are: 1.3(1-0.37) = 0.819 and 1.3(1+0.37) = 1.781. So, 0.819



Did you like the article? Share with your friends!