Convergence by probability definition. Limit theorems of probability theory

theory probability convergence theorem

Limit theorems of probability theory

Convergence of sequences of random variables and probability distributions

1.1.1.1 Convergence random variables

Let there be a probability space with a system of random variables and a random variable specified in it. In probability theory, the following types of convergence of sequences of random variables are considered.

A sequence of random variables converges in probability to a random variable if for any

This type of convergence is denoted as follows: , or.

A sequence of random variables converges to a random variable with probability 1 (or almost certainly) if

that is, if for all except, perhaps, from some set of zero probability (). We will denote convergence with probability 1 as follows: , or. Convergence with probability 1 is convergence almost everywhere with respect to the probability measure.

Note that convergence is an event from -algebra that can be represented in the form

Let us formulate some theorems that establish criteria for almost certain convergence.

Theorem 1.1. if and only if for any

or, what is the same,

Theorem 1.2. If the row

converges for any, then

It can be shown that convergence entails convergence (this follows from (1.1)). The converse statement is, generally speaking, false, but the following theorem is true.

Theorem 1.3. If, then there is a subsequence such that for.

The connection between convergence and convergence is established by the following theorems.

Theorem 1.4. (Levy on monotonic convergence) Let there be monotonic sequence non-negative random variables: having finite mathematical expectations limited to the same value: . Then the sequence converges with probability 1 to some random variable c, and

Theorem 1.5. (Lebesgue on dominated convergence) Let and be quantities where is a non-negative random variable having a finite mathematical expectation. Then the random variable also has a finite mathematical expectation and

A sequence of random variables converges to a random variable with a mean of order if

We will denote such convergence. When they talk about convergence in the mean square and denote it. By virtue of the generalized Chebyshev inequality, convergence follows. From convergence in probability, and especially from convergence almost certainly, convergence of order does not follow. Thus, convergence in probability is the weakest convergence of the three we have considered.

A sequence is said to be fundamental in probability (almost certainly, on average of order) if for any

Theorem 1.6. (Cauchy convergence criterion) In order for it to be in any sense (in probability, almost certainly, on the average of order) it is necessary and sufficient that the sequence be fundamental in the corresponding sense.

1.1.1.2 Weak convergence of distributions

It is said that the probability distribution of random variables weakly converges to the distribution of a random variable if, for any continuous bounded function

We will denote weak convergence as follows: . Note that convergence implies convergence. The reverse is not true, however for weak convergence entails convergence in probability.

Condition (1.2) can be rewritten using the Lebesgue integral with respect to measure as follows

For random variables having a probability density, weak convergence means convergence for any bounded function

If we are talking about distribution functions and the corresponding and, then weak convergence means that

Sequences of random variables X 1, X 2 , . . ., X n, . . ., given to a certain rum probability space to a random variable X, defined as follows: if for any
In mathematics In analysis, this convergence is called convergence in measure. From N. to E. flows out convergence in distribution.
V. I. Bityutskov.

Mathematical encyclopedia. - M.: Soviet Encyclopedia.

I. M. Vinogradov.

    1977-1985.

    See what “CONVERGENCE IN PROBABILITY” is in other dictionaries: - ... Wikipedia

    Convergence with probability one, convergence of a sequence of random variables X1, X2, . . ., X p. . . ., defined on a certain probability space to a random variable X, defined as follows: (or a.s.), if B mathematical... ...

    Mathematical Encyclopedia In probability theory, a type of convergence of random variables. Contents 1 Definition 2 Notes ... Wikipedia have a limit. The concepts make sense for arbitrary sequences, series and integrals: Limit of a sequence... ... Wikipedia

    This term has other meanings, see Convergence. A sequence of functions converges almost everywhere to a limit function if the set of points for which there is no convergence has measure zero. Contents 1 Definition 1.1 Term ... Wikipedia

    This term has other meanings, see Convergence. Convergence in functional analysis, probability theory and related disciplines type of convergence of measurable functions or random variables. Definition Let space with... ... Wikipedia

    - (by probability) in functional analysis, probability theory and related disciplines, this is a type of convergence of measurable functions (random variables) defined on a space with a measure (probability space). Definition: Let a space have a measure... ... Wikipedia

    A mathematical concept meaning that some variable quantity has a Limit. In this sense, they speak of a sequence, a series, an infinite product, a continued fraction, an integral, etc. The concept of a series arises... ... Great Soviet Encyclopedia

    Same as convergence in probability... - ... Wikipedia

    The general principle is that joint action random factors leads to some very general conditions to a result almost independent of chance. The convergence of the frequency of occurrence of a random event with its probability as the number increases... ... - ... Wikipedia

Books

  • Probability theory and mathematical statistics in problems More than 360 problems and exercises, Borzykh D.. The proposed manual contains problems various levels difficulties. However, the main emphasis is on tasks of medium complexity. This is done intentionally to encourage students to...
  • Probability theory and mathematical statistics in problems. More than 360 tasks and exercises, Borzykh D.A.. The proposed manual contains tasks of varying levels of complexity. However, the main emphasis is on tasks of medium complexity. This is done intentionally to encourage students to...

In the future, we will have to widely operate with derivatives and integrals of random processes. Both operations - differentiation and integration - assume, as is known, the convergence of a certain sequence of quantities to a limit. But for random variables that are not defined deterministically, but by their probability distributions, the concept of convergence to the limit (and thereby the concepts of continuity, differentiability, integrability for random functions) cannot have the same meaning that is put into it in the analysis. For a sequence of random variables, only a probabilistic definition of convergence to the limit is possible, which, by the way, opens up more diverse possibilities in choosing the definition itself. Probabilistic convergence is also essential for considering the so-called ergodic properties of random functions, which we will address in the next section.

Let's start, for simplicity, by considering various types convergence of a sequence of random variables to a (non-random) number a.

One of the types of probabilistic convergence is convergence in the mean square (rms), which means that the mean goes to zero square deviation from the number a at

which is written in the form

Designation 1. i. m. made up of initial letters English name this limit (limit in the mean square). The use of this type of convergence is most appropriate in cases where one has to deal with quadratic ones (in particular, those having energetic meaning) combinations of random variables.

Equality (19.1) obviously assumes the finiteness of the most finite and the average value since . Subtracting and adding in brackets in (19.1), we rewrite this equality differently:

But the limit of the sum of two non-negative quantities can be equal to zero, only if the limits of both terms are equal to zero, i.e.

Thus, is the limit of the sequence of means and the limit of variance is zero.

Another type of probabilistic convergence to a - convergence in probability (in ver.) - is defined as follows:

where, as usual, is any arbitrarily small positive number. In this case they write

Equality (19.2) means that the probability of hitting somewhere outside an arbitrarily narrow interval becomes zero in the limit. Due to its arbitrary smallness, this in turn means that the probability density of the random variable goes over at . However, it does not at all follow from this that a is the limit of the sequence and that D tends to zero. Moreover, they can grow unlimitedly with increasing N or even be infinite for any N. Let, for example, be non-negative and distributed according to Cauchy’s law:

For any, the limit at is equal to zero, whereas the limit does not exist. At the same time, the normalization condition is always satisfied:

so tends at . However, it is not difficult to verify that for any N and are infinite.

Convergence in probability is often called convergence in the sense of the law large numbers. Random variables are said to be extremely constant if there is such a sequence of constants that

If all are the same (equal to a), then this equality goes into (19.2), i.e., it means that it converges in probability to a or the difference - a converges in probability to zero.

Convergence in probability should be clearly distinguished from ordinary convergence

Indeed, nothing can be mathematically proven regarding the behavior of empirical numbers - values. Only statements related to theoretical concepts, including the concept of probability as defined in the original axioms. In probability convergence, we are not talking about the fact that a for , but about the fact that the probability of an event tends to unity. The connection of this statement with experience lies in the “measurement axiom”, according to which probability is measured by relative frequency

the occurrence of the random event in question in a sufficiently long series of tests, in a sufficiently large ensemble of systems, etc.

To better understand this fundamental aspect of the issue, let us dwell on some limit theorems of probability theory, united under common name the law of large numbers, namely on theorems related to the case when in (19.2) there is the arithmetic mean of N random variables

We perform a series of N tests, take their results and calculate the average (19.3). We then look to see if there is an event (let's call it a BN event) that

In order to measure the probability of a BN event, we must carry out a very large number M of series of N tests, and must have a collective of such series. The law of large numbers (19.2) states that the longer the series forming a collective (the larger N), the closer to unity, i.e., according to the “measurement axiom”, the large quantity series will correspond to the onset of BN (in the limit - almost all):

Thus, this is a completely meaningful statement, but it becomes so only with a clear comparison mathematical concept probability with the empirical concept of relative frequency. Without this, the law of large numbers remains a certain theorem, logically following from a certain system of axioms for the value P, which is defined as a completely additive, non-negative and normalized to unity function of the domain.

Often this question, which we have already touched upon in § 1, is presented in educational literature rather confusingly, without a clear indication that the “measurement axiom”, connecting the concepts of probability theory with real phenomena, with experiment and practice, is not contained in mathematical theory as such. One can come across statements that the foundation for the success of the application of probability theory in various problems of natural science and technology is laid precisely in the law of large numbers. If this were the case, it would mean that

the foundation of practical success is the logical consequence of certain abstract axioms and that these mathematical axioms themselves prescribe how empirical quantities should behave.

In principle, it would be possible to start from other axioms - and construct another probability theory, the conclusions of which, being different from those in existing theory, would be just as logically flawless and just as unnecessary for real phenomena. The situation here is the same as with the various possible geometries. But as soon as a mathematical theory is supplemented with certain methods of measuring the quantities with which it operates, and thereby becomes a physical theory, the situation changes. The correctness or incorrectness of a theory then ceases to be a question only of its logical consistency, but becomes a question of its correspondence to real things and phenomena. The question of the truth of the axioms themselves acquires content, since now this can be subjected to experimental and generally practical verification.

However, even before such verification, an internal correspondence is necessary between both parts of the physical theory: the established methods for measuring quantities should not be in conflict with the equations to which the mathematical part of the theory subordinates these quantities. For example, Newton's equations of motion assume that force is a vector, and are therefore incompatible with a way of measuring force that would characterize it only in terms of absolute value. Maybe in reality the force is not a vector, but, say, a tensor, but this is another question regarding how well it reflects objective reality given physical theory generally. We are talking now only about the fact that the presence of a contradiction between the mathematical and measurement parts of the physical theory makes it untenable even before any verification of its consequences experimentally.

From this point of view, the law of large numbers differs from other - logically equivalent - theorems of probability theory only in that, as will be seen from what follows, it especially clearly and explicitly shows the compatibility mathematical definition probability and the frequency method of measuring it. He shows that the frequency “measurement axiom” does not contradict the mathematical theory, but the latter, of course, does not and cannot replace this “axiom”.

The proof of various theorems in the form of the law of large numbers usually uses Chebyshev’s inequality, proven in his dissertation in 1846. Let a random variable have finite variance Chebyshev’s inequality

States that

If, in particular, , then inequality (19.4) takes the form

Although inequalities (19.4) and (19.5) give only a very rough estimate of P (a more accurate estimate can be obtained if the distribution law is known), they are very useful and important for theoretical constructions.

In the case when Chebyshev’s inequality contains the arithmetic mean (19.3) of N random variables, inequality (19.5) allows us to prove Chebyshev’s theorem, which is quite general expression law of large numbers. Namely, if is a sequence of pairwise independent random variables having uniformly bounded variances (D C), then

Really,

According to Chebyshev's inequality

whence theorem (19.6) follows for the probability of the opposite event, i.e. convergence in probability to

A special case of Chebyshev's theorem is Poisson's theorem. Let be random variables that fix the outcome of the test or 0 in accordance with the occurrence or non-occurrence of event A during the test in which . Then

and Chebyshev's theorem gives

This is Poisson's theorem. Even more special case- When . Then we come to Bernoulli's theorem, one of the first formulations of the law of large numbers:

Let's stop at this simplest form law. Theorem (19.8) shows that with increasing number of tests N relative frequency event A, i.e., the empirical quantity converges in probability k - the probability of event A. If this were not so, then it would be pointless to measure probability using relative frequency. But since this is so, then the frequency method of measuring probabilities both (based on the relative frequency of the occurrence of event A in a series of N tests) and P (based on the relative frequency of the occurrence of an event in a group of M series of tests) can be accepted as a complement to mathematical theory, since it does not contradict it. After this, it is already possible to ask and test experimentally whether the resulting physical theory reflects real statistical laws.

It is curious that in order to satisfy Theorem (19.8) for any values ​​of , i.e., for convergence in probability

it is enough to require that this convergence takes place only for (the relative frequency of low-probability events must be small).

Let us now write down Chebyshev's theorem for the case when everything is a. Then

and the theorem takes the form

which is the basis of the arithmetic mean rule in measurements. Individuals may deviate greatly from a, but with probability we have a for This occurs because when calculating the average value random deviations individual terms are compensated and in the vast majority of cases the deviation turns out to be very small.

Deviations from a may be random errors measurements. But if the reading accuracy itself during measurement is not less, i.e. there is systematic error, associated with the cost of division of the scale, then the accuracy is no less for any N, so it is pointless, appealing to the law of large numbers, to strive to obtain in this case the value of a with an error less than , due to There is a fairly widespread misconception that the arithmetic mean allows you to exceed the measurement accuracy limited from below and obtain, say, using a panel ammeter, a current reading accurate to microamps.

Another situation is also possible: the measured quantity itself may be random (noise current, etc.). Then we can be sure that when , i.e., the arithmetic mean tends to mathematical expectation random variable.

The condition of mutual independence of the results of measuring a random variable requires, generally speaking, its measurements to be taken at sufficiently large intervals of time. However, for the law of large numbers to be valid, this independence condition itself is not necessary, since Chebyshev’s inequality requires only for . We won't stop for more general theorems and on necessary and sufficient conditions under which the law of large numbers is valid for the arithmetic mean, since these conditions relate to the quantity itself and are therefore less interesting in practice than narrower conditions, but related to individual terms

In 1909 E. Borel (then later general form- F. P. Cantelli, then A. N. Kolmogorov) a stronger statement was proven than the law of large numbers. By Bernoulli's theorem

According to Borel (strengthened law of large numbers)

that is, with certainty, or, as they say, “almost certainly,” the relative frequency has its limit probability. This is an even firmer basis for measuring probability by relative frequency.

Based on (19.9), we can introduce another type of probabilistic convergence - convergence in the sense of the strong law of large numbers, which is also called convergence with probability or almost certainly convergence:

(19.10)

Briefly this can be written as

Sometimes, in connection with definition (19.10), confusion arises due to the fact that it involves the usual limit of a sequence of random variables. It seems that we are retreating here from the statement made above that the convergence of random variables can only have a probabilistic meaning. But that's exactly what it's about we're talking about and in in this case. Among the various realizations of the sequence, there are also possible realizations that converge to a in the usual sense. It can be shown that the set of such realizations has a certain probability P. Convergence almost certainly means that this probability, that is, the probability of a random event, is equal to one. In other words, realizations converging to a in the usual sense “almost exhaust” the set of all possible realizations of the sequence. Thus, in (19.10) we do not move anywhere from the probabilistic definition of convergence, although now we do not have in mind the limit of probability (as in convergence in probability ), and the probability is a limit.

Let us present two of the conditions for convergence to a almost certainly. One of them is necessary and sufficient

However, in practice this condition can never be verified. Another - stronger sufficient condition - is that

that for any the series must converge

Other sufficient conditions and in general, a detailed mathematical discussion of issues relating to probabilistic convergence can be found in the books (Chapter 3) and (Chapter 1).

Convergence in the mean square entails (by virtue of Chebyshev's inequality) convergence in probability, and if everyone is almost certainly uniformly bounded in absolute value, then, conversely, convergence in probability implies convergence in the mean square. Almost certainly convergence also entails convergence in probability, but not convergence in mean square; at the same time, convergence in the mean square does not almost certainly entail convergence.

Adaptation algorithms include an implementation gradient or its estimates, which depend on random process. Consequently, vectors are also random and the usual concept of convergence, which is well known to us from courses, is not directly applicable to them mathematical analysis and used in § 2.15. Therefore, it is necessary to introduce new concepts of convergence, understood not in the usual, but in the probabilistic sense.

There are three main types of such convergence: probability convergence, mean square convergence and almost certain convergence.

A random vector converges in probability to for , if the probability that for any norm exceeds tends to zero, or, briefly, if

. (3.29)

Convergence in probability does not, of course, require that every sequence of random vectors converge to k in the usual sense. Moreover, for any vector we cannot claim that ordinary convergence takes place.

A random vector converges to the mean square at , if the mathematical expectation of the squared norm tends to zero, i.e. if

. (3.30)

Convergence in mean square entails convergence in probability, but also does not imply ordinary convergence for each random vector. Convergence in the mean square is associated with the study of the second order moment, which is calculated quite simply, and, in addition, it has a clear energetic meaning. These circumstances explain the relatively widespread use of precisely this concept of convergence in physics. But the very fact that in both types of convergence the probability that a given random vector converges to in the usual sense is zero sometimes causes dissatisfaction. After all, we always operate with an implementation gradient and its corresponding random vector, and it is desirable that the limit exists precisely for the sequence of random vector that we are now observing, and not for the family of sequences of random vectors corresponding to the family of implementations , which we may never see.

This desire can be realized if we invoke the concept of almost certain convergence, or, what is the same, convergence with probability one.

Since is a random vector, the convergence of the sequence to in the usual sense can be considered as random event. A sequence of random vectors converges at k almost certainly, or with probability one if the probability of ordinary convergence to is equal to one, i.e. if

(3.31)

It follows that, neglecting the set of realizations of random vectors having a common probability equal to zero, we have the usual convergence. Of course, the rate of convergence depends on the implementation and is random.

The convergence of adaptation algorithms is equivalent to the stability of systems described by stochastic difference or differential equations. The stability of these systems must be understood in a probabilistic sense: in probability, in mean square and almost certainly (or with probability one). Probabilistic stability is a relatively new section of the theory of stability, which is now being intensively developed.

Proves for various conditions convergence in probability of average results large number observations to some constant values.  

Thus, the sequence /t(/ m)> n 1 is fundamental in probability, and, therefore, according to the Cauchy criterion for convergence in probability, there exists a random variable, denoted / (/), such that  

Chebyshev's inequality. Convergence in probability and its properties. The law of large numbers in Chebyshev form.  

Convergence in distribution and its properties. Connection with convergence in probability. Continuity theorem. Characteristic functions.  

Comment. Convergence in probability - u(Qv) y (P0) when N -> oo does not follow from  

To see why this is so, let's say you compare steel companies using price/earnings multiples and one of the firms in the group recently posted very low profits due to a strike last year. If you don't normalize earnings, the firm will look overvalued relative to the sector, since the market price is likely to be based on the expectation that labor difficulties, albeit costly, are a thing of the past. If you use a multiple such as price/sales and compare it to the industry average to make comparative valuation judgments, then you assume that sooner or later the firm's profit margin will converge to the industry average.  

Quite often, the convergence hypothesis of the neoclassical growth model is tested on the example of regions of one country. Although there may be differences between regions in terms of technology development, preferences, etc., these differences will be significantly less significant than the differences between countries. Therefore, the likelihood of absolute convergence between regions is significantly higher than between countries. However, when using regions to test a hypothesis absolute convergence an important prerequisite of the neoclassical growth model is violated - the closed economy. It is obvious that cultural, linguistic, institutional and formal barriers to the movement of factors turn out to be less significant for a group of regions of one country. However, it is shown that even in the case of factor mobility and, thus, violation of the assumptions of the original model, the dynamic properties of a closed economy and an economy with a free  

Flow with limited aftereffect Palma flow Erlang flow of k-th order Erlang distribution law of k-th order with parameter I normalized Erlang flow of k-th order central limit theorem for identically distributed terms of random variables convergence in probability aftereffect measure normal distribution normal curve Gaussian curve Gauss K.F. Chebyshev P.L.  

Here plim is the probability limit; the arrow in the last condition denotes convergence in distribution.) If these conditions are met, then for n -> co, as in situation D,  

From Fatou's lemma it follows that this process is a (non-negative) supermartingale, and, therefore, by Doob's convergence theorem (see 3b, Chapter III), with probability one exists and lim Zt(- Zoo) is finite  

Special interest represents one of them, namely the density approximation property. Pages (1993) showed that the RNS algorithm that terminates complete absence neighbors of the winning neuron at the end of training, converges, which corresponds to convergence classical method gynoparametric quantization or, in other words, competitive training. The author of this work shows that after quantization neurons represent a good discrete frame to reconstruct the initial density, given that each neuron is weighted by a probability estimated by the frequency of its Voronoi domain. Provided that the neurons are adequately weighted, the result shows that the initial data can be restored, and the result itself is accurate if the number of neurons tends to infinity.  



Did you like the article? Share with your friends!