Covariance and correlation coefficient. Correlation moment, continuous random variables, linear dependence

4 pages (Word file)

View all pages


Fragment of the text of the work

Where

for discrete random variables Xi Y and

, y)dxdy

for continuous random variables,

The correlation moment serves to characterize the relationship between random variables. In particular, for independent random variables X and Y, the correlation moment Cxy is equal to zero.

By definition, the correlation moment has a dimension equal to the product of the dimensions of the quantities X and Y. This means that the magnitude of the correlation moment depends on the units of measurement of random variables. For example, if when measuring the values ​​of X and Y in centimeters, the result is C.” 2 cm2, then when measuring X and Y in millimeters we get Cxy = 200 mm2. This dependence of the correlation moment on units of measurement makes it difficult to compare different systems of random variables. To eliminate this drawback, a dimensionless characteristic rry of the relationship between the quantities X and Y, called the correlation coefficient, is introduced:

If the random variables X and Y are independent, then r", = O. If the random variables Xi Y are related by the exact linear dependence Y = ax + b, then rxy = l for a>O and b. = - for a z O. In general, the double inequality -1 S rxyS is true

The property of independence of two random variables X and Y in the general case is not equivalent to their uncorrelatedness (i.e. the equality rn. = 0). However, for normally distributed components of a two-dimensional random variable this is true.

The distribution law of a system of two discrete random variables (X, A is given by the following table

) laws of distribution of random variables X and Y;

2) the conditional law of distribution of the random variable X, provided that Y = 1;

3) mathematical expectations IH), Ts U) and the center of dispersion;

4) dispersions of D(X) and DUE;

5) correlation moment Cdu and correlation coefficient b.

1. Adding the probabilities along the lines, we obtain the probabilities of possible values ​​of the random variable X: = 0.4, p(l) = 0.2, p(4) = 0.4. Consequently, the law of distribution of the value X has the following form

Check: 0.4 + 1.

Adding the probabilities across the columns, we obtain the probabilities of possible values ​​of the random variable Y: = 0.1, p(l) = 0.3, AZ) = 0.6. Let us write the law of distribution of the quantity Y

Check: (),l + 0.3 + 0.6 =

2.
Let's find the conditional probabilities for the random variable X, provided that Y = Y-2 = 1: p(-l f 1) = -P12

Since the distribution (X 1 Y = 1) has the following table

H. Based on the definition, we calculate the mathematical expectations:

5. Let’s create a table of the system of centrifuged random variables

x, Y, where Y = Y-t = Y -1.9

Let's calculate the correlation moment:

(-3,9) 0-2,4 (-0,9)

A system of two continuous random variables (X, Y) has a uniform distribution in the region D = “x, y) - S x S 3, O S y S x + l).

) distribution density;

2) probability Ch X, Y) from getting into the area

3) densities A(x) and Ky) of the distribution of random variables X and Y, as well as conditional densities and y(ylx);

4) functions and F20) distributions of random variables X and Y;

5) mathematical expectations M(X), and the center of dispersion;

6) dispersion and TsU);

7) correlation moment Sl. and correlation coefficient

1. By condition, the density function has the form a, if -lSxS3 and 0SySx+l, O, if (x, y) E D

To find the parameter a, we use the relation f(x, y)dy.dy = , where the integration domain D is shown in Fig. 7.

Region D is bounded on the left and right by the lines x = -1 and x = 3, and below and above by the lines O and Y2(x) = x + 1. Passing to the repeated integral, we have:

3

fady= gaur X +1 D = fa(x + l)dx =

8a. Since 8a = 1, THEN a z- and DENSITY function 8

looks like

-, If

Oh, if (x,y) E).

2. Let us depict the region G, which is a circle of radius 2 with the center at point (2, O) (see Fig. 8). Since the function Ax, y) is equal to zero outside

3. Let’s find the densities A(x) and silt:

That's why

Hence,

For O S y S 4 we similarly obtain

STATE COMMITTEE FOR SCIENCE AND TECHNOLOGY OF THE REPUBLIC OF AZERBAIJAN

BAKU RESEARCH AND TRAINING CENTER

GRADUATE STUDENT OF THE DEPARTMENT OF PEDIATRIC SURGERY

AMU named after N. NARIMANOV

MUKHTAROVA EMIL GASAN ogly

CORRELATION MOMENTS. COEFFICIENT OF CORRELATION

INTRODUCTION

Probability theory is a mathematical science that studies patterns in random phenomena.

What is meant by random phenomena?

In the scientific study of physical and technical problems, one often encounters phenomena of a special type, which are usually called random. Random phenomenon- this is a phenomenon that, when the same experience is repeated repeatedly, proceeds somewhat differently.

Let's give an example of a random phenomenon.

The same body is weighed several times on an analytical balance: the results of repeated weighings are somewhat different from each other. These differences are due to the influence of various minor factors accompanying the weighing operation, such as random vibrations of the equipment, errors in reading the instrument, etc.

It is obvious that there is not a single physical phenomenon in nature in which elements of randomness would not be present to one degree or another. No matter how accurately and in detail the experimental conditions are fixed, it is impossible to ensure that when the experiment is repeated, the results coincide completely and exactly.

Accidents inevitably accompany any natural phenomenon. However, in a number of practical problems these random elements can be neglected, considering its simplified diagram instead of a real phenomenon, i.e. model, and assuming that under the given experimental conditions the phenomenon occurs in a very definite way. At the same time, from the countless number of factors influencing this phenomenon, the most important, fundamental, and decisive ones are singled out. The influence of other, minor factors is simply neglected. When studying patterns within the framework of a certain theory, the main factors influencing a particular phenomenon are included in the concepts or definitions with which the theory in question operates.

Like any science that develops a general theory of any range of phenomena, probability theory also contains a number of basic concepts on which it is based. Naturally, not all basic concepts can be strictly defined, since to define a concept means to reduce it to other, more well-known ones. This process must be finite and end with primary concepts that are only explained.

One of the first concepts in probability theory is the concept of an event.

Under event refers to any fact that may or may not occur as a result of experience.

Let's give examples of events.

A - the birth of a boy or girl;

B - selection of one or another opening in a chess game;

C - belonging to one or another zodiac sign.

Considering the above events, we see that each of them has some degree of possibility: some greater, others less. In order to quantitatively compare events with each other according to the degree of their possibility, obviously, it is necessary to associate a certain number with each event, which is greater, the more possible the event is. This number is called the probability of an event. Thus, the probability of an event is a numerical characteristic of the degree of objective possibility of an event.

The unit of probability is taken to be the probability of a reliable event equal to 1, and the range of changes in the probabilities of any events is a number from 0 to 1.

Probability is usually denoted by the letter P.

Let's look at the example of the eternal problem of Shakespeare's Hamlet "to be or not to be?" How can you determine the probability of an event?

It is quite obvious that a person, an object and any other phenomenon can be in one of two and no more states: presence (“to be”) and absence (“not to be”). That is, there are two possible events, but only one can happen. This means that the probability of, for example, existence is 1/2.

In addition to the concept of event and probability, one of the main concepts of probability theory is the concept of a random variable.

Random variable is a quantity that, as a result of experiment, can take on one or another value, and it is not known in advance which one.

Random variables that take only values ​​that are separate from each other and that can be listed in advance are called continuous or discrete random variables.

For example:

1. Number of surviving and deceased patients.

2. The total number of children from patients admitted to the hospital overnight.

Random variables whose possible values ​​continuously fill a certain interval are called continuous random variables.

For example, weighing error on an analytical balance.

Note that modern probability theory primarily operates with random variables, rather than events, which the “classical” theory of probability was mainly based on.

CORRELATION MOMENTS. COEFFICIENT OF CORRELATION.

Correlation moments, correlation coefficient - these are numerical characteristics that are closely related to the concept of a random variable introduced above, or more precisely with a system of random variables. Therefore, to introduce and define their meaning and role, it is necessary to explain the concept of a system of random variables and some properties inherent in them.

Two or more random variables that describe some phenomenon are called system or complex of random variables.

A system of several random variables X, Y, Z, …, W is usually denoted by (X, Y, Z, …, W).

For example, a point on a plane is described not by one coordinate, but by two, and in space - even by three.

The properties of a system of several random variables are not limited to the properties of individual random variables included in the system, but also include mutual connections (dependencies) between random variables. Therefore, when studying a system of random variables, one should pay attention to the nature and degree of dependence. This dependence may be more or less pronounced, more or less close. And in other cases, random variables turn out to be practically independent.

The random variable Y is called independent from a random variable X, if the distribution law of the random variable Y does not depend on what value the variable X took.

It should be noted that the dependence and independence of random variables is always a mutual phenomenon: if Y does not depend on X, then the value X does not depend on Y. Taking this into account, we can give the following definition of the independence of random variables.

Random variables X and Y are called independent if the distribution law of each of them does not depend on what value the other takes. Otherwise, the values ​​of X and Y are called dependent.

Law of distribution A random variable is any relation that establishes a connection between the possible values ​​of a random variable and their corresponding probabilities.

The concept of “dependence” of random variables, which is used in probability theory, is somewhat different from the usual concept of “dependence” of variables, which is used in mathematics. Thus, a mathematician by “dependence” means only one type of dependence - complete, rigid, so-called functional dependence. Two quantities X and Y are called functionally dependent if, knowing the value of one of them, you can accurately determine the value of the other.

In probability theory, there is a slightly different type of dependence - probabilistic dependence. If the value Y is related to the value X by a probabilistic dependence, then, knowing the value of X, it is impossible to accurately indicate the value of Y, but you can indicate its distribution law, depending on what value the value X has taken.

The probabilistic relationship may be more or less close; As the tightness of the probabilistic dependence increases, it becomes closer and closer to the functional one. Thus, functional dependence can be considered as an extreme, limiting case of the closest probabilistic dependence. Another extreme case is the complete independence of random variables. Between these two extreme cases lie all gradations of probabilistic dependence - from the strongest to the weakest.

Probabilistic dependence between random variables is often encountered in practice. If random variables X and Y are in a probabilistic relationship, this does not mean that with a change in the value of X, the value of Y changes in a completely definite way; this only means that with a change in the value of X, the value of Y

tends to also change (increase or decrease as X increases). This trend is observed only in general terms, and in each individual case deviations from it are possible.

Examples of probabilistic dependence.

Let's select one patient with peritonitis at random. random variable T is the time from the onset of the disease, random variable O is the level of homeostatic disturbances. There is a clear relationship between these values, since the T value is one of the most important reasons determining the O value.

At the same time, there is a weaker probabilistic relationship between the random variable T and the random variable M, which reflects mortality in a given pathology, since the random variable, although it influences the random variable O, is not the main determinant.

Moreover, if we consider the T value and the B value (the age of the surgeon), then these values ​​are practically independent.

So far we have discussed the properties of systems of random variables, giving only verbal explanation. However, there are numerical characteristics through which the properties of both individual random variables and a system of random variables are studied.

To characterize the correlation dependence between quantities, the correction moment and the correlation coefficient are used.

Definition 2. Correlation momentµ xy of random variables X and Y is the mathematical expectation of the product of deviations of these variables

To calculate the correlation moment of discrete quantities, the expression is used

(3.12)

and for continuous ones – the expression

(3.13)

Remark. The correlation moment µ xy can be rewritten in the form

(3.14)

Indeed, using the properties of mathematical expectation (see §§ 2.2; 2.6), we have

Theorem. The correlation moment of two independent random variables X and Y is equal to zero.

Proof. According to the remark

and since X and Y are independent random variables, then (see §§ 2.2; 2.6)

and, therefore, µ xy =0.

From the definition of the correlation moment it follows that it has a dimension equal to the product of the dimensions of the quantities X and Y, i.e. its value depends on the units of measurement of random variables. Therefore, for the same two quantities, the magnitude of the correlation moment can have different values ​​depending on the units in which the quantities were measured. To eliminate this drawback, we agreed to take a dimensionless quantity as a measure of the relationship (dependence) of two random variables X and Y

Where σ x =σ(X), σ y =σ(Y), called correlation coefficient.

Example 1. Let a two-dimensional discrete random variable (X,Y) be specified by the distribution law:

and, therefore,

By adding up the probabilities in the columns, we find the probabilities of possible values ​​of Y:

Hence the distribution law Y:

Y
p 1\3 1\2 1\6

and, therefore,

Hence,

Thus, the correlation coefficient

Theorem. The absolute value of the correlation moment of two random variables does not exceed the product of their standard deviations:

Proof. Introducing the random variable Where Let's find its variance. We have

(any variance is non-negative). From here

By entering a random variable , similarly we will find

As a result we have

Definition 2. Random variables X and Y are called uncorrelated if = 0, and correlated if

Example 1. Independent random variables X and Y are uncorrelated, since due to relation (3.12) = 0.

Example 2. Let the random variables X And Y are connected by a linear dependence. Let's find the correlation coefficient. We have:

Thus, the correlation coefficient of random variables related by a linear dependence is equal to ±1 (more precisely, =1 if A>0 and =-1 if A<0).

Let us note some properties of the correlation coefficient.

From example 1 it follows:

1) If X and Y are independent random variables, then the correlation coefficient is zero.

Note that the converse statement is, generally speaking, false. (For proof, see work.)

2) The absolute value of the correlation coefficient does not exceed unity:

Indeed, dividing both sides of inequality (3.16) by the product , we arrive at the desired inequality.

3) As can be seen from formula (3.15) taking into account formula (3.14), the correlation coefficient characterizes the relative magnitude of the deviation of the mathematical expectation of the product from the product of mathematical expectations M(X) M(Y) quantities X And Y. Since this deviation occurs only for dependent quantities, we can say that The correlation coefficient characterizes the closeness of the relationship between X and Y.

3. Linear correlation. This type of correlation is quite common.

Definition. Correlation dependence between random variables X and Y called linear correlation, if both regression functions and are linear. In this case, both regression lines are straight; they are called direct regressions.

Let us derive the direct regression equations Y on X, those. let's find the coefficient of the linear function

Let's denote M(X) = a, M(Y)= b, M[(X - a) 2 ]= , M[(Y –b 2)]= . Using the properties of MO (§§ 2.2; 2.6) we find:

M(Y) = M= M(AX + B) = AM(X) + B,

those. b = Aa + B, where B=b-Aa.

M(XY)= M[Xg(X)\= M(AX 2 + BX) = AM(X 2) + BM(X)= AM(X 2) + (b- Aa)a,

or, according to property 1 of dispersion (§§ 2.3; 2.6),

The resulting coefficient is called regression coefficient Y on X and is denoted by:

Thus, the forward regression equation Y on X looks like

Similarly, you can obtain the equation of direct regression of X on Y

To describe a system of two random variables, in addition to mathematical expectations and variances of the components, other characteristics are used, which include correlation moment And correlation coefficient(briefly mentioned at the end of T.8.p.8.6) .

Correlation moment(or covariance, or moment of connection) two random variables X And Y called m.o. product of deviations of these quantities (see equality (5) clause 8.6):

Corollary 1. For the correlation moment r.v. X And Y the following equalities are also valid:

,

where the corresponding centralized r.v. X And Y (see clause 8.6.).

In this case: if
is a two-dimensional d.s.v., then the covariance is calculated by the formula

(8)
;

If
is a two-dimensional n.s.v., then the covariance is calculated by the formula

(9)

Formulas (8) and (9) were obtained based on formulas (6) in paragraph 12.1. There is a computational formula

(10)

which is derived from definition (9) and based on the properties of the MO, indeed,

Consequently, formulas (36) and (37) can be rewritten in the form

(11)
;

The correlation moment serves to characterize the relationship between quantities X And Y.

As will be shown below, the correlation moment is equal to zero if X And Y are independent;

Therefore, if the correlation moment is not equal to zero, thenXAndYare dependent random variables.

Theorem 12.1.Correlation moment of two independent random variablesXAndYis equal to zero, i.e. for independent r.v.XAndY,

Proof. Because X And Y independent random variables, then their deviations

And

T also independent. Using the properties of mathematical expectation (the mathematical expectation of the product of independent r.v.s is equal to the product of the mathematical expectations of the factors
,
, That's why

Comment. From this theorem it follows that if
then s.v. X And Y dependent and in such cases r.v. X And Y called correlated. However, from the fact that
does not follow independence r.v. X And Y.

In this case (
s.v. X And Y called uncorrelated, Thus, from independence follows uncorrelated; the converse statement is, generally speaking, false (see below example 2.)

Let us consider the main properties of the correlation moment.

Ccovariance properties:

1. The covariance is symmetric, i.e.
.

This follows directly from formula (38).

2. There are equalities: i.e. dispersion r.v. is its covariance with itself.

These equalities follow directly from the definition of dispersion and equality (38), respectively, for

3. The following equalities are valid:

These equalities are derived from the definition of variance and covariance of r.v.
And , properties 2.

By definition of dispersion (taking into account the centrality of r.v.
) we have

Now, based on (33) and properties 2 and 3, we obtain the first (with a plus sign) property 3.

Similarly, the second part of property 3 is derived from the equality

4. Let
constant numbers,
then the equalities are valid:

Usually these properties are called the properties of first-order homogeneity and periodicity in arguments.

Let us prove the first equality, and we will use the properties of m.o.
.

Theorem 12.2.Absolute valuecorrelation moment of two arbitrary random variablesXAndYdoes not exceed the geometric mean of their variances: i.e.

Proof. Note that for independent r.v. the inequality holds (see Theorem 12.1.). So, let r.v. X And Y dependent. Let us consider standard r.v.
And
and calculate the dispersion of r.v.
taking into account property 3, we have: on the one hand
On the other side

Therefore, taking into account the fact that
And - normalized (standardized) r.v., then for them m.o. is equal to zero, and the variance is equal to 1, therefore, using the property of m.o.
we get

and therefore, based on the fact that
we get

It follows that i.e.

=

The statement has been proven.

From the definition and properties of covariance it follows that it characterizes both the degree of dependence of r.v. and their scattering around a point
The dimension of covariance is equal to the product of the dimensions of random variables X And Y. In other words, the magnitude of the correlation moment depends on the units of measurement of random variables. For this reason, for the same two quantities X And Y, the magnitude of the correlation moment will have different values ​​depending on the units in which the values ​​were measured.

Let, for example, X And Y were measured in centimeters and
; if measured X And Y in millimeters, then
This feature of the correlation moment is the disadvantage of this numerical characteristic, since comparison of the correlation moments of different systems of random variables becomes difficult.

In order to eliminate this drawback, a new numerical characteristic is introduced - “ correlation coefficient».

Correlation coefficient
random variables
And is called the ratio of the correlation moment to the product of the standard deviations of these quantities:

(13)
.

Since the dimension
equal to the product of the dimensions of quantities
And ,
has the dimension of magnitude
σ y has the dimension of magnitude , That
is just a number (i.e. " dimensionless quantity"). Thus, the value of the correlation coefficient does not depend on the choice of units of measurement of r.v., this is advantage correlation coefficient before the correlation moment.

In T.8. clause 8.3 we introduced the concept normalized s.v.
, formula (18), and the theorem has been proven that
And
(See also Theorem 8.2.). Here we prove the following statement.

Theorem 12.3. For any two random variables
And equality is true
.In other words, the correlation coefficient
any two with
.V.XAndYequal to the correlation moment of their corresponding normalized s.v.
And .

Proof. By definition of normalized random variables
And

And
.

Taking into account the property of mathematical expectation: and equality (40) we obtain

The statement has been proven.

Let's look at some commonly encountered properties of the correlation coefficient.

Properties of the correlation coefficient:

1. The correlation coefficient in absolute value does not exceed 1, i.e.

This property follows directly from formula (41) - the definition of the correlation coefficient and Theorem 13.5. (see equality (40)).

2. If random variables
And are independent, the current correlation coefficient is zero, i.e.
.

This property is a direct consequence of equality (40) and Theorem 13.4.

Let us formulate the following property as a separate theorem.

Theorem 12.4.

If r.v.
And are interconnected by a linear functional dependence, i.e.
That

at the same time

And on the contrary, if
,
That s.v.
And are interconnected by a linear functional dependence, i.e. there are constants
And
such that equality holds

Proof. Let
Then Based on property 4 of covariance, we have

and since, , therefore

Hence,
. Equality in one direction is obtained. Let further
, Then

two cases should be considered: 1)
and 2)
So, let's consider the first case. Then by definition
and therefore from the equality
, Where
. In our case
, therefore from the equality (see the proof of Theorem 13.5.)

=
,

we get that
, Means
is constant. Because
and since then
really,

.

Hence,


.

Similarly, it is shown that for
takes place (check it yourself!)

,
.

Some conclusions:

1. If
And independents.v., then

2. If r.v.
And are linearly related to each other, then
.

3. In other cases
:

In this case they say that r.v.
And interconnected positive correlation, If
in cases
negative correlation. The closer
to one, the more reason to believe that r.v.
And are related by a linear relationship.

Note that the correlation moments and dispersions of the system of r.v. usually given correlation matrix:

.

Obviously, the determinant of the correlation matrix satisfies:

As already noted, if two random variables are dependent, then they can be like correlated, so uncorrelated. In other words, the correlation moment of two dependent quantities can be not equal to zero, but maybe equal zero.

Example 1. The distribution law of a discrete r.v. is given by the table


Find the correlation coefficient

Solution. Finding the laws of distribution of components
And :


Now let's calculate the m.o. components:

These values ​​could be found on the basis of the r.v. distribution table.

Likewise,
find it yourself.

Let's calculate the variances of the components and use the computational formula:

Let's create a distribution law
, and then we find
:

When compiling a table of the distribution law, you should perform the following steps:

1) leave only different meanings of all possible products
.

2) to determine the probability of a given value
, need to

add up all the corresponding probabilities located at the intersection of the main table that favor the occurrence of a given value.

In our example, r.v. takes only three different values
. Here the first value (
) corresponds to the product
from the second line and
from the first column, so at their intersection there is a probability number
similarly

which is obtained from the sum of the probabilities located at the intersections of the first row and first column, respectively (0.15; 0.40; 0.05) and one value
, which is at the intersection of the second row and the second column, and finally,
, which is at the intersection of the second row and third column.

From our table we find:

We find the correlation moment using formula (38):

Find the correlation coefficient using formula (41)

Thus, a negative correlation.

Exercise. Law of distribution of discrete r.v. given by table


Find the correlation coefficient

Let's look at an example where there are two dependent random variables there may be uncorrelated.

Example 2. Two-dimensional random variable
)
given by the density function

Let's prove that
And dependent , But uncorrelated random variables.

Solution. Let us use the previously calculated distribution densities of the components
And :

Since then
And dependent quantities. To prove uncorrelated
And , it is enough to make sure that

Let's find the correlation moment using the formula:

Since the differential function
symmetrical about the axis OY, That
similarly
, due to symmetry
relative to the axis OX. Therefore, taking out a constant factor

The inner integral is equal to zero (the integrand is odd, the limits of integration are symmetrical with respect to the origin), therefore,
, i.e. dependent random variables
And are not correlated with each other.

So, from the correlation of two random variables, their dependence follows, but from the uncorrelatedness it is still impossible to conclude that these variables are independent.

However, for normally distributed r.v. such a conclusion is except those. from uncorrelated normally distributed s.v. flows them out independence.

The next paragraph is devoted to this issue.

Correlation moments, correlation coefficient are numerical characteristics that are closely related to the concept of a random variable introduced above, or more precisely with a system of random variables. Therefore, to introduce and define their meaning and role, it is necessary to explain the concept of a system of random variables and some properties inherent in them.

Two or more random variables that describe a certain phenomenon are called a system or complex of random variables.

A system of several random variables X, Y, Z, …, W is usually denoted by (X, Y, Z, …, W).

For example, a point on a plane is described not by one coordinate, but by two, and in space - even by three.

The properties of a system of several random variables are not limited to the properties of individual random variables included in the system, but also include mutual connections (dependencies) between random variables. Therefore, when studying a system of random variables, one should pay attention to the nature and degree of dependence. This dependence may be more or less pronounced, more or less close. And in other cases, random variables turn out to be practically independent.

A random variable Y is said to be independent of a random variable X if the distribution law of the random variable Y does not depend on the value of X.

It should be noted that the dependence and independence of random variables is always a mutual phenomenon: if Y does not depend on X, then the value X does not depend on Y. Taking this into account, we can give the following definition of the independence of random variables.

Random variables X and Y are called independent if the distribution law of each of them does not depend on what value the other takes. Otherwise, the quantities X and Y are called dependent.

The distribution law of a random variable is any relationship that establishes a connection between the possible values ​​of a random variable and their corresponding probabilities.

The concept of “dependence” of random variables, which is used in probability theory, is somewhat different from the usual concept of “dependence” of variables, which is used in mathematics. Thus, a mathematician by “dependence” means only one type of dependence - complete, rigid, so-called functional dependence. Two quantities X and Y are called functionally dependent if, knowing the value of one of them, you can accurately determine the value of the other.

In probability theory, we encounter a slightly different type of dependence - a probabilistic dependence. If the value Y is related to the value X by a probabilistic dependence, then, knowing the value of X, it is impossible to accurately indicate the value of Y, but you can indicate its distribution law, depending on what value the value X has taken.

The probabilistic relationship may be more or less close; As the tightness of the probabilistic dependence increases, it becomes closer and closer to the functional one. Thus, functional dependence can be considered as an extreme, limiting case of the closest probabilistic dependence. Another extreme case is the complete independence of random variables. Between these two extreme cases lie all gradations of probabilistic dependence - from the strongest to the weakest.

Probabilistic dependence between random variables is often encountered in practice. If random variables X and Y are in a probabilistic relationship, this does not mean that with a change in the value of X, the value of Y changes in a completely definite way; this only means that with a change in the value of X, the value of Y

tends to also change (increase or decrease as X increases). This trend is observed only in general terms, and in each individual case deviations from it are possible.

Examples of probabilistic dependence.

Let's select one patient with peritonitis at random. random variable T is the time from the onset of the disease, random variable O is the level of homeostatic disturbances. There is a clear relationship between these values, since the T value is one of the most important reasons determining the O value.

At the same time, there is a weaker probabilistic relationship between the random variable T and the random variable M, which reflects mortality in a given pathology, since the random variable, although it influences the random variable O, is not the main determinant.

Moreover, if we consider the T value and the B value (the age of the surgeon), then these values ​​are practically independent.

So far we have discussed the properties of systems of random variables, giving only verbal explanation. However, there are numerical characteristics through which the properties of both individual random variables and a system of random variables are studied.

One of the most important characteristics of a random variable of a normal distribution is its mathematical expectation.

Consider a discrete random variable X having possible values ​​X 1, X2, ... , Xn with probabilities p1, p2, ... , рn. we need to characterize with some number the position of the values ​​of a random variable on the abscissa axis, taking into account the fact that these values ​​have different meanings. For this purpose, they usually use the so-called “weighted average” of the values Xi, and each value Xi when averaging, it must be taken into account with a “weight” proportional to the probability of this value. Thus, if we denote the “weighted average” by M[X] or mx, we get

or, given that,

The mathematical expectation of a random variable is the sum of the products of all possible values ​​of a random variable and the probabilities of these values.

For greater clarity, let us consider one mechanical interpretation of the introduced concept. Let the points with abscissas x 1 be located on the abscissa axis, x2, …, xn, in which the masses are concentrated respectively p1, p2, … , рn, and. Then the mathematical expectation is nothing more than the abscissa of the center of gravity of a given system of material points.

Formula (1) for the mathematical expectation corresponds to the case of a discrete random variable. For a continuous value X, the mathematical expectation, naturally, is expressed not as a sum, but as an integral:

where is the distribution density of the value X.

Formula (2) is obtained from formula (1) if we replace individual values ​​in it Xi continuously changing parameter X, the corresponding probabilities pi probability element f(x)dx, the final sum - an integral.

In the mechanical interpretation, the mathematical expectation of a continuous random variable retains the same meaning - the abscissa of the center of gravity in the case when the mass distribution along the abscissa is continuous with the density f(x).

It should be noted that the mathematical expectation does not exist for all random variables, which, however, according to some scientists, is not of significant interest for practice.

In addition to the mathematical expectation, other numerical random variables - moments - are also important.

The concept of moment is widely used in mechanics to describe the distribution of masses (statistical moments, moments of inertia, etc.). Exactly the same techniques are used in probability theory to describe the basic properties of the distribution of a random variable. Most often, two types of moments are used in practice: initial and central.

The initial moment of the sth order of a discontinuous random variable X is a sum of the form

Obviously, this definition coincides with the definition of the initial moment of order s in mechanics, if on the abscissa axis at points x 1, ..., xn mass concentrated p1, …, рn.

For a continuous random variable X, the initial moment of sth order is called the integral

It's obvious that

those. the initial moment of the sth order of a random variable X is nothing more than the mathematical expectation of the sth degree of this random variable.

Before defining the central moment, we introduce the concept of a “centered random variable.”

Let there be a random variable X with mathematical expectation m x . A centered random variable corresponding to the value X is the deviation of the random variable X from its mathematical expectation

It is easy to see that the mathematical expectation of a centered random variable is equal to zero.

Centering a random variable is equivalent to moving the origin of coordinates to a point whose abscissa is equal to the mathematical expectation.

The central moment of order s of a random variable X is the mathematical expectation of the sth degree of the corresponding centered random variable:

For a discontinuous random variable, the sth central moment is expressed by the sum

and for continuous - by the integral

Of utmost importance is the second central moment, which is called dispersion and denoted D[X]. For the variance we have

The dispersion of a random variable is a characteristic of dispersion, the scattering of the values ​​of a random variable around its mathematical expectation. The word "dispersion" itself means "dispersion".

The mechanical interpretation of dispersion is nothing more than the moment of inertia of a given mass distribution relative to the center of gravity.

In practice, the quantity is also often used

called the standard deviation (otherwise known as the “standard”) of the random variable X.

Now let's move on to considering the characteristics of systems of random variables.

The initial moment of order k,s of the system (X, Y) is the mathematical expectation of the product of X k and Y s,

xk,s=M.

The central moment of order k,s of a system (X, Y) is the mathematical expectation of the product of the k-th and s-th powers of the corresponding centered quantities:

For discontinuous random variables

where p ij is the probability that the system (X, Y) will take the values ​​( xi, yj), and the sum is considered over all possible values ​​of the random variables X,Y.

For continuous random variables

where f(x,y) is the distribution density of the system.

In addition to the numbers k and s, which characterize the order of the moment in relation to individual quantities, the total order of the moment k + s, equal to the sum of the exponents of X and Y, is also considered. According to the total order, the moments are classified into first, second, etc. In practice, only the first and second moments are usually applied.

The first initial moments represent the mathematical expectations of the X and Y values ​​included in the system

y1.0=mx y0.1=my.

Set of mathematical expectations m x , my is a characteristic of the position of the system. Geometrically, these are the coordinates of the midpoint on the plane around which the point (X, Y) is scattered.

The second central moments of systems also play an important role in practice. Two of them represent the variances of the X and Y values

characterizing the scattering of a random point in the direction of the Ox and Oy axes.

The second displaced central moment plays a special role:

called the correlation moment (otherwise - the “moment of connection”) of random variables X and Y.

The correlation moment is a characteristic of a system of random variables that describes, in addition to the dispersion of the values ​​X and Y, also the connection between them. In order to verify this, we note that the correlation moment of independent random variables is equal to zero.

Note that the correlation moment characterizes not only the dependence of quantities, but also their dispersion. Therefore, to characterize the relationship between quantities (X;Y) in its pure form, we move from the moment K xy to the characteristic

Where yx, yy- standard deviations of the X and Y values. This characteristic is called the correlation coefficient of the X and Y values.

From formula (3) it is clear that for independent random variables the correlation coefficient is equal to zero, since for such variables kxy=0.

Random variables for which rxy=0, are called uncorrelated (unrelated).

Note, however, that the uncorrelated nature of random variables does not imply their independence.

The correlation coefficient does not characterize any dependence, but only the so-called linear dependence. The linear probabilistic dependence of random variables is that when one random variable increases, the other tends to increase (or decrease) according to a linear law. Thus, the correlation coefficient characterizes the degree of closeness of the linear relationship between random variables.

There are several methods for determining the correlation coefficient. However, we will give an example using the Pearson mixed moment correlation coefficient, where

using a data table (in our example, the relative content of T-lymphocytes in % and IgG level in g/l):

Substituting the obtained values ​​into formula (4), we obtain

That is, the correlation coefficient of the dynamics of T-lymphocytes and immunoglobulin G in children with peritonitis is 0.9933, which indicates a high connection between these indicators.



Did you like the article? Share with your friends!