Maximum likelihood moment methods. Methods for obtaining estimates

In works intended for initial acquaintance with mathematical statistics, usually consider estimates maximum likelihood(abbreviated as WMD):

Thus, the probability density function corresponding to the sample is first constructed. Since the sample elements are independent, this density is represented as a product of densities for individual elements samples. The joint density is considered at the point corresponding to the observed values. This expression as a function of the parameter (for given sample elements) is called the likelihood function. Then, in one way or another, the value of the parameter is sought at which the value of the joint density is maximum. This is the maximum likelihood estimate.

It is well known that maximum likelihood estimators are included in the class of best asymptotically normal estimators. However, with finite sample volumes in a number of problems, OMPs are unacceptable, because they are worse (variance and mean square error are larger) than other estimates, in particular unbiased ones. That is why in GOST 11.010-81 for assessing the parameters of negative binomial distribution unbiased estimates are used rather than WMD. From what has been said, one should a priori prefer OMP to other types of estimates - if possible - only at the stage of studying the asymptotic behavior of estimates.

IN in some cases WMD are found explicitly, in the form of specific formulas suitable for calculation.

In most cases analytical solutions does not exist, to find WMD it is necessary to use numerical methods. This is the case, for example, with samples from the gamma distribution or the Weibull-Gnedenko distribution. In many works, a system of maximum likelihood equations is solved using some iterative method or the likelihood function is directly maximized.

However, the application numerical methods gives rise to numerous problems. Convergence iterative methods requires justification. In a number of examples, the likelihood function has many local maxima, and therefore natural iteration procedures do not converge. For VNII data railway transport For steel fatigue tests, the maximum likelihood equation has 11 roots. Which of the eleven should be used as an estimate of the parameter?

As a result of the awareness of these difficulties, work began to appear on proving the convergence of algorithms for finding maximum likelihood estimates for specific probabilistic models and specific algorithms.

However, theoretical proof of the convergence of the iterative algorithm is not everything. The question arises about a reasonable choice of when to stop calculations in connection with achieving the required accuracy. In most cases it is not resolved.

But that's not all. The accuracy of calculations must be linked to the sample size - the larger it is, the more accurately it is necessary to find parameter estimates, otherwise we cannot talk about the consistency of the estimation method. Moreover, as the sample size increases, it is necessary to increase the number of digits used in the computer, move from single to double accuracy of calculations and further - again, for the sake of achieving consistent estimates.

Thus, in the absence of explicit formulas for maximum likelihood estimates, finding WMD runs into a number of computational problems. Specialists in mathematical statistics allow themselves to ignore all these problems, discussing weapons of mass destruction in theoretical terms. However, applied statistics cannot ignore them. The noted problems call into question the feasibility practical use WMD.

Example 1. In statistical problems of standardization and quality management, the family of gamma distributions is used. The gamma distribution density has the form

The probability density in formula (7) is determined by three parameters a, b, c, Where a>2, b>0. At the same time a is a form parameter, b- scale parameter and With - shift parameter. Factor 1/G(a) is normalizing, it was introduced to

Here G(a)- one of the special functions used in mathematics, the so-called “gamma function”, after which the distribution specified by formula (7) is named,

Detailed solutions to the problems of estimating parameters for the gamma distribution are contained in the state standard GOST 11.011-83 “Applied Statistics” developed by us. Rules for determining estimates and confidence limits for gamma distribution parameters.” This publication is currently being used as methodological material for engineering and technical workers industrial enterprises and applied research institutes.

Since the gamma distribution depends on three parameters, there are 2 3 - 1 = 7 options for setting estimation problems. They are described in table. 1. In table. 2 shows real data on the operating time of the cutters to the limiting state, in hours. Ordered selection ( variation series) volume n= 50 taken from the state standard. It is these data that will serve as the source material for demonstrating certain methods for estimating parameters.

Selecting the “best” estimates in a particular parametric model applied statistics- research work extended over time. Let us distinguish two stages. Asymptotic stage: Estimates are constructed and compared based on their properties as the sample size grows without limit. At this stage, such characteristics of estimates as consistency, asymptotic efficiency, etc. are considered. Final sample size stage: estimates are compared, say, at n= 10. It is clear that the study begins with the asymptotic stage: in order to compare estimates, one must first construct them and be sure that they are not absurd (such confidence is provided by proof of consistency).

Example 2. Estimation of gamma distribution parameters by the method of moments in the case of three unknown parameters(line 7 of table 1).

In accordance with the above reasoning, to estimate three parameters, it is enough to use three sample moments - the sample arithmetic mean:

sample variance

and selective third central point

Equating theoretical points, expressed through distribution parameters and sample moments, we obtain a system of equations for the method of moments:

Solving this system, we find estimates for the method of moments. Substituting the second equation into the third, we obtain the method of moments estimate for the shift parameter:

Substituting this estimate into the second equation, we find the method of moments estimate for the shape parameter:

Finally, from the first equation we find an estimate for the shift parameter:

For the real data given above in table. 2, sample arithmetic mean = 57.88, sample variance s 2 = 663.00, sample third central moment m 3 = 14927.91. According to the just obtained formulas for evaluating the method of moments, they are: a* = 5,23; b* = 11,26, c* = - 1,01.

Estimates of the gamma distribution parameters obtained by the method of moments are functions of the sample moments. In accordance with the above, they are asymptotically normal random variables. In table Figure 3 shows estimates of the method of moments and their asymptotic dispersions for various combinations of known and unknown parameters of the gamma distribution.

All estimates of the method of moments given in table. 3, included in state standard. They cover all formulations of problems of estimating parameters of the gamma distribution (see Table 1), except for those when only one parameter is unknown - a or b. For these exceptional cases, developed special methods assessment.

Since the asymptotic distribution of estimates of the method of moments is known, it is not difficult to formulate verification rules statistical hypotheses regarding the values ​​of distribution parameters, as well as constructing confidence limits for the parameters. For example, in probabilistic model, when all three parameters are unknown, in accordance with the third row of Table 3, the lower confidence limit for the parameter A, corresponding confidence probability r = 0.95, in asymptotics it has the form

and the upper confidence limit for the same confidence probability is as follows

Where A* - evaluation of the method of moments of the shape parameter (Table 3).

Example 3. Let's find weapons of mass destruction for a sample of normal distribution, each element of which has a density

Thus, it is necessary to estimate the two-dimensional parameter ( m, y 2).

The product of the probability densities for the sample elements, i.e. the likelihood function has the form

We need to solve an optimization problem

As in many other cases, the optimization problem is easier to solve if we take the logarithm of the likelihood function, i.e. go to function

called the log-likelihood function. For sampling from normal distribution

A necessary condition for a maximum is the equality of 0 partial derivatives of logarithmic function likelihood by parameters, i.e.

System (10) is called a system of maximum likelihood equations. IN general case the number of equations is equal to the number of unknown parameters, and each of the equations is written by equating to 0 the partial derivative of the log-likelihood function with respect to one or another parameter.

When differentiating with respect to m the first two terms on the right side of formula (9) become 0, and the last term gives the equation

Therefore, the assessment m* maximum likelihood parameter m is the sample arithmetic mean,

To find an estimate of the variance, it is necessary to solve the equation

It's easy to see that

Therefore, the maximum likelihood estimate (y 2)* for the variance y 2, taking into account the previously found estimate for the parameter m is the sample variance,

So, the system of maximum likelihood equations is solved analytically, the GME for the mathematical expectation and variance of the normal distribution is the sample arithmetic mean and sample variance. Note that the last estimate is biased.

Note that in the conditions of Example 3, the estimates of the maximum likelihood method coincide with the estimates of the method of moments. Moreover, the type of estimates of the method of moments is obvious and does not require any reasoning.

Example 4. Let's try to get into secret meaning next phrase founder modern statistics Ronald Fisher: “Nothing is easier than coming up with an estimate for a parameter.” The classic was ironic: he meant that it is easy to come up with a bad rating. Good rating there is no need to invent it (!) - it must be obtained in a standard way, using the maximum likelihood principle.

Task. According to H 0, the mathematical expectations of three independent Poisson random variables are related by a linear dependence: .

Realizations of these quantities are given. Two parameters need to be estimated linear dependence and check H0.

For clarity, you can imagine linear regression, which takes average values ​​at points. Let the values ​​be obtained. What can be said about the magnitude and fairness of H0?

Naive approach

It would seem that the parameters can be estimated using basic common sense. We obtain an estimate of the slope of the direct regression by dividing the increment in the transition from x 1 =-1 to x 3 =+1 by, and find the estimate of the value as the arithmetic mean:

It is easy to check that the mathematical expectations of the estimates are equal (the estimates are unbiased).

Once estimates are obtained, H0 is tested as usual using the Pearson chi-square test:

Estimates of expected frequencies can be obtained based on estimates:

Moreover, if our estimates are “correct,” then the Pearson distance will be distributed as random variable chi-square with one degree of freedom: 3-2=1. Recall that we are estimating two parameters by fitting the data to our model. However, the amount is not fixed, so additional unit no need to subtract.

However, when we substitute it, we get a strange result:

On the one hand, it is clear that for these frequencies there is no reason to reject H0, but we are not able to check this using the chi-square test, since the estimate of the expected frequency at the first point turns out to be negative. So, the estimates found from “common sense” do not allow us to solve the problem in the general case.

Maximum likelihood method

The random variables are independent and have a Poisson distribution. The probability of getting values ​​is:

According to the principle of maximum likelihood, the values ​​of unknown parameters must be sought, requiring that the probability of obtaining the values ​​be maximum:

If they are constant, then we are dealing with ordinary probability. Fisher proposed a new term “plausibility” for the case when constants are considered to be variables. If the likelihood turns out to be the product of the probabilities independent events, then it is natural to turn the product into a sum and then deal with the log-likelihood:

Here, all terms that do not depend on are designated and discarded in the final expression. To find the maximum log-likelihood, we equate the derivatives with respect to zero:

Solving these equations, we get:

These are the “correct” expressions for assessments. The estimate of the average value coincides with what was proposed common sense, however estimates for the slope vary: . What can you say about the formula for?

  • 1) It seems strange that the answer depends on the frequency at the midpoint, since the magnitude determines the angle of inclination of the line.
  • 2) However, if H 0 is true (the regression line is straight), then when large values observed frequencies, they become close to their mathematical expectation. Therefore: , and the maximum likelihood estimate becomes close to the result obtained from common sense.

3) The benefits of estimation begin to be felt when we notice that all expected frequencies are now always positive:

This was not true for “naive” estimates, so it was not always possible to apply the chi-square test (an attempt to replace negative or equal to zero the expected frequency per unit does not help the situation).

4) Numerical calculations show that naive estimates can be used only if the expected frequencies are sufficiently large. If they are used at small values, the calculated Pearson distance will often be excessively large.

Conclusion : The right choice estimation is important, since otherwise it will not be possible to test the hypothesis using the chi-square test. A seemingly obvious assessment may turn out to be unusable!

continuous random variable with density The type of density is known, but the values ​​of the parameters are unknown. The likelihood function is a function (here - a sample of volume n from the distribution of the random variable £). It is easy to see that the likelihood function can be given a probabilistic meaning, namely: consider a random vector whose components are independently, in the aggregate, identically distributed random variables with the law D(z). Then the probability element of vector E has the form i.e. The likelihood function is associated with the probability of obtaining a fixed sample in a sequence of experiments P. The main idea of ​​the likelihood method is that, as estimates of parameters A, it is proposed to take such values ​​(3) that provide a maximum of the likelihood function for a given fixed sample, i.e. it is proposed consider the sample obtained in the experiment as the most probable. Finding estimates of parameters pj is reduced to solving a system of k equations (k is the number of unknown parameters): Since the function log L has a maximum at the same point as the likelihood function, the system of likelihood equations (19) is often written in the form As estimates of unknown parameters One should take solutions of system (19) or (20) that really depend on the sample and are not constant. In the case where £ is discrete with a distribution series, the likelihood function is called the function and estimates are sought as solutions to the system. Maximum likelihood method or equivalent. It can be shown that maximum likelihood estimates have the property of consistency. It should be noted that the maximum likelihood method leads to more complex calculations than the method of moments, but theoretically it is more effective, since maximum likelihood estimates deviate less from the true values ​​of the estimated parameters than estimates obtained using the method of moments. For distributions most frequently encountered in applications, parameter estimates obtained using the method of moments and the maximum likelihood method coincide in most cases. Prshir 1. Deviation (of the part size from the nominal value is a normally distributed random variable. It is required to determine the systematic error and variance of the deviation from the sample. M According to the condition ( - a normally distributed random variable with mathematical expectation ( systematic error) and variance to be estimated from a sample of size n: X\>...yXn. In this case, the Likelihood function System (19) has the form Hence, excluding solutions that do not depend on Xx, we obtain i.e. the maximum likelihood estimates in this case coincide with the empirical mean and variance already known to us > Example 2. Estimate the parameter /i from the sample exponentially distributed random variable. 4 The likelihood function has the form The likelihood equation leads us to a solution that coincides with the estimate of the same parameter obtained by the method of moments, see (17). ^ Example 3. Using the maximum likelihood method, estimate the probability of the appearance of a coat of arms if, during ten tosses of a coin, the coat of arms appeared 8 times. -4 Let the probability to be estimated be equal to p. Let us consider a random variable (with a distribution series. The likelihood function (21) has the form Maximum likelihood method The equation gives as an estimate of the unknown probability p the frequency of the appearance of the coat of arms in the experiment. Concluding the discussion of methods for finding estimates, we emphasize that, even having a very large amount of experimental data, we we still can’t indicate exact value of the estimated parameter; moreover, as has been repeatedly noted, the estimates we obtain are close to the true values ​​of the estimated parameters only “on average” or “in most cases.” Therefore important statistical problem, which we will consider next, is the task of determining the accuracy and reliability of the assessment we conduct.

The task of estimating distribution parameters is to obtain the most plausible estimates of the unknown parameters of the population distribution based on sample data. In addition to the method of moments for determining point estimate distribution parameters are also used maximum likelihood method. The maximum likelihood method was proposed by the English statistician R. Fisher in 1912.

Let, to estimate the unknown parameter  of a random variable X from the general population with a probability distribution density p(x)= p(x, ) sample extracted x 1 ,x 2 ,…,x n. We will consider the sample results as the implementation n-dimensional random variable ( X 1 ,X 2 ,…,X n). The previously discussed method of moments for obtaining point estimates of unknown parameters of a theoretical distribution does not always provide the best estimates. The method for searching for estimates that have the necessary (best) properties is the method maximum likelihood.

The maximum likelihood method is based on the condition for determining the extremum of a certain function, called the likelihood function.

Likelihood function DSV X

L (x 1 ,x 2 ,…,x n ; )=p(x 1 ; )p(x 2 ; )…p(x n ; ),

Where x 1, …, x n– fixed sampling options,  unknown estimated parameter, p(x i; ) – probability of event X= x i .

Likelihood function NSV X called the argument function :

L (x 1 ,x 2 ,…,x n ; )=f(x 1 ; )f(x 2 ; )…f(x n ; ),

Where f(x i; ) – given probability density function at points x i .

As a point estimate of the distribution parameters  take its value at which the likelihood function reaches its maximum. Evaluation
called maximum likelihood estimation. Because functions L And
L
reach their maximum at the same values ​​of , then usually to find the extremum (maximum) they use
L
as a more convenient feature.

To determine the maximum point
L
you need to use a well-known algorithm to calculate the extremum of the function:


In the case when the probability density depends on two unknown parameters -  1 and  2, then find critical points, solving the system of equations:

So, according to the maximum likelihood method, as an estimate of the unknown parameter  the value * is taken at which
sampling distributions x 1 ,x 2 ,…,x n maximum.

Task 8. Let us find the estimate using the maximum likelihood method for probability p in Bernoulli's scheme,

Let's carry out n independent repeated trials and measure the number of successes, which we denote m. According to Bernoulli's formula, the probability that there will be m success from n–– is the likelihood function of the DSV.

Solution : Let's create a likelihood function
.

According to the maximum likelihood method, we find such a value p, which maximizes L, and with it ln L.

Then taking logarithm L, we have:

Derivative of function ln L By p looks like
and at the extremum point it is equal to zero. Therefore, solving the equation
, we have
.

Let's check the sign of the second derivative
at the resulting point:

. Because
for any values ​​of the argument, then the found value p there is a maximum point.

Means, best estimate For
.

So, according to the maximum likelihood method, the probability estimate p events A in Bernoulli's scheme the relative frequency of this event is used .

If the sample x 1 , x 2 ,…, x n is extracted from a normally distributed population, then estimates for the mathematical expectation and variance by the maximum likelihood method have the form:

The found values ​​coincide with the estimates of these parameters obtained by the method of moments. Because Since the dispersion is shifted, it must be multiplied by the Bessel correction. Then she will look like
, coinciding with the sample variance.

Task 9 . Let the Poisson distribution be given
where at m= x i we have
. Let us find the estimate of the unknown parameter using the maximum likelihood method .

Solution :

By constructing the likelihood function L and its logarithm ln L. We have:

Let's find the derivative of ln L:
and solve the equation
. The resulting estimate of the distribution parameter will take the form:
Then
because at
second partial derivative
then this is the maximum point. Thus, the sample mean can be taken as an estimate of the maximum likelihood of the parameter  for the Poisson distribution.

It can be verified that the exponential distribution
likelihood function for sample values x 1 , x 2 , …, x n has the form:

.

Estimation of the distribution parameter  for exponential distribution is equal to:
.

The advantage of the maximum likelihood method is the ability to obtain “good” estimates that have such properties as consistency, asymptotic normality and efficiency for large samples at the most general conditions.

The main disadvantage of the method is the complexity of solving the likelihood equations, as well as the fact that the analyzed distribution law is not always known.

Maximum likelihood method.

This method consists in taking as a point estimate of the parameter the value of the parameter at which the likelihood function reaches its maximum.

For a random time to failure with probability density f(t, ), the likelihood function is determined by formula 12.11: , i.e. represents joint density probabilities of independent measurements of the random variable τ with probability density f(t, ).

If the random variable is discrete and takes the values Z 1 ,Z 2..., respectively with the probabilities P 1 (α), P 2 (α) ..., then the likelihood function is taken in a different form, namely: , where the indices of the probabilities indicate that the values ​​were observed.

Maximum likelihood estimates of the parameter are determined from the likelihood equation (12.12).

The value of the maximum likelihood method is determined by the following two assumptions:

If the parameter exists effective assessment, then the likelihood equation (12.12) has the only solution.

Under certain general conditions of an analytical nature imposed on the functions f(t, ) the solution to the likelihood equation converges at k true meaning parameter.

Let's consider an example of using the maximum likelihood method for normal distribution parameters.

Example:

We have: , , t i (i=1..N) a sample from a population with a density distribution.

We need to find an estimate of maximum similarity.

Likelihood function: ;

.

Likelihood equations: ;

;

The solution to these equations has the form: - statistical average; - statistical dispersion. The estimate is biased. An unbiased estimate would be: .

The main disadvantage of the maximum likelihood method is the computational difficulties that arise when solving likelihood equations, which, as a rule, are transcendental.

Method of moments.

This method was proposed by K. Pearson and is the very first general method point estimate of unknown parameters. It is still widely used in practical statistics, since it often leads to a relatively simple computational procedure. The idea of ​​this method is that the moments of the distribution, depending on unknown parameters, are equated to the empirical moments. Taking the number of moments, equal to the number unknown parameters, and by composing the corresponding equations, we will obtain the required number of equations. The first two statistical points are most often calculated: sample mean; and sample variance . Estimates obtained using the method of moments are not the best in terms of their efficiency. However, very often they are used as first approximations.

Let's look at an example of using the method of moments.

Example: Consider the exponential distribution:

t>0; λ<0; t i (i=1..N) – sample from a population with distribution density . We need to find an estimate for the parameter λ.

Let's make an equation: . Thus, otherwise.

Quantile method.

This is the same empirical method as the method of moments. It consists in the fact that the quantiles of the theoretical distribution are equal to the empirical quantiles. If several parameters are subject to evaluation, then the corresponding equalities are written for several quantiles.

Let us consider the case when the distribution law F(t,α,β) with two unknown parameters α, β . Let the function F(t,α,β) has a continuously differentiable density that takes positive values ​​for any possible parameter values α, β. If tests are carried out according to plan , r>>1, then the moment of occurrence of the th failure can be considered as an empirical quantile of the level, i=1.2… , - empirical distribution function. If t l And t r – the moments of occurrence of the l-th and r-th failures are known exactly, the values ​​of the parameters α And β could be found from the equations



Did you like the article? Share with your friends!