The optimal value is the maximum likelihood method. Methods for obtaining estimates

continuous random variable with density The type of density is known, but the values ​​of the parameters are unknown. The likelihood function is a function (here - a sample of volume n from the distribution of the random variable £). It is easy to see that the likelihood function can be given a probabilistic meaning, namely: consider a random vector whose components are independent, collectively identically distributed random variables with the law D(z). Then the probability element of vector E has the form i.e. The likelihood function is associated with the probability of obtaining a fixed sample in the sequence of experiments P. The main idea of ​​the likelihood method is that, as estimates of parameters A, it is proposed to take such values ​​(3) that provide the maximum of the likelihood function for a given fixed sample, i.e. it is proposed consider the sample obtained in the experiment as the most probable. Finding estimates of parameters pj is reduced to solving a system of k equations (k is the number of unknown parameters): Since the function log L has a maximum at the same point as the likelihood function, the system of likelihood equations (19) is often written in the form As estimates of unknown parameters One should take solutions of system (19) or (20) that really depend on the sample and are not constant. In the case where £ is discrete with a distribution series, the likelihood function is called the function and estimates are sought as solutions to the system. Method maximum likelihood or equivalent It can be shown that maximum likelihood estimates have the property of consistency. It should be noted that the maximum likelihood method leads to more complex calculations than the method of moments, but theoretically it is more effective, since maximum likelihood estimates deviate less from the true values ​​of the estimated parameters than estimates obtained using the method of moments. For distributions most frequently encountered in applications, parameter estimates obtained using the method of moments and the maximum likelihood method coincide in most cases. Prshir 1. Deviation (of the part size from the nominal value is a normally distributed random variable. It is required to determine the systematic error and variance of the deviation from the sample. M By condition (is a normally distributed random variable with mathematical expectation (systematic error) and variance to be estimated from a sample of size n: X\>...yXn. In this case, the Likelihood function System (19) has the form Hence, excluding solutions that do not depend on Xx, we obtain i.e. the maximum likelihood estimates in this case coincide with the empirical mean and variance already known to us > Example 2. Estimate the parameter /i from the sample exponentially distributed random variable. 4 The likelihood function has the form The likelihood equation leads us to a solution that coincides with the estimate of the same parameter obtained by the method of moments, see (17). ^ Example 3. Using the maximum likelihood method, estimate the probability of the appearance of a coat of arms if, during ten tosses of a coin, the coat of arms appeared 8 times. -4 Let the probability to be estimated be equal to p. Let us consider a random variable (with a distribution series. The likelihood function (21) has the form Maximum likelihood method The equation gives as an estimate of the unknown probability p the frequency of the appearance of the coat of arms in the experiment. Concluding the discussion of methods for finding estimates, we emphasize that, even having a very large amount of experimental data, we we still can’t indicate exact value the estimated parameter; moreover, as has been repeatedly noted, the estimates we obtain are close to the true values ​​of the estimated parameters only “on average” or “in most cases.” Therefore important statistical problem, which we will consider next, is the task of determining the accuracy and reliability of the assessment we conduct.

Maximum likelihood method.

This method consists in taking as a point estimate of the parameter the value of the parameter at which the likelihood function reaches its maximum.

For a random time to failure with probability density f(t, ), the likelihood function is determined by formula 12.11: , i.e. is the joint probability density of independent measurements of the random variable τ with the probability density f(t, ).

If the random variable is discrete and takes the values Z 1 ,Z 2..., respectively with the probabilities P 1 (α), P 2 (α) ..., then the likelihood function is taken in a different form, namely: , where the indices of the probabilities indicate that the values ​​were observed.

Maximum likelihood estimates of the parameter are determined from the likelihood equation (12.12).

The value of the maximum likelihood method is determined by the following two assumptions:

If the parameter exists effective assessment, then the likelihood equation (12.12) has the only solution.

For some general conditions analytical nature superimposed on functions f(t, ) the solution to the likelihood equation converges at k true meaning parameter.

Let's consider an example of using the maximum likelihood method for normal distribution parameters.

Example:

We have: , , t i (i=1..N) a sample from a population with a density distribution.

We need to find an estimate of maximum similarity.

Likelihood function: ;

.

Likelihood equations: ;

;

The solution to these equations has the form: - statistical average; - statistical dispersion. The estimate is biased. An unbiased estimate would be: .

The main disadvantage of the maximum likelihood method is the computational difficulties that arise when solving likelihood equations, which, as a rule, are transcendental.

Method of moments.

This method was proposed by K. Pearson and is the very first general method for point estimation of unknown parameters. It is still widely used in practical statistics, since it often leads to a relatively simple computational procedure. The idea of ​​this method is that the moments of the distribution, depending on unknown parameters, are equated to the empirical moments. Taking the number of moments, equal to the number unknown parameters, and by composing the corresponding equations, we will obtain the required number of equations. The first two statistical points are most often calculated: sample mean; And sample variance . Estimates obtained using the method of moments are not the best in terms of their efficiency. However, very often they are used as first approximations.

Let's look at an example of using the method of moments.

Example: Consider the exponential distribution:

t>0; λ<0; t i (i=1..N) – sample from a population with distribution density . We need to find an estimate for the parameter λ.

Let's make an equation: . Thus, otherwise.

Quantile method.

This is the same empirical method as the method of moments. It consists in the fact that the quantiles of the theoretical distribution are equal to the empirical quantiles. If several parameters are subject to evaluation, then the corresponding equalities are written for several quantiles.

Let us consider the case when the distribution law F(t,α,β) with two unknown parameters α, β . Let the function F(t,α,β) has a continuously differentiable density that takes positive values ​​for any possible parameter values α, β. If tests are carried out according to plan , r>>1, then the moment of occurrence of the th failure can be considered as an empirical quantile of the level, i=1.2… , - empirical distribution function. If t l And t r – the moments of occurrence of the l-th and r-th failures are known exactly, the values ​​of the parameters α And β could be found from the equations

And others).

Maximum likelihood estimation is a popular statistical method that is used to create a statistical model from data and provide estimates of the model's parameters.

Corresponds to many well-known estimation methods in the field of statistics. For example, let's say you are interested in the growth of the people of Ukraine. Let's say you have height data for a number of people rather than the entire population. Additionally, growth is assumed to be normal distributed quantity with unknown variance and mean. The mean and variance of the sample growth is most likely to be the mean and variance of the entire population.

For a fixed data set and basic probabilistic model, using the maximum likelihood method, we will obtain values ​​of the model parameters that make the data “closer” to the real ones. Maximum likelihood estimation provides a unique and simple way to determine solutions in the case of a normal distribution.

The maximum likelihood estimation method is used to wide range statistical models, including:

  • linear models and generalized linear models;
  • factor analysis;
  • structural equation modeling;
  • many situations, as part of hypothesis testing and confidence interval formation;
  • discrete choice models.

Essence of the method

called maximum likelihood estimation parameter. Thus, a maximum likelihood estimator is an estimator that maximizes the likelihood function given a fixed sample realization.

Often, the log-likelihood function is used instead of the likelihood function. Since the function increases monotonically over the entire domain of definition, the maximum of any function is the maximum of the function, and vice versa. Thus

,

If the likelihood function is differentiable, then necessary condition extremum - equality to zero of its gradient:

Sufficient condition extremum can be formulated as the negative definiteness of the Hessian - the matrix of second derivatives:

Important To evaluate the properties of the maximum likelihood method estimates, the so-called information matrix is ​​used, equal by definition:

At the optimal point, the information matrix coincides with the mathematical expectation of the Hessian, taken with a minus sign:

Properties

  • Maximum likelihood estimates, generally speaking, can be biased (see examples), but are consistent. asymptotically efficient and asymptotically normal estimates. Asymptotic normality means that

where is the asymptotic information matrix

Asymptotic efficiency means that the asymptotic covariance matrix is ​​a lower bound for all consistent asymptotically normal estimators.

Examples

The last equality can be rewritten as:

where , from which it can be seen that the likelihood function reaches its maximum at point . Thus

. .

To find its maximum, we equate the partial derivatives to zero:

- sample mean, and - sample variance.

Conditional maximum likelihood method

Conditional method maximum likelihood (Conditional ML) used in regression models. The essence of the method is that incomplete joint distribution all variables (dependent and regressors), but only conditional distribution of the dependent variable across factors, that is, in fact, distribution random errors regression model. Full function verisimilitude is the product " conditional function likelihood” and factor distribution density. Conditional MMP is equivalent full version MMP in the case when the distribution of factors does not depend in any way on the estimated parameters. This condition is often violated in time series models, such as the autoregressive model. IN in this case, the regressors are the past values ​​of the dependent variable, which means their values ​​also obey the same AR model, that is, the distribution of the regressors depends on the estimated parameters. In such cases, the results of applying the conditional and full method maximum likelihoods will differ.

See also

Notes

Literature

  • Magnus Y.R., Katyshev P.K., Peresetsky A.A. Econometrics. Beginner course. - M.: Delo, 2007. - 504 p. - ISBN 978-5-7749-0473-0

Wikimedia Foundation. 2010.

  • Marshak, Boris Ilyich
  • Byte order

See what the “Maximum Likelihood Method” is in other dictionaries:

    maximum likelihood method- — maximum likelihood method B mathematical statistics a method for estimating distribution parameters based on maximizing the so-called likelihood function... ...

    MAXIMUM LIKELIHOOD METHOD- method of estimating unknown parameters of the distribution function F(s; α1,..., αs) from a sample, where α1, ..., αs unknown parameters. If a sample of n observations is divided into r disjoint groups s1,…, sr; р1,..., pr… … Geological encyclopedia

    Maximum likelihood method- in mathematical statistics, a method for estimating distribution parameters based on maximizing the so-called likelihood function ( joint density probabilities of observations with values ​​equal to... ... Economic-mathematical dictionary

    maximum likelihood method- maksimaliojo tikėtinumo metodas statusas T sritis automatika atitikmenys: engl. maximum likelihood method vok. Methode der maksimalen Mutmaßlichkeit, f rus. maximum likelihood method, m pranc. méthode de maximum de vraisemblance, f;… … Automatikos terminų žodynas

    maximum likelihood partial response method- Viterbi signal detection method, which provides minimum level intersymbol distortion. See also. Viterbi algorithm. [L.M. Nevdyaev. Telecommunication technologies. English Russian explanatory dictionary directory. Edited by Yu.M... Technical Translator's Guide

    sequence detector using maximum likelihood method- A device for calculating an estimate of the most probable sequence of symbols that maximizes the likelihood function of the received signal. [L.M. Nevdyaev. Telecommunication technologies. English-Russian explanatory dictionary reference book. Edited by Yu.M... Technical Translator's Guide

    maximum likelihood method- maximum likelihood method - [L.G. Sumenko. English-Russian dictionary on information technology. M.: State Enterprise TsNIIS, 2003.] Topics information Technology in general Synonyms maximum likelihood method EN maximum likelihood method ... Technical Translator's Guide

    maximum likelihood method - General method calculation of parameter estimates. Estimates are sought that maximize the sample likelihood function, equal to the product distribution function values ​​for each observed data value. Maximum likelihood method is better... Dictionary of Sociological Statistics

And others).

Maximum likelihood estimation is a popular statistical method that is used to create a statistical model from data and provide estimates of the model's parameters.

Corresponds to many well-known estimation methods in the field of statistics. For example, let's say you are interested in the growth of the people of Ukraine. Let's say you have height data for a number of people rather than the entire population. In addition, height is assumed to be a normally distributed variable with unknown variance and mean. The mean and variance of the sample growth is most likely to be the mean and variance of the entire population.

Given a fixed set of data and a basic probability model, using the maximum likelihood method, we will obtain values ​​for the model parameters that make the data “closer” to the real world. Maximum likelihood estimation provides a unique and simple way to determine solutions in the case of a normal distribution.

Maximum likelihood estimation is used for a wide range of statistical models, including:

  • linear models and generalized linear models;
  • factor analysis;
  • structural equation modeling;
  • many situations, within the framework of hypothesis testing and confidence interval formation;
  • discrete choice models.

Essence of the method

called maximum likelihood estimation parameter. Thus, a maximum likelihood estimator is an estimator that maximizes the likelihood function given a fixed sample realization.

Often, the log-likelihood function is used instead of the likelihood function. Since the function increases monotonically over the entire domain of definition, the maximum of any function is the maximum of the function, and vice versa. Thus

,

If the likelihood function is differentiable, then a necessary condition for the extremum is that its gradient be equal to zero:

A sufficient condition for an extremum can be formulated as a negative definiteness of the Hessian - the matrix of second derivatives:

The so-called information matrix, which by definition is equal to:

At the optimal point, the information matrix coincides with the mathematical expectation of the Hessian, taken with a minus sign:

Properties

  • Maximum likelihood estimates, generally speaking, can be biased (see examples), but are consistent. asymptotically efficient and asymptotically normal estimates. Asymptotic normality means that

where is the asymptotic information matrix

Asymptotic efficiency means that the asymptotic covariance matrix is ​​a lower bound for all consistent asymptotically normal estimators.

Examples

The last equality can be rewritten as:

where , from which it can be seen that the likelihood function reaches its maximum at point . Thus

. .

To find its maximum, we equate the partial derivatives to zero:

- sample mean, and - sample variance.

Conditional maximum likelihood method

Conditional Maximum Likelihood (Conditional ML) used in regression models. The essence of the method is that not the complete joint distribution of all variables (dependent and regressors) is used, but only conditional distribution of the dependent variable across factors, that is, in fact, the distribution of random errors in the regression model. The total likelihood function is the product of the “conditional likelihood function” and the factor distribution density. The conditional MMP is equivalent to the full version of the MMP in the case when the distribution of factors does not depend in any way on the estimated parameters. This condition is often violated in time series models, such as the autoregressive model. In this case, the regressors are the past values ​​of the dependent variable, which means their values ​​also obey the same AR model, that is, the distribution of the regressors depends on the estimated parameters. In such cases, the results of applying the conditional and full maximum likelihood methods will differ.

See also

Notes

Literature

  • Magnus Y.R., Katyshev P.K., Peresetsky A.A. Econometrics. Beginner course. - M.: Delo, 2007. - 504 p. - ISBN 978-5-7749-0473-0

Wikimedia Foundation. 2010.

See what the “Maximum Likelihood Method” is in other dictionaries:

    maximum likelihood method- - maximum likelihood method In mathematical statistics, a method for estimating distribution parameters based on maximizing the so-called likelihood function... ...

    A method for estimating unknown parameters of the distribution function F(s; α1,..., αs) from a sample, where α1, ..., αs are unknown parameters. If a sample of n observations is divided into r disjoint groups s1,…, sr; р1,..., pr… … Geological encyclopedia

    Maximum likelihood method- in mathematical statistics, a method for estimating distribution parameters, based on maximizing the so-called likelihood function (joint probability density of observations with values ​​equal to ... ... Economic-mathematical dictionary

    maximum likelihood method- maksimaliojo tikėtinumo metodas statusas T sritis automatika atitikmenys: engl. maximum likelihood method vok. Methode der maksimalen Mutmaßlichkeit, f rus. maximum likelihood method, m pranc. méthode de maximum de vraisemblance, f;… … Automatikos terminų žodynas

    maximum likelihood partial response method- Viterbi signal detection method, which ensures a minimum level of intersymbol distortion. See also. Viterbi algorithm. [L.M. Nevdyaev. Telecommunication technologies. English-Russian explanatory dictionary reference book. Edited by Yu.M... Technical Translator's Guide

    sequence detector using maximum likelihood method- A device for calculating an estimate of the most probable sequence of symbols that maximizes the likelihood function of the received signal. [L.M. Nevdyaev. Telecommunication technologies. English-Russian explanatory dictionary reference book. Edited by Yu.M... Technical Translator's Guide

    maximum likelihood method- maximum likelihood method - [L.G. Sumenko. English-Russian dictionary on information technology. M.: GP TsNIIS, 2003.] Topics information technology in general Synonyms maximum likelihood method EN maximum likelihood method ... Technical Translator's Guide

    maximum likelihood method- General method for calculating parameter estimates. Estimates are sought that maximize the sample likelihood function equal to the product of the distribution function values ​​for each observed data value. Maximum likelihood method is better... Dictionary of Sociological Statistics

The renowned taxonomist Joe Felsenstein (1978) was the first to propose that phylogenetic theories should be evaluated on a non-parsimological basis.

research, but by means of mathematical statistics. As a result, the maximum likelihood method was developed. .

This method is based on prior knowledge about possible ways evolution, that is, it requires the creation of a model of changes in traits before analysis. It is to build these models that the laws of statistics are used.

Under believable the probability of observing data if a certain model of events is accepted is understood. Various models can make observed data more or less probable. For example, if you toss a coin and only get heads one out of a hundred times, then you can assume that the coin is faulty. If you accept this model, the likelihood of the result obtained will be quite high. If you go by the model that the coin is faulty, then you might expect to see heads in fifty cases rather than one. Getting only one head in 100 tosses of a bad coin is statistically unlikely. In other words, the probability of getting a result of one head in one hundred tails is very low in the defective coin model.

Credibility is mathematical quantity. It is usually calculated using the formula:

where Pr(D|H) is the probability of obtaining data D if hypothesis H is accepted . The vertical bar in the formula reads “for a given.” Since L often turns out to be a small value, studies usually use natural logarithm credibility.

It is important to distinguish between the probability of obtaining observed data and the probability that the accepted model of events is correct. The likelihood of the data says nothing about the likelihood of the model itself. Philosopher-biologist E. Sober used next example in order to make this distinction clear. Imagine that you hear a loud noise in the room above you. You might assume that this is caused by the gnomes playing bowling in the attic. For this model, your observation (a loud noise above you) has high likelihood (if the dwarves were actually bowling above you, you would almost certainly hear it). However, the likelihood that your hypothesis is true, that is, that it was the dwarves who caused the noise, is something completely different. They were almost certainly not dwarves. So, in this case, your hypothesis provides the data with high plausibility, but in itself highest degree unlikely.

Using this system reasoning, the maximum likelihood method makes it possible to statistically evaluate phylogenetic trees obtained using traditional cladistics. Essentially, this method concludes

searches for the cladogram that provides the highest probability of the available data set.

Let's consider an example illustrating the use of the maximum likelihood method. Let's assume that we have four taxa for which the nucleotide sequences of a certain DNA site have been established (Fig. 16).

If the model assumes the possibility of reversions, then we can root this tree at any node. One of the possible root trees is shown in Fig. 17.2.

We do not know which nucleotides were present in the locus in question in common ancestors taxa 1-4 (these ancestors correspond to nodes X and Y on the cladogram). For each of these nodes, there are four nucleotide variants that could have been present there in ancestral forms, resulting in 16 phylogenetic scenarios leading to tree 2. One of these scenarios is depicted in Fig. 17.3.

The probability of this scenario can be determined by the formula:

where P A is the probability of the presence of nucleotide A in the root of the tree, which is equal to the average frequency of nucleotide A (in general case= 0.25); P AG – probability of replacing A with G; P AC – probability of replacing A with C; P AT – probability of replacing A with T; the last two multipliers are the probability of nucleotide T being stored in nodes X and Y, respectively.

Another possible scenario, which allows you to obtain the same data, is shown in Fig. 17.4. Since there are 16 such scenarios, the probability of each of them can be determined, and the sum of these probabilities will be the probability of the tree shown in Fig. 17.2:

Where P tree 2 is the probability of observing data at the locus indicated by an asterisk for tree 2.

The probability of observing all data in all loci of a given sequence is the product of the probabilities for each locus i from 1 to N:

Since these values ​​are very small, another indicator is used - the natural logarithm of the likelihood lnL i for each locus i. In this case, the log-likelihood of the tree is the sum of the log-likelihoods for each locus:

The lnL tree value is the logarithm of the likelihood of observing data when choosing a certain evolutionary model and a tree with its characteristic

sequence of branching and length of branches. Computer programs, used in the maximum likelihood method (for example, the already mentioned cladistic package PAUP), search for a tree with maximum indicator lnL. The doubled difference of log-likelihoods of two models 2Δ (where Δ = lnL tree A- lnL treeB) obeys the known statistical distribution x 2. This allows you to evaluate whether one model is reliably better than another. This makes maximum likelihood a powerful tool for testing hypotheses.

In the case of four taxa, lnL calculations are required for 15 trees. At large number It turns out to be impossible to evaluate all taxa, so heuristic methods are used for searching (see above).

In the example considered, we used the values ​​of the probabilities of replacement (substitution) of nucleotides in the process of evolution. Calculating these probabilities is itself a statistical task. In order to reconstruct an evolutionary tree, we must make certain assumptions about the substitution process and express these assumptions in the form of a model.

In the simplest model, the probabilities of replacing any nucleotide with any other nucleotide are considered equal. This simple model has only one parameter - the rate of substitution and is known as one-parameter Jukes-Cantor model or JC (Jukes and Cantor, 1969). When using this model, we need to know the rate at which nucleotide substitution occurs. If we know that at a moment in time t= 0 in a certain site there is a nucleotide G, then we can calculate the probability that in this site after a certain period of time t the nucleotide G will remain, and the probability that this site will be replaced by another nucleotide, for example A. These probabilities are denoted as P(gg) and P(ga) respectively. If the rate of substitution is equal to some value α per unit time, then

Since, according to the one-parameter model, any substitutions are equally likely, a more general statement would look like this:

More complex evolutionary models have also been developed. Empirical observations indicate that some substitutions may occur

more often than others. Substitutions, as a result of which one purine is replaced by another purine, are called transitions, and replacements of purine with pyrimidine or pyrimidine with purine are called transversions. One might expect that transversions occur more frequently than transitions, since only one in three possible substitutions for any nucleotide is a transition. However, the opposite usually occurs: transitions tend to occur more frequently than transversions. This is particularly true for mitochondrial DNA.

Another reason that some nucleotide substitutions occur more frequently than others is due to unequal base ratios. For example, the mitochondrial DNA of insects is richer in adenine and thymine compared to vertebrates. If some grounds are more common, we can expect some substitutions to occur more frequently than others. For example, if a sequence contains very little guanine, substitution of this nucleotide is unlikely to occur.

The models differ in that in some a certain parameter or parameters (for example, the ratio of bases, the rate of substitution) remain fixed and vary in others. There are dozens of evolutionary models. Below we present the most famous of them.

Already mentioned Jukes-Cantor (JC) model characterized by the fact that the base frequencies are the same: π A = πC = πG = π T , transversions and transitions have the same rates α=β, and all substitutions are equally probable.

Kimura two-parameter (K2P) model assumes equal frequencies bases π A =π C =π G =π T , and transversions and transitions have different speeds α≠β.

Felsenstein model (F81) assumes that the base frequencies are different π A ≠π C ≠π G ≠π T , and the rates of substitution are the same α=β.

General reversible model (REV) assumes different base frequencies π A ≠π C ≠π G ≠π T , and all six pairs of substitutions have different speeds.

The models mentioned above assume that substitution rates are the same across all sites. However, the model can also take into account differences in substitution rates at different sites. The values ​​of base frequencies and substitution rates can either be assigned a priori or these values ​​can be obtained from the data using special programs, for example PAUP.

Bayesian analysis

The maximum likelihood method estimates the likelihood of phylogenetic models after they have been generated from the available data. However, knowledge general patterns evolution of a given group makes it possible to create a series of the most probable models of phylogeny without the use of basic data (for example, nucleotide sequences). Once these data are obtained, it is possible to evaluate the fit between them and pre-built models, and to reconsider the likelihood of these initial models. The method that allows this to be done is called Bayesian analysis , and is the newest of the methods for studying phylogeny (see. detailed review: Huelsenbeck et al., 2001).

According to standard terminology, initial probabilities are usually called prior probabilities (since they are accepted before the data is received) and the revised probabilities are a posteriori (since they are calculated after the data is received).

Mathematical basis Bayesian analysis is Bayes' theorem, in which prior probability tree Pr[ Tree] and likelihood Pr[ Data|Tree] are used to calculate the posterior probability of the tree Pr[ Tree|Data]:

The posterior probability of a tree can be thought of as the probability that the tree reflects the true course of evolution. The tree with the highest posterior probability is selected as the most likely model of phylogeny. The posterior probability distribution of trees is calculated using computer modeling methods.

Maximum likelihood and Bayesian analysis require evolutionary models that describe changes in traits. Creation mathematical models morphological evolution is currently not possible. For this reason statistical methods Phylogenetic analyzes apply only to molecular data.



Did you like the article? Share with your friends!