Establishment of the distribution function of reliability indicators based on the results of processing statistical information data. Practice of application of gamma distribution in the theory of reliability of technical systems Gamma distribution distribution function

4. Random variables and their distributions

Gamma distributions

Let's move on to the family of gamma distributions. They are widely used in economics and management, theory and practice of reliability and testing, in various fields of technology, meteorology, etc. In particular, in many situations, the gamma distribution is subject to such quantities as the total service life of the product, the length of the chain of conductive dust particles, the time the product reaches the limiting state during corrosion, the operating time to k-th refusal, k= 1, 2, …, etc. The life expectancy of patients with chronic diseases and the time to achieve a certain effect during treatment in some cases have a gamma distribution. This distribution is most adequate for describing demand in economic and mathematical models of inventory management (logistics).

The gamma distribution density has the form

The probability density in formula (17) is determined by three parameters a, b, c, Where a>0, b>0. Wherein a is a form parameter, b- scale parameter and With- shift parameter. Factor 1/Γ(a) is normalizing, it was introduced to

Here Γ(a)- one of the special functions used in mathematics, the so-called “gamma function”, after which the distribution given by formula (17) is named,

At fixed A formula (17) specifies a scale-shift family of distributions generated by a distribution with density

(18)

A distribution of the form (18) is called the standard gamma distribution. It is obtained from formula (17) at b= 1 and With= 0.

A special case of gamma distributions for A= 1 are exponential distributions (with λ = 1/b). With natural A And With=0 gamma distributions are called Erlang distributions. From the works of the Danish scientist K.A. Erlang (1878-1929), an employee of the Copenhagen Telephone Company, who studied in 1908-1922. the functioning of telephone networks, the development of queuing theory began. This theory deals with probabilistic and statistical modeling of systems in which a flow of requests is serviced in order to make optimal decisions. With Erlang distributions are used in the same application areas in which exponential distributions are used. This is based on the following mathematical fact: the sum of k independent random variables exponentially distributed with the same parameters λ and , has a gamma distribution with a shape parameterk a = b, scale parameter = 1/λ and shift parameter kc With. At

= 0 we obtain the Erlang distribution. If the random variable X A has a gamma distribution with a shape parameter such that = 2 a d b= 1 and With- integer, = 0, then 2 X such that has a chi-square distribution with

degrees of freedom. If the random variable Random value

with gvmma distribution has the following characteristics: Expected valueM(X) = + c,

ab Variance(If the random variable) = σ 2 = M(X) = 2 ,

D

Gamma distribution

The gamma distribution is a two-parameter distribution. It occupies a fairly important place in the theory and practice of reliability. The distribution density is limited on one side (). If parameter a of the distribution curve shape takes an integer value, this indicates the probability of the same number of events occurring (for example, failures)

provided that they are independent and appear with a constant intensity λ (see Fig. 4.4).

The gamma distribution is widely used to describe the occurrence of failures of aging elements, recovery time, and time between failures of redundant systems. For different parameters, the gamma distribution takes on various forms, which explains its widespread use.

The probability density of the gamma distribution is determined by the equality

where λ > 0, α > 0.

The distribution density curves are shown in Fig. 4.5.

Rice. 4.5.

Distribution function

Expectation and variance are equal respectively< 1 интенсивность отказов монотонно убывает, что соответствует периоду приработки изделия, при α >At α

1 – increases, which is typical for the period of wear and aging of elements. At α = 1, the gamma distribution coincides with the exponential distribution; at α > 10, the gamma distribution approaches the normal law. If a takes the values ​​of arbitrary positive integers, then such a gamma distribution is called If λ = 1/2, and the value of a is a multiple of 1/2, then the gamma distribution coincides with the distribution χ2 ( chi-square).

Establishment of the distribution function of reliability indicators based on the results of processing statistical information data

The most complete characteristic of the reliability of a complex system is law of distribution, expressed as distribution function, distribution density or reliability functions.

The form of the theoretical distribution function can be judged from the empirical distribution function (Fig. 4.6), which is determined from the relation

Where T, - number of failures per time interval t; N – scope of testing; t i < t < t i+1 the time interval over which the empirical function is determined.

Rice. 4.6.

The empirical function is constructed by summing the increments obtained at each time interval:

Where k – number of intervals.

The empirical reliability function is the opposite of the distribution function; it is determined by the formula

The probability density estimate is found from the histogram. The construction of a histogram comes down to the following. The entire range of time values t divided into intervals t 1,t 2, ..., t i and for each of them the probability density is estimated using the formula

Where T i number of failures per i-th interval, i = 1, 2,..., k; (t i+1 – t i) – period of time i-th interval; N– scope of tests; k– number of intervals.

An example of a histogram is shown in Fig. 4.7.

Rice. 4.7.

Smoothing a step histogram into a smooth curve, but its appearance can be judged about the distribution law of a random variable. In practice, to smooth a curve, for example, the least squares method is often used. To more accurately establish the distribution law, it is necessary that the number of intervals be at least five, and the number of realizations falling into each interval be at least ten.

Discrepancies in the understanding of reliability terminology

The problem of terminology is quite complex in various fields of science and human activity in general. It is known that disputes about terms have been going on for many centuries. If you look at the translations of the poems, you can see a clear confirmation of this idea. For example, translations of such a world-famous masterpiece as “Hamlet” by B. L. Pasternak and P. P. Gnedich are very different. In the first of them, the meaning of the tragedy outweighs the music of the verse, unlike the second. And the original "Hamlet", written in the language of the 16th century, is difficult to understand for non-English people, and for the English too, since the language itself has evolved greatly over several centuries, like, in fact, any other language in accordance with the law of synchronism-desynchronism.

A similar picture is observed in world religions. The translation of the Bible from Church Slavonic into Russian, which lasted 25 years, “divorced” (to the point of stopping the translation) St. Philaret of Moscow (Drozdov) and the largest church writer - St. Theophan the Recluse (the publication of his collected works in 42 volumes is planned in the near future). ). Translations and clarifications of the “book of books” of the Bible “transfer” people into the camps of irreconcilable enemies in life in our world. Sects, heretics and heroes are born, sometimes even blood is shed. And numerous translations into Russian of Immanuel Kant’s fundamental work in the field of philosophy “Critique of Pure Reason” only strengthen the validity of our thesis about the complexity of the problem of terminology (super-large system) in various fields of science and human activity in general.

Antinomic phenomena take place in the field of science and technology. One of the solutions to the problem of ensuring the correctness and adequacy of terminology was outlined by G. Leibniz. He is in terms of the development of science and technology in the 17th century. proposed to end disputes by defining terms using a universal language in digital form (0011...).

Note that in the science of reliability, the way to define terms is traditionally decided at the state level with the help of state standards (GOSTs). However, the emergence of increasingly highly intelligent technical systems, the interaction and rapprochement of living and inanimate objects operating in them, poses new, very difficult tasks for teaching in pedagogy and psychology, and forces us to look for creative compromise solutions.

For a mature employee who has worked in a specific scientific field, and in particular in the field of reliability, the relevance of terminology issues is beyond doubt. As Gottfried Wilhelm Leibniz wrote (in his work on the creation of a universal language), there would be less controversy if the terms were defined.

We will try to smooth out discrepancies in the understanding of reliability terminology with the following remarks.

We say “distribution function” (DF), omitting the word “operation” or “failure”. Operating time is most often understood as a category of time. For non-repairable systems, it is more correct to say - integral FR time to failure, and for recoverable systems - time to failure. And since operating time is most often understood as a random variable, the identification of the probability of failure-free operation (FBO) and (1 – FR), called in this case the reliability function (RF), is used. The integrity of this approach is achieved through a complete group of events. Then

FBG = FN = 1 – FR.

The same is true for the distribution density (DP), which is the first derivative of the DF, in particular with respect to time, and, figuratively speaking, characterizes the “rate” of the occurrence of failures.

The completeness of the description of the reliability of a product (in particular, for single-use products), including the dynamics of behavior stability, is characterized by the failure rate through the ratio of PR to FBG and is physically understood as a change in the state of the product, and mathematically it is introduced in queuing theory through the concept of failure flow and a number of assumptions in relation to the failures themselves (stationarity, ordinaryness, etc.).

Those interested in these issues that arise when choosing reliability indicators at the stage of product design can be referred to the works of such eminent authors as A. M. Polovko, B. V. Gnedenko, B. R. Levin - natives of the reliability laboratory at Moscow University, led by A. N. Kolmogorov, as well as A. Ya. Khinchin, E. S. Ventsel, I. A. Ushakova, G. V. Druzhinina, A. D. Solovyov, F. Bayhelt, F. Proshan - the founders of the statistical theory of reliability .

  • Cm.: Kolmogorov A. N. Basic concepts of probability theory. M.: Mir, 1974.

This article describes the formula syntax and function usage GAMMA.DIST. in Microsoft Excel.

Returns the gamma distribution. This function can be used to study variables that have a skewed distribution. The gamma distribution is widely used in the analysis of queuing systems.

Syntax

GAMMA.DIST(x;alpha;beta;integral)

The arguments to the GAMMA.DIST function are described below.

    x- required argument. The value for which you want to calculate the distribution.

    Alpha- required argument. Distribution parameter.

    Beta- required argument. Distribution parameter. If beta = 1, GAMMA.DIST returns the standard gamma distribution.

    Integral- required argument. A Boolean value that specifies the form of the function. If cumulative is TRUE, GAMMA.DIST returns the cumulative distribution function; if this argument is FALSE, the probability density function is returned.

Notes

Example

Copy the sample data from the following table and paste it into cell A1 of a new Excel worksheet. To display the results of formulas, select them and press F2, then press Enter. If necessary, change the width of the columns to see all the data.

Data

Description

The value for which you want to calculate the distribution

Alpha distribution parameter

Beta distribution parameter

Formula

Description

Result

GAMMA.DIST(A2,A3,A4,FALSE)

Probability density using x, alpha and beta values ​​in cells A2, A3, A4 with integral argument FALSE.

GAMMA.DIST(A2,A3,A4,TRUE)

Cumulative distribution using the x, alpha, and beta values ​​in cells A2, A3, A4 with the cumulative argument TRUE.

The simplest type of gamma distribution is a distribution with density

Where - shift parameter, - gamma function, i.e.

(2)

Each distribution can be “expanded” into a scale-shift family. Indeed, for a random variable having a distribution function, consider a family of random variables , where is the scale parameter, and is the shift parameter. Then the distribution function is .

Including each distribution with a density of the form (1) in the scale-shift family, we obtain the gamma distributions accepted in the parameterization of the family:

Here - shape parameter, - scale parameter, - shift parameter, gamma function is given by formula (2).

There are other parameterizations in the literature. So, instead of a parameter, the parameter is often used . Sometimes a two-parameter family is considered, omitting the shift parameter, but retaining the scale parameter or its analogue - the parameter . For some applied problems (for example, when studying the reliability of technical devices), this is justified, since from substantive considerations it seems natural to accept that the probability distribution density is positive for positive values ​​of the argument and only for them. This assumption is associated with a long-term discussion in the 80s about “prescribed reliability indicators,” which we will not dwell on.

Special cases of the gamma distribution for certain parameter values ​​have special names. When we have an exponential distribution. The natural gamma distribution is an Erlang distribution used, in particular, in queuing theory. If a random variable has a gamma distribution with a shape parameter such that is an integer, and, has a chi-square degrees of freedom distribution.

Applications of the gamma distribution

The gamma distribution has wide applications in various fields of technical sciences (in particular, reliability and test theory), meteorology, medicine, and economics. In particular, the gamma distribution can be subject to the total service life of the product, the length of the chain of conductive dust particles, the time the product reaches the limit state during corrosion, the time until the kth failure, etc. . The life expectancy of patients with chronic diseases and the time to achieve a certain effect during treatment in some cases have a gamma distribution. This distribution turned out to be the most adequate for describing demand in a number of economic and mathematical models of inventory management.

The possibility of using the gamma distribution in a number of applied problems can sometimes be justified by the reproducibility property: the sum of independent exponentially distributed random variables with the same parameter has a gamma distribution with parameters of shape and scale and shift. Therefore, the gamma distribution is often used in those application areas that use the exponential distribution.

Hundreds of publications are devoted to various questions of statistical theory related to the gamma distribution (see summaries). This article, which does not claim to be comprehensive, examines only some mathematical and statistical problems associated with the development of a state standard.

2. DESCRIPTION OF UNCERTAINTIES IN DECISION-MAKING THEORY

2.3.4. Interval data in parameter estimation problems (using the example of the gamma distribution)

Let us consider a parametric estimation problem that is classic in applied mathematical statistics. Initial data - sample x1 , x 2 , ..., x n, consisting of n real numbers. In a probabilistic model of simple random sampling, its elements x 1 , x 2 , ..., x n are considered a set of implementations n independent identically distributed random variables. We will assume that these quantities have a density f (x). In parametric statistical theory it is assumed that the density f (x) known up to a finite-dimensional parameter, i.e. , at some This, of course, is a very strong assumption that requires justification and verification; however, parametric estimation theory is now widely used in various application areas.

All observational results are determined with some accuracy, in particular, they are recorded using a finite number of significant figures (usually 2 - 5). Consequently, all real distributions of observational results are discrete.

It is generally believed that these discrete distributions are fairly well approximated by continuous distributions. Clarifying this statement, we arrive at the model already considered, according to which only the quantities are available to statistics = y j + x j

j, j = 1, 2, ... , n, where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that n

steam x1 , x 2 , ..., x n form a simple random sample from some two-dimensional distribution, and - sampling from a distribution with density . It is necessary to take into account that and are the realizations of dependent random variables (if we consider them independent, then the distribution yi

Let be a characteristic of the magnitude of the error, for example, the mean square error. In classical mathematical statistics it is considered negligible () for a fixed sample size where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that. The general results are proved in asymptotics. Thus, in classical mathematical statistics, the passage to the limit is done first, and then the passage to the limit. In the statistics of interval data, we assume that the sample size is quite large (), but all measurements correspond to the same error characteristic. We obtain limit theorems that are useful for analyzing real data when . In interval data statistics, the passage to the limit is done first, and then the passage to the limit. So, both theories use the same two limit passages: and , but in a different order.

The claims of both theories are fundamentally different.

The presentation below is based on the example of estimating the parameters of the gamma distribution, although similar results can be obtained for other parametric families, as well as for hypothesis testing problems (see below), etc. Our goal is to demonstrate the main features of the interval statistics approach. Its development was stimulated by the preparation of GOST 11.011-83. x = (x 1 , x 2 , ..., Note that the formulation of statistics for objects of non-numerical nature corresponds to the approach adopted in the general theory of stability. According to this approach, the sample ) x n a set of permissible deviations is matched(x), G those. set of possible values ​​of the vector of observation results = (those. set of possible values ​​of the vector of observation results 1 , those. set of possible values ​​of the vector of observation results 2 , ..., y ). y n

If it is known that the absolute error of the measurement results does not exceed , then the set of permissible deviations has the form

If it is known that the relative error does not exceed , then the set of permissible deviations has the form

Stability theory makes it possible to take into account the “worst” deviations, i.e. leads to minimax-type conclusions, while specific error models allow one to draw conclusions about the behavior of statistics “on average.” Estimates of gamma distribution parameters.

Where a As is known, a random variable X has a gamma distribution if its density is as follows: b– shape parameter,

– scale parameter, - gamma function. Note that there are other ways to parameterize the family of gamma distributions.(If the random variable) = M(X) =, Variance(If the random variable) = M(X) = 2 , Because the

M then the estimates of the method have the form 2 where is the sample arithmetic mean, and where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that

s

– sample variance. It can be shown that for large a * up to infinitesimals of a higher order.

(12)

where is the inverse function of the function

At large where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that

As with method of moments estimators, maximum likelihood estimator b * the scale parameter has the form

At large where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that up to infinitesimals of higher order

Using the properties of the gamma function, it can be shown that for large A

up to infinitesimals of a higher order. Comparing with formulas (11), we are convinced that the mean squared errors for the method of moments estimates are greater than the corresponding mean squared errors for the maximum likelihood estimates.

Thus, from the point of view of classical mathematical statistics, maximum likelihood estimators have an advantage over method of moments estimators. The need to take measurement errors into account.

Let's put

From the properties of the function it follows that for small v a* Due to the consistency of the maximum likelihood estimate

from formula (13) it follows that according to probability at According to the interval data statistics model, the results of observations are not x i , A y i , instead of v

(14)

calculated based on real data

By virtue of the law of large numbers, with a sufficiently small error , which provides the possibility of approximation for the terms in formula (14), or, equivalently, with sufficiently small maximum absolute error in formula (1) or sufficiently small maximum relative error, we have at

by probability (assuming that all errors are equally distributed). a*(those. set of possible values ​​of the vector of observation results) Thus, the presence of errors introduces a shift that, generally speaking, does not disappear as the sample size increases. Therefore, if then the maximum likelihood estimate is not valid. We have According to the interval data statistics model, the results of observations are not where is the value , A, i=1,2,…,where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that determined by formula (12) with the replacement

on A.

. From formula (13) it follows that , instead of those. the influence of measurement errors increases as the From the formulas for And

(16)

w From the formulas for it follows that, up to infinitesimals of a higher order , instead of In order to find the asymptotic distribution

we select using formula (16) and the formula for From the formulas for, the main terms in the corresponding terms where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that Thus, the value , instead of And From the formulas for presented as a sum of independent identically distributed random variables (up to a case-dependent residual term of order 1/

). In each term, two parts are distinguished - one corresponding to MB and the second, which includes Based on representation (17), it can be shown that when the distribution of random variables , instead of And From the formulas for, the type of parameters of the asymptotic distribution (at ) of the maximum likelihood estimate a* and formula (15) one of the main relations of statistics of interval data follows:

(18)

Relation (18) clarifies the statement about the inconsistency a*. It also follows that it makes no sense to increase the sample size indefinitely where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that in order to increase the accuracy of parameter estimation A, since in this case only the second term in (18) decreases, while the first remains constant.

In accordance with the general approach to the statistics of interval data, the standard proposes to determine the rational sample size n rat from the condition of “equalizing errors” (proposed in the monograph) of various types in formula (18), i.e. from the condition

Simplifying this equation under the assumption we get that

According to the above, it is advisable to use only samples with volumes.

Exceeding the rational sample size does not significantly increase the estimation accuracy. Application of methods of stability theory.

Let's find the asymptotic note. As follows from the form of the main linear term in formula (17), the solution to the optimization problem

corresponding to the restrictions on absolute errors, has the form However, the pairs do not form a simple random sample, because in expressions for includes . However, when can be replaced byM( x 1).

Then we get that a at

>1, where

Thus, up to infinitesimals of a higher order, the musical notation has the form A Let us apply the obtained results to the construction of confidence intervals. In the formulation of classical mathematical statistics (i.e., at ) the confidence interval for the shape parameter

, corresponding to the confidence probability, has the form

where is the quantile of the order of the standard normal distribution with mathematical expectation 0 and variance 1,

When formulating statistics for interval data (i.e., when ) one should consider the confidence interval

in a probabilistic formulation (pairs form a simple random sample) and in an optimization formulation. In both probabilistic and optimization formulations, the length of the confidence interval does not tend to 0 when With If restrictions are imposed on the maximum relative error, the value is given, then the value

can be found using the following approximate calculation rules.

(II) The relative error of the product and the quotient is equal to the sum of the relative errors of the factors or, respectively, the dividend and the divisor.

It can be shown that, within the framework of interval data statistics with relative error restrictions, rules (I) and (II) are strict statements when

Let us denote the relative error of a certain quantity t via OP( t), absolute error – through AP( t).

From rule (I) it follows that OP() = , and from rule (II) it follows that

Since the considerations are carried out at then, by virtue of Chebyshev’s inequality

by probability for since both the numerator and denominator in (19) with probability close to 1 lie in the interval where the constant d can be determined using the mentioned Chebyshev inequality.

Since, if (19) is true, up to infinitesimals of a higher order

then using the last three relations we have

(20)

Let's apply one more rule of approximate calculations.

(III) The maximum absolute error of the sum is equal to the sum of the maximum absolute errors of the terms.

From (20) and rule (III) it follows that

From (15) and (21) it follows that

whence, in accordance with the previously obtained formula for the rational sample size with replacement, we obtain that

In particular, when a= 5.00, = 0.01 we get i.e. in a situation in which data on the operating time of cutters to the limit state was obtained, it is irrational to conduct more than 50 observations.

In accordance with previous considerations, the asymptotic confidence interval for a, corresponding to confidence probability = 0.95, has the form

In particular, when we have an asymptotic confidence interval instead of

At large A due to the considerations given when deriving formula (19), it is possible to relate the relative and absolute errors of the observation results According to the interval data statistics model, the results of observations are not :

(21)

Therefore, for large A we have

Thus, the arguments carried out made it possible to calculate the asymptotic behavior of the integral defining the quantity A.

Comparison of assessment methods. Let us study the influence of measurement errors (with restrictions on the absolute error) on the estimation of the method of moments.

We have then the estimates of the method have the form 2 Error then the estimates of the method have the form 2 depends on the calculation method

(22)

. If the formula is used

Compared to the analysis of the influence of errors on the estimate of a*, a new point arises here - the need to take into account errors in the random component of the deviation of the estimate from the estimated parameter, while when considering the maximum likelihood estimate, the errors provided only a bias. Let us accept in accordance with Chebyshev’s inequality

(23)

If you calculate s 2 using the formula

(24)

then similar calculations give that

those. error at large A significantly more. Although the right-hand sides of formulas (22) and (24) are identically equal, the errors of calculations using these formulas are very different. This is due to the fact that in formula (24) the last operation is to find the difference of two large numbers, approximately equal in value (for a sample from the gamma distribution with a large value of the shape parameter).

From the results obtained it follows that

When deriving this formula, linearization of the influence of errors was used (selecting the main linear term). Using the relationship (21) between absolute and relative errors, we can write

This formula differs from that given in

A

b) to increase the accuracy of estimation, it is advisable to increase the sample size without limit;

c) maximum likelihood estimates are better than method of moments estimates,

then in the statistics of interval data, taking into account measurement errors, respectively:

a) there are no consistent assessments: for any assessment a n there is a constant With such that

b) it makes no sense to consider sample sizes larger than the “rational sample size”

c) estimates of the method of moments in a wide range of parameters are better than maximum likelihood estimates, in particular, for and for

It is clear that the above results are valid not only for the considered problem of estimating the parameters of the gamma distribution, but also for many other formulations of applied mathematical statistics.

Metrological, methodological, statistical and computational errors. It is advisable to highlight a number of types of errors in statistical data. Errors caused by inaccuracy in the measurement of source data are called metrological. Their maximum value can be estimated using a musical note.

Methodological errors are caused by the inadequacy of the probabilistic-statistical model, the deviation of reality from its premises. Inadequacy usually does not disappear as the sample size increases. It is advisable to study methodological errors using the “general stability scheme”, which generalizes the model of clogging with large emissions, popular in the theory of robust statistical procedures. Methodological errors are not considered in this chapter.

Statistical error is the error that is traditionally considered in mathematical statistics. Its characteristics are the dispersion of the estimate, the addition to 1 of the power of the criterion with a fixed alternative, etc. As a rule, the statistical error tends to 0 as the sample size increases.

The computational error is determined by calculation algorithms, in particular, rounding rules. At the level of pure mathematics, the identity of the right-hand sides of formulas (22) and (24), which define the sample variance, is true then the estimates of the method have the form 2 , and at the level of computational mathematics, formula (22) under certain conditions gives significantly more correct significant figures than the second.

Above, using the example of the problem of estimating the parameters of the gamma distribution, the combined effect of metrological and computational errors was considered, and the calculation errors were estimated according to the classical rules for manual calculation. It turned out that with this approach, estimates of the method of moments have an advantage over maximum likelihood estimates in a wide range of parameter variations. However, if we take into account only metrological errors, as was done above in examples 1-5, then using similar calculations it can be shown that estimates of these two types have (for sufficiently large where x i are “true” values, observation errors (including sampling errors). In the probabilistic model we assume that) the same error.

We do not consider the computational error in detail here.

A number of interesting results about its role in statistics were obtained by N.N. Lyashenko and M.S. Nikulin.


Previous Did you like the article?