Maximum entropy value. Entropy in our lives

Information theory

At the origins of information theory is Claude Shannon, who in 1947-48 worked on the issue of the efficiency of communication systems. As a result, the goal of this theory was formulated - to increase the capacity of the communication channel. An effective system is one that, other conditions and costs being equal, transmits more information. Typically, the analysis considers the object: the source of information and the channel for transmitting information.

So, there are some events. Information about them in symbolic form, in the form of a signal, is transmitted over a communication channel. It can be argued that a channel is good if it meets two conditions. Firstly, information is transmitted through it at high speed and secondly, interference affecting the transmission reduces the quality of information slightly. In order to find the conditions for such a transfer, it is necessary to enter some information characteristics.

The basic principles of information theory are most clearly manifested with a discrete source and the same channel. Therefore, we will begin our acquaintance with the topic with this assumption.

1.1 Quantitative measure of information.

First, let's figure out what makes sense to transmit over the channel.

If the recipient knows what information will be transmitted, then there is obviously no need to transmit it. It makes sense to convey only what is unexpected. The greater this surprise, the large quantity information must be contained in this event. For example, you work at a computer. A message that today’s work must be completed in 45 minutes. according to the schedule is unlikely to be new to you. This was absolutely clear even before the announcement of the end of work. Therefore, such a message contains zero information; there is no point in passing it on. And now another example. The message is as follows: in an hour, your boss will give you a plane ticket to Moscow and back, and will also allocate a sum of money for entertainment. This kind of information is unexpected for you and, therefore, contains a large number of units of measure. These are the kinds of messages that make sense to convey through the channel. The conclusion is very simple: the more surprise there is in a message, the more information it contains.

Surprise is characterized by probability, which is included in the information measure.

A few more examples. We have two boxes, one with white balls and the other with black balls. How much information is contained in the message where the white balls are? The probability that any given box contains white balls is 0.5. Let's call this probability up to experience or a priori .

Now we take out one ball. Regardless of which ball we took out, after such an experiment we will absolutely know in which box the white balls are in. Therefore, the probability of information will be equal to 1. This probability is called after the experimental or a posteriori .

Let's look at this example from the point of view of the amount of information. So, we have a source of information - boxes with balls. Initially, the uncertainty about the balls was characterized by a probability of 0.5. Then the source “spoke” and gave out information; we pulled out the ball. Then everything became determined with probability 1. It is logical to take the degree of reduction in uncertainty about an event as a result of experience as a quantitative measure of information. In our example it will be 1/0.5.

Now the example is more complex. It is known that the part size can be 120,121,122, . . .,180 mm., that is, it has one of 61 values. The prior probability that part size i mm is 1/61.

We have a very imperfect measuring tool that allows us to measure a part with an accuracy of +5.-5 mm. As a result of the measurements, the size was 130 mm. But in fact it could be 125,126, . . .,135 mm; only 11 values. As a result of the experiment, uncertainty remains, which is characterized by a posteriori probability of 1/11. The degree of uncertainty reduction will be (1/11):(1/61). As above, this ratio is the amount of information.

Most convenient logarithmic function to reflect the amount of information. The base of the logarithm is taken to be two. Let us denote the amount of information
- a priori probability,
- posterior probability. Then,

. (1)

In the first example
1 bit of information; in the second
2.46 bits of information. Bit – one binary unit of information .

Now let's turn to the real source of information, which is a set independent events(messages) with different prior probabilities
. This set represents data about the parameters of the object and there is information about it. Usually, after the source issues a message, it becomes reliably known which parameter was issued. The posterior probability is 1. The amount of information contained in each event will be equal to

. (2)

This value is always Above zero. So many events, so much information. This is not entirely convenient for characterizing the source. Therefore, the concept of entropy is introduced. Entropy is the average amount of information per event (message) of the source . It is found according to the rules for determining the mathematical expectation:

. (3)

Or given the properties of the logarithmic function

. (4)

Entropy dimension bits/message. Let us dwell on the properties of entropy. Let's start with an example. Let's say there is a binary source of information with a priori probabilities of events And making up the complete group. From this follows the connection between them:
. Let's find the entropy of the source:

It is not difficult to see that if one of the probabilities is equal to zero, then the second is equal to 1, and the entropy expression will give zero.

Let's plot the dependence of entropy on
(Fig. 1).

Let us note that entropy is maximum at probability equal to 0.5 and is always positive.

The first property of entropy . Entropy is maximum for equally probable events in the source. In our binary source example, this value is 1. If the source is not binary and contains N words, then the maximum entropy.

The second property of entropy. If the probability of one source message is 1, and the others are zero, as forming a complete group of events, then the entropy is zero. Such a source does not generate information.

The third property of entropy is the entropy addition theorem . Let's look at this question in more detail. Let's say there are two sources of information represented by sets of messages And .

Each source has entropies
And
. Next, these sources are combined, and it is required to find the entropy of the combined ensemble
. Every pair of messages And corresponds to probability
. The amount of information in such a pair will be

Proceeding in a well-known manner, we find the average amount of information per pair of ensemble messages. This will be entropy. True, there can be two cases here. The combined ensembles can be statistically independent and dependent.

Consider the first case of independent ensembles, the appearance of the message is in no way defined . Let's write down the expression for entropy:

, (7)

Here
- number of messages in ensembles.

Since with independence the two-dimensional probability , a, from the general previous formula we obtain

Where
And
are determined by known formulas.

Next we will consider a more complex case. Let us assume that message ensembles are in a statistical relationship, that is with some probability suggests the appearance . This fact is characterized by the conditional probability
; The slash in the notation characterizes the condition. When introducing conditional probabilities, a two-dimensional probability can be defined through the product of one-dimensional ones:

Taking this into account, let us find an expression for entropy. The conversion goes like this:

Given that the sum of all event probabilities is equal to 1, the first double sum in the last expression gives the entropy of the source X, H(x).

The second double sum is called conditional entropy and is denoted as
. Thus,

In a similar way it can be proven that .

In the last expressions we encountered conditional entropy, which is determined by the connection between the combined ensembles of messages. If the ensembles are statistically independent
, and conditional entropy
. As a result, we get the well-known formula.

If messages are absolutely dependent, that is, they are in a functional connection,
takes one of two values: either 1, when
, or 0 when
. The conditional entropy will be equal to 0, since the second ensemble of messages has no surprise, and therefore does not carry information.

After introducing entropy and its properties, let's return to the only source of information. You should know that any source of information works in the current time. Its symbols (signs) occupy a certain place in the sequence. A source of information is called stationary if the probability of a symbol does not depend on its place in the sequence. And one more definition. Source symbols can have a statistical (probabilistic) relationship with each other. An ergodic source of information is one in which the statistical relationship between signs extends to a finite number of previous characters. If this connection covers only two neighboring signs, then such a source is called a simply connected Markov chain. This is the source we will now consider. The symbol generation scheme by the source is shown in Fig. 2.

Symbol Appearance depends on what character was given out by the source at the previous moment. This dependence is determined by the probability
. Let's find the entropy of such a source. We will proceed from the general understanding of entropy as the mathematical expectation of the amount of information. Let's say two characters are displayed as shown in Fig. 2. The amount of information in such a situation is given by the source

By averaging this amount over all possible subsequent symbols, we obtain the partial entropy, provided that the previous one is always given the symbol :

. (13)

Once again, averaging this partial entropy over all previous characters, we get the final result:

The index 2 in the entropy designation indicates that the statistical relationship extends only to two adjacent symbols.

Let us dwell on the properties of the entropy of an ergodic source.

When the symbols in the source are independent
, formula (14) is simplified and reduced to the usual form (4).

The presence of statistical (probabilistic) connections between source symbols always leads to a decrease in entropy,
.

So, a source of information has maximum entropy if two conditions are met: all symbols of the source are equally probable (entropy property) and there are no statistical connections between the symbols of the source.

To show how well the source symbols are used, a redundancy parameter is introduced :

. (15)

Magnitude is in the range from 0 to 1.

The attitude towards this parameter is twofold. On the one hand, the less redundancy, the more efficiently the source operates. On the other hand, the greater the redundancy, the less interference and noise affect the delivery of information from such a source to the consumer. For example, the presence of statistical relationships between symbols increases redundancy, but at the same time increases transmission fidelity. Individual missing characters can be predicted and restored.

Let's look at an example. The source is letters of the Russian alphabet, there are 32 of them in total. Let us determine the maximum entropy:
bit/message.

Since there is a statistical relationship between letters and the probabilities of their appearance in the text are far from identical, the real entropy is equal to 3 bits/message. Hence the redundancy
.

The next characteristic of the source is performance; it characterizes the speed of information generation by the source. Let's assume that each letter of the source is issued over a certain period of time . By averaging these times, we find the average time for issuing one message . The average amount of information produced by a source per unit of time - source productivity
:

. (16)

So, let's summarize. The characteristics of an ergodic source of information are as follows:

the amount of information in each sign,

entropy,

redundancy,

performance.

It should be noted that strong point The introduced measure of the amount of information and, of course, all characteristics is universality. All the concepts introduced above are applicable to any type of information: sociological, technical, etc. The weak side of the measure is that it does not reflect the significance of the information, its value. Information about winning a pen and a car lottery is equally important.

1.2. Information characteristics of the channel

Let us remember that information is transmitted through a communication channel. We previously introduced the information characteristics of the information source, and now we will introduce the information characteristics of the channel. Let's imagine the situation as shown in Fig. 1.

Rice. 1

At the channel input there is an input alphabet consisting of many characters , and at the output - .

P
Let's represent the communication channel with a mathematical model. The most famous representation of a discrete channel is in the form of a graph. Graph nodes obtained by ( ) and transmitted ( ) letters of the alphabet; the edges reflect possible connections between these letters (Fig. 2).

Relationships between letters of the alphabet are usually assessed by conditional probabilities, for example,
probability of acceptance provided that it is transferred . This is the probability of a correct reception. In the same way, one can introduce conditional probabilities of erroneous techniques, for example,
. The reasons for the appearance of these non-zero probabilities are interference, from which none of the real channels is free. Please note that n and m, the number of characters (letters) in the transmitted and received array are not necessarily equal. Based on this model, further definitions are introduced.

Symmetrical channel – this is a channel in which all the probabilities of correct reception for all symbols are equal, and also the probability of erroneous receptions is equal. For such a channel, the conditional probability can be written as follows:

Here – probability of erroneous reception. If this probability does not depend on what characters were transmitted before a given symbol, such a channel is called " channel without memory "As an example, Fig. 3 below shows the graph of a symmetric binary channel without memory.

R
is. 3

Let us further assume that the alphabet at the output of the channel contains an additional symbol, which appears when the receiver decoder cannot recognize the transmitted symbol. In this case, he develops a refusal to make a decision. This position is called erasure. This channel is called channel without memory with erasing and its graph is shown in Fig. 4. The “erasing” position is indicated here by a question mark.

R
is. 4.

The simplest channel with memory is Markov channel . In it, the probabilities of errors depend on whether the previous symbol was received correctly or erroneously.

Along with the graph for the communication channel, there is another description - channel matrix . This is a set of conditional probabilities
or
. Together with a priori probabilities,
And
this gives full picture statistics of a noisy channel. For example, let's look at the channel matrix

Any message we deal with in information theory is a collection of information about some physical system. For example, a message about a normal or increased percentage of defects, about chemical composition raw materials or oven temperature. To the input of the funds management system air defense a message can be transmitted that there are two targets in the air, flying at a certain altitude, at a certain speed. A message can be transmitted to the same input that a certain number of fighters are currently in combat readiness at a certain airfield, or that the airfield has been disabled by enemy fire, or that the first target has been shot down, and the second continues to fly with a modified course. Any of these messages describes the state of some physical system.

Obviously, if the state of the physical system were known in advance, there would be no point in transmitting the message. The message becomes meaningful only when the state of the system is unknown in advance, by chance.

Therefore, as an object about which information is transmitted, we will consider a certain physical system that may randomly end up in one state or another, i.e., a system that is obviously inherent in some degree of uncertainty. Obviously, the information obtained about the system will, generally speaking, be more valuable and meaningful, the greater the uncertainty of the system before receiving this information (“a priori”). A natural question arises: what does a “larger” or “smaller” degree of uncertainty mean and how can it be measured?

To answer this question, let's compare two systems, each of which has some uncertainty.

As the first system, let's take a coin, which, as a result of throwing, can end up in one of two states: 1) the coat of arms came up and 2) the number came up. The second is a dice, which has six possible states: 1, 2, 3, 4, 5 and 6. The question is, which system has more uncertainty? Obviously, the second one, since she has more possible states, in each of which she can end up with the same probability.

It may seem that the degree of uncertainty is determined by the number of possible states of the system. However, in general case this is wrong. Consider, for example, a technical device that can be in two states: 1) operational and 2) faulty. Let us assume that before receiving information (a priori), the probability of proper operation of the device is 0.99, and the probability of failure is 0.01. Such a system has only a very small degree of uncertainty: it is almost certain that the device will work properly. When tossing a coin, there are also two possible states, but the degree of uncertainty is much greater. We see that the degree of uncertainty of a physical system is determined not only by the number of its possible states, but also by the probabilities of the states.

Let's move on to the general case. Let's consider some system that can take a finite set of states: with probabilities , where

(18.2.1)

The probability that the system will assume the state (the symbol denotes the event: the system is in the state). Obviously, .

Let's write this data in the form of a table, where the top line lists the possible states of the system, and the bottom line lists the corresponding probabilities:

This table is written similar to the discontinuous distribution series random variable With possible values, having probabilities. Indeed, between the physical system and finite set states and a discontinuous random variable have much in common; in order to reduce the first to the second, it is enough to assign to each state some numerical value (say, the state number). We emphasize that to describe the degree of uncertainty of the system, it is completely unimportant which values are written in the top row of the table; Only the number of these values and their probabilities are important.

As a measure of a priori uncertainty of a system (or a discontinuous random variable), information theory uses a special characteristic called entropy. The concept of entropy is fundamental in information theory.

The entropy of a system is the sum of the products of probabilities various conditions systems for the logarithms of these probabilities, taken with the opposite sign:

. (18.2.2)

Entropy, as we will see later, has a number of properties that justify its choice as a characteristic of the degree of uncertainty. Firstly, it goes to zero when one of the states of the system is reliable, but the others are impossible. Secondly, for a given number of states it reaches a maximum when these states are equally probable, and when the number of states increases, it increases. Finally, and this is the most important thing, it has the property of additivity, that is, when several independent systems are combined into one, their entropies add up.

The logarithm in formula (18.2.2) can be taken with any base. Changing the base is equivalent to simply multiplying the entropy by constant number, and the choice of base is equivalent to the choice of a specific unit of measurement of entropy. If the number 10 is chosen as the base, then we talk about “decimal units” of entropy, if 2 - about “binary units”. In practice, it is most convenient to use logarithms in base 2 and measure entropy in binary units; this is in good agreement with those used in electronic digital computers binary number system.

In what follows, unless otherwise stated, we will always understand the symbol as the binary logarithm.

The appendix (Table 6) gives binary logarithms of integers from 1 to 100.

It is easy to verify that when choosing 2 as the base of logarithms, entropy is taken as the unit of entropy measurement the simplest system, which has two equally possible states:

Indeed, according to formula (18.2.2) we have:

The unit of entropy defined in this way is called a “binary unit” and is sometimes denoted bit (from the English “binary digit”). This is the entropy of one digit binary number, if it is equally likely to be zero or one.

Let us measure in binary units the entropy of a system that has equally probable states:

that is, the entropy of a system with equally possible states is equal to the logarithm of the number of states.

For example, for a system with eight states .

Let us prove that in the case when the state of the system is exactly known in advance, its entropy is equal to zero. Indeed, in this case, all probabilities in formula (18.2.2) vanish, except for one - for example, which is equal to one. The term goes to zero because . The remaining terms also vanish, since

Let us prove that the entropy of a system with a finite set of states reaches a maximum when all states are equally probable. To do this, consider the entropy of the system (18.2.2) as a function of probabilities and find conditional extremum this function provided:

Using the method undefined multipliers Lagrange, we will look for the extremum of the function:

. (18.2.5)

Differentiating (18.2.5) with respect to and equating the derivatives to zero, we obtain a system of equations:

, (18.2.6)

from which it is clear that the extremum (in in this case maximum) is achieved at equal values of . From condition (18.2.4) it is clear that in this case

, (18.2.7)

and the maximum entropy of the system is:

, (18.2.8)

i.e. maximum value entropy of the system with finite number states is equal to the logarithm of the number of states and is achieved when all states are equally probable.

The calculation of entropy using formula (18.2.2) can be somewhat simplified if a special function is introduced into consideration:

, (18.2.9)

where the logarithm is taken to base 2.

Formula (18.2.2) takes the form:

. (18.2.10)

The function is tabulated; the appendix (Table 7) shows its values for from 0 to 1 through 0.01.

Example 1. Determine the entropy of a physical system consisting of two aircraft (a fighter and a bomber) participating in air combat. As a result of the battle, the system may end up in one of four possible states:

1) both planes are not shot down;

2) the fighter is shot down, the bomber is not shot down;

3) the fighter was not shot down, the bomber was shot down;

4) both planes were shot down.

The probabilities of these states are respectively 0.2; 0.3; 0.4 and 0.1.

Solution. We write the conditions in the form of a table:

For a source with dependent messages, entropy is also calculated as expected value amount of information per element of these messages. The amount of information and entropy are logarithmic measures and are measured in the same units.

6. The entropy of the combined statistically independent sources of information is equal to the sum of their entropies. 7. Entropy characterizes the average uncertainty of choosing one state from the ensemble, completely ignoring the substantive side of the ensemble. ECOSYSTEM ENTROPY is a measure of the disorder of an ecosystem, or the amount of energy unavailable for use. How more indicator entropy, the less stable the ecosystem is in time and space.

4.1.2. Entropy and performance of a discrete message source

Any of these messages describes the state of some physical system. We see that the degree of uncertainty of a physical system is determined not only by the number of its possible states, but also by the probabilities of the states. As a measure of a priori uncertainty of a system (or a discontinuous random variable), information theory uses a special characteristic called entropy.

Entropy, as we will see later, has a number of properties that justify its choice as a characteristic of the degree of uncertainty. Finally, and this is the most important thing, it has the property of additivity, i.e. when several independent systems combine into one, their entropies add up. If the number 10 is chosen as the base, then we talk about “decimal units” of entropy, if 2 – about “binary units”.

Let us prove that the entropy of a system with a finite set of states reaches a maximum when all states are equally probable. Example 3. Determine the maximum possible entropy of a system consisting of three elements, each of which can be in four possible states.

It should be noted that the entropy value obtained in this case will be less than for a source of independent messages. This follows from the fact that in the presence of message dependence, the uncertainty of choice decreases and, accordingly, the entropy decreases. Let's determine the entropy of the binary source. The graph of dependence (4.4) is presented in Fig. 4.1. As follows from the graph, the entropy of a binary source varies from zero to one.

Basic properties of entropy

It is usually noted that entropy characterizes given distribution probabilities in terms of the degree of uncertainty of the outcome of the test, i.e. the uncertainty of the choice of a particular message. Indeed, it is easy to verify that entropy is zero if and only if one of the probabilities is equal to one and all others are equal to zero; this means complete certainty of choice.

Another visual interpretation of the concept of entropy is possible as a measure of the “diversity” of messages created by a source. It is easy to see that the above properties of entropy are quite consistent with the intuitive idea of the measure of diversity. It is also natural to assume that the more diverse the possibilities for choosing this element are, the greater the amount of information contained in a message element.

An expression representing the mathematical expectation of the amount of information in the selected element for a source located in the th state can be called the entropy of this state. The source entropy per message element defined above depends on how messages are divided into elements, i.e., on the choice of alphabet. However, entropy has important property additivity.

Let us note some properties of entropy. Entropy. This is perhaps one of the most difficult concepts to understand that you can encounter in a physics course, at least when it comes to classical physics.

For example, if you ask me where I live, and I answer: in Russia, then my entropy for you will be high, after all, Russia big country. If I tell you my zip code: 603081, then my entropy for you will decrease because you will receive more information.

The entropy of your knowledge of me has decreased by approximately 6 characters. What if I told you that the sum is 59? There are only 10 possible microstates for this macrostate, so its entropy is only one symbol. As you can see, different macrostates have different entropies. We measure entropy as the number of symbols needed to write the number of microstates.

In other words, entropy is how we describe a system. For example, if we heat a gas a little, then the speed of its particles will increase, therefore, the degree of our ignorance about this speed will increase, that is, entropy will increase. Or, if we increase the volume of gas by retracting the piston, our ignorance of the position of the particles will increase, and the entropy will also increase.

On the one hand, this expands the possibilities of using entropy in the analysis of the most various phenomena, but, on the other hand, requires a certain additional assessment of emerging situations. This is firstly. Secondly, the Universe is not an ordinary finite object with boundaries, it is infinity itself in time and space.

MAXIMUM WORK - in thermodynamics 1) work done by a thermally insulated material. Any message we deal with in information theory is a collection of information about some physical system. Obviously, if the state of the physical system were known in advance, there would be no point in transmitting the message.

Obviously, the information obtained about the system will, generally speaking, be more valuable and meaningful, the greater the uncertainty of the system before receiving this information (“a priori”). To answer this question, let's compare two systems, each of which has some uncertainty.

However, in general this is not the case. Consider, for example, a technical device that can be in two states: 1) operational and 2) faulty. We emphasize that to describe the degree of uncertainty of the system, it is completely unimportant which values are written in the top row of the table; Only the number of these values and their probabilities are important. The concept of entropy is fundamental in information theory.

The amount of this information is called entropy. Let's assume that some message includes elements of the alphabet, elements, etc. The quantity is called the entropy of the message source. 3. Entropy is maximum if all states of message elements are equally probable. In information theory, it is proven that always, i.e., the presence of probabilistic connections reduces the entropy of the message source.

A game of billiards begins with the balls being arranged in a neat pyramid on the table. Then the first blow is struck with a cue, which breaks the pyramid. The balls roll across the table along bizarre trajectories, repeatedly collide with the walls of the table and with each other, and finally freeze in some new location. For some reason, the new arrangement is always less orderly. Why? You can try endlessly. The positions of the balls on the table will change each time, but we will never arrive at the same ordered pyramid that was on the table before the first hit. The system spontaneously goes into less ordered states. Never in more orderly. In order for the system to move into an orderly state, outside intervention is necessary. One of the players takes a triangular frame and forms new pyramid. The process requires an investment of energy. There is no way to force the balls to spontaneously line up into a pyramid as a result of collisions with each other and with the walls.

The process of increasing disorder on a billiard table is not controlled (although it requires energy to occur), because a good billiard table is specially made so that the energy of the ball at any point is the same. What happens on the billiard table is demonstrated by another great principle, according to which our Universe is organized: the principle of maximum entropy. Of course, the great principle of the universe is not limited to the billiard table alone. So we'll figure it out.

Entropy is a measure of the disorder of a system. The less order there is in a system, the higher its entropy. It probably makes sense to talk about what is considered order and what is disorder.

Order can be understood as a regular arrangement of particles, when distances and directions are repeated, and from the location of several particles one can predict the location of the next one. If the particles are uniformly mixed without any visible law of arrangement, it is a disorder. If the particles are neatly collected in one area of space, this is order. If they are scattered everywhere, it is a mess. If different components of the mixture are in different places, this is order. If everything is mixed up, it's a mess. In general, ask your mother or wife - she will explain.

The entropy of a gas (by the way, the word "gas" is a corruption of the Greek "chaos") is higher than that of a liquid. The entropy of the liquid is higher than solid. Generally speaking, increasing the temperature increases the disorder. Of all states of matter, will have the least entropy hard crystal at a temperature absolute zero. This entropy is taken to be zero.

IN various processes entropy changes. If in some process there is no change in energy, then the process proceeds spontaneously only if this leads to an increase in the entropy of the system. (We'll discuss what happens when both entropy and energy change a little later.) This is why, after being hit with a cue, the balls on a billiard table move into a less ordered position. Entropy changes in various systems can be summarized as maximum entropy principle:

Any system spontaneously strives to occupy the most disordered state available to it.

Very often this same thing is formulated in the form principle of non-decrease of entropy:

The entropy of an isolated system cannot decrease.

This formulation gave rise and continues to give rise to a lot of controversy on the topic of the thermal death of the Universe: The Universe, by definition, is isolated system(since she does not have environment, with which an exchange of mass or energy would be possible), therefore, its entropy gradually increases. Consequently, the Universe will eventually come to a state of complete homogeneous disorder, in which not a single object can exist that is somehow different from its environment. Topic in highest degree fascinating, but let's talk about this some other time.

Entropy is defined as the average value of the ensemble's own information

The maximum entropy method, similar to the maximum information method, is based on searching among all possible probability distributions for one that has maximum entropy type (3.19). Thus, the maximum entropy criterion is used to remove the uncertainty of the solution, and the functional (3.19) acts as a kind of “measure of quality” of the image.

The meaning of such a quality measure can be understood by turning to the problem of estimating probability distribution densities in mathematical statistics. When famous moments random distribution the estimate obtained by maximizing expression (3.19) is the least biased of all possible estimates. It can be expected that the maximum (3.19), with restrictions imposed on the image formation process, will give good mark distribution density. Let's try to consider the process of image formation and find out physical meaning maximum entropy criterion.

Let the total intensity of the source be equal to and the intensity from the point is emitted from Let us count the number of ways in which a given object can be formed from rays:

Now let’s find the distribution that will be formed in the greatest number of cases

Replacing it with its logarithm (the maximum will not shift) and using the Stirling formula, we get:

To solve the problem, it is also necessary to take into account restrictions on the formation equations:

as well as a limitation on the total intensity of the image, i.e.

The expressions form the basis of the maximum entropy method. The physical meaning of applying the maximum entropy criterion is to search for such a probability distribution at the channel input, which in most cases forms a given output distribution or to search for the most plausible source distribution at given conditions formation. In this sense, the maximum entropy method can be considered as a method maximum likelihood for the ray imaging model.

Let's consider one of the most common forms of writing the maximum entropy method. We will consider simultaneously with image formation the parallel formation of a noise field:

Based on the above reasoning, we find that the noise field can be created in ways where

To solve the problem it is necessary to maximize joint probability image and noise field formation

Taking the logarithm of this expression gives the sum of the noise and image entropies:

Taking into account the restrictions on the formation process and maintaining the number of rays (total intensity), we obtain the following optimization problem:

where the quantities and are the Lagrange multipliers of the optimization problem. To solve the system, we find the partial derivatives (3.25) with respect to and equate them to zero:

Substituting expressions for and from (3.26), (3.27) into the constraint equations, we find

From equations of the form (3.28), the Lagrange multipliers are determined, which are used to find the input distribution function:

The exponential in (3.29) ensures the positivity of the solution. The entropy functional itself is significantly nonlinear, which causes an interesting feature of equations (3.29): they can contain spatial frequencies that were absent in the spectrum of the distorted image. This allows us to talk about the possibility of “super-resolution,” i.e., the restoration of information destroyed by a generation system with a limited bandwidth (Chapter 5 is devoted to the effect of super-resolution and the assessment of its capabilities). Note also that the solutions obtained on the basis of (3.29) have increased quality compared with linear algorithms recovery, but require solutions complex system nonlinear equations.

There is an alternative to the entropy expression in the form (3.19), proposed by Burg for estimating power spectra. This form of entropy has the following form:

The reconstruction method based on expression (3.30) can also be used in image processing practice. Let us know the noisy spectrum samples

where, respectively, are the samples of the spectra. Let us impose a limitation on the discrepancy between the true and noisy samples of the spectrum of the observed image:

Then to find a solution it is necessary to maximize a simpler functional:

It should be noted that in Lately appeared big number algorithms based on both (3.19) and (3.30), using a wide variety of restrictions arising from the formulation of each specific task. True, the presence of two entropy norms raises some doubts, firstly, due to the fact that it is unclear which one to use in practice, and secondly, due to the insufficiently clear formulation of the recovery problem.

There is another interesting feature algorithms based on searching for maximum entropy. Let us turn to expressions (3.27)-(3.29) for the case ideal system formation, but in the presence of additive noise. It is easy to see that the use of the maximum entropy algorithm in this case claims to isolate the image from the noise without any a priori characteristics of the noise and signal. However, a more careful analysis shows that the solution using equations of the form (3.28) gives a paradoxical result: the signal and noise are related linear dependence. Indeed, the signal estimate here is equal to

and the noise estimate will be:

IN practical applications To avoid this effect, the expression for the entropy of noise is taken with a certain weighting coefficient and instead of (3.24), the following functional is considered:

This technique, however, leaves the physical meaning of derivative transformations unclear.

Another disadvantage of the maximum entropy method is that best results with its help, they are obtained by reconstructing objects consisting of individual impulses on a homogeneous background, and attempts to apply the method to spatially extended objects cause the appearance of fluctuations.

The presented results concerning the maximum entropy and maximum information methods can be combined

into a single scheme based on the construction of algorithms for estimating the distribution density using the maximum likelihood method. Thus, the considered algorithms can be included in the group of statistical regularization methods described in § 2.4. The only difference is that these algorithms are based on a different statistical model - the representation of the image itself as a probability density. Such a model immediately leads to the nonlinearity of the functionals under consideration. However, the previously noted disadvantages force us to look for algorithms that, while maintaining the advantages of information-theoretic restoration methods (unlimited frequency band, non-negativity of the solution, etc.), allow us to restore a wider class of images.