Pattern recognition is carried out on the basis of processes. “Adaptive control of complex systems based on the theory of pattern recognition

Brute force method. In this method, a comparison is made with a certain database, where for each object different options for modifying the display are presented. For example, for optical pattern recognition, you can use the brute force method. different angles or scales, offsets, deformations, etc. For letters, you can toggle the font or its properties. In the case of sound pattern recognition, a comparison is made with some known patterns (a word spoken by many people). Further, more deep Scan characteristics of the image. In the case of optical recognition, this may be the determination of geometric characteristics. In this case, the sound sample is subjected to frequency and amplitude analysis.

Next method - use of artificial neural networks(INS). It requires either a huge number of examples of a recognition task, or a special neural network structure that takes into account the specifics of a given task. But, nevertheless, this method is highly efficient and productive.

Methods based on estimates of distribution densities of feature values. Borrowed from the classical theory of statistical decisions, in which the objects of study are considered as implementations of a multidimensional random variable, distributed in the feature space according to some law. They are based on a Bayesian decision-making framework that appeals to initial probabilities belonging of objects to one class or another and conditional distribution densities of features.

A group of methods based on estimating the distribution densities of feature values ​​is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed in modern statistics parametric methods for which it is considered known analytical expression distribution law ( normal law) and you only need to estimate a small amount of parameters (vectors of mean values ​​and covariance matrices). The main difficulties in using this method are considered to be the need to remember the entire training sample to calculate density estimates and the high sensitivity to the training sample.

Methods based on assumptions about the class of decision functions. In this group, the type of the decision function is considered known and the functional of its quality is specified. Based on this functional, the optimal approximation to the decision function is found using the training sequence. Quality functionality decisive rule usually associated with an error. The main advantage of the method is the clarity of the mathematical formulation of the recognition problem. The possibility of extracting new knowledge about the nature of an object, in particular knowledge about the mechanisms of interaction of attributes, is here fundamentally limited by the given structure of interaction, fixed in the selected form of decision functions.

Method of comparison with the prototype. This is the easiest extensional recognition method in practice. It is used when the recognized classes are shown as compact geometric classes. Then the center of the geometric grouping (or the object closest to the center) is selected as the prototype point.

To classify an undefined object, the closest prototype to it is found and the object belongs to the same class as it. Obviously, no generalized images are formed in this method. As a measure, they can be used Various types distances

The k-nearest neighbors method. The method consists in the fact that when classifying an unknown object, a given number (k) of geometrically closest features in the space of other nearest neighbors with already known membership in any class is found. The decision to classify an unknown object is made by analyzing information about its nearest neighbors. The need to reduce the number of objects in the training sample (diagnostic precedents) is a disadvantage of this method, since this reduces the representativeness of the training sample.

Based on the fact that different recognition algorithms manifest themselves differently on the same sample, the question arises about synthetic decisive rule, which would use strengths all algorithms. For this purpose, there is a synthetic method or groups of decision rules that combine the maximum positive sides each of the methods.

To conclude the review of recognition methods, we will present the essence of the above in a summary table, adding there also some other methods used in practice.

Table 1. Table of classification of recognition methods, comparison of their areas of application and limitations

Classification of recognition methods

Application area

Limitations (disadvantages)

Intensive recognition methods

Methods based on density estimates

Problems with a known distribution (normal), the need to collect large statistics

The need to enumerate the entire training sample during recognition, high sensitivity to non-representativeness of the training sample and artifacts

Assumption-Based Methods

Classes must be well separable

The type of the decision function must be known in advance. Inability to take into account new knowledge about correlations between traits

Boolean Methods

Small problems

When selecting logical decision rules, exhaustive search is required. High labor intensity

Linguistic methods

The task of determining grammar from a certain set of statements (descriptions of objects) is difficult to formalize. Unsolved theoretical problems

Extensional recognition methods

Method of comparison with a prototype

Problems of small dimension of feature space

High dependence of classification results on metrics. Unknown optimal metric

k nearest neighbors method

High dependence of classification results on metrics. The need for a complete enumeration of the training sample during recognition. Computational effort

Algorithms for calculating estimates (ABO)

Problems of small dimension in terms of the number of classes and features

Dependence of classification results on metrics. The need for a complete enumeration of the training sample during recognition. High technical complexity of the method

Decision Rule Collectives (DRCs) are a synthetic method.

Problems of small dimension in terms of the number of classes and features

Very high technical complexity of the method, unsolved number of theoretical problems, both in determining the areas of competence of private methods and in the private methods themselves

And signs. Such problems are solved quite often, for example, when crossing or driving through a street following traffic lights. Recognizing the color of a lit traffic light and knowing the rules traffic allows you to accept correct solution about whether it is possible or not to cross the street at the moment.

In the process of biological evolution, many animals solved problems with the help of their visual and auditory apparatus. pattern recognition good enough. Creation of artificial systems pattern recognition remains complex theoretical and technical problem. The need for such recognition arises in a variety of areas - from military affairs and security systems to the digitization of all kinds of analog signals.

Traditionally, pattern recognition tasks are included in the range of artificial intelligence tasks.

Directions in pattern recognition

Two main directions can be distinguished:

  • Studying the recognition abilities that living beings possess, explaining and modeling them;
  • Development of theory and methods for constructing devices designed to solve individual problems in applied applications.

Formal statement of the problem

Pattern recognition is the assignment of source data to a specific class using selection essential features, characterizing these data from total mass irrelevant data.

When setting recognition problems, they try to use mathematical language, trying, unlike the theory of artificial neural networks, where the basis is obtaining a result through experiment, to replace experiment with logical reasoning and mathematical proof.

Monochrome images are most often considered in pattern recognition problems, which makes it possible to consider the image as a function on a plane. If we consider a point set on the plane T, where the function x(x,y) expresses at each point of the image its characteristics - brightness, transparency, optical density, then such a function is a formal recording of the image.

The set of all possible functions x(x,y) on surface T- there is a model of the set of all images X. Introducing the concept similarities between the images you can pose a recognition task. The specific type of such a statement strongly depends on the subsequent stages of recognition in accordance with one or another approach.

Pattern recognition methods

For optical pattern recognition, you can use the method of searching through the type of object under different angles, scales, offsets, etc. For letters, you need to sort through the font, font properties, etc.

The second approach is to find the outline of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either large quantity examples of a recognition task (with correct answers), or a special neural network structure that takes into account the specifics of this task.

Perceptron as a pattern recognition method

F. Rosenblatt, introducing the concept of a brain model, the task of which is to show how in some physical system, the structure and functional properties of which are known, psychological phenomena can arise - he described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment on the basis of which you can obtain psychological meaningful information about a system, boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention on the part of the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to train the system to carry out the required classification.

In an experiment with perceptron training, a certain sequence of images is usually presented, which includes representatives of each of the classes to be distinguished. According to some memory modification rule right choice reactions are reinforced. The perceptron is then presented with a control stimulus and the probability of obtaining the correct response for the stimuli is determined of this class. Depending on whether the selected control stimulus coincides or does not coincide with one of the images that were used in the training sequence, different results are obtained:

  • 1. If the control stimulus does not coincide with any of the training stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
  • 2. If a control stimulus excites a certain set of sensory elements completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization .

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough to one of the images with which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

  • Letter recognition.
  • Barcode recognition.
  • License plate recognition.
  • Face recognition.
  • Speech recognition.
  • Image recognition.
  • Recognition of local areas of the earth's crust in which mineral deposits are located.

Pattern recognition programs

see also

Notes

Links

  • Yuri Lifshits. Course “Modern problems of theoretical computer science” - lectures on statistical methods of pattern recognition, face recognition, text classification
  • Journal of Pattern Recognition Research

Literature

  • David A. Forsythe, Jean Pons Computer vision. Modern approach = Computer Vision: A Modern Approach. - M.: "Williams", 2004. - P. 928. - ISBN 0-13-085198-1
  • George Stockman, Linda Shapiro Computer vision = Computer Vision. - M.: Binom. Knowledge Laboratory, 2006. - P. 752. - ISBN 5947743841
  • A.L.Gorelik, V.A.Skripkin, Recognition methods, M.: graduate School, 1989.
  • Sh.-K. Cheng, Principles of design of visual information systems, M.: Mir, 1994.

Wikimedia Foundation. 2010.

- in technology, a scientific and technical direction associated with the development of methods and the construction of systems (including computer-based) to establish the belonging of a certain object (object, process, phenomenon, situation, signal) to one of the advance... ... Big Encyclopedic Dictionary

One of the new regions cybernetics. The content of the theory of R. o. is the extrapolation of the properties of objects (images) belonging to several classes to objects that are close to them in some sense. Usually, when training an automaton R. o. available... ... Geological encyclopedia

English recognition, image; German Gestalt alterkennung. A branch of mathematical cybernetics that develops principles and methods for classifying and identifying objects described by a finite set of features that characterize them. Antinazi. Encyclopedia... ... Encyclopedia of Sociology

Pattern recognition- method of studying complex objects using a computer; consists in selecting features and developing algorithms and programs that allow computers to automatically classify objects based on these features. For example, determine which... ... Economic-mathematical dictionary

- (technical), scientific and technical direction associated with the development of methods and construction of systems (including computer-based ones) to establish the belonging of a certain object (object, process, phenomenon, situation, signal) to one of the advance... ... encyclopedic Dictionary

PATTERN RECOGNITION- a section of mathematical cybernetics that develops methods of classification, as well as identification of objects, phenomena, processes, signals, situations of all those objects that can be described by a finite set of certain signs or properties... ... Russian Sociological Encyclopedia

pattern recognition- 160 pattern recognition: Identifying forms of representations and configurations using automatic means

Lecture No. 17.PATTERN RECOGNITION METHODS

The following groups of recognition methods are distinguished:

Proximity Function Methods

Discriminant function methods

Statistical recognition methods.

Linguistic methods

Heuristic methods.

The first three groups of methods are focused on the analysis of features expressed as numbers or vectors with numerical components.

A group of linguistic methods provides pattern recognition based on the analysis of their structure, described by the corresponding structural features and the relationships between them.

The group of heuristic methods combines characteristic techniques and logical procedures used by humans in pattern recognition.

Proximity Function Methods

The methods of this group are based on the use of functions that estimate the measure of proximity between the recognized image and the vector x* = (x* 1 ,….,x*n), and reference images of various classes, represented by vectors x i = (x i 1 ,…, x i n), i= 1,…,N, Where i – image class number.

The recognition procedure according to this method consists of calculating the distance between the point of the recognized image and each of the points representing the reference image, i.e. in calculating all values d i , i= 1,…,N. The image belongs to a class for which the value d i has the least importance among all i= 1,…,N .

A function that assigns each pair of vectors x i, x* real number as a measure of their proximity, i.e. determining the distance between them can be quite arbitrary. In mathematics, such a function is called the metric of space. It must satisfy the following axioms:

r(x,y)=r(y,x);

r(x,y) > 0 if x not equal y And r(x,y)=0 if x=y;

r(x,y) <=r(x,z)+r(z,y)

The listed axioms are satisfied, in particular, by the following functions

a i= 1/2 , j=1,2,…n.

b i=sum j=1,2,…n.

c i=max abs ( x ix j *), j=1,2,…n.

The first of them is called the Euclidean norm of a vector space. Accordingly, spaces in which the specified function is used as a metric are called Euclidean space.

Often, the root mean square difference in the coordinates of the recognized image is chosen as a proximity function x* and standard x i, i.e. function

d i = (1/n) sum( x i jx j *) 2 , j=1,2,…n.

Magnitude d i geometrically interpreted as the square of the distance between points in the feature space, related to the dimension of the space.

It often turns out that different features are not equally important in recognition. In order to take this circumstance into account when calculating the proximity functions, the coordinate differences corresponding to more important features are multiplied by large coefficients, and to less important ones - by smaller ones.

In this case d i = (1/n) sum w j (x i jx j *) 2 , j=1,2,…n,

Where w j– weighting coefficients.

The introduction of weighting coefficients is equivalent to scaling the axes of the feature space and, accordingly, stretching or compressing the space in certain directions.

The indicated deformations of the feature space pursue the goal of placing the points of the reference images in such a way that corresponds to the most reliable recognition in conditions of a significant scatter of images of each class in the vicinity of the point of the reference image.

Groups of image points close to each other (clusters of images) in the feature space are called clusters, and the task of identifying such groups is called a clustering problem.

The task of identifying clusters is classified as an unsupervised pattern recognition task, i.e. to recognition problems in the absence of an example of correct recognition.

Discriminant function methods

The idea of ​​the methods of this group is to construct functions that define boundaries in the space of images that divide the space into areas corresponding to classes of images. The simplest and most frequently used functions of this kind are functions that linearly depend on the values ​​of features. In the feature space they correspond to dividing surfaces in the form of hyperplanes. In the case of a two-dimensional feature space, a straight line acts as a separating function.

The general form of the linear decision function is given by the formula

d(x)=w 1 x 1 + w 2 x 2 +…+w n x n +w n +1 = Wx+w n

Where x- image vector, w=(w 1 , w 2 ,…w n) – vector of weighting coefficients.

In case of splitting into two classes X 1 and X 2 discriminant function d(x) allows recognition in accordance with the rule:

x belongs X 1 if d(x)>0;

x belongs X 2 if d(x)<0.

If d(x)=0, then there is a case of uncertainty.

In the case of splitting into several classes, several functions are introduced. In this case, each class of images is assigned a certain combination of discriminatory function signs.

For example, if three discriminant functions are introduced, then the following option for identifying image classes is possible:

x belongs X 1 if d 1 (x)>0,d 2 (x)<0,d 3 (x)<0;

x belongs X 2 if d(x)<0,d 2 (x)>0,d 3 (x)<0;

x belongs X 3 if d(x)<0,d 2 (x)<0,d 3 (x)>0.

It is assumed that for other combinations of values d 1 (x),d 2 (x),d 3 (x) there is a case of uncertainty.

A variation of the discriminant function method is the decision function method. In it, if available m classes are assumed to exist m functions d i(x), called decisive, such that if x belongs X i, That d i(x) > dj(x) for all j unequal i,those. decisive function d i(x) It has maximum value among all functions dj(x), j=1,...,n..

An illustration of this method can be a classifier based on estimating the minimum Euclidean distance in the feature space between the image point and the standard. Let's show it.

Euclidean distance between the feature vector of the recognized image x and the vector of the reference image is determined by the formula || x ix|| = 1/2 , j=1,2,…n.

Vector x will be assigned to the class i, for which the value || x ix*|| minimal.

Instead of distance, you can compare the square of the distance, i.e.

||x ix|| 2 = (x ix)(x ix) t = x x- 2x x i +x i x i

Since the value x x the same for everyone i, minimum function || x ix|| 2 will coincide with the maximum of the decision function

d i(x) = 2x x i -x i x i.

that is x belongs X i, If d i(x) > dj(x) for all j unequal i.

That. the minimum distance classifying machine is based on linear decision functions. The general structure of such a machine uses decisive functions of the form

d i (x)=w i 1 x 1 + w i 2 x 2 +…+w in x n +w i n +1

It can be visually represented by the corresponding block diagram.

For a machine that performs classification based on the minimum distance, the following equalities hold: w ij = -2x i j , w i n +1 = x i x i.

Equivalent recognition by the discriminant function method can be carried out by defining the discriminant functions as differences d ij (x)=d i (x)‑dj (x).

The advantage of the discriminant function method is the simple structure of the recognition machine, as well as the possibility of its implementation mainly through predominantly linear decision blocks.

Another important advantage of the discriminant function method is the ability to automatically train a machine for correct recognition based on a given (training) sample of images.

At the same time, the automatic learning algorithm turns out to be very simple in comparison with other recognition methods.

For these reasons, the discriminant function method has gained wide popularity and is very often used in practice.

Self-training procedures for pattern recognition

Let us consider methods for constructing a discriminant function for a given (training) sample in relation to the problem of dividing images into two classes. If two sets of images are given, belonging respectively to classes A and B, then the solution to the problem of constructing a linear discriminant function is sought in the form of a vector of weighting coefficients W=(w 1 ,w 2 ,...,w n,w n+1), which has the property that for any image the following conditions are satisfied:

x belongs to class A if >0, j=1,2,…n.

x belongs to class B if<0, j=1,2,…n.

If the training set consists of N images of both classes, the task reduces to finding a vector w that ensures the validity of the system of inequalities. If the training sample consists of N images of both classes, the task comes down to finding the vector w, ensuring the validity of the system of inequalities

x 1 1 w i+x 21 w 2 +...+x n 1 w n+w n +1 >0;

x 1 2 w i+x 22 w 2 +...+x n 2 w n+w n +1 <0;

x 1 iw i+x 2i w 2 +...+x ni w n+w n +1 >0;

................................................

x 1 Nw i +x 2N w 2 +...+x nN w n +w n + 1>0;

Here x i=(x i 1 ,x i 2 ,...,x i n ,x i n+ 1 ) - vector of image feature values ​​from the training sample, the sign > corresponds to image vectors x, belonging to class A, and the sign< - векторам x, belonging to class B.

Required vector w exists if classes A and B are separable and does not exist otherwise. Vector component values w can be found either in advance, at the stage preceding the hardware implementation of the SRO, or directly by the SRO itself during its operation. The last of these approaches provides greater flexibility and autonomy of the SRO. Let's consider it using the example of a device called a percentron. invented in 1957 by the American scientist Rosenblatt. A schematic representation of the percentron, which ensures that an image is assigned to one of two classes, is presented in the following figure.

Retina S Retina A Retina R

oh oh x 1

oh oh x 2

oh oh x 3

o (sum)-------> R(reaction)

oh oh x i

oh oh x n

oh oh x n +1

The device consists of retinal sensory elements S, which are randomly connected to associative elements of the retina A. Each element of the second retina produces an output signal only if a sufficient number of sensory elements connected to its input are in an excited state. Whole system response R is proportional to the sum of the reactions of the elements of the associative retina taken with certain weights.

Designated by x i reaction i th associative element and through w i- reaction weight coefficient i th associative element, the system reaction can be written as R=sum( w j x j), j=1,..,n. If R>0, then the image presented to the system belongs to class A, and if R<0, то образ относится к классу B. Описание этой процедуры классификации соответствует рассмотренным нами раньше принципам классификации, и, очевидно, перцентронная модель распознавания образов представляет собой, за исключением сенсорной сетчатки, реализацию линейной дискриминантной функции. Принятый в перцентроне принцип формирования значений x 1 , x 2 ,...,x n corresponds to some algorithm for generating features based on signals from primary sensors.

In general there can be several elements R, forming the perceptron reaction. In this case, they speak of the presence of a retina in the perceptron R reacting elements.

The percentron scheme can be extended to the case when the number of classes is more than two, by increasing the number of retinal elements R up to the number of distinguishable classes and the introduction of a block for determining the maximum reaction in accordance with the diagram presented in the above figure. In this case, the image is assigned to the class with number i, If R i>R j, for all j.

The percentron training process consists of selecting the values ​​of the weighting coefficients w j so that the output signal corresponds to the class to which the recognized image belongs.

Let's consider the percentron action algorithm using the example of recognizing objects of two classes: A and B. Objects of class A must correspond to the value R= +1, and class B - the value R= -1.

The learning algorithm is as follows.

If the next image x belongs to class A, but R<0 (имеет место ошибка распознавания), тогда коэффициенты w j with indices to which the values ​​correspond x j>0, increase by some amount dw, and the remaining coefficients w j reduced by dw. In this case, the value of the reaction R receives an increment towards it positive values, corresponding to the correct classification.

If x belongs to class B, but R>0 (there is a recognition error), then the coefficients w j with indices that correspond to x j<0, увеличивают на dw, and the remaining coefficients w j reduced by the same amount. In this case, the value of the reaction R receives an increment towards negative values ​​corresponding to the correct classification.

The algorithm thus makes a change to the vector of weights w if and only if the image presented on k-th training step, was incorrectly classified when performing this step, and leaves the vector of weights w no change if classified correctly. The proof of the convergence of this algorithm is presented in [Tu, Gonzalez]. Such training will ultimately (with proper selection dw and linear separability of image classes) leads to the vector w, ensuring correct classification.

Statistical recognition methods.

Statistical methods are based on minimizing the probability of classification error. Probability P of incorrect classification of an image submitted for recognition, described by a feature vector x, is determined by the formula

P = sum[ p(i)prob( D(x)+i | x class i)]

Where m- number of classes,

p(i) = prob ( x belongs to class i) - a priori probability of belonging to an arbitrary image x To i th class (frequency of appearance of images i-th class),

D(x) - a function that makes a classification decision (vector of features x matches the class number i from the set (1,2,..., m}),

prob( D(x) not equal i| x belongs to class i) - probability of event " D(x) not equal i" when the membership condition is met x class i, i.e. probability of making an erroneous decision by the function D(x) for a given value x, owned i-th class.

It can be shown that the probability of misclassification reaches a minimum if D(x)=i if and only if p(x|ip(i)>p(x|jp(j), for all i+j, Where p(x|i) - image distribution density i-class in the feature space.

According to the above rule, the point x belongs to the class to which the maximum value corresponds p(i) p(x|i), i.e. product of the prior probability (frequency) of the appearance of images i-class and image distribution density i-class in the feature space. The presented classification rule is called Bayesian, because it follows from the Bayes formula known in probability theory.

Example. Suppose it is necessary to carry out recognition discrete signals at the exit information channel exposed to noise.

Each input signal represents a 0 or 1. As a result of signal transmission, the value appears at the output of the channel x, which is superimposed with Gaussian noise with zero mean and variance b.

To synthesize a classifier that performs signal recognition, we will use the Bayesian classification rule.

We will combine signals representing ones into class No. 1, and signals representing zeros into class No. 2. Let it be known in advance that on average out of every 1000 signals a signals represent units and b signals - zero. Then the values ​​of the a priori probabilities of the appearance of signals of the 1st and 2nd classes (ones and zeros), respectively, can be taken equal to

p(1)=a/1000, p(2)=b/1000.

Because the noise is Gaussian, i.e. obeys the normal (Gaussian) distribution law, then the distribution density of images of the first class depending on the value x, or, which is the same thing, the probability of obtaining the output value x when a signal 1 is applied at the input, it is determined by the expression

p(x¦1) =(2pib) -1/2 exp(-( x-1) 2 /(2b 2)),

and the distribution density depending on the value x images of the second class, i.e. probability of obtaining the output value x when a signal 0 is applied at the input, it is determined by the expression

p(x¦2)= (2pib) -1/2 exp(- x 2 /(2b 2)),

Application of the Bayesian decision rule leads to the conclusion that a class 2 signal has been transmitted, i.e. null is passed if

p(2) p(x¦2) > p(1) p(x¦1)

or, more specifically, if

b exp(- x 2 /(2b 2)) > a exp(-( x-1) 2 /(2b 2)),

By dividing left side inequality on the right, we get

(b/a) exp((1-2 x)/(2b 2)) >1,

where after taking logarithms we find

1-2x> 2b 2 ln(a/b)

x< 0.5 - б 2 ln(a/b)

From the resulting inequality it follows that when a=b, i.e. with equal a priori probabilities of occurrence of signals 0 and 1, the image is assigned the value 0 when x<0.5, а значение 1, когда x>0.5.

If it is known in advance that one of the signals appears more often and the other less frequently, i.e. in case of unequal values a And b, the response threshold of the classifier shifts in one direction or another.

So when a/b=2.71 (which corresponds to 2.71 times more frequent transmission of units) and b 2 =0.1, the image is assigned the value 0 if x<0.4, и значение 1, если x>0.4. If information about prior distribution probabilities is not available, then statistical methods recognition, which are based on classification rules other than Bayesian.

However, in practice, the most common methods are those based on Bayes' rules due to their greater efficiency, and also due to the fact that in most pattern recognition problems it is possible to set prior probabilities appearance of images of each class.

Linguistic methods of pattern recognition.

Linguistic methods of pattern recognition are based on the analysis of the description of an idealized image, presented in the form of a graph or a chain of characters, which is a phrase or sentence of a certain language.

Consider the idealized images of letters obtained as a result of the first stage of linguistic recognition described above. These idealized images can be specified by descriptions of graphs, presented, for example, in the form of connection matrices, as was done in the example discussed above. The same description can be represented by the phrase formal language(expression).

Example. Let three images of the letter A be given, obtained as a result of preliminary image processing. Let's denote these images by identifiers A1, A2 and A3.

To describe the presented images linguistically, we will use PDL (Picture Description Language). The PDL vocabulary includes the following symbols:

1. Names of the simplest images (primitives). As applied to the case under consideration, the primitives and their corresponding names are as follows.

Images in the form of a line directed:

up and left (le F t), north (north), up and to the right (right), east).

Names: L, N, R, E.

2. Symbols of binary operations. (+,*,-) Their meaning corresponds to the sequential connection of primitives (+), the connection of the beginnings and endings of primitives (*), the connection of only the endings of primitives (-).

3. Right and left brackets. ((,)) Parentheses allow you to determine the sequence of operations in an expression.

The considered images A1, A2 and A3 are described in the PDL language, respectively, by the following expressions.

T(1)=R+((R-(L+N))*E-L

T(2)=(R+N)+((N+R)-L)*E-L

T(3)=(N+R)+(R-L)*E-(L+N)

After the linguistic description of the image has been constructed, it is necessary, using some recognition procedure, to analyze whether or not this image belongs to the class of interest to us (class of letters A), i.e. Whether or not this image has some structure. To do this, first of all, it is necessary to describe the class of images that have the structure that interests us.

Obviously, the letter A always contains the following structural elements: a left leg, a right leg, and a head. Let's call these elements STL, STR, TR, respectively.

Then in the PDL language the symbol class A - SIMB A is described by the expression

SIMB A = STL + TR - STR

The left "leg" of STL is always a chain of elements R and N, which can be written like this

STL ‑> R ¦ N ¦ (STL + R)¦ (STL + N)

(STL is the character R or N, or a string obtained by adding the characters R or N to the source STL string)

The right “leg” of STR is always a chain of elements L and N, which can be written like this, i.e.

STR ‑> L¦N¦ (STR + L)¦(STR + N)

The head part of the letter - TR is a closed contour made up of element E and chains such as STL and STR.

In PDL, the TR structure is described by the expression

TR ‑> (STL - STR) * E

We finally get the following description of the letter class A:

SIMB A ‑> (STL + TR - STR),

STL ‑> R¦N¦ (STL + R)¦(STL + N)

STR ‑> L¦N¦ (STR + L)¦(STR + N)

TR ‑> (STL - STR) * E

The recognition procedure in this case can be implemented as follows.

1. The expression corresponding to the image is compared with the reference structure STL + TR - STR.

2. Each element of the structure STL, TR, STR, if possible, i.e. if the image description is comparable to the standard, some subexpression from the expression T(A) is matched. For example,

for A1: STL=R, STR=L, TR=(R-(L+N))*E

for A2: STL = R + N, STR = L, TR = ((N + R) - L) * E

for A3: STL = N + R, STR = L + N, TR = (R - L) * E 3.

Expressions STL, STR, TR are compared with their corresponding reference structures.

4. If the structure of each expression STL, STR, TR corresponds to the standard, a conclusion is made that the image belongs to the letter class A. If at any of stages 2, 3, 4 a discrepancy between the structure of the analyzed expression and the standard is detected, a conclusion is drawn that the image does not belong to the SIMB class A. Comparison of expression structures can be carried out using algorithmic languages ​​LISP, PLANER, PROLOG and other similar languages artificial intelligence.

In the example under consideration, all STL chains are composed of symbols N and R, and STR chains are composed of symbols L and N, which corresponds to the given structure of these chains. The structure of TR in the images under consideration also corresponds to the reference one, since consists of the “difference” of chains like STL, STR, “multiplied” by the symbol E.

Thus, we come to the conclusion that the images under consideration belong to the class SIMB A.


Synthesis of a fuzzy controller for a DC electric drivein the MatLab environment

Synthesis of a fuzzy controller with one input and output.

The challenge is getting the drive to follow the different input signals accurately. The development of the control action is carried out by a fuzzy controller, in which the following functional blocks can be structurally distinguished: a fuzzifier, a block of rules and a defuzzifier.

Fig.4 Generalized functional diagram of a system with two linguistic variables.

Fig.5 Schematic diagram of a fuzzy controller with two linguistic variables.

The fuzzy control algorithm in the general case is a transformation of the input variables of a fuzzy controller into its output variables using the following interrelated procedures:

1. transformation of input physical variables received from measuring sensors from the control object into input linguistic variables of a fuzzy controller;

2. processing logical statements, called linguistic rules, regarding the input and output linguistic variables of the controller;

3. transformation of the output linguistic variables of the fuzzy controller into physical control variables.

Let us first consider the simplest case, when only two linguistic variables are introduced to control the servo drive:

“angle” is an input variable;

“control action” is the output variable.

We will synthesize the controller in the MatLab environment using the toolbox " Fuzzy Logic" It allows you to create fuzzy inference and fuzzy classification systems within the MatLab environment, with the ability to integrate them into Simulink. Basic concept Fuzzy Logic Toolbox is a FIS-structure - fuzzy inference system (Fuzzy Inference System). The FIS structure contains all the necessary data to implement the functional mapping “inputs-outputs” based on fuzzy logical inference according to the diagram shown in Fig. 6.


Figure 6. Fuzzy inference.

X - input crisp vector; - vector fuzzy sets, corresponding to the input vector X;
- the result of logical inference in the form of a vector of fuzzy sets; Y - the output clear vector.

The fuzzy module allows you to build fuzzy systems of two types - Mamdani and Sugeno. In systems like Mamdani, the knowledge base consists of rules of the form “If x 1 = low and x 2 = medium, then y = high”. In Sugeno-type systems, the knowledge base consists of rules of the form “If x 1 = low and x 2 = medium, then y = a 0 +a 1 x 1 +a 2 x 2 ". Thus, the main difference between the Mamdani and Sugeno systems is in different ways setting the values ​​of the output variable in the rules that form the knowledge base. In systems of the Mamdani type, the values ​​of the output variable are specified by fuzzy terms, in systems of the Sugeno type - as a linear combination of input variables. In our case, we will use the Sugeno system, because it lends itself better to optimization.

To control the servo drive, two linguistic variables are introduced: “error” (by position) and “control action”. The first of them is the input, the second is the output. Let us define a term set for the specified variables.

Basic components of fuzzy logical inference. Fuzzifier.

For each linguistic variable, we define a basic term set of the form, which includes fuzzy sets that can be designated: negative high, negative low, zero, positive low, positive high.

First of all, let us subjectively define what is meant by the terms “big error”, “small error”, etc., defining membership functions for the corresponding fuzzy sets. Here, for now, you can only be guided by the required accuracy, known parameters for the class of input signals and common sense. No one has yet been able to propose any strict algorithm for choosing the parameters of membership functions. In our case, the linguistic variable “error” will look like this.

Fig.7. Linguistic variable "error".

It is more convenient to present the linguistic variable “control” in the form of a table:

Table 1

Rule block.

Let's consider the sequence of defining several rules that describe some situations:

Suppose, for example, that the output angle is equal to the input signal (ie, the error is zero). Obviously, this is the desired situation, and therefore we do not have to do anything (the control action is zero).

Now consider another case: the position error is much greater than zero. Naturally, we must compensate for it by generating a large positive control signal.

That. two rules have been drawn up, which can be formally defined as follows:

If error = null, That control action = zero.

If error = large positive, That control influence = large positive.

Fig.8. Formation of control with a small positive error in position.

Fig.9. Formation of control with zero position error.

The table below shows all the rules corresponding to all situations for this simple case.

table 2

In total, for a fuzzy controller with n inputs and 1 output, control rules can be defined, where is the number of fuzzy sets for the i-th input, but for the normal functioning of the controller it is not necessary to use all possible rules, but you can get by with fewer of them. In our case, all 5 possible rules are used to generate a fuzzy control signal.

Defuzzifier.

Thus, the resulting impact U will be determined according to the fulfillment of some rule. If a situation arises when several rules are executed at once, then the resulting impact U is found according to the following relationship:

, where n is the number of triggered rules (defuzzification by the region center method), u nphysical meaning control signal corresponding to each of the fuzzy sets UBO, UMo, UZ, UMp, UBP. mUn(u)– degree of belonging of the control signal u to the corresponding fuzzy set Un=( UBO, UMo, UZ, UMp, UBP). There are also other defuzzification methods where the output linguistic variable is proportional to the “strongest” or “weakest” rule.

Let us model the process of controlling an electric drive using the fuzzy controller described above.

Fig. 10. Structural scheme systems in the environmentMatlab.

Fig. 11. Block diagram of a fuzzy controller in an environmentMatlab.

Fig. 12. Transient process under a single step action.

Rice. 13. Transient process under harmonic input action for a model with a fuzzy controller containing one input linguistic variable.

Analysis of the characteristics of the drive with a synthesized control algorithm shows that they are far from optimal and worse than when synthesizing control by other methods (the control time is too long for a single step action and the error is harmonic). This is explained by the fact that the parameters of the membership functions were chosen quite arbitrarily, and only the position error value was used as the controller inputs. Naturally, there can be no talk of any optimality of the resulting regulator. Therefore, the task of optimizing a fuzzy controller becomes relevant in order to achieve the highest possible indicators of control quality. Those. The task is to optimize the objective function f(a 1 ,a 2 …a n), where a 1 ,a 2 …a n are the coefficients that determine the type and characteristics of the fuzzy controller. To optimize the fuzzy controller, we will use the ANFIS block from the Matlab environment. Also, one of the ways to improve the characteristics of the controller may be to increase the number of its inputs. This will make the regulator more flexible and improve its performance. Let's add one more input linguistic variable - the rate of change of the input signal (its derivative). The number of rules will increase accordingly. Then the circuit diagram of the regulator will take the form:

Fig. 14 Schematic diagram of a fuzzy controller with three linguistic variables.

Let be the value of the input signal speed. We define the basic term set Tn as:

Tn=("negative (BO)", "zero (Z)", "positive (BP)").

The location of the membership functions for all linguistic variables is shown in the figure.

Fig. 15. Membership functions of the linguistic variable “error”.

Fig. 16. Membership functions of the linguistic variable “input signal speed”.

Due to the addition of one more linguistic variable, the number of rules will increase to 3x5=15. The principle of their compilation is completely similar to that discussed above. All of them are shown in the following table:

Table 3

Fuzzy signal

management

Position error

Speed

For example, if If error = zero and derivative of the input signal = large positive, That control influence = small negative.

Fig. 17. Formation of control under three linguistic variables.

Due to the increase in the number of inputs and, accordingly, the rules themselves, the structure of the fuzzy controller will become more complex.

Fig. 18. Block diagram of a fuzzy controller with two inputs.

Add a picture

Fig.20. Transient process under harmonic input action for a model with a fuzzy controller containing two input linguistic variables.

Rice. 21. Error signal under harmonic input action for a model with a fuzzy controller containing two input linguistic variables.

Let's simulate the operation of a fuzzy controller with two inputs in the Matlab environment. The block diagram of the model will be exactly the same as in Fig. 19. From the graph of the transient process for the harmonic input action, it can be seen that the accuracy of the system has increased significantly, but at the same time its oscillation has increased, especially in places where the derivative of the output coordinate tends to zero. Obviously, the reasons for this, as mentioned above, are the suboptimal choice of membership function parameters for both input and output linguistic variables. Therefore, we optimize the fuzzy controller using the ANFISedit block in the Matlab environment.

Optimization of a fuzzy controller.

Let's consider the use of genetic algorithms to optimize a fuzzy controller. Genetic algorithms – adaptive methods searches that are in Lately often used to solve functional optimization problems. They are based on similarity to genetic processes biological organisms: biological populations develop over several generations, obeying laws natural selection and according to the principle of “survival of the fittest”, opened by Charles Darwin. Imitating this process genetic algorithms capable of “developing” solutions real problems, if they are appropriately encoded.

Genetic algorithms work with a collection of “individuals”—a population—each of which represents a possible solution to a given problem. Each individual is assessed by the measure of its “adaptability” according to how “good” the solution to the problem corresponding to it is. The fittest individuals are able to “reproduce” offspring by “crossbreeding” with other individuals in the population. This leads to the emergence of new individuals that combine some of the characteristics they inherit from their parents. The least fit individuals are less likely to reproduce, so whatever traits they possessed will gradually disappear from the population.

This is how the entire new population of feasible solutions is reproduced, choosing the best representatives of the previous generation, crossing them and obtaining many new individuals. This new generation contains a higher ratio of the characteristics possessed by good dicks previous generation. Thus, from generation to generation, good characteristics spread throughout the population. Ultimately, the population will converge to the optimal solution to the problem.

There are many ways to implement the idea of ​​biological evolution within the framework of genetic algorithms. Traditional, can be represented as the following block diagram shown in Figure 22, where:

1. Initialization of the initial population - generation given number solutions to the problem from which the optimization process begins;

2. Application of crossover and mutation operators;

3. Stopping conditions - usually the optimization process continues until a solution to the problem with a given accuracy is found, or until it is determined that the process has converged (i.e., the solution to the problem has not improved over the last N generations).

In the Matlab environment, genetic algorithms are represented by a separate toolbox, as well as by the ANFIS package. ANFIS is an abbreviation for Adaptive-Network-Based Fuzzy Inference System - adaptive fuzzy inference network. ANFIS is one of the first variants of hybrid neuro-fuzzy networks - a special type of feed-forward neural network. The architecture of a neuro-fuzzy network is isomorphic to a fuzzy knowledge base. Neuro-fuzzy networks use differentiable implementations triangular norms(multiplication and probabilistic OR), as well as smooth membership functions. This allows you to use fast and genetic algorithms for training neural networks based on the method for setting up neuro-fuzzy networks backpropagation errors. The architecture and operating rules of each layer of the ANFIS network are described below.

ANFIS implements the Sugeno fuzzy inference system as a five-layer feedforward neural network. The purpose of the layers is as follows: the first layer is the terms of the input variables; the second layer - antecedents (premises) of fuzzy rules; the third layer is the normalization of the degrees of compliance with the rules; the fourth layer is the conclusion of the rules; fifth layer - aggregation of the result obtained from various rules.

Network inputs are not allocated to a separate layer. Figure 23 shows an ANFIS network with one input variable (“error”) and five fuzzy rules. For linguistic evaluation of the input variable “error”, 5 terms are used.


Fig.23. StructureANFIS-networks

Let us introduce the following notation necessary for further presentation:

Let be the network inputs;

y - network output;

Fuzzy rule with serial number r;

m - number of rules;

A fuzzy term with a membership function used for linguistic evaluation of a variable in the r-th rule (,);

Real numbers at the conclusion of the r-th rule (,).

The ANFIS network operates as follows.

Layer 1. Each node in the first layer represents one term with a bell-shaped membership function. The network inputs are connected only to their terms. The number of nodes in the first layer is equal to the sum of the cardinalities of the term sets of the input variables. The output of the node is the degree to which the value of the input variable belongs to the corresponding fuzzy term:

,

where a, b and c are configurable parameters of the membership function.

Layer 2. The number of nodes in the second layer is m. Each node in this layer corresponds to one fuzzy rule. The node of the second layer is connected to those nodes of the first layer that form the antecedents of the corresponding rule. Therefore, each node in the second layer can receive from 1 to n input signals. The node's output is the degree of rule fulfillment, which is calculated as the product of the input signals. Let us denote the outputs of the nodes of this layer by , .

Layer 3. The number of nodes in the third layer is also m. Each node of this layer calculates the relative degree of fulfillment of the fuzzy rule:

Layer 4. The number of nodes in the fourth layer is also m. Each node is connected to one node of the third layer as well as to all network inputs (connections with inputs are not shown in Fig. 18). The fourth layer node calculates the contribution of one fuzzy rule to the network output:

Layer 5. A single node in this layer summarizes the contributions of all rules:

.

Typical procedures for training neural networks can be used to configure the ANFIS network since it uses only differentiable functions. Typically, a combination of gradient descent in the form of backpropagation and least squares is used. The backpropagation algorithm adjusts the parameters of the antecedents of the rules, i.e. membership functions. The coefficients of rule conclusions are estimated using the least squares method, since they are linearly related to the network output. Each iteration of the setup procedure is performed in two steps. At the first stage, a training sample is supplied to the inputs, and according to the discrepancy between the desired and actual behavior of the network iterative method least squares are optimal parameters nodes of the fourth layer. At the second stage, the residual residual is transferred from the network output to the inputs, and the parameters of the nodes of the first layer are modified using the backpropagation method. In this case, the rule conclusion coefficients found at the first stage do not change. The iterative tuning procedure continues until the residual exceeds a predetermined value. To set up membership functions, in addition to the backpropagation method, other optimization algorithms can be used, for example, the Levenberg-Marquardt method.

Fig.24. Workspace ANFISedit.

Let us now try to optimize the fuzzy controller for a single step action. The desired transient process has approximately the following form:

Fig.25. Desired transition process.

From the graph shown in Fig. follows that most time, the engine must be running at full power to ensure maximum performance, and when approaching the desired value it should slow down smoothly. Guided by these simple arguments, we will take the following sample of values, presented below in table form, as a training sample:

Table 4


Error value

Control value

Error value

Control value

Error value

Control value


Fig.26. Type of training sample.

We will conduct training in 100 steps. This is more than enough for the convergence of the method used.

Fig.27. The process of training a neural network.

During the learning process, the parameters of the membership functions are formed in such a way that when given value errors, the regulator created the necessary control. In the area between the nodal points, the control dependence on the error is an interpolation of the table data. The interpolation method depends on how the neural network is trained. In fact, after training, the fuzzy controller model can be represented nonlinear function one variable, the graph of which is presented below.

Fig.28. Graph of control versus position error inside the controller.

Having saved the found parameters of the membership functions, we simulate the system with an optimized fuzzy controller.


Rice. 29. Transient process under harmonic input action for a model with an optimized fuzzy controller containing one input linguistic variable.

Fig.30. Error signal under harmonic input action for a model with a fuzzy controller containing two input linguistic variables.


It follows from the graphs that optimization of the fuzzy controller using neural network training was successful. The variability and magnitude of the error were significantly reduced. Therefore, the use of a neural network is quite justified for optimizing regulators whose operating principle is based on fuzzy logic. However, even an optimized controller cannot satisfy the requirements for accuracy, so it is advisable to consider another control method when the fuzzy controller does not directly control the object, but combines several control laws depending on the current situation.

And signs. Such problems are solved quite often, for example, when crossing or driving through a street following traffic lights. Recognizing the color of a lit traffic light and knowing the rules of the road allows you to make the right decision about whether you can or cannot cross the street at the moment.

Creation of artificial systems pattern recognition remains a complex theoretical and technical problem. The need for such recognition arises in a variety of areas - from military affairs and security systems to the digitization of all kinds of analog signals.

Traditionally, pattern recognition tasks are included in the range of artificial intelligence tasks.

Directions in pattern recognition

Two main directions can be distinguished:

  • The study of the recognition abilities that living things possess, the explanation and modeling of them;
  • Development of theory and methods for constructing devices designed to solve individual problems for applied purposes.

Formal statement of the problem

Pattern recognition is the assignment of source data to a certain class by identifying significant features that characterize this data from the total mass of unimportant data.

When setting recognition problems, they try to use mathematical language, trying, in contrast to the theory of artificial neural networks, where the basis is obtaining a result through experiment, to replace experiment with logical reasoning and mathematical proof.

Classic formulation of the pattern recognition problem: Given a set of objects. A classification needs to be made regarding them. A set is represented by subsets called classes. Given: information about classes, a description of the entire set, and a description of information about an object whose membership in a specific class is unknown. It is required, based on the available information about the classes and description of the object, to determine which class this object belongs to.

Monochrome images are most often considered in pattern recognition problems, which makes it possible to consider the image as a function on a plane. If we consider a point set on the plane, where the function expresses its characteristics at each point of the image - brightness, transparency, optical density, then such a function is a formal recording of the image.

The set of all possible functions on the plane is a model of the set of all images. Introducing the concept similarities between the images you can pose a recognition task. The specific type of such a statement strongly depends on the subsequent stages of recognition in accordance with a particular approach.

Some graphic pattern recognition methods

For optical pattern recognition, you can use the method of searching through the view of an object at different angles, scales, offsets, etc. For letters, you need to sort through the font, font properties, etc.

The second approach is to find the outline of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either a large number of examples of the recognition task (with correct answers), or a special neural network structure that takes into account the specifics of this task.

Perceptron as a pattern recognition method

F. Rosenblatt, introducing the concept of a brain model, the task of which is to show how psychological phenomena can arise in a certain physical system, the structure and functional properties of which are known, described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment from which one can obtain psychologically significant information about a certain system boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention on the part of the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to train the system to carry out the required classification.

In an experiment with perceptron training, a certain sequence of images is usually presented, which includes representatives of each of the classes to be distinguished. According to some rule of memory modification, the correct choice of response is reinforced. The perceptron is then presented with a control stimulus and the probability of obtaining the correct response for stimuli of a given class is determined. Depending on whether the selected control stimulus coincides or does not coincide with one of the images that were used in the training sequence, different results are obtained:

  1. If the control stimulus does not coincide with any of the training stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
  2. If a control stimulus excites a certain set of sensory elements completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization.

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough to one of the images with which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

  • Barcode recognition
  • License plate recognition
  • Image recognition
  • Recognition of local areas of the earth's crust in which mineral deposits are located

see also

Notes

Literature

  • Gorelik A. L., Skripkin V. A. Recognition methods. - 4th ed. - M.: Higher School, 1984, 2004. - 262 p.
  • Vapnik V. N., Chervonenkis A. Ya. Pattern recognition theory. - M.: Nauka, 1974. - 416 p.
  • Vasilyev V.I. Recognition systems. Directory. - 2nd ed. - K.: Naukova Dumka, 1983. - 424 p.
  • George Stockman, Linda Shapiro. Computer vision = Computer Vision. - M.: Binom. Knowledge Laboratory, 2006. - 752 p. - ISBN 5-947-74384-1
  • Forsythe David A., Pons Jean. Computer vision. Modern approach = Computer Vision: A Modern Approach. - M.: Williams, 2004. - 928 p. - ISBN 0-13-085198-1
  • Cheng S.-K. Design principles for visual information systems. - M.: Mir, 1994. - 408 p.

Links

  • Yuri Lifshits. Course “Modern problems of theoretical computer science” - lectures on statistical methods of pattern recognition, face recognition, text classification
  • Journal of Pattern Recognition Research

Wikimedia Foundation. 2010.

  • Recognized language
  • Raspopa

See what “Pattern Recognition Theory” is in other dictionaries:

    pattern recognition theory- a scientific direction based on data from psychology and physiology, probability theory and associated with the development of principles and construction of systems (including computer-based) designed to determine whether a given object belongs to one of... ... Encyclopedic Dictionary of Psychology and Pedagogy

    Pattern recognition- a scientific direction associated with the development of principles and construction of systems designed to determine whether a given object belongs to one of the pre-defined classes of objects. Under objects in R. o. understand various itemsGreat Soviet Encyclopedia

    PATTERN RECOGNITION- section of mathematics. cybernetics, developing principles and methods of classification, as well as identification of objects, phenomena, processes, signals, situations of all those objects that can be described by a finite set of certain signs or properties,... ... Mathematical Encyclopedia

    Pattern recognition

    Pattern recognition (cybernetics) - Automatic recognition persons special program. Pattern recognition theory is a branch of cybernetics that develops theoretical basis and methods of classification and identification of objects, phenomena, processes, signals, situations, etc. objects,... ... Wikipedia

    INFORMATION THEORY- a section of applied mathematics and cybernetics related to mathematics. description and assessment of the quality of transmission, storage, retrieval and classification of information. The term I. t., which arose in the 50s. 20th century, still (by 1978) does not have a single... ... Mathematical Encyclopedia

    Unsupervised learning- (English: Unsupervised learning, self-learning, spontaneous learning) one of the ways machine learning, when solving which the system under test spontaneously learns to perform the assigned task, without intervention from the outside... ... Wikipedia

    Artificial neural network- This term has other meanings, see Neural network (meanings). Scheme of a simple neural network. Green indicates input neurons, blue hidden neurons, yellow output neuron... Wikipedia

Sun, Mar 29, 2015

Currently, there are many tasks in which it is necessary to make some decision depending on the presence of an object in the image or to classify it. The ability to "recognize" is considered a basic property biological creatures, while computer systems do not fully possess this property.

Let's consider common elements classification models.

Class- a set of objects having general properties. For objects of the same class, the presence of “similarity” is assumed. For a recognition task, an arbitrary number of classes, greater than 1, can be defined. The number of classes is denoted by the number S. Each class has its own identifying class label.

Classification- the process of assigning class labels to objects, according to some description of the properties of these objects. A classifier is a device that receives a set of object attributes as input data and produces a class label as a result.

Verification- the process of mapping an object instance to a single object model or class description.

Under way we will understand the name of the area in the feature space in which many objects or phenomena are displayed material world. Sign - quantitative description one or another property of the object or phenomenon being studied.

Feature space This N-dimensional space, defined for a given recognition task, where N is a fixed number of measured features for any objects. A vector from the feature space x corresponding to the object of the recognition task is an N-dimensional vector with components (x_1,x_2,…,x_N), which are the feature values ​​for this object.

In other words, pattern recognition can be defined as the assignment of source data to a certain class by identifying significant features or properties that characterize this data from the total mass of unimportant details.

Examples of classification problems are:

  • character recognition;
  • speech recognition;
  • establishing a medical diagnosis;
  • weather forecast;
  • face recognition
  • classification of documents, etc.

Most often, the source material is the image received from the camera. The problem can be formulated as obtaining feature vectors for each class in the image under consideration. The process can be viewed as an encoding process that involves assigning a value to each feature from the feature space for each class.

If we consider 2 classes of objects: adults and children. You can choose height and weight as signs. As follows from the figure, these two classes form two disjoint sets, which can be explained by the selected features. However, it is not always possible to select the correct measured parameters as class features. For example, the selected parameters are not suitable for creating disjoint classes of football players and basketball players.

The second task of recognition is to extract characteristic features or properties from the source images. This task can be classified as preprocessing. If we consider the task of speech recognition, we can distinguish such features as vowels and consonants. The attribute must be a characteristic property of a particular class, and at the same time common to this class. Features that characterize the differences between - interclass features. Features common to all classes do not carry useful information and are not considered as features in the recognition task. Feature selection is one of the important tasks associated with building a recognition system.

Once the features have been determined, the optimal decision procedure for classification must be determined. Let's consider a pattern recognition system designed to recognize different M classes, denoted as m_1,m_2,…,m 3. Then we can assume that the image space consists of M regions, each containing points corresponding to an image from one class. Then the recognition problem can be considered as constructing boundaries separating M classes based on the adopted measurement vectors.

Solving the problem of image preprocessing, feature extraction and the problem of obtaining an optimal solution and classification is usually associated with the need to estimate a number of parameters. This leads to the problem of parameter estimation. In addition, it is obvious that feature extraction can use Additional information based on the nature of classes.

Objects can be compared based on their representation as measurement vectors. It is convenient to represent measurement data in the form of real numbers. Then the similarity of feature vectors of two objects can be described using Euclidean distance.

where d is the dimension of the feature vector.

There are 3 groups of pattern recognition methods:

  • Comparison with sample. This group includes classification according to the nearest average, classification according to distance to nearest neighbor. Structural recognition methods can also be included in the group of comparison with the sample.
  • Statistical methods. As the name suggests, statistical methods use some statistical information when solving a recognition problem. The method determines whether an object belongs to a specific class based on probability. In some cases, this comes down to determining the posterior probability of an object belonging to a specific class, provided that the characteristics of this object have taken the appropriate values. An example is the method based on the Bayesian decision rule.
  • Neural networks. A separate class of recognition methods. A distinctive feature from others is the ability to learn.

Classification by nearest mean

In the classical pattern recognition approach, in which an unknown object for classification is represented as a vector of elementary features. A feature-based recognition system can be developed in various ways. These vectors can be known to the system in advance as a result of training or predicted in real time based on some models.

A simple classification algorithm is to group the class reference data using a vector mathematical expectation class (average value).

where x(i,j) is the jth reference feature of class i, n_j is the number of reference vectors of class i.

Then an unknown object will belong to class i if it is significantly closer to the mathematical expectation vector of class i than to the mathematical expectation vectors of other classes. This method is suitable for problems in which points of each class are located compactly and far from points of other classes.

Difficulties will arise if classes have slightly more complex structure, for example, as in the figure. In this case, class 2 is divided into two disjoint sections that are poorly described by a single average value. Also, class 3 is too elongated; samples of class 3 with large coordinate values ​​x_2 are closer to the average value of class 1 than class 3.

The described problem in some cases can be solved by changing the distance calculation.

We will take into account the characteristic of the “scatter” of class values ​​- σ_i, along each coordinate direction i. The standard deviation is square root from dispersion. The scaled Euclidean distance between the vector x and the expectation vector x_c is

This distance formula will reduce the number of classification errors, but in reality most problems cannot be represented by such a simple class.

Classification by distance to nearest neighbor

Another approach to classification is to assign an unknown feature vector x to the class to which the individual sample this vector is most similar. This rule is called the nearest neighbor rule. Nearest neighbor classification can be more efficient even when classes have complex structures or when classes overlap.

This approach does not require assumptions about the distribution models of feature vectors in space. The algorithm uses only information about known reference samples. The solution method is based on calculating the distance x to each sample in the database and finding the minimum distance. The advantages of this approach are obvious:

  • you can add new samples to the database at any time;
  • tree and grid data structures reduce the number of calculated distances.

In addition, the solution will be better if we search the database not for one nearest neighbor, but for k. Then, for k > 1, it provides the best sampling of the distribution of vectors in d-dimensional space. However, efficient use of k values ​​depends on whether there are sufficient numbers in each region of space. If there are more than two classes, it becomes more difficult to make the right decision.

Literature

  • M. Castrillon, . O. Deniz, . D. Hernández and J. Lorenzo, “A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework,” International Journal of Computer Vision, no. 22, pp. 481-494, 2011.
  • Y.-Q. Wang, “An Analysis of Viola-Jones Face Detection Algorithm,” IPOL Journal, 2013.
  • L. Shapiro and D. Stockman, Computer Vision, Binom. Knowledge Laboratory, 2006.
  • Z. N. G., Recognition methods and their application, Soviet Radio, 1972.
  • J. Tu, R. Gonzalez, Mathematical principles of pattern recognition, Moscow: “Mir” Moscow, 1974.
  • Khan, H. Abdullah and M. Shamian Bin Zainal, “Efficient eyes and mouth detection algorithm using combination of viola jones and skin color pixel detection,” International Journal of Engineering and Applied Sciences, No. Vol. 3 No. 4, 2013.
  • V. Gaede and O. Gunther, “Multidimensional Access Methods,” ACM Computing Surveys, pp. 170-231, 1998.


Did you like the article? Share with your friends!