Sequential pattern recognition methods. “Adaptive control of complex systems based on the theory of pattern recognition

Living systems, including humans, have been constantly confronted with the problem of pattern recognition since their appearance. In particular, information coming from the senses is processed by the brain, which in turn sorts the information, ensures decision-making, and then, using electrochemical impulses, transmits the necessary signal further, for example, to the movement organs, which implement the necessary actions. Then the environment changes, and the above phenomena occur again. And if you look at it, each stage is accompanied by recognition.

With development computer technology It has become possible to solve a number of problems that arise in the process of life, to facilitate, speed up, and improve the quality of the result. For example, work various systems life support, human-computer interaction, the emergence of robotic systems, etc. However, we note that it is currently not possible to provide a satisfactory result in some tasks (recognition of fast-moving similar objects, handwritten text).

Purpose of the work: to study the history of image recognition systems.

Specify qualitative changes what happened in the field of pattern recognition, both theoretical and technical, indicating the reasons;

Discuss methods and principles used in computing;

Give examples of prospects that are expected in the near future.

1. What is pattern recognition?

Early research with computing largely followed classic scheme mathematical modeling- mathematical model, algorithm and calculation. These were the tasks of modeling the processes occurring during explosions atomic bombs, calculation of ballistic trajectories, economic and other applications. However, in addition to the classical ideas of this series, methods based on a completely different nature arose, and as the practice of solving some problems showed, they often gave better results than solutions based on overcomplicated mathematical models. Their idea was to abandon the desire to create a comprehensive mathematical model of the object being studied (moreover, it was often practically impossible to construct adequate models), and instead be satisfied with the answer only to specific questions that interest us, and to seek these answers from considerations common to a wide class of problems. Research of this kind included recognition of visual images, forecasting crop yields, river levels, the task of distinguishing oil-bearing and aquifers based on indirect geophysical data, etc. A specific answer in these tasks was required in a fairly simple form, such as, for example, whether an object belongs to one of the pre-fixed classes. And the initial data of these tasks, as a rule, were given in the form of fragmentary information about the objects being studied, for example, in the form of a set of pre-classified objects. WITH mathematical point From a perspective, this means that pattern recognition (and this is how this class of problems was named in our country) is a far-reaching generalization of the idea of ​​function extrapolation.

The importance of such a statement for the technical sciences is beyond doubt, and this in itself justifies numerous studies in this area. However, the problem of pattern recognition also has a broader aspect for natural science (however, it would be strange if something so important for artificial cybernetic systems did not have significance for natural ones). The context of this science also organically included questions posed by ancient philosophers about the nature of our knowledge, our ability to recognize images, patterns, and situations in the surrounding world. In fact, there is little doubt that the mechanisms for recognizing the simplest images, such as images of an approaching dangerous predator or food, were formed much earlier than the emergence of elementary language and formal logical apparatus. And there is no doubt that such mechanisms are quite developed in higher animals, which also in their life activities urgently need the ability to distinguish a rather complex system of signs of nature. Thus, in nature we see that the phenomenon of thinking and consciousness is clearly based on the ability to recognize images, and the further progress of the science of intelligence is directly related to the depth of understanding of the fundamental laws of recognition. Understanding the fact that the above issues go far beyond standard definition pattern recognition (in the English-language literature the term supervised learning is more common), it is also necessary to understand that they have deep connections with this relatively narrow (but still far from exhausted) direction.

Already now, pattern recognition has become firmly established in daily life and is one of the most vital knowledge of the modern engineer. In medicine, pattern recognition helps doctors make more accurate diagnoses; in factories, it is used to predict defects in batches of goods. Biometric personal identification systems as their algorithmic core are also based on the results of this discipline. Further development artificial intelligence, in particular the design of fifth generation computers capable of more direct communication with a person in languages ​​natural to people and through speech, are unthinkable without recognition. Robotics is just a stone's throw away here, artificial systems controls containing recognition systems as vital subsystems.

That is why the development of pattern recognition from the very beginning attracted a lot of attention from specialists of various profiles - cybernetics, neurophysiologists, psychologists, mathematicians, economists, etc. It is largely for this reason that modern pattern recognition itself is fueled by the ideas of these disciplines. Without claiming completeness (and it is impossible to claim it in a short essay), we will describe the history of pattern recognition, key ideas.

Definitions

Before proceeding to the main methods of pattern recognition, we present several necessary definitions.

Pattern recognition (objects, signals, situations, phenomena or processes) is the task of identifying an object or determining any of its properties from its image (optical recognition) or audio recording (acoustic recognition) and other characteristics.

One of the basic ones is the concept of set, which does not have a specific formulation. In a computer, a set is represented as a set of non-repeating elements of the same type. The word "non-repeating" means that some element in the set is either there or it is not there. A universal set includes all elements possible for the problem being solved; an empty set does not contain any.

An image is a classification grouping in a classification system that unites (highlights) a certain group of objects according to a certain criterion. Images have a characteristic property, which manifests itself in the fact that familiarization with a finite number of phenomena from the same set makes it possible to recognize as much as you like. large number its representatives. Images have characteristic objective properties in the sense that different people, studying at different materials observations, mostly classify the same objects equally and independently of each other. In the classical formulation of the recognition problem, the universal set is divided into image parts. Each mapping of an object onto the perceptive organs of the recognition system, regardless of its position relative to these organs, is usually called an image of the object, and sets of such images, united by some common properties, are images.

The method of assigning an element to any image is called decisive rule. One more thing important concept- metric, a method of determining the distance between elements of a universal set. The smaller this distance, the more similar the objects (symbols, sounds, etc.) are - what we recognize. Typically, elements are specified as a set of numbers, and the metric is specified as a function. The effectiveness of the program depends on the choice of image representation and metric implementation; one recognition algorithm with different metrics will make mistakes with different frequencies.

Learning is usually called the process of developing in a certain system one or another reaction to groups of external identical signals through repeated exposure to the system of external adjustments. Such external adjustments in training are usually called “rewards” and “punishments”. The mechanism for generating this adjustment almost completely determines the learning algorithm. Self-learning differs from training in that here additional information about the accuracy of the reaction is not provided to the system.

Adaptation is the process of changing the parameters and structure of the system, and possibly control actions, based on current information in order to achieve a certain state of the system under initial uncertainty and changing operating conditions.

Learning is the process by which a system gradually acquires the ability to respond necessary reactions to certain populations external influences, and adaptation is the adjustment of the parameters and structure of the system in order to achieve the required quality of control under conditions of continuous changes in external conditions.

Examples of pattern recognition tasks: - Letter recognition;

Sun, Mar 29, 2015

Currently, there are many tasks in which it is necessary to make some decision depending on the presence of an object in the image or to classify it. The ability to "recognize" is considered a basic property biological creatures, while computer systems do not fully possess this property.

Let's consider common elements classification models.

Class- a set of objects having general properties. For objects of the same class, the presence of “similarity” is assumed. For a recognition task, an arbitrary number of classes, greater than 1, can be defined. The number of classes is denoted by the number S. Each class has its own identifying class label.

Classification- the process of assigning class labels to objects, according to some description of the properties of these objects. A classifier is a device that receives a set of object attributes as input data and produces a class label as a result.

Verification- the process of mapping an object instance to a single object model or class description.

Under way we will understand the name of the area in the feature space in which many objects or phenomena are displayed material world. Sign - quantitative description one or another property of the object or phenomenon being studied.

Feature space This N-dimensional space, defined for a given recognition task, where N is a fixed number of measured features for any objects. A vector from the feature space x corresponding to the object of the recognition task is an N-dimensional vector with components (x_1,x_2,…,x_N), which are the feature values ​​for this object.

In other words, pattern recognition can be defined as assigning source data to a specific class using selection essential features or properties characterizing this data, from total mass unimportant details.

Examples of classification problems are:

  • character recognition;
  • speech recognition;
  • establishing a medical diagnosis;
  • weather forecast;
  • face recognition
  • classification of documents, etc.

Most often, the source material is the image received from the camera. The problem can be formulated as obtaining feature vectors for each class in the image under consideration. The process can be viewed as an encoding process that involves assigning a value to each feature from the feature space for each class.

If we consider 2 classes of objects: adults and children. You can choose height and weight as signs. As follows from the figure, these two classes form two disjoint sets, which can be explained by the selected features. However, it is not always possible to select the correct measured parameters as class features. For example, the selected parameters are not suitable for creating disjoint classes of football players and basketball players.

The second task of recognition is to select characteristic features or properties from source images. This task can be classified as preprocessing. If we consider the task of speech recognition, we can distinguish such features as vowels and consonants. The attribute must be a characteristic property of a particular class, and at the same time common to this class. Features that characterize the differences between - interclass features. Features common to all classes are not useful information and are not considered as features in the recognition task. Feature selection is one of the important tasks associated with building a recognition system.

Once the features have been determined, the optimal decision procedure for classification must be determined. Let's consider a pattern recognition system designed to recognize different M classes, denoted as m_1,m_2,…,m 3. Then we can assume that the image space consists of M regions, each containing points corresponding to an image from one class. Then the recognition problem can be considered as constructing boundaries separating M classes based on the adopted measurement vectors.

Solving the problem of image preprocessing, feature extraction and the problem of obtaining an optimal solution and classification is usually associated with the need to estimate a number of parameters. This leads to the problem of parameter estimation. In addition, it is obvious that feature extraction can use additional information based on the nature of classes.

Objects can be compared based on their representation as measurement vectors. It is convenient to represent measurement data in the form of real numbers. Then the similarity of feature vectors of two objects can be described using Euclidean distance.

where d is the dimension of the feature vector.

There are 3 groups of pattern recognition methods:

  • Comparison with sample. This group includes classification by nearest average, classification by distance to the nearest neighbor. Structural recognition methods can also be included in the group of comparison with the sample.
  • Statistical methods. As the name suggests, statistical methods use some statistical information when solving a recognition problem. The method determines whether an object belongs to a specific class based on probability. In some cases, this comes down to determining the posterior probability of an object belonging to a specific class, provided that the characteristics of this object have taken the appropriate values. An example is the method based on the Bayesian decision rule.
  • Neural networks. A separate class of recognition methods. Distinctive feature from others is the ability to learn.

Classification by nearest mean

In the classical pattern recognition approach, in which an unknown object for classification is represented as a vector of elementary features. Feature based recognition system can be developed in various ways. These vectors can be known to the system in advance as a result of training or predicted in real time based on some models.

A simple classification algorithm is to group the class reference data using the class expectation vector (mean).

where x(i,j)- j-th reference feature of class i, n_j is the number of reference vectors of class i.

Then an unknown object will belong to class i if it is significantly closer to the vector of mathematical expectation of class i than to the vectors of mathematical expectations of other classes. This method is suitable for problems in which points of each class are located compactly and far from points of other classes.

Difficulties will arise if classes have slightly more complex structure, for example, as in the figure. IN in this case Class 2 is divided into two disjoint sections that are poorly described by a single average value. Also class 3 is too elongated, class 3 samples with large values coordinates x_2 are closer to the average value of the 1st class than the 3rd.

The described problem in some cases can be solved by changing the distance calculation.

We will take into account the characteristic of the “scatter” of class values ​​- σ_i, along each coordinate direction i. The standard deviation is equal to the square root of the variance. The scaled Euclidean distance between the vector x and the expectation vector x_c is

This distance formula will reduce the number of classification errors, but in reality most problems cannot be represented by such a simple class.

Classification by distance to nearest neighbor

Another approach to classification is to assign an unknown feature vector x to the class to which the individual sample this vector is most similar. This rule is called the nearest neighbor rule. Nearest neighbor classification can be more efficient even when classes have complex structures or when classes overlap.

This approach does not require assumptions about the distribution models of feature vectors in space. The algorithm uses only information about known reference samples. The solution method is based on calculating the distance x to each sample in the database and finding the minimum distance. The advantages of this approach are obvious:

  • you can add new samples to the database at any time;
  • tree and grid data structures reduce the number of calculated distances.

In addition, the solution will be better if we search the database not for one nearest neighbor, but for k. Then, for k > 1, it provides the best sampling of the distribution of vectors in d-dimensional space. However efficient use values ​​of k depends on whether there is sufficient quantity in every region of space. If there are more than two classes then accept right decision turns out to be more difficult.

Literature

  • M. Castrillon, . O. Deniz, . D. Hernández and J. Lorenzo, “A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework,” International Journal of Computer Vision, no. 22, pp. 481-494, 2011.
  • Y.-Q. Wang, “An Analysis of Viola-Jones Face Detection Algorithm,” IPOL Journal, 2013.
  • L. Shapiro and D. Stockman, Computer Vision, Binom. Knowledge Laboratory, 2006.
  • Z. N. G., Recognition methods and their application, Soviet Radio, 1972.
  • J. Tu, R. Gonzalez, Mathematical principles of pattern recognition, Moscow: “Mir” Moscow, 1974.
  • Khan, H. Abdullah and M. Shamian Bin Zainal, “Efficient eyes and mouth detection algorithm using combination of viola jones and skin color pixel detection,” International Journal of Engineering and Applied Sciences, No. Vol. 3 No. 4, 2013.
  • V. Gaede and O. Gunther, “Multidimensional Access Methods,” ACM Computing Surveys, pp. 170-231, 1998.

Etc. objects that are characterized by a finite set of certain properties and characteristics. Such problems are solved quite often, for example, when crossing or passing a street following traffic lights. Recognizing the color of a lit traffic light and knowing the rules traffic allows you to accept the right decision about whether you can or cannot cross the street.

The need for such recognition arises in the most different areas- from military affairs and security systems to the digitization of analog signals.

The problem of image recognition has acquired outstanding importance in conditions of information overload, when a person cannot cope with a linear-sequential understanding of the messages coming to him, as a result of which his brain switches to the mode of simultaneous perception and thinking, which is characteristic of such recognition.

It is no coincidence, therefore, that the problem of image recognition ended up in the field interdisciplinary research- including in connection with the work on creating artificial intelligence, and the creation of technical systems image recognition is attracting more and more attention.

Encyclopedic YouTube

    1 / 4

    Introduction to Pattern Recognition

    R.V. Shamin. Lecture No. 6 Hopfield and Hamming networks in pattern recognition problems

    [DDSh-2016]: Neural networks and modern computer vision

    Lecture 9. Exponential smoothing. Pattern recognition: k-nearest neighbor method

    Subtitles

Directions in pattern recognition

Two main directions can be distinguished:

  • The study of the recognition abilities that living things possess, the explanation and modeling of them;
  • Development of theory and methods for constructing devices designed to solve individual problems for applied purposes.

Formal statement of the problem

Pattern recognition is the assignment of source data to a certain class by identifying significant features that characterize this data from the total mass of unimportant data.

When setting recognition problems, they try to use mathematical language, striving - in contrast to the theory of artificial neural networks, where the basis is obtaining a result through experiment - to replace experiment with logical reasoning and mathematical proofs.

Classic formulation of the pattern recognition problem: Given a set of objects. A classification needs to be made regarding them. A set is represented by subsets called classes. Given: information about classes, a description of the entire set, and a description of information about an object whose membership in a specific class is unknown. It is required, based on the available information about the classes and description of the object, to determine which class this object belongs to.

Monochrome images are most often considered in pattern recognition problems, which makes it possible to consider the image as a function on a plane. If we consider a point set on the plane T (\displaystyle T), where the function expresses its characteristics at each point of the image - brightness, transparency, optical density, then such a function is a formal recording of the image.

The set of all possible functions f (x , y) (\displaystyle f(x,y)) on the plane T (\displaystyle T)- there is a model of the set of all images X (\displaystyle X). Introducing the concept similarities between the images you can pose a recognition task. The specific type of such a statement strongly depends on the subsequent stages of recognition in accordance with a particular approach.

Some graphic pattern recognition methods

For optical pattern recognition, you can use the method of enumerating the view of an object at various angles, scales, offsets, etc. For letters, you need to enumerate the font, font properties, etc.

The second approach is to find the outline of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either large quantity examples of a recognition task (with correct answers), or a special structure neural network, taking into account the specifics of this task.

Perceptron as a pattern recognition method

F. Rosenblatt, introducing the concept of a brain model, the task of which is to show how in some physical system, the structure and functional properties of which are known, can arise psychological phenomena, described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment on the basis of which you can obtain psychological meaningful information about a system, boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention on the part of the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to train the system to carry out the required classification.

In an experiment with perceptron training, a certain sequence of images is usually presented, which includes representatives of each of the classes to be distinguished. According to some rule of memory modification, the correct choice of response is reinforced. Then the perceptron is presented with a control stimulus and the probability of receiving correct reaction for incentives of this class. Depending on whether the selected control stimulus coincides or does not coincide with one of the images that were used in the training sequence, different results are obtained:

  1. If the control stimulus does not coincide with any of the training stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
  2. If a control stimulus excites a certain set of sensory elements completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization.

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough to one of the images with which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

  • Barcode recognition
  • License plate recognition
  • Image recognition
  • Local area recognition earth's crust, in which the deposits are located

And signs. Such problems are solved quite often, for example, when crossing or passing a street following traffic lights. Recognizing the color of a lit traffic light and knowing the rules of the road allows you to make the right decision about whether you can or cannot cross the street at the moment.

In the process of biological evolution, many animals solved problems with the help of their visual and auditory apparatus. pattern recognition good enough. Creation of artificial systems pattern recognition remains complex theoretical and technical problem. The need for such recognition arises in a variety of areas - from military affairs and security systems to the digitization of all kinds of analog signals.

Traditionally, pattern recognition tasks are included in the range of artificial intelligence tasks.

Directions in pattern recognition

Two main directions can be distinguished:

  • Studying the recognition abilities that living beings possess, explaining and modeling them;
  • Development of theory and methods for constructing devices designed to solve individual problems in applied applications.

Formal statement of the problem

Pattern recognition is the assignment of source data to a certain class by identifying significant features that characterize this data from the total mass of unimportant data.

When setting recognition problems, they try to use mathematical language, trying, unlike the theory of artificial neural networks, where the basis is obtaining a result through experiment, to replace experiment with logical reasoning and mathematical proof.

Monochrome images are most often considered in pattern recognition problems, which makes it possible to consider the image as a function on a plane. If we consider a point set on the plane T, where the function x(x,y) expresses its characteristics at each point of the image - brightness, transparency, optical density, then such a function is a formal recording of the image.

The set of all possible functions x(x,y) on the plane T- there is a model of the set of all images X. Introducing the concept similarities between the images you can pose a recognition task. The specific type of such a statement strongly depends on the subsequent stages of recognition in accordance with one or another approach.

Pattern recognition methods

For optical pattern recognition, you can use the method of enumerating the view of an object at various angles, scales, offsets, etc. For letters, you need to enumerate the font, font properties, etc.

The second approach is to find the outline of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either a large number of examples of the recognition task (with correct answers), or a special neural network structure that takes into account the specifics of this task.

Perceptron as a pattern recognition method

F. Rosenblatt, introducing the concept of a brain model, the task of which is to show how in some physical system, the structure and functional properties of which are known, psychological phenomena can arise - he described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment from which one can obtain psychologically significant information about a certain system boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention on the part of the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to train the system to carry out the required classification.

In an experiment with perceptron training, a certain sequence of images is usually presented, which includes representatives of each of the classes to be distinguished. According to some rule of memory modification, the correct choice of response is reinforced. The perceptron is then presented with a control stimulus and the probability of obtaining the correct response for stimuli of a given class is determined. Depending on whether the selected control stimulus coincides or does not coincide with one of the images that were used in the training sequence, different results are obtained:

  • 1. If the control stimulus does not coincide with any of the training stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
  • 2. If a control stimulus excites a certain set of sensory elements completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization .

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough to one of the images with which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

  • Letter recognition.
  • Barcode recognition.
  • License plate recognition.
  • Face recognition.
  • Speech recognition.
  • Image recognition.
  • Recognition of local areas of the earth's crust in which mineral deposits are located.

Pattern recognition programs

See also

Notes

Links

  • Yuri Lifshits. Course "Modern problems of theoretical computer science" - lectures on statistical methods image recognition, face recognition, text classification
  • Journal of Pattern Recognition Research

Literature

  • David A. Forsythe, Jean Pons Computer vision. Modern approach= Computer Vision: A Modern Approach. - M.: "Williams", 2004. - P. 928. - ISBN 0-13-085198-1
  • George Stockman, Linda Shapiro Computer vision = Computer Vision. - M.: Binom. Knowledge Laboratory, 2006. - P. 752. - ISBN 5947743841

Wikimedia Foundation. 2010.

- scientifically in technology technical direction associated with the development of methods and construction of systems (including computer-based) to establish the belonging of a certain object (object, process, phenomenon, situation, signal) to one of the advance... ... Big Encyclopedic Dictionary

One of the new regions cybernetics. The content of the theory of R. o. is the extrapolation of the properties of objects (images) belonging to several classes to objects that are close to them in some sense. Usually, when training an automaton R. o. available... ... Geological encyclopedia

English recognition, image; German Gestalt alterkennung. A branch of mathematical cybernetics that develops principles and methods for classifying and identifying objects described by a finite set of features that characterize them. Antinazi. Encyclopedia... ... Encyclopedia of Sociology

Pattern recognition- method of studying complex objects using a computer; consists in selecting features and developing algorithms and programs that allow computers to automatically classify objects based on these features. For example, determining which... ... Economic-mathematical dictionary

- (technical), scientific and technical direction associated with the development of methods and construction of systems (including computer-based) for establishing the belonging of some object (object, process, phenomenon, situation, signal) to one of the pre-existing... ... Encyclopedic Dictionary

PATTERN RECOGNITION- a section of mathematical cybernetics that develops methods of classification, as well as identification of objects, phenomena, processes, signals, situations of all those objects that can be described by a finite set of certain signs or properties,... ... Russian Sociological Encyclopedia

pattern recognition- 160 pattern recognition: Identifying forms of representations and configurations using automatic means

  • Tutorial

I have long wanted to write a general article containing the very basics of Image Recognition, a kind of guide on basic methods, telling when to use them, what problems they solve, what can be done in the evening on your knees, and what is better not to think about without having a team of people at 20.

I’ve been writing some articles on Optical Recognition for a long time, so people write to me a couple of times a month various people with questions on this topic. Sometimes you get the feeling that you live with them different worlds. On the one hand, you understand that the person is most likely a professional in a related topic, but knows very little about optical recognition methods. And the most annoying thing is that he is trying to apply a method from a nearby field of knowledge, which is logical, but does not work completely in Image Recognition, but he does not understand this and is very offended if you start telling him something from the very basics. And considering that telling from the basics takes a lot of time, which is often not available, it becomes even sadder.

This article is intended so that a person who has never worked with image recognition methods can, within 10-15 minutes, create in his head a certain basic picture of the world that corresponds to the topic, and understand in which direction to dig. Many of the techniques described here are applicable to radar and audio processing.
I'll start with a couple of principles that we always start telling to a potential customer, or a person who wants to start doing Optical Recognition:

  • When solving a problem, always go from the simplest. It is much easier to put a tag on a person orange color than to follow a person, highlighting him in cascades. It is much easier to take a camera with a higher resolution than to develop a super-resolution algorithm.
  • A strict formulation of the problem in optical recognition methods is orders of magnitude more important than in problems system programming: one extra word can add 50% of work to the technical specifications.
  • There are no universal solutions to recognition problems. You cannot make an algorithm that will simply “recognize any inscription.” A sign on the street and a sheet of text are fundamental different objects. It can probably be done general algorithm(a good example from Google), but this will require a lot of work from a large team and consist of dozens of different subroutines.
  • OpenCV is a bible that has many methods and can solve 50% of almost any problem, but OpenCV is only a small part of what can actually be done. In one study, the conclusions were written: “The problem cannot be solved using OpenCV methods, therefore it is unsolvable.” Try to avoid this, don’t be lazy and soberly evaluate the current task from scratch every time, without using OpenCV templates.
It is very difficult to give any universal advice, or tell how to create some kind of structure around which you can build a solution to arbitrary computer vision problems. The purpose of this article is to structure what can be used. I'll try to break it existing methods into three groups. The first group is preliminary filtering and image preparation. The second group is the logical processing of filtering results. The third group is decision-making algorithms based on logical processing. The boundaries between groups are very arbitrary. To solve a problem, it is not always necessary to use methods from all groups; sometimes two are enough, and sometimes even one.

The list of methods given here is not complete. I suggest adding in the comments critical methods, which I did not write and attribute 2-3 accompanying words to each.

Part 1. Filtration

In this group I placed methods that allow you to select areas of interest in images without analyzing them. Most of these methods applies some kind of single transformation to all points of the image. At the filtering level, image analysis is not performed, but points that undergo filtering can be considered as areas with special characteristics.
Binarization by threshold, selection of histogram area
The simplest transformation is binarization of the image by threshold. For RGB and grayscale images, the threshold is the color value. Meet ideal problems, in which such a transformation is sufficient. Suppose you want to automatically select objects on a white sheet of paper:




The choice of the threshold at which binarization occurs largely determines the process of binarization itself. In this case, the image was binarized by the average color. Typically, binarization is performed using an algorithm that adaptively selects a threshold. Such an algorithm can be the choice of expectation or mode. Or you can select the largest peak in the histogram.

Binarization can provide very interesting results when working with histograms, including in the situation where we consider an image not in RGB, but in HSV. For example, segment colors of interest. On this principle, you can build both a tag detector and a human skin detector.
Classical filtering: Fourier, low-pass filter, high-pass filter
Classic radar filtering and signal processing methods can be successfully applied to a variety of Pattern Recognition tasks. Traditional method in radar, which is almost never used in pure form in images, is the Fourier transform (more specifically, the FFT). One of the few exceptions in which the one-dimensional Fourier transform is used is image compression. For image analysis, a one-dimensional transformation is usually not enough; you need to use a much more resource-intensive two-dimensional transformation.

Few people actually calculate it; usually, it’s much faster and easier to use convolution of the area of ​​interest with a ready-made filter, tuned for high (HPF) or low (LPF) frequencies. This method, of course, does not allow spectrum analysis, but in specific task Video processing usually requires not analysis, but results.


The most simple examples filters that emphasize low frequencies (Gaussian filter) and high frequencies(Gabor filter).
For each image point, a window is selected and multiplied with a filter of the same size. The result of such a convolution is a new point value. When implementing low-pass filters and high-pass filters, images of the following type are obtained:



Wavelets
But what if we use some arbitrary characteristic function for convolution with the signal? Then it will be called "Wavelet transform". This definition of wavelets is not correct, but traditionally, in many teams, wavelet analysis is the search for an arbitrary pattern in an image using convolution with a model of this pattern. There is a set of classical functions used in wavelet analysis. These include the Haar wavelet, Morlet wavelet, Mexican hat wavelet, etc. Haar primitives, about which there were several of my previous articles (,), relate to such functions for two-dimensional space.


Above are 4 examples of classical wavelets. 3-dimensional Haar wavelet, 2-dimensional Meyer wavelet, Mexican Hat wavelet, Daubechies wavelet. A good example Using an extended interpretation of wavelets is the problem of finding a glare in the eye, for which the wavelet is the glare itself:

Classical wavelets are usually used for, or for their classification (to be described below).
Correlation
After such a free interpretation of wavelets on my part, it is worth mentioning the actual correlation that underlies them. This is an indispensable tool when filtering images. Classic application- video stream correlation to find shifts or optical flows. The simplest shift detector is also, in a sense, a difference correlator. Where the images did not correlate, there was movement.

Filtering functions
An interesting class of filters is function filtering. These are purely mathematical filters that allow you to detect simple mathematical function on the image (straight line, parabola, circle). An accumulating image is constructed in which for each point original image many functions that generate it are drawn. The most classic transformation is the Hough transform for lines. In this transformation, for each point (x;y), a set of points (a;b) of the straight line y=ax+b is drawn for which the equality is true. You get beautiful pictures:


(the first plus is to the one who is the first to find a catch in the picture and this definition and explain it, the second plus is to the one who is the first to say what is shown here)
The Hough transform allows you to find any parameterizable functions. For example circles. There is a modified transformation that allows you to search for any . Mathematicians are terribly fond of this transformation. But when processing images, unfortunately, it does not always work. Very slow operating speed, very high sensitivity to the quality of binarization. Even in ideal situations, I preferred to make do with other methods.
An analogue of the Hough transform for straight lines is the Radon transform. It is calculated through FFT, which gives a performance gain in a situation where there are a lot of points. In addition, it can be applied to a non-binarized image.
Contour filtering
A separate class of filters is border and contour filtering. Outlines are very useful when we want to move from working with an image to working with the objects in that image. When an object is quite complex, but well distinguished, then often the only way working with it is to highlight its contours. There are a number of algorithms that solve the problem of filtering contours:

Most often it is Canny that is used, which works well and whose implementation is in OpenCV (Sobel is also there, but it searches for contours worse).



Other filters
Above are filters whose modifications help solve 80-90% of problems. But besides them, there are rarer filters used in local tasks. There are dozens of such filters, I will not list them all. Interesting are iterative filters (for example), as well as ridgelet and curvlet transformations, which are a fusion of classical wavelet filtering and analysis in the radon transform field. The beamlet transform works beautifully on the boundary of the wavelet transform and logical analysis, allowing you to select contours:

But these transformations are very specific and tailored for rare tasks.

Part 2. Logical processing of filtering results

Filtering provides a set of data suitable for processing. But often you cannot simply take and use this data without processing it. There will be several in this section classical methods, allowing you to move from the image to the properties of objects, or to the objects themselves.
Morphology
The transition from filtering to logic, in my opinion, is methods mathematical morphology( , ). In essence, these are the simplest operations of growing and eroding binary images. These methods allow you to remove noise from a binary image by increasing or decreasing the existing elements. There are contouring algorithms based on mathematical morphology, but usually some kind of hybrid algorithms or algorithms in combination are used.
Contour analysis
Algorithms for obtaining boundaries have already been mentioned in the section on filtering. The resulting boundaries are quite simply converted into contours. For the Canny algorithm this happens automatically; for other algorithms additional binarization is required. You can obtain a contour for a binary algorithm, for example, using the beetle algorithm.
An outline is a unique characteristic of an object. This often allows you to identify an object by its outline. There is a powerful mathematical apparatus that allows you to do this. The device is called contour analysis (,).

To be honest, I have never been able to apply contour analysis in real problems. Too ideal conditions are required. Either there is no boundary, or there is too much noise. But, if you need to recognize something under ideal conditions, then contour analysis is a great option. It works very quickly, beautiful mathematics and clear logic.
Special points
Singular points are unique characteristics of an object that allow the object to be compared with itself or with similar classes of objects. There are several dozen ways to identify such points. Some methods highlight singular points in adjacent frames, some through big gap time and when the lighting changes, some allow you to find special points that remain so even when the object is rotated. Let's start with methods that allow you to find special points, which are not so stable, but are quickly calculated, and then we will go in increasing complexity:
First class. Special points that are stable over a period of seconds. Such points are used to guide an object between adjacent video frames, or to combine images from neighboring cameras. Such points include local maxima of the image, corners in the image (the best detector is, perhaps, the Charis detector), points at which maximum dispersion is achieved, certain gradients, etc.
Second class. Special points that are stable when lighting changes and small movements of the object. Such points serve primarily for training and subsequent classification of object types. For example, a pedestrian classifier or a face classifier is the product of a system built precisely on such points. Some of the previously mentioned wavelets may be the basis for such points. For example, Haar primitives, search for highlights, search for other specific functions. These points include those found by the histogram of directional gradients (HOG) method.
Third grade. Stable points. I know only about two methods that provide complete stability and about their modifications. This and . They allow you to find special points even when you rotate the image. The calculation of such points takes longer compared to other methods, but it is sufficient limited time. Unfortunately, these methods are patented. Although, in Russia it is impossible to patent algorithms, so use it for the domestic market.

Part 3. Training

The third part of the story will be devoted to methods that do not work directly with the image, but which allow you to make decisions. Basically it's various methods machine learning and decision making. Recently Yandyx posted on Habr on this topic, it’s very good selection. Here it is in the text version. For a serious study of the topic, I highly recommend watching them. Here I will try to outline several main methods used specifically in pattern recognition.
In 80% of situations, the essence of learning in the recognition task is as follows:
There is a test sample that contains several classes of objects. Let it be the presence/absence of a person in the photo. For each image there is a set of features that have been highlighted by some feature, be it Haar, HOG, SURF or some wavelet. The learning algorithm must build a model so that it can analyze a new image and decide which object is in the image.
How is this done? Each of the test images is a point in the feature space. Its coordinates are the weight of each of the features in the image. Let our signs be: “Presence of eyes”, “Presence of a nose”, “Presence of two hands”, “Presence of ears”, etc... We will highlight all these signs using our existing detectors, which are trained on body parts similar to human For a person in such a space, the correct point would be . For the monkey, dot for the horse. The classifier is trained using a sample of examples. But not all the photographs showed hands, others had no eyes, and in the third, the monkey had a human nose due to a classifier error. A trained human classifier automatically partitions the feature space in such a way as to say: if the first feature lies in the range 0.5 Essentially, the goal of the classifier is to draw areas in the feature space that are characteristic of the objects of classification. This is what a sequential approximation to the answer will look like for one of the classifiers (AdaBoost) in two-dimensional space:


There are a lot of classifiers. Each of them works better in some particular task. The task of selecting a classifier for a specific task is largely an art. Here are some beautiful pictures on the topic.
Simple case, one-dimensional separation
Let's look at an example of the simplest case of classification, when the feature space is one-dimensional, and we need to separate 2 classes. The situation occurs more often than you might think: for example, when you need to distinguish two signals, or compare a pattern with a sample. Let us have a training sample. This produces an image where the X-axis is the measure of similarity, and the Y-axis is the number of events with such a measure. When the desired object is similar to itself, a left Gaussian is obtained. When it doesn't look like it, it's the right one. The value of X=0.4 separates the samples so that a wrong decision minimizes the probability of making any wrong decision. The search for such a separator is the task of classification.


A small note. The criterion that minimizes the error will not always be optimal. The following graph is a graph of a real iris recognition system. For such a system, the criterion is chosen to minimize the probability of false admission of an unauthorized person to the facility. This probability is called “type I error”, “probability of false alarm”, “false positive”. In English-language literature “False Access Rate”.
) AdaBusta is one of the most common classifiers. For example, the Haar cascade is built on it. Usually used when binary classification is needed, but nothing prevents training for a larger number of classes.
SVM ( , , , ) One of the most powerful classifiers, which has many implementations. Basically, on the learning tasks I've encountered, it worked similarly to Adabusta. It is considered quite fast, but its training is more difficult than Adabusta's and requires choosing the right core.

There are also neural networks and regression. But to briefly classify them and show how they differ, we need an article much longer than this.
________________________________________________
I hope I was able to give a quick overview of the methods used without diving into mathematics and description. Maybe this will help someone. Although, of course, the article is incomplete and there is not a word about working with stereo images, nor about LSM with a Kalman filter, nor about the adaptive Bayes approach.
If you like the article, I’ll try to make a second part with a selection of examples of how existing ImageRecognition problems are solved.

And finally

What to read?
1) I once really liked the book “Digital Image Processing” by B. Yane, which is written simply and clearly, but at the same time almost all the mathematics is given. Good for getting acquainted with existing methods.
2) A classic of the genre is R. Gonzalez, R. Woods “Digital Image Processing”. For some reason it was more difficult for me than the first one. Much less mathematics, but more methods and pictures.
3) “Image processing and analysis in computer vision problems” - written on the basis of a course taught at one of the departments of Physics and Technology. There are a lot of methods and their detailed descriptions. But in my opinion, the book has two big disadvantages: the book is strongly focused on the software package that comes with it; in the book, too often the description of a simple method turns into a mathematical jungle, from which it is difficult to derive the structural diagram of the method. But the authors have made a convenient website where almost all the content is presented - wiki.technicalvision.ru Add tags

Did you like the article? Share with your friends!