Machine translate. Volume and savings

Speakers: Irina Rybnikova and Anastasia Ponomareva.

We will tell you about the history of machine translation and how we use it in Yandex.

Back in the 17th century, scientists speculated about the existence of some kind of language that connects other languages, and this is probably too long ago. Let's take a closer look. We all want to understand the people around us - no matter where we go - we want to see what is written on the signs, we want to read announcements, information about concerts. The idea of ​​the Babel fish haunts the minds of scientists and is found in literature, cinema, and everywhere. We want to reduce the time it takes us to access information. We want to read articles about Chinese technologies, understand any sites we see, and we want to receive it here and now.

In this context, it is impossible not to talk about machine translation. This is what helps solve this problem.

The starting point is considered to be 1954, when in the USA, on an IBM 701 machine, 60 sentences on the general topic of organic chemistry were translated from Russian into English, and all this was based on 250 glossary terms and six grammatical rules. It was called the Georgetown Experiment, and it was such a shock to reality that the newspapers were full of headlines that in another three to five years, the problem would be completely solved, everyone would be happy. But as you know, everything went a little differently.

Rule-based machine translation emerged in the 1970s. It was also based on bilingual dictionaries, but also on the very same sets of rules that helped describe any language. Anyone, but with restrictions.

Serious linguistic experts were required to write down the rules. This is quite a complex job, it still could not take into account the context, completely cover any language, but they were experts, and high computing power was not required then.

If we talk about quality, a classic example is a quote from the Bible, which was then translated like this. Not enough yet. Therefore, people continued to work on quality. In the 90s, a statistical translation model, SMT, arose, which spoke about the probabilistic distribution of words and sentences, and this system was fundamentally different in that it knew nothing at all about the rules and about linguistics. She received as input a huge number of identical texts, paired in one language and another, and then she made decisions herself. It was easy to maintain, didn't require a bunch of experts, didn't require waiting. You could download and get the result.

The requirements for incoming data were quite average, from 1 to 10 million segments. Segments - sentences, small phrases. But there were difficulties and the context was not taken into account; everything was not very easy. And in Russia, for example, such cases have appeared.

I also like the example of translations of GTA games, the result was excellent. Everything did not stand still. A fairly important milestone was 2016, when neural machine translation was launched. It was quite an epoch-making event that greatly changed life. My colleague, after looking at the translations and how we use them, said: “Cool, he speaks in my words.” And it was really great.

What are the features? High requirements at the entrance, training material. It is difficult to maintain this within the company, but a significant increase in quality is what it was started for. Only a high-quality translation will solve the assigned problems and make life easier for all participants in the process, the same translators who do not want to correct a bad translation, they want to do new creative tasks, and leave routine template phrases to the machine.

There are two approaches within machine translation. Expert assessment / linguistic analysis of texts, that is, testing by real linguists, experts for compliance with the meaning, language literacy. In some cases, they sat down experts, allowed them to proofread the translated text and assessed how effective it was from this point of view.

What are the features of this method? A translation sample is not required; we look at the finished translated text now and evaluate it objectively according to any aspect. But it's expensive and time-consuming.

There is a second approach - automatic reference metrics. There are many of them, each has pros and cons. I won’t go into depth; you can read about these keywords in more detail later.

What feature? In fact, this is a comparison of translated machine texts with some kind of standard translation. These are quantitative metrics that show the discrepancy between the exemplary translation and the actual result. It's fast, cheap and can be done quite conveniently. But there are some peculiarities.

In fact, hybrid methods are now most often used. This is when something is initially evaluated automatically, then the error matrix is ​​analyzed, and then an expert linguistic analysis is carried out on a smaller corpus of texts.

Lately, it is still a common practice when we invite not linguists, but simply users. The interface is being made - show which translation you like best. Or when you go to online translators, you enter text, and you can often vote on what you like better, whether this approach is suitable or not. In fact, we are all currently training these engines, and everything that we give them to translate, they use for training and work on their quality.

I would like to tell you how we use machine translation in our work. I give the floor to Anastasia.

We at Yandex in the localization department realized quite quickly that machine translation technology had great potential, and decided to try to use it in our daily tasks. Where did we start? We decided to conduct a small experiment. We decided to translate the same texts through a regular neural network translator, and also to assemble a trained machine translator. To do this, we have prepared corpora of texts in the Russian-English pair over the years that we at Yandex have been localizing texts into these languages. Next, we came with this corpus of texts to our colleagues from Yandex.Translator and asked us to train the engine.

When the engine was trained, we translated the next batch of texts, and, as Irina said, with the help of experts we evaluated the results. We asked translators to look at literacy, style, spelling, and conveying meaning. But the most turning point was when one of the translators said that “I recognize my style, I recognize my translations.”

To reinforce these feelings, we decided to calculate statistical indicators. First, we calculated the BLEU coefficient for transfers made through a regular neural network engine and got the following figure (0.34). It would seem that it needs to be compared with something. We again went to our colleagues from Yandex.Translator and asked them to explain what BLEU coefficient is considered the threshold for translations made by a real person. This is from 0.6.

Then we decided to check the results on trained translations. We got 0.5. The results are truly encouraging.

Let me give you an example. This is a real Russian phrase from the Direct documentation. Then it was translated through a regular neural network engine, and then through a trained neural network engine using our texts. Already in the very first line we notice that the traditional type of advertising for Direct was not recognized. And already in the trained neural network engine our translation appears, and even the abbreviation is almost correct.

We were very encouraged by the results obtained, and decided that it was probably worth using the machine engine in other pairs, on other texts, not just on that basic set of technical documentation. Then a series of experiments were carried out for several months. We encountered a large number of features and problems, these are the most common problems that we had to solve.

I’ll tell you more about each one.

If you, like us, are going to make a custom engine, you will need a fairly large amount of high-quality parallel data. A large engine can be trained on an amount of 10 thousand sentences; in our case, we prepared 135 thousand parallel sentences.

Your engine will not show equally good results on all types of text. In technical documentation where there are long sentences, structure, user documentation and even in the interface where there are short but unambiguous buttons, you will most likely do well. But perhaps, like us, you will encounter problems in marketing.

We conducted an experiment translating music playlists and got this example.

This is what the machine translator thinks about star factory workers. That these are labor shock workers.

When translating through a machine engine, the context is not taken into account. This is not such a funny example, but a very real one, from the technical documentation of Direct. It would seem that those are clear, when you read the technical documentation, those are technical. But no, the machine engine didn’t hit.

You will also have to take into account that the quality and meaning of the translation will greatly depend on the original language. We translate the phrase into French from Russian and get the same result. We get a similar phrase with the same meaning, but from English, and we get a different result.

If, as in our text, you have a large number of tags, markup, and some technical features, most likely you will have to track them, edit them, and write some scripts.

Here are examples of real phrases from the browser. In parentheses is technical information that should not be translated, in particular plural forms. In English they are in English, and in German they should also remain in English, but they are translated. You will have to keep track of these points.

The machine engine knows nothing about your naming features. For example, we have an agreement that we call Yandex.Disk everywhere in the Latin alphabet in all languages. But in French it turns into a disk in French.

Abbreviations are sometimes recognized correctly, sometimes not. In this example, BY, denoting that it belongs to the Belarusian technical requirements for advertising, turns into a preposition in English.

One of my favorite examples is new and borrowed words. Here’s a cool example, the word disclaimer, “originally Russian.” Terminology will have to be verified for each part of the text.

And one more, not so significant problem - outdated spelling.

Previously, the Internet was a new thing, in all texts it was written with a capital letter, and when we trained our engine, the Internet was written with a capital letter everywhere. Now is a new era, we are already writing the Internet with a small letter. If you want your engine to continue writing the Internet with a lowercase letter, you will have to retrain it.

We did not despair, we solved these problems. Firstly, we changed the text corpora and tried to translate on other topics. We passed on our comments to our colleagues from Yandex.Translator, re-trained the neural network and looked at the results, evaluated them, and asked for improvements. For example, tag recognition, HTML markup processing.

I'll show you real use cases. We are good at machine translation for technical documentation. This is a real case.

Here is the phrase in English and Russian. The translator who handled this documentation was very encouraged by the appropriate choice of terminology. Another example.

The translator appreciated the choice of is instead of a dash, that the structure of the phrase has changed to English, the adequate choice of the term, which is correct, and the word you, which is not in the original, but it makes this translation exactly English, natural.

Another case is on-the-fly interface translations. One of the services decided not to bother with localization and translate texts right during loading. But after changing the engine, about once a month the word “delivery” changed in a circle. We suggested that the team connect not an ordinary neural network engine, but ours, trained on technical documentation, so that the same term, agreed upon with the team, which is already in the documentation, is always used.

How does all this affect the monetary moment? It has historically been the case that in a Russian-Ukrainian pair, minimal editing of the Ukrainian translation is required. Therefore, a couple of months ago we decided to switch to a post-editing system. This is how our savings grow. September is not over yet, but we estimate that we have reduced our post-editing costs by about a third in Ukrainian, and we are going to continue editing almost everything except marketing texts. A word from Irina to sum up.

Irina:
- It becomes obvious to everyone that we need to use this, it is already our reality, and we cannot exclude it from our processes and interests. But there are a few things to think about.

Decide on the types of documents and context you are working with. Is this technology right for you specifically?

Second point. We talked about Yandex.Translator, because we are on good terms, we have direct access to developers and so on, but in fact you need to decide which engine will be the most optimal for you specifically, for your language, your topic. The next report will be devoted to this topic. Be prepared that there are still difficulties, engine developers are all working together to solve the difficulties, but for now they are still encountered.

I would like to understand what awaits us in the future. But in fact, this is no longer the future, but our present time, what is happening here and now. We all rather need customization to fit our terminology, our texts, and this is what is now becoming public. Now everyone is working to ensure that you don’t go inside the company and negotiate with the developers of a specific engine on how to optimize it for you. You can get this in public open engines via API.

Customization occurs not only in texts, but also in terminology, in customizing terminology to suit your own needs. This is quite an important point. The second topic is interactive translation. When a translator translates a text, the technology allows him to predict the next words based on the source language, the source text. This can make your work much easier.

About what is really expensive now. Everyone is thinking about how to train some engines much more effectively using smaller amounts of text. This is something that happens everywhere and is triggered everywhere. I think the topic is very interesting, and it will become even more interesting in the future.

Lecture No. 8 Topic: Purpose of machine translation systems.

Purpose of machine translation

Machine translation (MT), or automatic translation (AT), is an intensively developing area of ​​scientific research, experimental development and already functioning systems (SMT), in which a computer is involved in the process of translation from one natural language (NL) to another. SMPs provide quick and systematic access to information in a foreign language, ensure efficiency and uniformity in the translation of large flows of texts, mainly scientific and technical ones. EMS operating on an industrial scale rely on large terminological data banks and, as a rule, require the involvement of a person as a pre-, inter- or post-editor. Modern SMPs, especially those that rely on knowledge bases in a specific subject area when translated, are classified as artificial intelligence (AI) systems.

Main areas of use of MC

1. In industry information services in the presence of a large array or constant flow of foreign language sources. If SMPs are used to provide signaling information, post-editing is not required.

2. In large international organizations dealing with a multilingual polythematic array of documents. These are the conditions of work at the Commission of the European Communities in Brussels, where all documentation must appear simultaneously in nine working languages. Since the translation requirements here are high, MP requires post-editing.

3. In services that translate technical documentation accompanying exported products. Translators cannot cope with extensive documentation within the required time frame (for example, specifications for aircraft and other complex objects can take up to 10,000 or more pages). The structure and language of technical documentation are quite standard, which makes translation easier and even makes it preferable to manual translation, as it guarantees a uniform style
the entire array. Since the translation of specifications must be complete and accurate, MP products require post-editing.

4. For simultaneous or almost simultaneous translation of some constant stream of similar messages. This is the flow of weather reports in Canada that must appear simultaneously in English and French.

In addition to the practical need of the business world for MT, there are also purely scientific incentives for the development of MT: stable operating experimental MT systems are an experimental field for testing various aspects of the general theory of understanding, speech communication, information transformation, as well as for creating new, more effective models of MT itself .

In terms of scale and degree of development, SMPs can be divided into three main classes: industrial, developing and experimental.

Linguistic support for machine translation systems

The MT process is a sequence of transformations applied to the input text and transforming it into text in the output language, which should maximally recreate the meaning and, as a rule, the structure of the source text, but using the output language. The linguistic support of SMP includes the entire complex of linguistic, metalinguistic and so-called “extralinguistic” knowledge that is used in such transformation.

In classical SMTs, which carry out indirect translation of individual sentences (phrase-by-phrase translation), each sentence goes through a sequence of transformations consisting of three parts (stages): analysis -> transfer (interlingual operations) -> synthesis. In turn, each of these stages represents a rather complex system of intermediate transformations.

The goal of the analysis stage is to construct a structural description (intermediate representation, internal representation) of the input sentence, | The task of the transfer stage (translation itself) is to transform the structure of the input sentence into the internal structure of the output sentence. This stage also includes replacing lexemes of the input language with their translation equivalents (lexical interlingual transformations). The goal of the synthesis stage is to construct the correct sentence of the target language based on the structure obtained as a result of the analysis.

Linguistic support for standard modern SMP includes:

1) dictionaries;

2) grammar;

3) formalized intermediate representations of units of analysis at different stages of transformation.

In addition to standard ones, some SMPs may also have some non-standard components. Thus, expert knowledge about software can be specified using special conceptual networks, and not in the form of dictionaries and grammars.

Mechanisms (algorithms, procedures) for operating with existing dictionaries, grammars and structural representations are classified as mathematical and algorithmic support for SMP.

One of the necessary requirements for modern SMPs is high modularity. From a linguistically substantive point of view, this means that analysis and the processes that follow it are built taking into account the theory of linguistic levels. In the practice of creating SMP, the following levels of analysis are distinguished:

Pre-syntactic analysis (this includes morphological analysis - MorphAn, analysis of phrases, unidentified text elements, etc.);

Syntactic analysis SinAn (builds a syntactic representation of a sentence, or SinP); within its boundaries a number of sublevels can be distinguished that provide analysis of different types of syntactic units;

Semantic analysis SemAn, or logical-semantic analysis (builds an argument-predicate structure of statements or another type of semantic
proposal presentation and text);

Conceptual analysis (analysis in terms of conceptual structures reflecting software semantics). This level of analysis is used in SMPs that target very limited software. In fact, the conceptual structure is a projection of software schemes onto linguistic structures, often not even semantic, but syntactic. Only for very narrow software and limited classes of texts does the conceptual structure coincide with the semantic one; in general, there should not be a complete match, since the text is more detailed than any
conceptual diagrams.

Synthesis theoretically goes through the same levels as analysis, but in the opposite direction. In working systems, only the path from the SinP to the chain of words of the output sentence is usually implemented.

The linguistic differentiation of different levels can also be manifested in the differentiation of the formal means used in the corresponding descriptions (the set of these means is specified for each level separately). In practice, the linguistic means MorfAn are often specified separately and the means SinAn and SemAn are combined. But the distinction between levels can only remain meaningful if a single formalism is used in their descriptions, suitable for representing information from all distinguished levels.

From a technical point of view, the modularity of linguistic support means the separation of the structural representation of phrases and texts (as current, temporary knowledge about the text) from “permanent” knowledge about the language, as well as language knowledge from software knowledge; separation of dictionaries from grammars, grammars from algorithms for their processing, algorithms from programs. The specific relationships between the various modules of the system (dictionaries-grammars, grammars-algorithms, algorithms-programs, declarative-procedural knowledge, etc.), including the distribution of linguistic data across levels, is the main thing that determines the specifics of SMP.

Dictionaries. Analysis dictionaries are usually monolingual. They must contain all the information necessary to include a given lexical unit (LU) in the structural representation. Dictionaries of basics (with morphological-syntactic information: part of speech, type of inflection, subclass characterizing the syntactic behavior of LEs, etc.) are often separated from dictionaries of word meanings containing semantic and conceptual information: semantic class of LEs, semantic hopes (valences), conditions their implementation in a phrase, etc.

In many systems, dictionaries of common and terminological vocabulary are separated. This division makes it possible, when moving to texts of another subject area, to limit oneself only to changing terminological dictionaries. Dictionaries of complex LE (turns of phrases, constructions) usually form a separate array, the dictionary information in them indicates the method of “collecting” such a unit during analysis. Part of the dictionary information can be specified in a procedural form, for example, polysemous words can be associated with algorithms for resolving the corresponding type of ambiguity. New types of organization of dictionary information for MT purposes are offered by the so-called “lexical knowledge bases”. The presence of heterogeneous information about a word (called the lexical universe of a word) brings such a dictionary closer to an encyclopedia than to traditional linguistic dictionaries.

Grammars and algorithms. Grammar and vocabulary define the linguistic model, forming the bulk of linguistic data. Algorithms for their processing, 1. i.e. correlation with text units, are referred to as the mathematical and algorithmic support of the system.

The separation of grammars and algorithms is important in a practical sense because it allows you to change the rules of the grammar without changing the algorithms (and, accordingly, the programs) that work with the grammars. But such a division is not always possible. So, for a system with a procedural task of grammar, and even more so with a procedural representation of dictionary information, such a division is irrelevant. Decision-making algorithms in the case of insufficient (incompleteness of input data) or redundant (variability of analysis) information are more empirical; their formulation requires linguistic intuition. Setting a general control algorithm that controls the order in which different grammars are called (if there are several of them in one system) also requires linguistic justification. However, the current trend is to separate grammars from algorithms so that all linguistically meaningful information is specified in the static form of grammars, and to make algorithms so abstract that they can call and process different linguistic models.

The most clear separation of grammars and algorithms is observed in systems working with context-free grammars (CFGs), where the language model is a grammar with a finite number of states, and the algorithm must provide for an arbitrary sentence a tree of its output according to the rules of the grammar, and if there are several such outputs , then list them. Such an algorithm, which is a formal (in the mathematical sense) system, is called an analyzer. The description of the grammar serves for the analyzer, which has universality, the same input as the analyzed sentence. Parsers are built for classes of grammars, although taking into account specific features of the grammar can improve the efficiency of the parser.

Syntactic level grammars are the most developed part both from the point of view of linguistics and from the point of view of their provision with formalisms.

The main types of grammars and algorithms that implement them:

Chain grammar fixes the order of elements, that is, the linear structures of a sentence, specifying them in terms of grammatical classes of words (article + noun + preposition) or in terms of functional elements (subject + predicate);

The grammar of components (or the grammar of direct components - NSG) records linguistic information about the grouping of grammatical elements, for example, a noun phrase (consists of a noun, an article,
adjective and other modifiers), prepositional phrase (consists of a preposition and a noun phrase), etc. up to the sentence level. The grammar is constructed as a set of substitution rules, or the calculus of productions of the form A-»B...C. NSG
They are grammars of the generative type and can be used both in analysis and in synthesis: sentences of a language are generated by repeated application of such rules;

Dependency grammar (DG) specifies a hierarchy of relations between sentence elements (the main word determines the form of the dependent ones). The analyzer in GZ is based on the identification of masters and their dependents (servants). The main thing in a sentence is the verb in personal form, since it determines the number and nature of dependent nouns. The strategy of analysis in civil law is top-down: first the masters are identified, then the servants, or bottom-up: the masters are determined by the substitution process;

The Bar-Hillel categorical grammar is a version of the constituent grammar, in which there are only two categories - sentences S and names n. The rest are defined in terms of the ability to combine with these main ones in the structure of the NS. Thus, a transitive verb is defined as n\S because it combines with and to the left of a name to form the sentence S.

There are many ways to account for contextual conditions: grammars of metamorphosis and their variants. All of them are extensions of the KS rules. In general terms, this means that the rules of production are rewritten as follows: A [a] -> B [b], ..., C [c], where small letters indicate conditions, tests, instructions, etc., expanding the original rigid rules and giving the grammar flexibility and efficiency.

In the grammar of generalized components-TCS, meta-rules are introduced, which are a generalization of the regularities of the rules of KS1.

The grammars of extended transition networks-RSP provide tests and conditions for arcs, as well as instructions that must be executed if the analysis follows a given arc. In various modifications of the RSP, weights can be assigned to arcs, then the analyzer can select the path with the highest weight. Conditions can be divided into two parts: context-free and context-sensitive.

A type of RSPG is cascade RSPG. The cascade is a RSP equipped with the action 1shshsh1. This action causes the process in this cascade to stop, storing information about the current configuration on the stack and moving to a deeper cascade with a subsequent return to its original state. RSP has a number of possibilities of transformational grammars. It can also be used as a generating system.

The graph analysis method allows you to store partial results and present analysis options.

A new and immediately popular method of grammatical description is lexical-functional grammar (LFG). It eliminates the need for transformation rules. Although the LFG is based on the CSG, the test conditions in it are separated from the substitution rules and are “solved” as autonomous equations.

Unification grammars (UG) represent the next stage of generalization of the analysis model after graph-schemes: they are capable of embodying grammars of various types. The UG contains four components: a unification package, an interpreter for rules and lexical descriptions, programs for processing directed graphs, and an analyzer using a graph diagram. UGs combine grammatical rules with dictionary descriptions, syntactic valences with semantic ones.

The central problem of any NL analysis system is the problem of choosing options. To solve it, syntactic-level grammars are supplemented with auxiliary grammars and methods for analyzing complex situations. NN grammars use filter and heuristic methods. The filter method is: that first they receive all options for analyzing a proposal, and then reject those that do not satisfy a certain system of filter conditions. From the very beginning, the heuristic method constructs only a part of the options that are more plausible from the point of view of the given criteria. The use of weights to select options is an example of the use of heuristic methods in analysis.

The semantic level is much less supported by theory and practical developments. The traditional task of semantics is to remove the ambiguity of syntactic analysis - structural and lexical. For this purpose, the apparatus of selective restrictions is used, which is tied to the frames of sentences, i.e., fits into the syntactic model. The most common type of SeAn is based on so-called case grammars. The basis of grammar is the concept of deep, or semantic, case. The case frame of a verb is an extension of the concept of valence: it is a set of semantic relations that can (mandatory or optional) accompany the verb and its variations in the text. Within the same language, one and the same deep case is realized by different surface prepositional case forms. Deep cases, in principle, allow one to go beyond the boundaries of the sentence, and going into the text means moving to the semantic level of analysis.

Since semantic information, in contrast to syntactic information, which relies primarily on grammars, is concentrated mainly in dictionaries, in the 80s grammars were intensively developed to “lexicalize” CSGs. The development of grammars based on the study of the properties of discourse is underway.

Over the past decades, a computer connected to the Internet has become the most important tool of a translator. After all, thanks to it, access to huge amounts of information is provided, as well as to electronic dictionaries and translators. Machine translation has become commonplace today.

The term “machine translation” (MT, also known as Machine Translation or MT) refers to the action when one natural language is translated into another using special software for this purpose. The program can be installed directly on the computer (or) or be accessible only when connected to the Internet.

A little history

The idea of ​​using a computing device for translation appeared back in 1947. But the implementation of this in those years was simply impossible, since computer technology was in its infancy. However, already in 1954, the first attempt at machine translation was made. The very first dictionary included only 250 words, and the grammar was limited to 6 rules. However, this was enough to convince us that there is a great future for machine translation. Work in this direction began in many countries, the first machine translation systems (MTS) began to appear, and special theories were created.

At the beginning, the development of machine translation was hampered by the low level of computer technology and its very high cost. However, the gradual penetration of first personal computers and then the Internet into our lives has led to the rapid development of this industry. Today, machine translation is actively used in a variety of areas of human activity.

Who needs it

The development of machine translation was facilitated by the expansion of international relations. People began to travel to other countries more often, and business expansion abroad ceased to be something exceptional, even by the standards of small companies. And if so, then difficulties in communication arise more and more often. As a result, machine translation is increasingly used in business today. Even if the result of translations provided by a computer is far from ideal, it is still better than nothing at all.

With the help of SMP, it becomes possible to very quickly understand the contents of large volumes of texts, which is simply impossible with the traditional approach. This can be very useful, for example, if you need to classify a large amount of information in a foreign language. Or for linguistic analysis.

MP has also become commonplace when communicating on the Internet, when high translation speed and understanding of what the interlocutor told you is very important. However, in this case you can safely forget about conveying literary images if you want to be understood correctly. Only “dry” phrases, without any ambiguity.

Human participation

Despite the development of various approaches and solutions to computing power issues, the quality of machine translation is still far from ideal. Even if the successes in this matter can be called impressive, but only in comparison with the very first systems.

Modern SMPs have already learned to more or less adequately translate technical texts, which, as we know, do not contain those literary liberties so often found in literary texts. The quality of translation is strongly influenced by the relatedness of languages. For example, when translating from to, the result will be much more worthy than from to. In the second case, the resulting text may turn out to be simply unreadable nonsense.

For this reason, machine translation cannot yet operate without human intervention. Which either initially adapts the text, eliminating all possible ambiguities (pre-editing), or edits the finished translation, removing almost inevitable errors from it (post-editing). There is also the concept of interediting, when a person directly intervenes in the operation of the system, correcting inaccuracies that arise “on the fly.”

What types of emergency services are there?

To date, work in the field of MP has been divided into two main areas:

  • Statistical machine translation (Statistical Machine Translation, SMT);
  • Rule-based machine translation (Rule-based Machine Translation, RBMT).

In the first case, we have self-learning systems. Translation becomes possible as a result of constant analysis of a huge number of texts with the same content, but in different languages. The system finds and uses always existing patterns. The quality of translation when using SMT is considered quite high. But only if the system has already managed to analyze a huge amount of information. And for this you need to have not only the tests themselves, but also impressive computing power. This means that only large companies can work in this direction. Examples of such systems: Google Translator, Yandex translate, and Bing Translator from Microsoft.

In the case of RBMT systems, all the rules are created by people who then constantly “test” them. Accordingly, the quality of the result depends on how fully linguists are able to describe the natural language with which they work. It is the need to constantly maintain the created linguistic database up to date that is the main disadvantage of RBMT systems. But to create a translator capable of providing a satisfactory result, impressive computing power is not required, which allows small companies to work in this direction. Examples include systems such as Multilect, Linguatec And PROMT.

There is also a third option: hybrid machine translation. (Hybrid Machine Translation, HMT). This method combines both approaches, SMT and RBMT. In theory, this approach allows you to gain the benefits of both technologies. This is what the company uses Systran, founded in 1968 and considered the oldest commercial enterprise operating in the field of MP.

Kontsevoy Daniil Sergeevich,
Private educational institution of higher education "Omsk Law Academy", Omsk

A translator in the field of professional communications is a person who is actively proficient in a foreign language of the professional sphere, who is able to logically correctly, reasonedly and clearly construct foreign language oral and written speech, and most importantly, master the technique of using machine translation systems, because even professionals cannot do without turning to electronic translators.

Machine translate - a process performed on a computer or other electronic device to convert text from one language into equivalent text in another language, as well as the result of such an action. Since there are no fully automated electronic translators capable of accurately and correctly translating a text, a specialist translator must prepare this text, or correct errors and omissions already in the machine-processed text.

There are four forms of organizing interaction between a computer and a person when performing machine translation:

  • pre-editing: a person prepares the text for computer processing (simplifying the meaning of the text, eliminating ambiguous readings, marking up the text), after which machine translation is performed;
  • inter-editing: a person directly intervenes in the operation of the translation system, resolving problematic issues;
  • post-editing: the entire source text is subjected to machine processing, and a person corrects the result by editing the translated text;
  • mixed system.

Modern electronic translators are capable of producing a perceptually adequate translation of individual phrases and sentences; they serve to facilitate the work of a human translator, to relieve him of the routine work of searching for the meanings of certain words and phrases in dictionaries.

To master machine translation systems, it is necessary to at least have a general understanding of electronic translation technologies. There are several of them in machine translation:

1) Direct machine translation

Direct machine translation is the oldest machine translation approach. With this method of translation, the text in the source language is not subject to structural analysis beyond morphology. This translation uses a large number of dictionaries and is word-by-word, except for minor grammatical adjustments, for example regarding word order and morphology. The direct translation system is designed for specific language pairs. The lexicon is a repository of information about the specifics of words. These systems depend on the quality of dictionary preparation, morphological analysis and text processing software. An example of a direct translation system is Systran.

2) Rule-based machine translation uses a large store of linguistic rules and bilingual dictionaries for each language pair. Types of rule-based machine translation include the Interlingua principle and Transfer machine translation.

  • Machine translation Interlingua

In machine translation based on the Interlingua principle, translation is carried out through an intermediate (semantic) model of the source language text. Interlingua is a language-independent model from which translations into any language can be generated. The Interlingua principle allows for the possibility of transforming text in the source language into a model common to several languages.

  • Transfer machine translation is based on the idea of ​​Interlingua using comparative analysis of two languages. The three stages of this process are analysis, transfer and generation. First, the source language text is translated into an abstract or intermediate model of the source language, which is then transformed into a model of the target language, in order to be finally formed into text in the target language. This principle is simpler than Interlingua, but it is more difficult to avoid ambiguity.

3) Machine translation on text corpora

The corpus approach in machine translation uses a collection (corpus) of parallel bilingual texts. The main advantage of corpus-based machine translation systems is their self-tuning, i.e. they are able to remember the terminology and even the style of phrases from the texts of previous translations. Statistical machine translation and example-based machine translation are variants of the corpus approach.

  • Statistical machine translation

This is a type of machine text translation based on comparison of large volumes of language pairs. This translation approach uses statistical translation models. One of the approaches used is Bayes' theorem. Building statistical translation models is a fairly fast process, but the technology relies heavily on the availability of a multilingual text corpus. A minimum of 2 million words is required for each individual area if we are talking about the language as a whole. Statistical machine translation requires special equipment in order to “average” translation models. An example of statistical machine translation is Google Translate.

  • Machine translation with examples

Example-based machine translation systems are based on the principle of a parallel bilingual corpus of texts, which contains pairs of sentences as examples. Each sentence is duplicated in a different language. Statistical machine translation has a "learning" property. The more texts (examples) you have at your disposal, the better the machine translation result.

Every translator in the field of professional communication will face the problem of choosing the appropriate translation program. Excluding paid services, we consider it necessary to analyze the most well-known systems.

The electronic translator Google Translate, which was developed by Google in the mid-2000s, is very popular. This service is designed for translating texts and translating websites on the fly. The translator uses a self-learning machine translation algorithm based on language analysis of texts.

Unlike most machine translators, which use SYSTRAN technology, Google uses its own software. Google Translate is currently the most popular translator due to its simplicity and versatility (as well as its direct connection with the computer software developer - Microsoft). Thanks to this, this machine translation system is developing very quickly and is optimized to meet the needs of users. Therefore, now the functions of this translator can be observed: translation of the entire web page; simultaneous search for information with translation into another language; translation of text on images; translation of the spoken phrase; handwriting translation; translation of dialogue.

The features of this machine translation system include:

  1. Translation options are controlled by a statistical algorithm.

Users can always offer their own translations of certain words and/or select one of the translation options as the most suitable. The disadvantage of such an algorithm can be deliberately incorrect translation options, including obscene words.

  1. Coverage of world languages.

That is, the program now works with more than a hundred languages, including Swahili, Chinese and Welsh. Thus, Google Translator is able to translate from one supported language to another supported language, but in most cases the translation is performed through English. The disadvantage of this mechanism is obvious - the quality of the translation suffers.

PROMT, developed in 1991, occupies a leading position in the Russian machine translator market.

PROMT, like Google Translate, uses its own software, which was significantly updated in 2010. From now on, PROMT carries out translation based on hybrid technology. Its essence lies in the fact that instead of one translation option, the program produces about a hundred translations of the same sentence, depending on the polysemy of words, constructions and statistical results. The machine then selects the most likely of the proposed translations. Thus, the translator is able to learn quickly, but has the same disadvantages as all translators based on statistical methods of text processing.

The translator's capabilities include: translation of words, phrases and texts, including using hot keys; translation of a selected area of ​​the screen with graphic text; translation of documents of various formats: doc(x), xls(x), ppt(x), rtf, html, xml, txt, ttx, pdf (including scanned ones), jpeg, png, tiff; use, editing and creation of specialized dictionaries and translation profiles; connection of Translation Memory databases and glossaries; integration into office applications, web browsers, corporate portals and websites.

The disadvantages of the translator are: a small number of language pairs with which the program works; complex interface; inaccuracies in the translations of professional vocabulary (which, however, is eliminated by connecting thematic dictionaries).

However, PROMT was recognized as the best English-Russian translator at the annual workshop on statistical machine translation under the auspices of the Association for Computational Linguistics (ACL) in 2013 and 2014.

There are many other machine translation systems, but they, one way or another, copy various features of the domestic PROMT translator or the American Google Translate.

Thus, a translator in the field of professional communication, knowing machine translation technologies and knowing how to choose the right electronic translator for certain purposes, will be savvy to carry out successful professional activities, because at this stage of development of computer technology it is too early to think about fully automatic machine translation. A human translator thinks in images and proceeds from the goal: to convey a specific thought to the listener/reader. It is still difficult to imagine a computer program with such capabilities. Modern machine translators play a supporting role. They are designed to save a person from routine work during the translation process. The age of paper dictionaries is over, and machine translation systems are coming to help professional translators (and not only others).

List of used literature

  1. www.promt.ru
  2. www.translate.google.com
  3. Belonogov G.G. Zelenkov Yu.G. Interactive system for Russian-English and English-Russian machine translation, VINITI, 1993.
  4. Bulletin of Moscow University. Ser.19 Linguistics and intercultural communication. 2004. No. 4, p. 51.

Your rating: Empty

Content:
Introduction……………………………………………………….………………. 3
1.1 What is machine translation?.................................................... ........ ................ 5
1.2 Start of machine translation……..………….……...….………………… 8
1.3 Stages of development of machine translation…………………….………….…. 12
1.4 Modern machine translation …………………..……………………….. 15
1.5 Machine translation on the Internet …….………………… ……………….. 18
Conclusion ……………………………………………………………………. 21
Literature…….…………………...……………………………………. . 22

Introduction.
Mechanization of translation is the oldest dream of humanity. But in the 20th century such a dream became a reality. This is largely due to the constant desire of society for globalization and even ethnic conflicts and political cataclysms, the strengthening of socio-economic ties between states, and the integration of many previously “closed” countries into the world community. Knowledge of foreign languages ​​is not only a useful skill in everyday life, but also one of the basic requirements when applying for a job. Currently, the need to know one or even several foreign languages ​​is becoming increasingly urgent. Knowledge of the language (English or German) is necessary not only when traveling on vacation abroad, but also when receiving business partners from abroad, in everyday life when reading the news or watching films. Therefore, a large number of routine, everyday and everyday operations that did not previously require knowledge of a foreign language, today, due to the development of international integration processes and the widespread desire of business for globalization, are becoming increasingly difficult if one relies on only one language. In this regard, today, the services of translators who perform professional translations into English, German and other languages ​​and language pairs are becoming increasingly in demand. However, today knowledge of foreign languages ​​alone is not enough, since the volume of information that needs to be translated every day has increased significantly. At the same time, this task is successfully solved, and it is not difficult for anyone to translate a contract or content of a foreign website in just a few seconds. And all because the translation in this case is carried out by a translator program: the person does not even have time to blink an eye, and the translation is already ready.
But today, as before, reality is not perfect. There is not a single machine translation system that, with the click of just a few buttons, can produce a flawless translation of any text in any language without human intervention or at least editing. For now, these are only plans for the distant future, if such an ideal can be achieved at all, since many question this assumption.

1.1 What is machine translation?

Machine translation is a translation process performed by a special computer program that allows you to convert text in one natural language into equivalent text in another language. This is also the name of the direction of scientific research related to the construction of such systems.
Modern machine or automatic translation can be considered in the interaction of a computer program with a person:

      With post-editing, when the source text is processed by a machine, and a human editor corrects the result.
      With pre-editing, when a person adapts the text for processing by a machine, for example, eliminating possible ambiguous readings, simplifying and marking up the text, after which software processing begins.
      With inter-editing, in which a person intervenes in the operation of the translation system, resolving difficult cases.
      Mixed systems, including, for example, simultaneous pre- and post-editing.
The main goal of machine translation as a science is to develop an algorithm that completely automates the translation process.
To carry out machine translation, a special program is introduced into the computer that implements the translation algorithm, which is understood as a sequence of uniquely and strictly defined actions on the text to find translation correspondence in a given pair of languages ​​L 1 - L 2 for a given direction of translation (from one specific language to another) . The machine translation system includes bilingual dictionaries equipped with the necessary grammatical information (morphological, syntactic and semantic) to ensure the transmission of equivalent, variant and transformation translation correspondences, as well as algorithmic grammatical analysis tools that implement any of the formal grammars accepted for automatic text processing . There are also separate machine translation systems designed to translate within three or more languages, but these are currently experimental.
The most common is the following sequence of formal operations that provide analysis and synthesis in a machine translation system:
1. At the first stage, text is entered and a search for input word forms (words in a specific grammatical form, for example, the dative plural) is carried out in the input dictionary (dictionary of the language from which the translation is made) with accompanying morphological analysis, during which it is established that the given word form belongs to a certain lexeme (a word as a unit of vocabulary). In the process of analysis, information related to other levels of organization of the language system can also be obtained from the form of a word.
2. The next stage includes the translation of idiomatic phrases, phraseological units or cliches of a given subject area. Includes the determination of the basic grammatical (morphological, syntactic, semantic and lexical) characteristics of the elements of the input text, produced within the framework of the input language; resolution of homography (conversion homonymy of word forms - say, English. round can be a noun, adjective, adverb, verb or preposition); lexical analysis and translation of lexemes. Typically, at this stage, single-valued words are separated from polysemous words (having more than one translation equivalent in the target language), after which single-valued words are translated using lists of equivalents, and to translate polysemantic words, so-called contextual dictionaries are used, the dictionary entries of which are algorithms for querying the context in the presence or absence of contextual determinants of meaning.
3. Final grammatical analysis, during which the necessary grammatical information is determined taking into account the data of the target language (for example, with Russian nouns like sled, scissors the verb must be in the plural form, despite the fact that the original may also have a singular number).
4. Synthesis of output word forms and sentences as a whole in the target language.
Depending on the characteristics of the morphology, syntax and semantics of a particular language pair, as well as the direction of translation, the general translation algorithm may include other stages, as well as modifications of these stages or the order of their occurrence, but variations of this kind in modern systems are usually insignificant. Analysis and synthesis can be carried out both phrase by phrase and for the entire text entered into the computer memory; in the latter case, the translation algorithm provides for the identification of so-called anaphoric connections.
Modern machine translation should be distinguished from the use of computers to assist human translators. In the latter case, we mean an automatic dictionary that helps a person quickly select the desired translation equivalent. Although in both cases the computer works together with a person (translator or editor), the content of the term “machine translation” includes the idea that the main part of the work of translating and finding translation equivalents and translation correspondences is carried out by the machine. themselves, leaving the person only to control and correct mistakes. While a computer dictionary to help a person is a purely auxiliary tool for quickly finding translation matches; At the same time, however, in dictionaries of this kind, some functions inherent in machine translation systems can be implemented to a limited extent.

1.2 Start of machine translation.

Machine translation technology, as a scientific field, has a history of almost a century, and the first ideas for automating the translation process appeared in the 17th century.
As is commonly believed, the reasons for the emergence of machine translation were the rapidly growing flow of information in different languages ​​of different countries and continents since the 2nd half of the 20th century, the need to assimilate it for scientific and technological progress, the lack of qualified (especially in certain fields) translators, as well as high the cost of their preparation.
The English inventor Charles Babbage first thought about developing new methods of translation, who proposed it in the late 1830s. project of the first computer in history. The essence of the device's operation was to use the potential of computer memory to store dictionaries. Ch. Babbage's idea was that a memory of 1000 50-bit decimal numbers (50 gears in each register) could be used to store dictionaries. However, Babbage never succeeded in bringing his idea to life.
The theoretical basis of the initial period of work on machine translation was the view of language as a code system. The pioneers of machine translation were mathematicians and engineers. Descriptions of their first experiments using newly emerging computers to solve cryptographic problems were published in the USA in the late 1940s. The birth date of machine translation as a research field is usually considered to be March 1947. It was then that the director of the natural sciences department of the Rockefeller Foundation, Warren Weaver, developed a memorandum in which he identified the task of text translation from one language to another as another area of ​​​​application of decryption techniques. In his letter to Norbert Wiener, Warren Weaver first posed the problem of machine translation, comparing it to the problem of decryption.
This was followed by a heated discussion of the idea of ​​automated translation and the theoretical development of the first technologies. There were suggestions about the complete replacement of human translators with electronic systems, and many professional translators feared being unemployed in the near future. Weaver's ideas formed the basis of an approach to machine translation based on the concept of interlingva: the information transfer stage is divided into two stages; At the first stage, the source sentence is translated into an intermediary language (created on the basis of simplified English), and then the result of this translation is presented in the target language.
The same Warren Weaver, after a series of discussions, drew up a memorandum in 1949 in which he theoretically substantiated the fundamental possibility of creating machine translation systems.Machine translation systems in those years were quite different from modern systems. These were very large and expensive machines that occupied entire rooms and required a large staff of engineers, operators and programmers for their maintenance. These computers were mainly used to carry out mathematical calculations for the needs of military institutions, as well as mathematics and physics departments of universities (the latter were also closely related to the military sphere). Therefore, in the early stages, the development of machine translation was actively supported by the military; Moreover, in the USA the main attention was paid to the Russian-English direction, and in the USSR - to the English-Russian direction.
In addition to obvious practical needs, an important role in the development of machine translation was played by the fact that the famous test of intelligence (“Turing test”), proposed in 1950 by the English mathematician A. Turing, actually replaced the question of whether a machine can think with the question of whether whether a machine can communicate with a person in natural language in such a way that he will not be able to distinguish it from a human interlocutor. Thus, for decades, issues of computer processing of natural language messages became the focus of research in cybernetics (and subsequently in artificial intelligence), and productive cooperation was established between mathematicians, programmers and computer engineers, on the one hand, and linguists, on the other.
Soon, funding for research began, and in 1952 the first conference on machine translation was held at the Massachusetts Institute of Technology, organized by logician and mathematician J. Bar-Hillel.
In 1954, the first results were presented to the public: IBM, together with Georgetown University (USA), successfully carried out the first experiment. It went down in history as the so-called Georgetown experiment, in which the first version of an electronic translator was presented. The experiment demonstrated fully automatic translation of more than 60 sentences from Russian to English . The presentation had a positive impact on the development of machine translation over the next 12 years.
The experiment was designed and prepared to attract public and governmentattention. Paradoxically, it was based on a rather simple system : it was based on only 6grammar rules, and the dictionary included 250 entries. The system was specialized: assubject areawas chosen for translationorganic chemistry. The program ran on an IBM 701 mainframe.
In the same 1954, the first experiment on machine translation was carried out in the USSR by I.K. Belskaya (linguistic part) and D.Yu. Panov (software part) at the Institute of Precision Mechanics and Computer Science of the USSR Academy of Sciences, and the first industrially suitable machine translation algorithm and a machine translation system from English into Russian on a universal computer were developed by a team led by Yu.A. Motorin. After this, work began in many information institutes, scientific and educational organizations in the country. The work in this area by domestic linguists, such as I.A. Melchuk and Yu.D. Apresyan (Moscow), deserves special mention. The result is the linguistic processor ETAP. In 1960, an experimental machine translation laboratory was organized as part of the Research Institute of Mathematics and Mechanics in Leningrad, which was later transformed into the Laboratory of Mathematical Linguistics of Leningrad State University.
The Georgetown Experiment demonstration was widely reported in mass media and was perceived as a success. It influenced the decisions of some governments states , Firstly USA, invest in the region computational linguistics. The organizers of the experiment assured that within three to five years the problem of machine translation would be solved. The idea of ​​machine translation has stimulated the development of research in theoretical and applied linguistics around the world. Theories of formal grammars appeared, much attention was paid to the modeling of language and its individual aspects, linguistic and mental activity, issues of linguistic form and quantitative distributions of linguistic phenomena. New areas of linguistic science have emerged - computational, mathematical, engineering, statistical, algorithmic linguistics and a number of other branches of applied and theoretical linguistics. During the 1950s, departments of applied linguistics and machine translation were opened in educational centers around the world. So, in the USSR, such departments were created in Moscow (MSU named after M.V. Lomonosov, Moscow State Pedagogical Institute named after M. Thorez - now MSLU), in Minsk Moscow State Pedagogical Institute of Foreign Languages, in Yerevan, Makhachkala, Leningrad University, in the universities of Kyiv, Kharkov, Novosibirsk , a number of other cities. Research and development in machine translation has also taken place in France, England, the USA, Canada, Italy, Germany, Japan, the Netherlands, Bulgaria, Hungary and other countries, as well as in international organizations where there is a large volume of translations from various languages. Currently, research is being conducted in countries such as Malaysia, Saudi Arabia, Iran, etc.

1.3 Stages of development of machine translation.

As a result of such a successful start to the development of machine translation, it seemed that the creation of high-quality automatic translation systems was quite achievable within a few years. At the same time, the emphasis was on the development of fully automatic systems providing high-quality translations; human involvement in the post-editing phase was seen as a temporary compromise. Professional translators seriously feared that they would soon be left without work...
However, machine translation research has experienced both ups and downs throughout its history. In the 1950s, significant investments were made in research, but the results quickly disappointed investors. One of the main reasons for the low quality of machine translation in those years was the limited capabilities of hardware: a small amount of memory with slow access to the information contained in it, and the inability to fully use high-level programming languages. Another reason was the lack of a theoretical framework necessary to solve linguistic problems. As a result of this, the first machine translation systems were reduced to word-by-word (word by word) translation of texts without any syntactic, much less semantic, integrity.
In 1959, the philosopher J. Bar-Hillel argued that high-quality, fully automatic translation could not be achieved in principle. He proceeded from the fact that the choice of one translation or another is determined by knowledge of extra-linguistic reality, and this knowledge is too extensive and diverse to be entered into a computer. However, Bar-Hillel did not deny the idea of ​​machine translation as such, considering the development of machine systems oriented towards their use by a human translator (a kind of “human-machine symbiosis”) as a promising direction. But this speech had the most unfavorable impact on the development of machine translation in the United States. In the early 1960s, the initial euphoric stage in the development of MP ended. This was greatly facilitated by the publication of the so-called “Black Book of Machine Translation” - a report by the Ad Hoc Committee on Applied Linguistics (ALPAC) of the US National Academy of Sciences, which stated the impossibility of creating universal high-quality machine translation systems in the foreseeable future. The commission came to the conclusion that machine translation was unprofitable: the ratio of cost and quality was clearly not in favor of the latter, and there were enough human resources for the needs of translating technical and scientific texts. The consequence of this publication was a reduction in funding and a general decline in interest in the problems of machine translation, but there was no complete curtailment of research, especially theoretical ones. And the first translation systems continued to be popular in military and scientific institutions of the USSR and the USA.
A new stage in the development of machine translation technologies began in the 1970s. This rise was associated with the advent of computing technology - the emergence of microcomputers, the development of networks, and an increase in memory resources. Programmers abandoned the idea of ​​​​creating an “ideal” translator machine: new systems were developed with the goal of greatly increasing the speed of information translation, but with the obligatory participation of a person at various stages of the translation process to achieve the best quality of work.
About the revival of machine translation in the 70-80s. The following facts indicate: the Commission of the European Communities (CEC) buys the English-French version of Systran, as well as a translation system from Russian into English (the latter developed after the ALPAC report and continued to be used by the US Air Force and NASA); in addition, CEC commissions the development of French-English and Italian-English versions. At that time, thanks to the CEC, the foundations of the EUROTRA project were laid, based on the developments of the SUSY and GETA groups. At the same time, there is a rapid expansion of activities to create machine translation systems in Japan; in the USA, the Pan American Health Organization (PAHO) orders the development of a Spanish-English track (SPANAM system); The US Air Force is funding the development of the MP system at the Linguistic Research Center at the University of Texas at Austin; The TAUM group in Canada makes significant progress in developing its METEO system (which was used primarily for the translation of weather reports). A number of projects started in the 70-80s subsequently developed into full-fledged commercial systems. In our country, the development of the fundamentals of machine translation technology was continued by a group of specialists at VINITI under the leadership of Professor G. G. Belonogov. As a result, in 1993, an industrial version of the RETRANS system for phraseological machine translation from Russian into English and vice versa was created, which was used in the ministries of defense, railways, science and technology, as well as in the All-Russian Scientific Information Center.
The next stage of research in the field of machine translation was the 90s of the last century. This is, of course, connected with the colossal progress of modern personal computers, the emergence of high-quality scanners and effective optical text recognition programs accessible to the mass user and, of course, with the advent of the global computer network Internet. All this gave new impetus to work on machine translation, attracted new significant investments into this area and resulted in serious practical results. Namely, quite effective machine translation systems and computer dictionaries have appeared for working on a personal computer; machine translation systems were combined with optical text recognition and spell checking systems. Special machine translation tools have been created for working on the Internet, providing either translation of texts on the servers of relevant companies, or online translation of Web pages, allowing one to overcome the language barrier and navigate through foreign sites.

1.4 Modern machine translation.

Today's translation programs have a much broader outlook and operate on the basis of more advanced translation technologies. Translation systems are actively used all over the world in cases where it is necessary to quickly understand the meaning of a text or frequently translate large amounts of information. Some developers today have managed to achieve very acceptable translation quality in certain language areas.
Modern machine translation should be distinguished from the use of computers to assist human translators. In the latter case, we mean an automatic dictionary that helps a person quickly select the desired translation equivalent. The content of the term “machine translation” includes the idea that the machine takes on the main part of the work of translation and finding translation equivalents and translation correspondences. A person is provided only with control and correction of errors, while a computer dictionary to help a person is a purely auxiliary tool for quickly finding translation matches.
In translation practice and in information technology, there are two main approaches to machine translation. On the one hand, machine translation results can be used to briefly familiarize yourself with the content of a document in an unknown language. In this case, it can be used as signal information and does not require careful editing. Another approach involves using machine translation instead of regular human translation. This involves careful editing and customization of the translation system for a specific subject area. The completeness of the dictionary, its focus on the content and set of linguistic means of the translated texts, the effectiveness of methods for resolving lexical ambiguity, the effectiveness of algorithms for extracting grammatical information, finding translation correspondences and synthesis algorithms play a role here. In practice, translation of this type becomes economically profitable if the volume of translated texts is large enough, if the texts are sufficiently homogeneous, the system dictionaries are complete and allow further expansion, and the software is convenient for post-editing. This kind of machine translation systems is used in organizations whose needs for prompt and high-quality translations are quite large.
Within the framework of machine translation technology, there are two approaches: traditional (rule-based) and statistical (based on statistical processing of dictionary databases). The traditional MT method is used by most translation system developers. The work of such a program includes several stages and, in essence, consists of using linguistic rules (algorithms). Accordingly, the creation of such an electronic translator includes the development of rules and replenishment of the system’s dictionary databases. The quality of the output translation depends on the development of the necessary algorithms. The rich vocabulary of the system also allows you to cope with the translation of a wide variety of texts. The statistical method operates on a completely different principle. It is based on mathematical methods for obtaining translation. More precisely, the entire operating principle of such a system is based on statistical calculation of the probability of matches of phrases from the source text with phrases that are stored in the translation system database.
In Russia, using the traditional method of machine translation, software products of the PROMT company are developed - the only manufacturer of translation programs in our country. Currently, the PROMT company is a leading developer of automated translation systems and has enormous technological expertise, which allows it to developtranslation systemswith different functionality. Unique technologies for constructing translation systems and original algorithms for working with texts in natural languages ​​became the basis on which all the company’s software products were created, and which provided the opportunity to develop a wide range of solutions for automated translation from one language to another. PROMT software products are equally useful for solving business problems and for home use. Recently, PROMT has been paying special attention to the creation of special tools and technologies for professional translators. Currently, PROMT systems perform translations for24 language directions. The general dictionary for one language pair contains from 40 to 200 thousand dictionary entries, which in turn contain a structured description of various linguistic information necessary for the system to operate complex text analysis and synthesis algorithms. Dictionaries by topic contain specific words and expressions characteristic of the subject area; their volume can vary from 5 to 50 thousand dictionary entries. For example, specialized dictionaries have been developed for the English-Russian and Russian-English systems, covering more than 50 different topics.

1.5 Machine translation on the Internet.

Online translation of information on the Internet is becoming increasingly popular. The Internet is rapidly transforming from a predominantly English-speaking environment to a multilingual environment, forcing Web site owners to provide information in multiple languages. Most often, information and search sites that seek to attract multilingual users to their pages resort to the services of MP. Thus, a new translation service has opened on the Canadian information retrieval portal InfiniT (http://www.infiniT.com). The website now offers online translation of text from English and German into French and vice versa. The increase in the number of visitors to the portal is due to the possibility of online translation of Web pages. To do this, the user just needs to indicate the address of the Web page, select the direction of translation and click the translation button. As a result, after a few seconds the user receives a fully translated Web page with formatting preserved.
The new service allows you to eliminate the language problem on the Canadian Internet, where, due to historical features, two languages ​​are widely used: English and French. In addition, the online translator provides access to sites in German to those residents of Canada who do not speak foreign languages. The service operates on the basis of the PROMT Internet server solution called PROMT Internet Translation Server version 2.0. The project was implemented jointly with the Softissimo company, which promotes PROMT products under the REVERSO brand. An interesting feature of Web sites introducing MP programs, electronic dictionaries and other linguistic support programs is that you can get acquainted with the work of many software products interactively, using the version installed on the server and having a gateway for remote communication via a Web interface . On the server of the Web publishing house "InfoArt" (http://www.
infoart.ru/misc/dict) an interactive demonstration of the Lingvo and MultiLex dictionaries was organized. You can enter a word or phrase and instantly get a translation, interpretation, examples of use and common phrases.
The most universal is PROMT Internet. By purchasing this package, you will receive several programs for translating Web pages, and not only them. It is safe to say that the capabilities of this set of applications are quite sufficient for full-fledged work with documents in English, French and German. If you plan to use the universal translation program WebTranSite 98 or the WebView browser more than other parts of the PROMT Internet package, and at the same time want to save some money, you can purchase these products separately. In this case, WebTranSite 98 will appeal to those who often translate small fragments of text not only from the Internet, but also from office, email and other programs, as well as from online help systems.
WebTranSite 98 is suitable for more than just translating Web pages. It is quite universal and allows you to process fragments
etc.................



Did you like the article? Share with your friends!