It was unable to attend the 6th Motion Translation Summit (MT Summit Vi, 10. 29 ~ 11.1, 1997) in the near future. Only well reading the context and ask some friends with some friends through E-mail. The theme of this session is "the past, now, future", one of important contents is to celebrate the 50th anniversary of the translation of machine. It is particularly rare that the General Assembly invited a pioneer translated by multiple machines, such as the propagander of the machine translation research andrew booth, and the founder of the famous machine translation system, such as Systran Peter Toma, Metal's Lehmann et al. They talked about the hardships of the year and disclosed some interesting things. Half century, although the machine translation study has become more and more recognized, its application has become more and more wide. It is especially encouraging whether it is foreign or in China now has so many commercial systems enter the market, especially PC machine translation products. At the same time, with the PC's popularity and to meet the needs of browsing the Internet, the trend of translation products into thousands of households has begun to reveal. In the summer of 1996, he met Mr. Chang Tou at an academic conference in Singapore. He just mentioned the translation study. He said that some of Japanese computer company said that today's machine translation began to make money. When he said this, the kind of expecting is like a long time, I can share with him. A person is engaged in a study that seems to have only investment but lack of output, and it is also worthy of persuading others to believe that this investment is worth it. He will bear more pressure, This is only those who are pro, and immersive talents can experience. Of course, Japan's situation may be more typical. It is reported from 1978 to 1993 to Japan's investment in the translation study of $ 200 million [PEDTKE, 97]. Today, Japan's translation software is about 500,000 sets per year, most of which cost each price from $ 100 to $ 1,000 [Kamei, ETAL, 97]. In summary, in terms of mechanical translation research and development, pessimistic arguments, negative arguments, now there is no more common. But we should also see the translation quality of the machine translation system, the reflection of users' dissatisfaction and disappointment is still very common, sometimes very strong. Any excessive exaggeration of the translation quality of the machine translation system will still be very harmful. How to improve the translation quality of the machine translation product is still a severe test in front of us, and it is also the primary problem that machine translation research is moving towards the new century. Where is the breakthrough point of the translation study? This is the main problem to be discussed herein. 1. Breakthrough in machine translation research Talk about the theoretical or technology breakthrough point, some people may think of "rule-based" or "language-based" or "corpus-based" or "corpus" or statistical methods, etc., Or "empirical", "rationalism", etc. In the early 1990s, in the field of machine translation and other fields in the natural language process, they did have caused a debate. But people quickly recognize that the combination of linguistics and corners and statistical methods is better than relative [Somers, 97]. The breakthrough point we have to discuss is to refer to the key issues that may cause technical changes. 1.1 From a single sentence processing to the sentence group to date, most practical machine translation systems have been processed by one sentence. That is to say, their analysis and production are limited to only one isolated sentence range. The so-called context is this isolated sentence rather than a paragraph or several coherent sentences. This narrow context is difficult to analyze, even in syntactic analysis, provide more adequate information, thereby ensuring the correctness of the analysis. Most of the translation quality of the translation system is caused by the error of analysis failure or ambiguity. E.g:
(a) Sorry i can't go with you, I am going to the bank. I'll get a mother for the imagement office. This example is very difficult if it is only limited to one sentence. Judge. But if the scope of the analysis is a sentence group or a paragraph, it should not be difficult. As another example:
(b) Yesterday I bought several books, the print quality is quite good, it is too expensive. Here, the existing Chinese-English machine translation system is difficult to solve the problem of omission of the latter half, so the generated English translation is actually an English syntax error. If you have a Chinese-English machine translation system, you may wish to try. In addition, if the analysis of one isolated sentence in a paragraph is correct, can you generate high quality translations? not always. In particular, the source language and target language are different, such as English and Chinese, or Japanese and English, it is even more difficult. Try:
(c) The School Bus Came and Picked Up The Boys Punctually As Usual. When The bus drove Near The School, John Felt Sick and His Face Went Terribly White. Here, if the first "(School) BUS" translated into "school bus ", And the second" bus "is translated into a" bus ", or the first" (School) BUS "translated into" school bus ", and the second" bus "is also a" bus ", It is a very awkward translation. Unfortunately, the existing English-Chinese translation system can only be so embarrassed, because they just handle isolated single sentences. If you have English and Chinese translation systems, try it. In the cooperation with the University of South California and SYSTRAN, we have done a good exploration in the paragraph level to improve the quality of the translation [Hovy, 97].
A new generation of machine translation will take sentence processing. The so-called "sentence group", we refer to a complete paragraph or a number of consensus sentences in a paragraph, in short, more than one sentence. Interpretation of sentences is by no means a simple increase in the number of sentences. The essence of sentence group processing is based on text-based understanding. There are at least two problems that must face here. First, select the size of the sentence. Generally speaking, a complete natural paragraph containing 6 to 8 sentences is ideal. It is not a sentence group so much better. The paragraph is too large, and the correlation before and after. The reliability of related information will also be low. This may cause misleading this way. On the other hand, if the sentence is too small, even a natural paragraph, because the relevant information is not enough, it is not ideal. Second, build a group language model. The traditional machine translation system solves the syntax tree of a sentence during analysis. Even if there are several sentences of the same paragraph, there is no connection between their syntax trees. It is not a traditional syntax tree between multiple sentences to be established, but should be a semantic network. This is what we are saying to build a group language model. As mentioned above, the analysis depth of the new generation of system will have a big difference than today's aircraft translation system. Here, the sentence group semantic network will replace the syntactic tree of a single sentence [NAGAO, 97]. So far, some typical problems in the analysis are fundamentally dependent on semantics, such as referring to semantics. Depth analysis and sentence processing are interdependent, interacting. There will be less context such as less than the sentence, and the depth analysis will not be over; at the same time, it does not perform depth analysis, and the sentence processing is not practical. In order to meet the needs of depth analysis, the knowledge system used by the new generation of machine translation will have corresponding changes, the most pronounced is that it will adopt a powerful knowledge base. 1.2 New Knowledge System New Generation Machine Translation The knowledge system will contain anything, what is the characteristics? It is well known that the translation requires at least two knowledge, one is the knowledge of language characters, and the other is world knowledge, including common sense and expertise. This should be used in artificial translation, which should be used for machine translation. But the world knowledge of the traditional machine translation system is very limited, if not completely, It is important to rendering such a translation system, and even translate various professional, all kinds of texts, actually unreal, because even if it is artificial translation. New knowledge systems used in the new generation of machine translations, in addition to the traditional dictionary and rules that mainly reflect language knowledge, mainly will include world knowledge or common sense. We call it relationship semantic knowledge base [Dong Zhendong 97]. Our so-called new knowledge systems should have the following main features: (1) It is a system that provides a comprehensive description of a variety of semantic relationships between concepts and its properties, not just an online righteous dictionary; (2) ) The architecture it describes is a mesh, not a tree shape. What should the so-called multiple semantic relationships? Based on our experience and understanding of some of the existing systems, it may include the following:
Superordinate and Hyponym, such as: there is life-animal; Synynymy or Near-Synynymy, such as: Good - good; Medical - Medical Antonymy, Long - short; fat man - thin and complementary relationship (Complementarity) such as: born - death; buy - selling part of the overall relationship (WHOLE-Part) such as: Hand - people; Role-Playing Such as: Doctor - Divide (Medical); Hospital - Space (Medical) Subsidiary Relationship (Possession) such as: morality; color - specific object; price - 万 物 product pointing relationship (Reference) Red - color; courage - courage; expensive - price sector (Domain-sharing) such as: Theater - Performing Arts; Singing - Event Mutual Inductance, such as: Shi (purchase) = Serious (getting) = result (received); (received) = premise (lost);
We note that many scholars or academic institutions have begun in this regard in recent years [Huang Zengyang, 97, Miller, et al, 90]. Some of them cannot be considered a true knowledge base, or is just a semantic dictionary, such as THESAURUS, but can be seen in the same direction, at least people realize that the knowledge base should be strengthened Construction. Some existing systems have also vigorously develop the original dictionary, which greatly increases the content of their knowledge [Gerber & yang, 97]. 1.3 Improved Machine Translation Researcher Takes that another important drawback of existing machine translation systems is the constraints of their translation unable to get rid of the syntax of the source [Nagao, 97, Chang, 97], so it is difficult to generate nature Or authentic translation. How to get authentic translations will be another new breakthrough point of the new generation of translation research. How can I achieve this goal? There are three ways to predict now. First, establish a larger scale containing a large number of examples of the two-language library (Translation Memory). This is very effective for certain locultural environments and is also necessary. Such an example we are not difficult to find from the translation of a variety of identities, such as "right side", "click here", "no smoking", etc. Second, a fixed translation template is designed. We are studying a fixed, including a plurality of role slot, such as Chinese verbs, "Role Slot," Role Slot, and Role Slot. - (flower) price - (give) beneficiaries - (bought) collapse "= Bought beneficiary collar" FOR price here, "Shi", "consideration", "beneficiar" "Collar items" is a role. The main task of the analysis of the machine translation system is to fill the various grooves specified in the fixed template. Once the role specified by a template has been obtained, you can generate a translation of the fixed statement template. Since both languages have been strictly good, and the translation is to be strictly installed in a fixed sentence, so it is possible to get rid of the binding of the source language. The disadvantage of this method is that there may be slightly leaks in the source language. This approach is better for developing a translation system for online browsing, especially the Hanwaofa machine translation system. Third, the high quality translation generated on the basis of the text understanding of sentence processing. Here is a fixed translation template, but this template is based on not just a single word, but a story. In fact, this kind of study has been explored several years ago, although the context is quite limited, such as traffic accidents. 2. The worthwhile trend in development With the high-speed development of computer hardware and software platform related to language processing, the machine translation has begun to industrialize due to continuous advancement of machine translation of its own technology. Therefore, certain development trends in the future translation products should cause us attention. 2.1 Specialized existing large-scale translation systems in the field generally contain multiple professional technical dictionaries, although other parts are common, it is known as the text of various professional fields, and different professional dictionary is used. . We treat such a system to call the "wild system" (actually a "extensive system"). Artificial translation is impossible to have a hundred translation masters. No one will be everything. Everyone's knowledge structure is limited. Since people can't do it, how can I hope that the existing machine translation system does it? The existing "wild system" is a product that is not for, and it is not an ideal product. It will give the real professional system, such as automotive professional translation systems, aviation professional machine translation systems, etc. in the future. The professional system and "wild system" are significantly different in that they have different knowledge structures.
The specialization system not only contains professional dictionary, but it will have other knowledge bases of this major, and the form may be a rule base, or it may be a powerful bilingual material, or both. Moreover, such systems have been dedicated to this major in their development and development. Some developers are now worried that competition is too fierce. We believe that if everyone is just a "wild system" in the development quality, the low level of competition is inevitable. The development and development of the translation system will have a professional division of labor. So many languages should face, so many majors must be dealt with, and the mechanism research is really achievable, not due to the fierce competition and the walrow. The more diversified, the market will become worse. How to develop and develop specialization systems will be our new topic. 2.2 Diversified applications of applications will be another trend of future development. The emergence of the Internet affects the human society in how much range and how far, it is not easy to estimate it. The arrival of information society has made it easier to overcome the needs of language barriers. In recent years, various browsers with translation functions and online online translation systems are really like rain. Some of them are online dictionaries, providing users with a function of random capture words. Some is the full-text translation of online, which is embedded in a search engine, such as SYSTRAN launched a multi-language translation system on the Altavista / Systran online multi-language translation system in December 1997. If you are interested, you can try [Babelfish]. There are also machine translations to provide a remote translation service on the Internet, such as $ 0.01 per word. Online translation systems are facing more severe tests. They are more difficult to deal with the online five-flowers of language. If the quality is too bad, it is better to provide only a dictionary function. In addition, with the practical use of speech recognition, it will be a practicality of speech (speech) machine translation. At that time, we can see movie subtitles translation, telephone automatic translation, and systems that serve meetings, booking, and so on. 2.3 Development cycle shortening is now significantly shortened to the development cycle of the previous decade. This is also the researchers and developers should pay attention. Taking my country as an example, my country's first commercial machine translation system has experienced 11 years from the study, and the system is still very naive when it is launched. The shorteness of the development cycle is the great change in the hardware environment, as well as the accumulation and exchange of experience and information. In the field of knowledge intensity, especially knowledge engineering, the situation of repeated labor has been commonly affected by the speed of research. Nowadays, the Internet has more powerful promotion and sharing of technology and information. In the Internet, people can have a variety of dictionaries, corpus, and even analyze engines for scientific research. This will greatly accelerate the progress of the experiment. We expect a variety of functional components, such as syntactic dictionaries, semantic dictionaries, chromatographics software, tricks, analyzers, analytical engines, gikhics, etc.. Developing a system can purchase different components to assemble, revise, and debug, without having to come by themselves, everything starts from scratch. In fact, the corpus-based approach is just that this requirement can be satisfied. Everyone comes to build the language library, everyone also shares the word library. Japan's related activities of the company's dictionary of different machine translation systems in Japan [Kamei, 97] are worth learning. The machine translate out the laboratory and tied to industrialization to make a more sharp challenge, but also provided a better opportunity. How to make machine translation and academic research better combination is also a new issue [Gerber, 97]. We hope that the machine translation will meet the arrival of the new century with its new breakthrough.