True computational linguistics? !
2003-3-26
- Read R. Hausser "Comparative Lingu"
Liu Haitao
Roland Hausser
Foundations of Computational Linguistics: Man-Machine Communication In Natural Language. 1999. ISBN 3-540-66015-1. The Second Edition, 578 X PP. 2001.
Grundlagen Der Computerlinguistik: Mensch-Maschine-kommunikation in Natürlicher Sprache.572 S. 2000.ISBN 3-540-67187-0.
Berlin-New York: Springer-Verlag.
From the perspective of information theory and control, the language is a tool for exchange information between humans and human machines. The focus of theoretical language research is the general characteristics of human language, and calculating linguistics research is the communication problem of natural language between human machines. Computational linguistics can also be seen as science (LIU 1999) to study machines such as machines such as computers. Strictly speaking, calculating linguistics is a branch discipline in linguistics, and the language problem should be the core issue of computational linguistics. Computational language decades bumpy history has made more and more scholars are gradually recognizing the importance of this issue (HELLWIG 1988; Liu Yongquan 1997; Hutchins 1992). Of course, we emphasize the language characteristics and it The humanity does not mean neglecting methods and practices. We just think that any calculation linguistic application that ignores the basic research of linguistics will not achieve long-term development, but can only be a play object in computer scientists. This problem is particularly prominent in my country's computing language, due to the restrictions on the current learning, and students are difficult to convert between technologies. So, in my country, most of the people who calculate linguistic studies in our country are from the science and engineering, which is extremely unfavorable for the construction of the discipline. Of course, we also have Professor Feng Zhiwei (1996) from the calculation linguist of linguistics, but in general, such people are too small. We do not deny that with the popularity of computer technology, we will have more and more linguists to join the ranks of computational linguistic researchers; and also have a science and engineering scholar after recognizing the importance of linguistics. Continuously improve their language cultivation, such as Tsinghua University's computational linguistic research team headed by Professor Huang Changning. In this regard, we believe that German scholars' experience is worth learning. Investigate the webpage of the German college university on the Internet, found that these universities generally have specialized computational linguistics, and computational linguistics is generally part of the language discipline. If we take into account the good human research tradition in Germany, then study the calculating linguistic books prepared by German scholars must be beneficial. Unfortunately, many such works are written in German, such as: Schmitz (1992) is a very unique calculation linguistic introduction, it is worth reading.
The author Roland Hausser is currently a professor of computational linguistics in Irlangen University (1984, 1986, 1989). In general, from the authors of the basic definition of computational linguistics, we can speculate what he will say in his book. The first sentence in the book is: "The central task of computational linguistics facing future is to study a cognitive machine that can communicate freely with its own language." From this definition, it is not difficult to see that the author has highlighted the following issues: human-machine communication, non-restricted applications, and the role of understanding in language communication. In addition, we must also remind everyone that the subtheme of this book is "using the natural language to communicate with human machine". To accomplish this task, the author carefully organizes and designs a relatively complete language theory. Its theme is how to simulate human natural language exchange mechanisms through machines. There are four parts: "Language Theory", "Grammatical Theory", "Specious and Syntactic", "Semantic and Language". Every part is composed of six chapters, each chapter is divided into five sections. In order to facilitate students to master the key concepts and important issues in understanding the book, there are several exercises at the end of each chapter, and there are 772 exercises in the book. I believe that such a structure is very conducive to teachers and students. In the introduction, the author further clarifies the target of calculating linguistics: "The natural transfer process of copying information on a suitable computer by modeling the speaker generation process and the listener interpretation process. In other words, it is to build a self-governing machine (robot) that can communicate freely with natural language. In order to achieve this goal, it is necessary to have a deep understanding of the functional model of the natural language communication mechanism. This is precisely ignored by many language theories and computational linguistics. We can't imagine if there is no in-depth understanding of a process, how can we use a computer to reconstruct it. In this part of this, the author is also briefly discussed: the conditions that should be satisfied in the process of modeling the natural exchange process; the language profile program (PARSER) is the role of verifying and testing language theory; abstract significance for discipline research and formation. The author also believes that "repeatable experiments" and "methodification methods" in other scientific areas are not suitable for verification of linguistic theories. Conversely, the electronic model is the most appropriate way to test language and grammar theory. Computational linguistics for the inspection of theoretical linguistics seem to have become a consensus of many linguists (STAROSTA 1991; Feng 1996). The author is called SLIM, which consists of SLIM in this book, which consists of the following four main principles: surface component (methodological principles), linear (experience principles), internal (ontological principles), and matching (function principles ). In fact, Slim This name itself comes from the first letters of the English name of these four principles.
Computational Lingu is a cross-discipline, which contains many branches of traditional and theoretical lwords, and dictionary, language philosophy, analysis philosophy, logic, text handling, data inventory, oral and written language processing. . In the 1.2 "Language Science and Its Components", the author clearly tells the "traditional syntax", "theoretical language" and "theoretical linguistics" and "calculation linguistics". Theoretical language itself is non-efficient due to its theory, and the role of calculating linguistics is only in the form of language analysis and mathematical complexity theory. The traditional grammar that is written in the same side is quite favored by calculating linguistics because its basic principles are for specific language data. The application of computational linguistics is: the index and retrieval of text database, machine translation, automatic text generation, automatic text check, automatic content analysis, automatic teaching, automatic dialogue, and information systems, etc. (P.21-22).
By analyzing the "Question Rate" and "Regulation Rate" on the analysis of the text retrieval index, the author briefly describes the meaning and role of language knowledge for improving the results. Furthermore, there are two ways in the design of computational linguistic applications, namely: "Smart" and "Solid". The so-called "dexterity" is to avoid light, and the application is limited to an easy-controlled scope, such as the general common linguistic processing system; and "solid" application is built from the theory and practice of the phenomenon. Above the full understanding. Although the "dexterity" method has less input, the advantages of the effectiveness. However, its disadvantage is also fatal, its scalable is extremely poor, there is almost no way to improve the accuracy of processing. To this end, in the long run, it should also be "solid" method, and solidly start from the foundation. Any short-term behavior that circulates difficulties is unfavorable to the development and problem of disciplines. The author also proposes to build the importance of various language processing modules for calculating linguistic application research, and other places in the book also have a further setup of modularization of computational linguistics (P. 283). Although the author's modular concepts proposed in this book, it is not completely equivalent to similar concepts asked by the author (Liu Haitao 1995), but it is certain that modular design and professional division of labor are attracting people's attention. Researchers who have a lot of machine translations believe that machine translation is a set of controversy. I personally have a similar view. However, the author said that "translation avoids some of the most difficult problems in the process of language, such as the selection, serialization, and vocabulation" (p. 41). The reason is that the theory and method of the author is studying the entire process of language as an exchange tool, so it will inevitably involve human cognitive issues. Most of the calculation linguistics we have encountered generally study the computer handling of language itself, and this is just a part of the SLIM theory. It can be seen that the computational linguistics described in this book is a theoretical and method for studying how to apply language tools, while other common calculation linguistic theories are research tools (language) itself. As we all know, tools can only be improved during use, in order to create benefits. Therefore, the theoretical and methods proposed by this book may have more effective results to meet the final goals of computational linguistics. We have previously understood computational linguistics may be too sided and narrow.
After three common machine translation methods, the authors highly praised the "Media language" method, and the "media" mode and the translation system based on artificial intelligent knowledge system are the focus of the current translation theory research. We believe that from economic and modular, the "Media" model of the "Media" is undoubtedly better to meet today's Internet age. At the same time, the mechanical translation system using the medium is almost completely equivalent to the modeling of the natural language exchange mechanism proposed by this book (p. 48). Unfortunately, the author does not have more analysis of the "Media" problem, nor does it give the necessary references. It is good to have a textual issue of a copy of the media for reference (Liu Haitao 1993).
Why should the author need to emphasize the importance of establishing a natural exchange model? The author believes that this is a triple significance for calculating linguistics: theoretically, it is found that the natural language is actually working, this is a universal problem; the method provides a unified component of development grammar on the computer. The functional view, and allows objective verification to its theoretical model; practical, it can serve the basis for constructing "solid" advanced applications (p. 49).
Communication requires language, but cognition can be independent of the language. If you want to truly build a set of theory (and intelligent) to complete the original goal of this book, how to represent the cognitive structure in the human brain, it will become a problem that cannot be avoided. To this end, the topic of the third chapter of this book is the foundation of "awareness of semantics". The main content of this chapter is independent of the semantic cognitive representation of the language, namely: meaning and structure in cognition. Strictly speaking, this chapter is the beginning of this book. The author will grant such a non-language "Cognition" with the subject, and it is not difficult to see. How to add language ingredients on the basis of a cognition is the topic of Chapter 4, "Language Exchange". Human-machine communication requires a language. Today's general practice is that human goal is moving on the machine and learning the programming language for the use of special machines. This is obviously unfair to people because the computer itself should be a machine that serves human service, rather than human beings to find "official masters". The ultimate goal of calculating linguistics is to rescue human beings from this "strange circle", which is one of the important reasons for calculating linguistics for humanity. We have said many times (Liu 1999), the language study of the information age should consider the needs of people and machines, this means this. Although the generation and understanding of the language is two different processes from the surface, the authors have a common two sub-process, that is, the processing and interpretation of the symbols, only the order of the two sub-processes. Diametrically opposed. In the process of language communication, there are two "meaning", which is the so-called "literal" meaning, and the other is that the speaker really wants to express the meaning. The author calls the former as "meaning 1", the latter is called "meaning 2". The "meaning 2" that the listener (or receiver) is not to be said to be the original "meaning 2" of the speaker, unless the speaker's context environment can be correctly refactored (p. 77). Due to the difficulty of reconstructing the content of the information source, we can say that humans can only be approximately approximate during language information exchange. According to the principle of Frege "a complex expression of a complex expression is the principle of its components and components, the author introduces a method of expressing its meaning from the surface layer language. It is worth mentioning that the author linked this idea with the context so that "meaning 1" is combined with the environment where the context or language is used. This is undoubtedly a positive significance for the improvement of understanding accuracy. In view of the importance of the SLIM theory The syntax and semantic characteristics of the presentations are systematically derived from the syntactic category of the construction unit they contained. "Significance 1" "(P.80). According to the author's analysis, the conversion generation language theory (TGG) is essentially the principle of surface composition. Furthermore, the author suorts: anything that has no functional role in natural exchanges, should not be seen as the composition ingredients of human language mechanisms (P.83). About this, we believe that it is not only meaningful for computational linguistics, but also has a reference value for ordinary linguistics. In an article discussed "ingredients" and "according to", Hudson also mentioned that some things used in language analysis were excessive, unnecessary (Hudson 1980). Two people got similar conclusions from different angles. It is worth paying attention. In addition to TGG and its variants do not comply with SC principles, the foul is also a language theory of Grees, Wittgenstein, Austin, etc.
In a fifth chapter entitled "Using Language Symbol" in a suitable context, the author wants to solve the problem that the recipor correctly discovered the context required to understand. Its theme is the role and mechanism of pragmatic understanding. In this chapter, the author also introduces the second important concept of SLIM theory - time-linear structure of natural language symbol: "The basic structure of natural language symbols is their time-linear order. It exists in the sentence of text, sentence In the word shape and the morpheme variants of the word shape. Time-linear means: linearity is like time, and with time to go "(p.97). This definition is deemed to have a modern version of the Second Rule of Sauce. Since time linear is a fundamental attribute of language symbols, language theory should be considered. Unfortunately, most of the language theory is based on two-dimensional tree analysis. Only the LA grammar used in this book is to consider the basic elements of time linear as a natural language generation and interpretation. In the subsequent chapter, the author studies the "symbolic structure and function" from the perspective of symbolics and their role in language communication. In the first part, the authors explored the basic mechanism of natural exchanges from the perspective of cognitive psychology, linguistics, philosophy, symbolic. Although the relationship between this part and the calculation linguistics is not large from the surface. However, the mechanism of human language exchange is very complicated, which involves many disciplines. In order to create a model of natural exchanges more accurately, it simulates human natural language communication capabilities through machines. Such exploration and research are very useful, and it can be said to be indispensable. This should be seen as a major characteristic of this book, and we recommend one of the main reasons for this book.
As a textbook of a computational language. Of course, we have reason to see which already mature things. The author puts this into the second part "syntax theory". In the first three chapters of this section, the current popular form of "Fango", "Phrase Structural Structural Syntical Syntimal", also introduces the basic concepts of analyzing technology and their calculation complexity. The purpose of these syntax theories is to find that their shortcomings, especially their discomfort in the entire SLIM system. All of this, to introduce the LA grammar created by the author to the pave. After three chapters, the LA grammar (LAG, left-related grammar) is introduced from various perspectives of computational linguistics. And compare the LAG and other grams. The greatest difference between LAG and other cultural law is that it is born for natural language analysis, and unlike other fields of theory. Although we are not good to evaluate the benefits of this originality, LAG avoids certain problems of traditional formal chemical law from the beginning, this is a great benefit to the LAG in language analysis. Limited to the space, the author is not possible to discuss LAG more detailed discussion, and interesters can refer to the author's previous books (1986, 1989). The authors believe that compliance is a "semi-shape" system that generates syntax (P.129). I dare not agree with this view, the author's point of view may come from the view of Gaifman (1965). In fact, there are many more in-depth research today to change the original statement, such as: Broeker (1999), Fraser (1993) and Heringer / Strecker / Wimmer (1980).
The third section of the first three chapters of "lexard and syntax" treatment lexical issues. The problem of "words" occupies such an important position, which is really related to the language (English, German) structure treated in the author. For China's computational linguistic researchers who take Chinese as the main handling object, although there is no more complicated lexical issue, how to cut the words from a continuous text, it is enough to make our headache Not. Of course, the author did not involve the special issues of Chinese word words in the book. How to use the LAG theory to solve the "blocking road" problem in Chinese computing linguistics, which may be a meaningful job in front of our Chinese scholars. The analysis and identification of "New Word" is a problem that must be faced by any computational linguistic application practicalization. LAG-based lexic analysis mechanism seems to have solved this problem in a certain extent, but the feeling of people can only apply to synthetic words that have reasonable ideals. Chapter 15 of "Corpion Analysis", which makes people think that the author will introduce "probability statistics" in its own theory. But in fact, this is not the same for the author, it seems to be just a text collection that is used to verify the grammar system and test. From this short 11 pages, we can see only some description of the textual frequency statistics and text annotation of text. Why did the author ignores the experience methods that have been increasingly popular in computational linguistics in recent years, that is, the log library as a knowledge source and statistical method in the language processing, which may be related to the entire system of SLIM. Although the SLIM method provides? 1993b). Want to understand the readers of statistical natural language processing, you can refer to Manning & Schuetze (1999). In the three chapters of the syntax, there are two chapters describing the LA syntax of English and German, respectively. English is a relatively fixed language in terms of words, and German is more free. LA's successful description of these two language syntax shows that its performance is strong, and the adaptation range is wide. In Chapter 16, in Chapter 16, the Basic Concept of Syntactics, the author believes that the three principles of syntactics are "price, echoation and word order". According to my observation, the role of price increase in LAG is very important, it is almost "adhesive" in the LAG. Although the author refers to the role of Tesniere in the book in the book, it is not the position of the "price" concept of Tesniere in the book, but in fact, the "price" is not in the location of Tynee. It is highlighted as imagined. The concept of "price" can have such a prominent position today, which is a lot of work that is relying on German national language scholars. Regrettably, as a language of German countries, the authors do not include German documents related to "price" theory in its reference. Of course, there is no listing of references and does not mean that there is no impact of these "price" scholars. The author uses "Pable" (Place) when describing the "Valency" of the verb, such as a verb, two verbs ..., not Thatile said "one price verb", " Second price verb "and so on. Use "bits" to express the history of the concept of "price", than "price" itself. German linguists Buehler used "Leerstellen" in 1934 to express the concept of "Valence" (Buehler, 1934: 173). In the reference for this book, we saw this book of Buehler, although in this book, he only referenced his discussion on "Language Toolibility" (P. 90).
The last part of this book is "Semantic and Language", and the first three chapters describe the traditional semantic interpretation methods, their basic concepts, objectives, methods, and existence problems. The following three chapters have studied the semantics and palancread methods of LA in the theoretical system of SLIM. Creatively introduced the concepts and methods of "database semantics", which makes various semantics, contextual information in the form of a database (Word Bank). In the last two chapters, the author describes the working mode of Slim as a speaker and listener during the natural language communication. If the "The Late Library" is the core mechanism of SLIM for language generation and interpretation, in other words, it is based on the "knowledge base" of SLIM theory, then how does the machine automatically updates its knowledge base? For people, the automatic update of the knowledge base is very necessary, then "intelligent" constructed by SLIM theory? This is a suspense. According to conventions, the book reviewers generally have to list some print errors due to the negligence of the author or publisher. Although I don't have to do my best, I only found that less than 10 print errors. For a 500-page books, it is quite not easy to do this. In the part of the reference, it was found that there were two negligence: one of them was "Entwurf Einer Strukturalen Syntax", which mentioned Harris (1951) in page 168, but in the reference part It is not listed.
In short, this is a novel and unique computing language textbook. The book is reasonable, and the organization is strict. Original SLIM theory is always connected, and the content is wide. From the contextual issuance of language, the language of language is touched. The author has a lot of research on many local problems of language structure, but all of these detailed research is to achieve the ultimate goal of computational linguistics - build a human beings and it can communicate freely with natural language. . Of course, Slim theory itself needs further improvement. But this long-term "solid" approach is very beneficial for calculating the development of linguistics. Computational linguistics should solve human-computer exchange issues in the information age, it should not be just a "smart" play in some scholars or laboratory. From this point of view, the calculating linguistic theory described in the author book is a true meaningful computational language. To be fine, this thickness of more than 500 pages of "foundation" is clearly unable to use this short number of paper. The above is just some of my feelings when I read this book, I hope to get your correct. Finally, I have to say that there is currently no second book to provide us with the "annual" solution of human-computer natural language communication process.
From the following address http://www.Linguistik.uni-erlangen.de/~rrh/books/compling_foundations_intro.pdf, you can download the details of this book and the style. The author also established this book's errata in its homepage http://www.linguistik.uni-erlangen.de/~rrh/books/errata. In order to make it easy for teachers, you can download the teaching slide files of this book and other people's comments on this book.
The above book reviews are mainly based on the English version, and the content of the German version is basically the same as the English version. But in some places, the necessary changes are mainly corrected some spellings in the English version. In addition, the German version has increased the conclusion, and the English version of the conclusion can be read in the author's homepage. If you understand German, please read the German version directly. The second edition revisively, the printed mistakes and other shortcomings in the first edition. For Chapters 22-24, it made a big revision, and it can be said to be renovated, the foundation of these chapters is published in the "artificial intelligence" "One issue of" Database Database Delivery ", interested readers can click here to download this important literature.
references
Broeker, N. (1999). Eine Dependenzgrammatik Zur Kopplung Heterogener Wissensquellen. Tuebingen: max Niemeyer Verlag.
Buehler, K. (1934 [1965]). SPRachtheorie: Die Darstellungsfunktion der sprache. Stuttgart: Fischer. Feng Zhiwei, (1996). Computer processing of natural language, Shanghai Foreign Language Education Press.
Fraser, N. (1993). Dependency Parsing. PHD, UNIVERSITY COLEGE LONDON.
GAIFMAN, H. (1965). Dependency Systems and Phrase-structure systems. Information and Control, L 8, 304-337.
Harris, Z. (1951). Structural Linguistics. Chicago: The University of Chicago Press.
Hausser, Roland (1984): Surface Compositional Grammar. Muenchen: Wilhelm Fink.
Hausser, Roland (1986): Newcat: Natural Language Parsing Using Left-Associative Grammar. (Lecture Notes In Computer Science 231) Heidelberg: Springer.
Hausser, Roland (1989): Computation of Language. AN Essay On Syntax, Semantics and Pragmatics in Natural Man-Machine Communication. Berlin, New York: Springer.
Hellwig, Peter (1988) Weichenstellungen fuer die maschinelle Sprachverarbeitung (Plenarvortrag) In B. Spillner, Angewandte Linguistik und Computer Kongressbeitraege zur 18. Jahrestagung der Gesellschaft f Angewandte Linguistik, GAL e.V. (5 - 35) Tuebingen:.... Gunter Narr.
Heringer, H. J., Strecker, B., & Wimmer, R. (1980). SYNTAX: FRAGEN-LOESUNGEN-ALTERNATIVEN. MUENCHEN: WILHELM FINK VERLAG.
Hudson, R. (1980). Constituency and Dependency. Linguistics, 18, 179-198.
Hutchins, W.J. & Somers, H. (1992). An Introduction to Machine Translation. London: Academic Press.
Liu Haitao (1999). APLIKATA InterlingVistiko (Applied Interlinguistics). GRKG / HUMANKYBERNETIK, 40 (1), 31-41.
Liu Haitao, (1993). "Media Solution in Natural Language Process", "Information Science", 14 (2).
Liu Haitao, (1993b). The impact of Vitgenstein language philosophy on calculating semantics, carrying the research and application of computational linguistics, Beijing Language College Press, 1993.
Liu Haitao, (1995) "Modular Concept in Linguistic Application", "Language Text Application", 1995.4.
Liu Yongquan, (1997). "Machine translation is in the end is a language problem", "Language Text Application", 1997 (3) .manning, Christopher D. & Hinrich Schuetze (1999). Foundations of Statistic Natural Natural Language Processing. Ma , Cambridge: Mit press.
Schmitz, Ulrich (1992). Computerlinguistik: Eine Einfuehrung. Opladen: WestDeutscher Verlag.
Starosta, Stanley (1991) Natural language parsing and linguistic theories:.? Can the marriage be saved Review article, U. Reyle and Christian Rohrer (eds), Natural language parsing and linguistic theories, Studies in linguistics and philosophy volume 35, Dordrecht: D. REIDEL; Studies in Language 15: 1.175-197.
Tesniere, L. (1959). Elements de Syntaxe structurale. Paris: Editions Klincksieck.
Contact? Design.2003 yxswf@sina.com