Zhan Weidong
Summary: This paper summarizes the research work in the field of Chinese information processing since the 1980s. The aim is to explore how to process the grammar study of modern Chinese. Through the introduction of the three major studies in the 1980s, some of the research and development of the macro level of foreign theoretical methods and the macro level combined with Chinese treatment, the development of related application systems, the construction and grammar rules of Chinese knowledge base Excavation, etc., a preliminary understanding of our preliminary understanding is that the research-oriented modern Chinese grammar should be greatly enhanced in the premise of clear positioning, and vigorously strengthen the study of Chinese phrase structure rules for computers.
Keywords: information processing grammar research language knowledge category rules
One
Language research-oriented language studies with the marginal nature of the discipline, broadening the research field, while expanding research vision, the newly opened research space is also inevitable with a certain degree of background blur and level unclear. This paper hopes that through the Chinese mainland computational language circulation, the Chinese research community has made a combining work from different perspectives, respectively, from different perspectives, so that it can gather in "Chinese information processing". The research team and research results of the road research team have a clear global understanding, and on this basis, discussing the development direction of modern Chinese grammar research on information processing.
Similar to generally most summary articles, the purpose of this article is also to provide a further research in the future, and provide a certain reference in content selection and methods. Especially for researchers on language professional background, research orientation and strategies in our research areas can be revealed. For this purpose, I chose this time during the 1980s as an external coordinates we discussed, just let us discuss relatively more. Limited to information and space, this article basically does not have related research work involving overseas scholars. The subtitle is not said to be "Modern Chinese Research", but talk about "Modern Chinese Grammar Research", mainly because this article does not talk about research in voice processing. In addition, the grammar research referred to herein also includes semantic content from wide understanding.
In the second section, we briefly explain the macro mode and pattern of the current language information processing research. In the third quarter, we will roughly divide the research work in the field of Chinese information processing in the 1980s into three parts, and review these three part contents. The fourth quarter clarified our thinking about some of the actions and theoretical issues, and further clarifying the Chinese grammatical researchers to conduct related research as the background of information processing, what kind of issues should be paid to what kinds of principles and standards, and Description We have the view of the development direction of modern Chinese grammar research on the information.
two
The information processing of the natural language is a multi-disciplinary cross-research field that is almost together with the birth of the computer. Researchers from different disciplines such as computer science, language, mathematics constitute the main research power in this field. With the increasing popularity of computer applications, its functions are mainly based on numerical calculations to non-numerical information processing. Regardless of the value or non-numerical information, the general mode of computer processing information can be attributed to the following three parts.
(1) Treatment object (input): Finite length sequence of limited number symbol (M = a1 a2 ... an)
(2) Process (operation): Use the prior program to make a poor transformation
(3) Processing result (output): Generate new symbol expressions (M ')
When the natural language is processed as an input in a computer, (2) in the above mode can have different selections on the implementation policy. For example, the early man-machine dialogue system adopts a simple model matching method; later developed to rules based on rule-based processing methods; and the increasing popular corpus statistical methods in recent years. In general, the rules and statistical methods are complicated, and they are complementary and competitive, forming the basic patterns of the theory and technical strategy orientation of the current natural language processing.
In our opinion, no matter which method, you can abstain to two parts, one is about the knowledge of natural language, one is the mechanism of the presentation knowledge. We assume that knowledge about natural language is objective, so knowledge itself should be common, and there is no difference in rules and statistical methods. In this way, the difference in comparative rules and statistical methods can clearly becomes different to the mechanisms of expressions. In general, the rule method is most common to express a combined rule between size ingredients between natural languages with a certain form of grant system; statistical methods to display the combination of language components in various statistics. Many papers talk about the first book of the first subject, usually, will usually be attributed to, in the actual operation of the former's knowledge from the instructions, the latter is from the computer from the real corpus It is obtained; in the effect, the former's knowledge is large, the latter is small; and the robustness of the former when the object is facing the object, the latter robustness is strong. We believe that such a comparison seems to be very intuitive, but it is some similar and non-rough opinions, and there is no deep into the substance of the two methods. In fact, when considering the difference between the rule method and the statistical method, the problem that really answers should be, the two methods are in organizing language knowledge, where is the balance, how to control the language knowledge, the overall efficiency of the system and How is the cost, and so on. Which method is more useful when dealing with natural language, it should not be a general conclusion, but it should be discussed separately from different levels and level natural language processing issues, such as statistical methods for automatic fencing and the word Markets and speech recognitions, how to achieve better results, what will be used in syntax structure and semantic analysis? This article does not expand the specific advantages and disadvantages of the rules and statistical methods in the above pattern. We tend to look at the current pattern, ie, no matter which method, eventually relying on reliable language knowledge drive computer correctly handles the natural language. In terms of the level of natural language knowledge, there are still many research work to do, and there is no shortness of the shortness of the day. In addition, the two methods are just a perspective, consciously examine the commonality of the two and supplement each other, and may have more prominent research work. In fact, many researchers have discovered rules with statistical methods, and then use the resulting rules to analyze; or use statistical methods to add probability rights in the rules of traditional context without customary law to get the probability context. The rules have shown a trend that combines both.
The statistical method involves more mathematical formulas, considering that this paper is mainly intended to carry out research work to the research work in this research field to enter Chinese information, so the discussion is basically concentrated in Chinese information processing. Rule method related research work.
For the rule method, the work to be done mainly includes:
1 people extract can be formally understood from the natural language in their own rational thinking.
2 Specify these language knowledge in a certain formulation method.
3 After the knowledge algorithm is applied, it is compiled into a program input computer.
The above work generally said that it should be done by linguist workers and computer scientists. For the most typical situation, linguist workers should be more tasks that are formatted in a formified language knowledge from the complex language phenomena; computer scientific workers will express language knowledge in a certain form of model. And how to make the language knowledge algorithm approach to program implementation.
In a framework based on a rule method, language knowledge can be divided into two parts: category and rules. The so-called language knowledge from the natural language is to establish a limited scope of the natural language, and the limited relationship between these limited categories is expressed in limited rules. In the 1980s, foreign language geography has been repeated, and the theory is outlined, nothing more than the extraction of language knowledge, which is the result of different options for the establishment of which categories and how expression is organized. The same is true for the Chinese computing language, Chinese research community in grammar research on information processing. Below we are based on this understanding, we will conduct specific reviews. three
The main research in China in Chinese information processing is roughly three major blocks since the 1980s:
(1) Introduce foreign theory methods and combined with Chinese characteristics to explore the Chinese computer processing theory. (2) Development and development of experiments and application systems related to Chinese information processing.
(3) The construction of the Chinese knowledge base and the excavation of Chinese grammar rules.
If we use the extraction level of Chinese language as a ruler, we will measure the development of modern Chinese grammar research in this period of not too short, a generally clear preliminary conclusion that can be obtained. More results in establishing the scope of modern Chinese language language semantics, although there is a certain exploration in terms of rules, but it is insufficient compared with the progress of the category. In the case of a comprehension, it is the excavation of Chinese knowledge law, and the overall level is still difficult to meet the needs of computer processing.
Although this point of view will be reprimanded as a conservative or at least attitude is not very optimistic, but the following review of the three major research work in the 1980s, it may still be supported by the above understandings. In any case, it is clearly designed to design the future in order to see the past and current situation.
It should be noted that the content of the three parts of the study here is just the main aspects. It is also a large extent to which it is to be described in three blocks, and the actual research is only these faces. Since this paper is mainly based on the basic situation of Chinese information processing research in the 1980s, the issue of modern Chinese grammar research is conducted as the application background, so we have two research contents for the above (1), (2). Comment is relatively simple. The purpose is focused on drawing the background of the background. The research work of (3) this piece of research, including a considerable scale of knowledge base, and related research on Chinese grammar rules, which is not large but active exploration, and the previous two parts are detailed.
Let's talk about (1) research work.
This piece of work is dominated by introducing the theoretical methods in the field of foreign computational linguistics. At the same time, scholars have introduced many of the theory and methods of many steps in China, and there are also many people combined with the characteristics of Chinese themselves, and these theories and methods have been deeply explored.
From the perspective of extracting language knowledge, many grammatical theories can be roughly divided into two categories: a class of focus from language facts discovering categories, establishing rules (equivalent to the work of our step 1 mentioned in the second section). Like the US description linguistic theory (structuralist grammar), French Teenel's dependent syntax (price standard grammar), Philo's geniological method, Han Li De behalf system function syntax, LANGACKER advocated cognitive grammar, etc. This category; the other type focuses on how to describe the discovered language knowledge (equivalent to the step 2 of the step 2 mentioned in the second section). Since the end of the 1950s, since the conversion generating syntax, in the 1980s, a series of syntax theories related to the full-related syntax, such as expansion transfer network (ATN), grooming theory (GB), function One syntax (FUG), vocabulary syntax (LFG), stator sentence syntax (DCG), central word-driven phrase structure syntax (HPSG), general phrase structure syntax (GPSG), category (CG), link syntax (LG) Wait, it is all this. The rapid development of computational linguistics provides a good drill stage for the various theories of the above two types of grammar. These distinctive syntax theories have almost all wide experiments in various applications, such as machine translations, human-machine dialogue, and other fields. Of course, these natural language backgrounds are almost English. In the end of the 1950s, the machine translated the stars in the end of the 1980s, it was once again ignited, and became the main scenery in the field of natural language processing research. Representative research work basically includes the concentration of three volumes and computers in succession. Introduction to foreign countries, theoretical content is relatively small, mainly emphasized in system aspects of various machines (so-called first generation, second generation and third generation human machine dialogue systems, etc.). Fan Ji flooded, Xu Zhi Min, Li Jiazhi, Chen Yongming, Feng Zhiwei and others, the report of the experimental system, which is the representative of this aspect. During the short-lived transition period of the new situation, it is obvious that the framework is in a hurry. If you pay a little attention to the hardware environment at that time, it is not difficult to understand how much the researchers have invested more enthusiasm while showing an urgent mission.
Hanhua, a comprehensive system of syntax theory, is beginning to start in the middle of the 1980s. With the establishment of "Chinese Information Journal" at the end of 86, the researchers in this area have a solid discourse. Introducing the article of various grammar theory of foreign grammar to become an important reference for studying Chinese information processing in China. Two magazines "Foreign Language" and "Language Text Applications" have also played the role of this passion of communicator. In addition, from the two-year national computational linguistic joint academic conference starting from 1991, it provides researchers with valuable exchanges and learning opportunities. From the analysis of the disciplines of each meeting, it can be seen from the level of computational linguistic theory and practice in a stage.
In the early 1990s, there were three introduction of the introduction of the introduction of the introduction of computational linguistics. Qian Feng, Lu Zi, Liu Kai Wei and other scholars summarize the basic theoretical methods of this area of natural language to summarize the formation system, basically reflecting the basic research in the abroad. These three incurred works have formed a holistic impression of calculating linguistics to help. From the mid-1990s, there is another research on such systematic research in China, and the monographs of calculating linguistics, including Feng Zhiwei, Yao Tianshun and other scholars are representatives. Compared with the early batch of books, the work of this time complement the content of the relevant research status of foreign updates, on the other hand, the system research and theoretical exploration of domestic scholars have also involved.
Relative to the introduction of the specific grammar theory, the understanding of the philosophical color with the philosophical color is very cold in China. In the 1980s, Ning Chunyan published in "Chinese Research", "Several Fundamental Issues in Natural Language Issues", and Hubert L.Dreyfus Monthly "computer can't do what - the limit of artificial intelligence" is one of the few works of this effort. Since then, the language circles have Yuan Yulin, in the text of "Language Hypothesis of Natural Language," in 1993, has analyzed the underlying problem of some natural language understanding. In addition, it seems that there is a rare article in the publication of the linguistic community. It is worth mentioning that after a wide range of experiments for multiple form grammar theories, the effect does not seem to be the same as that of the beginning. The dilemma of natural language treatment has not yet been improved. Most of these theories belong to the second category said above, and is a formal description of language knowledge. And the excavation of natural language knowledge itself is limited. Although the expression method is advanced, the expression is not necessarily improved in substantial improvement. This is also the crux of these grammatical theories to fundamentally solve the problem of natural language processing.
In summary, (1) The research work of this piece has played a decisive role in establishing a macro pattern in my country's computational linguistics. Observations for many phenomena in this field, and the grasp of research topics, you can ink this big background.
Let's talk about (2) this block.
This research work mainly involves practical applications with a strong engineering technology. Combined with the actual situation of Chinese, use a computer to perform information on Chinese, first encounter the input and output issues of Chinese characters. Studies closely related to Chinese characters are Chinese character encodings. During the country, the problem of this aspect is basically resolved after the Chinese character coding Warring States in the Warring States. At present, Chinese character inputs do not constitute an obstacle to Chinese information processing. Not only that, from the keyboard to the OCR to handwriting identification of voice input, the input method of Chinese characters is a variety of ways, which can meet a variety of needs. It is closely related to the output of Chinese characters. Information compression technology for the Chinese character library. The Wang Election of the Professor of Peking University, who is known as "Contemporary Bi Si", has successfully developed a successful Chinese character compression technology, which has solved this problem. From the time of the epooping, the printing of the Chinese character literature bid farewell to lead and fire, enter the electronic age.
Researchers who have across the word processing difficulties continue to move forward, and encounter Chinese unique automatic dictation difficulties. Due to the habit of Chinese written language, the word and the words are not separated from the natural space like the pinyin text, so that the Chinese language faces the whole sentence is actually a string of single block Chinese characters. To process the sentence like a person, you must cut this string to a string of people. This is almost all other applications related to natural language processing, such as machine translation, human-machine dialogue, etc. However, it is not as lucky when resolving the word processing, on the word problem, although many of our computer automatic finishes applications have declared more than 90% of the correct rate, but due to the inheritance, there is no final solution to the language of Chinese words. The nature problem of the unit is also a more important aspect is the characteristics of Chinese words itself. Although the country has issued a normative specification ("" "information processing is standardized by modern Chinese word norms [2]", China National Standard GB13715, but There are still considerable particle disambiguation problems in practice, and the researchers are plagued by undefined words.
In any case, Chinese incomplete software is still basically practical. The word results are basically able to meet the requirements as subsequent processing. In contrast, more step more, and also the syntactic analysis of the core part of the natural language, the situation is more unsuitable. For Chinese characteristics, a large part of the syntactic analysis can actually be regarded as the structure analysis of phrases (phrase). When the Chinese information is processed (the phrase structure analysis) phase, it is difficult to make more and more difficulties in the word processing and the word processing phase. Taking the most striking machine translation field in language information, Wu Wei, etc. Moved efforts. Although the computer analyzes the development of Chinese syntax structure in a certain extent, there is not much progress in the language of Chinese language ingredients in Chinese language. However, it is worth pointing out that in addition to machine translation, there is also a need for Chinese language knowledge, such as Chinese generation, chapter understanding, information retrieval, automatic abstract, automatic school peers, etc., has put forward an urgent need for Chinese language knowledge. These needs are the main driving force for Chinese research from the beginning. At this point, the computer scientific worker follows the language researchers to form the affixing alliance, and give full play to their respective knowledge advantages to doubt. If you excessively emphasize practical or subject to the development of human financial resources and time, computer scientific workers are difficult to achieve the results of true digestive absorption of the theory and specific language knowledge. Linguistic researchers will also enter the misunderstanding of the language problem because of the universal or decline in the success or failure, it is difficult to truly promote the development of Chinese grammar research.
Below we want to review the study of subsection (3), the domestic researchers do the attempts and efforts to directly explore Chinese language knowledge. Among them, it includes examples of research in centralized computer science and linguistics to jointly carry out an example of research.
Let's first look at (3) Two scales in this study to build the work of Chinese knowledge bases.
One is the study of Chinese language voluntary relationship based on Geek, Zhangpu, Lin Xu Lang and other advocacy. The other is the research dictionary of the modern Chinese grammar information dictionary based on Zhu Dexi, Lu Shi Ming, Yu Shi Wen. From the perspective of extracting Chinese language knowledge, these two work have made considerable exploration work in establishing the basic category system of Chinese semantics knowledge and grammatical knowledge.
"The Verbal Dictionary" was first proposed by a Chinese-speaking system that composed of 22 grids under the slogan of a specific verb in Chinese. These 22 are divided into two major categories, 7 small classes to organize the relationship system of Chinese verbs. As shown below:
Role main body: Shi | Wave
Object: Appraisal | Guest |
Neighbor: Doing Affairs | Colleagues | Baseline
Series: Junction | Partial Division | Quantity
Scenario: Tools | Materials |
Environment: Range | Time | Position | Direction
Root: Base | Cause | Purpose
For the verb itself, "The Verve Dictionary" ", according to the semantic relationship between the action or status of the verb," said the verb is divided into six times. They are:
His verb: The main body is the action of the action, and the action involves the object. Such as: eat, pay attention, research, etc.
Automatic words: The main body is the sender of the action, and the action does not involve the object. Such as: walking, running, graduation, etc.
Outerime words: The main body is not a moving person, and the action involves the object. Such as: encounter, know, understand, etc.
Internal movement: The main body is not an action, and the action does not involve the object. Such as: disease, death, etc.
Subordinate verbs: Represents the verbs of collisals. Such as: Yes, owned, have a rod: a verb that represents a relationship. Such as: Yes, equal to, belonging to
Based on the above overall framework, the "Verve Dictionary" has a description of the relationship between 1000 modern Chinese common verbs, and gives the corresponding example sentences for each of the various frameworks of each verb. such as:
Love
[Basic] [Shi {Military, Cat, Library} Love
Benefits {Children, Soldiers, Eyes, Body}] Readers want to love books
[Extension] [Department]
Love the reputation of our company. ......
It is not difficult to see from this simple sample, "the verb big dictionary" is actually a summary of the semantic match of Chinese verbs with nounny. In the case of how many concurrent conclusions should be established in Chinese semantics research in Chinese language semantics, the editor of this dictionary out of language engineering, the spectacle is stronger from the specific description of each verb, it should be said very Some powerful. However, the semantic description is established from the theoretical framework to the grasp of each word in the specific implementation, no step is easy to walk. As the editor of the "Verve Dictionary" is pointed out in the preamble, the study of deepening the relationship between modern Chinese language is not ended. Whether the establishment of each grid in the verb frame system is appropriate (including the name and number), how to pay for each verb when portraying the framework, from just the noun that can follow the relationship of the verbs, In the semantic combination framework of the verbs, the semantic classification system of the noun is used to organize the semantic classification system of the publication, and the establishment of the Chinese language semantic model that allows the computer to play effect. There is also a long way to go. In any case, there is a certain scale of exploratory finished products as a reference for the successor. It is also an experience. It will be inspired.
"Modern Chinese Grammatical Information Dictionary" is the theoretical basis for the phrase proposed by Mr. Zhu Dexi as a theoretical basis for setting various grammar. The first is to select some specific functional standards to determine the Chinese language classification system, and the syntactic function of the word is set to a word class in accordance with the syntactic function of the word; then the functional concept guide the word language property project, and according to one The actual usage of words marks its attribute values. Attribute items in the dictionary are quite a few. For example, the verb as a research focus set up more than 100 properties in the total library and branches of the dictionary, to mark a verb to overlap, can be directly modified, can be "have" object, The object is still with the phrase. Depending on the value of each word on these attribute projects, we can roughly determine the distribution of the words in the actual Chinese discourse. We simply give the following three verbs as an example.
The word syndrome helps the outside inspirational double guests have overlapping VVO clutch single-pace prepaid single-oriented version ...
Custody 1 save body
Custody 2 guarantee
Help the help body doubled vv q
"Custody" has two righteousness, "saving (item)" understands, it can bring "with, yet,", etc., can also be either alone (such as "I keeping"); strain word objective, etc. . And when it is "guaranteed" understanding, it does not have these features. From the perspective of the distribution, two different righteous items of the term "keeping" have a general complementary distribution relationship in syntactics. "Help" except for the warning, the equivalent word class (marked as Q). This describes the functionality of the functionality of a word that is easy to think of the so-called complex feature set (COMPLEX Feature Set). In fact, the "Modern Chinese Sympathetic Information Dictionary" can be said to be a formal description method of complex feature sets, a large-scale practice in the formation of Chinese words. Only in the discrete complex feature set underwear, the grammatical knowledge about more than 50,000 Chinese words is still the organic overall that is unified by the phrase based on the style of syntax. In the above example, the name of the attribute project such as "Quality" is directly taken from Mr. Zhu Dexi's nature of Chinese verb object. Although compared to the complexity of semantics, the grammatical knowledge is relatively easy to grasp some. But not there is no problem. "Modern Chinese Grammatical Information Dictionary" is the attribute set by various words contains redundant information, and the consistency of the specific word between the relevant attributes is determined by the syntax feature information of the words as a static isolated mark. It participates in the combination how to change, that is, the records made by the syntax information dictionary to the usage of the word, and how much gaps are needed compared to its richness in the actual language, they have to further study problems. This directly affects the syntax attributes of the words that can play a major role in the computer sentences to the Chinese sentence, and how to play more effectively. These are urgent topics that need to be taken.
The two research works herein not directly belong to a specific application system in the field of information processing, but for universal considerations, it can be seen as a construction base platform that is launched under this big goal for information on Chinese. jobs. The rule method is also good, the statistical method is also, in the specific operation, in fact, it is inseparable from the support of such a language base knowledge base. However, if it is still returning to our consistently category of the language knowledge structure of the language knowledge structure. It is not difficult to find that the establishment of the above Chinese knowledge base is mainly also the setting of the French language, and does not involve research between the relationship between the category, that is, the summary of the Chinese language syntax rules directly serving information processing. Some of the macro-explorations that will be referred to in Chinese information processing, and some research on Chinese grammar research, as well as research on specific issues of Chinese, should be considered as a study of the study of Chinese syntactic rules. Among them, the macro research representative, including the "Calculative Linguistic Study", and the Challenge of Computational Linguistics on theoretical Linguistics of theoretical Linguistics "and the Potential Ambiguity of theoretical Linguistics, Bai Shuo 's" Linguistics " Computer-assisted discovery of knowledge ", Luo Zhen, etc. Research and distribution statistics on Chinese sentence patterns, as well as some studies proposed by some scholars." These research work puts Chinese research in computer information. This vast application background is inspired. Bai Shuo's research has proposed a system of using a computer to assist in discovering Chinese grammar knowledge, and give proven from mathematics, while using verb's small class into example, a small-scale practice test, very reference value . When it comes to it, it is necessary to pointed out that Bai Shuo finally explained the Chinese knowledge of Chinese knowledge extracted by computer, in his papers. This is a very desirable approach. At the time, many research work, especially some of the statistical methods to deal with the text, and hurt the necessary statements in this area, and the Russen, a lot of statistical formulas, together with one or two simple examples, let the readers, especially the texture The reader's two monk did not touch the mind. This does not actually have a lot of driving effect on the progress of research work. Of course, this is just our words. The following is back to the topic. In addition to the research work mentioned above, research-oriented research on information processing to develop Chinese specific issues can take these examples, including Ma Zhen, Lu Yi, Chinese "noun" "verb" string combination ambiguous research, Sun Honglin is marked The experimental analysis of the grammar rules of the Chinese "V N" sequence in the corpus in the corpus, and Zhan Weidong explores the automatic discipline of Chinese "P We already have a comparative system's Chinese term language French language system (of course, these categories still have necessary to further adjust improvements), the focus of the next work should be in the scope of phrase research and rule research. four The above 80s has made a brief comment on the basic conditions of research in Chinese information processing in the 1980s. The preliminary understanding of how to develop Chinese grammar on this article is that the preliminary understanding of our formation is that on the basis of the study, the system of Chinese phrase structure rules used should be fully attached. This can not only verify the existing category system, but also open the situation for the development of a complete set of Chinese syntax for information processing. On this section, on the basis of the review, the two relatively macro problems of Chinese grammar research on information processing, hoping to provide a reference for linguistic researches in this area. One is how to look at the various grammar theories since the 1980s and the rise of the corpus method to the Chinese grammar research. For the understanding of various grammar theories, we still play a point where the beginning of the third section has been clarified, that is, from the perspective of extracting language knowledge, it has developed, such as GPSG, LFG, HPSG, DCG, etc., as well as Joe since the perspective of extracting language knowledge. Msky is constantly refurbished the theoretical system (including X-bar, GB, θ-THEORY, etc.), which can be considered as a method of formulating language knowledge. In the background of these theories, Chinese grammar research is to examine the syntax research of Chinese. For example, in a comprehensive description of the syntax function attributes of Chinese words, the syntax function items of organizing Chinese words, and how to determine the functional value of a particular word, provide a better description. At the same time, we have made our understanding of the usage nature of the words, similar to the voice of speech from the voices, consonial to the sound, and then to the distinctive feature. In other words, these theories itself does not necessarily have a richer knowledge of Chinese grammar research, but the novel and description methods of their perspective tend to be tight formalized expressions, can promote our understanding Standardization of Chinese Grammar Research. The above is from the commonality of these theories. If we emphasize the personality of these theories, the problem is that if these theories are used for Chinese grammar research, it is the effect, or have high? For this issue, the author does not have much practical experience, and there are very few domestic research results that truly organize a more complete Chinese grammar, so it is still difficult to answer. However, if we allow for some trials of trials based on limited experience, our view is that these theories are used to organize Chinese grammar knowledge, which should be too much. Differences are only implemented in technical details and specific computers. It is worth mentioning that domestic scholars Huang Zengcang multi-year study experience, propose the theoretical framework of the entire natural language understanding - concept hierarchical network theory (HNC), challenges the traditional representation and processing model based on syntax knowledge, generation Chinese is based on semantic expression. In terms of the principle of the theoretical model, we believe that HNC is still trying to control unlimited natural language in limited form symbols. In the Declaration of HNC Theory, a limited form of symbols appear in the face of a so-called limited sentence, a level of network concept system, and asserts that this is based on the simulation of the human brain language perception. Completely express the semantic structure of any statement of natural language. In fact, according to our analysis of natural language knowledge, starting from syntactics or starting with semantics, not essentially different. The key to the problem is that from the syntax start, it is necessary to divide a large and small sentence method, and further express the possible relationships between various types, give the discrimination basis for similar relationships and heterogeneous relationships. In the case of introducing semantics knowledge to help judge; from semantic starting, this basic mode is also divided into a semantic class with large and small, gives the actual language ingredient into these types of operational standards, and in the language The constituent basis based on semantic classes is given correctly or decomposed. In this way, although the HNC theory is a new road, exploring the spirit is worthy of our admiration, the direction of exploration also reflects the current trend of the current natural language understanding (that is, the urgent need for natural language semantics), but actually implemented, The work that is still difficult is difficult (unable to avoid a problem with meaning classification). Optimistic to look forward to the future future of this new idea can be inspired, pragmatically think about the inner problems in theory and the obstacles that may encounter to practice should be the progress of research work. We expect to explore this road, there will be more practical regular knowledge about natural language being excavated, and can effectively serve the applications of natural language understanding. Another Rise of the Word Library Method. This is mainly manifested as the active construction of large-scale spending in recent years, and the continuous improvement and improvement of automatic processing tools such as corpus automatic retrieval, query, labeling. Experience-based ideas to extract Chinese language knowledge fundamentally unfairly supported language knowledge. For example, a tree library to build Chinese, what kind of grammatical system is selected, and the depth of artificial marks is to be determined by people in advance. For example, the so-called analytical rules are found in the language library, and they also have to design a certain knowledge template in advance, and have a purposeful discovery. Of course, we must also see that with the rapid development of computer hardware software technology, the ability to deal with corpus has been discorded. If there is a good statistical model and the quality reliable labeling support, it is a very worth exploring road using the computer to discover the structural rules of Chinese. In summary, high-performance computers are undoubtedly a valid tool for language research, but there is a simple corpus and computer programs, and statistical language knowledge is inevitable (such as word frequency or simple interaction, etc.). No matter how good tools, it is also possible to drive. The second is how to view some dispute issues that have been accumulated in the Chinese research industry in the context of information processing, as well as the significance of grammar research in Chinese information processing in recent years. It is well known that there is a lot of controversial issues in the Chinese research process, such as "word problems", "there is no problem in the word in Chinese", "Chinese sentence is the main mode or topic statement", etc. Chinese Information Processing This application background provides us with a very advantageous perspective of these issues. Tell the "Word Class", the biggest controversial is "The word is unfinished" or "the word has a customty but the class is unstable." In our opinion, a word class attribute is just a more important functional value. Give the word a word, not the ultimate goal of natural language processing. Words is only one of the methods of analysis. We say "labor" is only a class (such as verb), or "labor" is two classes (such as verbs and nouns), are not the essence of problems. Substantial problem is what we take "labor" word class? And for this purpose, it is a good way to "labor", or let it double. According to the author of the author, organize Chinese knowledge in form rules, the consequences of adhering to the principle of "word unfained" is the principle of the description in the dictionary; adhere to the principle of "words have customized but unstable positions" The consequence is the burden of the rules, while the burden of the dictionary is light. After weighing, it is better to take some "words." Here we don't start to talk about this problem. We believe that if the researchers look at the word class from this perspective, it will be better pushes on the research work. This is, for example, "there is no word in Chinese", as well. Simply take the English and compare, how to speak Chinese "word" is unique, how different "words" is different, and it does not actually solve any problems. For computer processing, with its question "There is no word in Chinese", it is better to ask "What is the rules in Chinese in the combination." Because, no matter the "word", pure grammar research is also good, to answer a common problem for information processing services, ie "how small units are combined into large units". Similarly, the sentence structure of Chinese is "the main mode" or "topical statement model", nor a real problem. The real problem is that if you adhere to the main predicate mode, you have to answer which components in Chinese can do "master" under what conditions, which ingredients can be "predicate". I adhere to the topic statement mode, I have to say what the "topic" and "statement" are derived from which components are derived; there is no form of marking; what is the structural rules of the two, and so on. I can't answer these questions, the debate and the debate of the topic have not much significant. Contacting the above disposal issues, some grammar systems have emerged in recent years, such as "Squains Center", "Word Syntax", etc., and earlier "sentence-based grammatical system, phrase the phrase" synthetic system ", etc. . How do I treat these Chinese grammar systems in different historical age? We also use the above vision and thinking, contact information processing on the requirements of Chinese language knowledge. No matter which system, it should be measured with the degree of simplicity of the knowledge of Chinese grammar knowledge and expressions. From this point of view, the above-mentioned system has a significant weight in exploring Chinese grammar, and it has its own possibilities. For example, in accordance with the research path of this bite, there will be a deeper summary of sentence types in Chinese. According to the small sentence center, the complex sentence system for Chinese should produce research results. According to the word in this place, the phrase structure rules for Chinese is easy to explore the law. The vision of the authority is more important to the rules of the Chinese character group word. Between these theoretical systems, it is not a simple one, but a different level of Chinese grammatical knowledge, all aspects have different focuses, from different pathways to dig Chinese grammar knowledge. Speaking of the concise level of the expression, because there is a roughly assigned tendency in all levels of the words sentence, and the phrase the standard system firmly bucks this to organize Chinese grammatical knowledge, and in general, the statue of this. Expression of Chinese syntax knowledge should be refined. As for the word book theory, the knowledge representation of the system of the system of the specific language law of the Chinese language is not seen, and "said the Chinese language law", the "Chinese language law", "said the" Chinese language law ", which has been comparative system. However, in terms of the mining of specific language knowledge, there is no essential system with the phrase of the phrase of this branch, but it is only a focus on the focus. It is now unable to do in-depth comparison, give an orthogonal meaning. To be enacpended, we tend to think about the above disputes and different theoretical systems with theoretical purposes in a clear application background. Sometimes, the dispute between the surface is not as sharp as it emphasizes. It is like the problem that needs to be truly conscientiously think about. Here we may wish to play a bit. Several bottles are equipped with different liquids, and if a person who has no sense of smell and color, it will use them to use them. He can only determine how to use them according to the label outside the bottle. If you have a "soy sauce" tag, use it to increase the savory; if you have a "yellow wine" tag, use it to remove the smell of the leeks, and so on. The computer handles natural language, which is roughly the same. The computer system is like the person who has no sense of smell and color but can know. It can only decide what to do according to the prior label. For example, we give "Labor" and "Glorious" label, respectively, "verbs" and "adjective" labels, and let the computer know "verb" and add a "adjective" to constitute a legitimate structure, express a complete meaning. (It is equivalent to that person knows that "soy sauce" can increase savory), the computer can create "labor glorious" to be like this happily. But the problem is that we also posted the same "verb" label for "Trick" labeling "Labor" label. Of course, according to the above knowledge, it is also "reasonable" to "high-spirited". Obviously, now there are two places that need to be checked after watching this bad result. First, whether the label is appropriate; the second is if the label is appropriate, is it to tell the computer, some "verb" "adjective" can not be combined into a "subject structure" to express meaning (equivalent to telling that person is not All "soy sauce" can bring delicious savory to the "soy sauce" to deepen awareness). We will end this article with this may not be a very appropriate metaphor. Just want to emphasize that people who are engaged in Chinese grammar research on information, the real problem should always include: (1), how much label is we prepared for Chinese? (2), give any of the other components in Chinese (regardless of size including morphin, words, phrases, sentences, etc.), what does it mean? (3) What is the relationship between the mutual match between these labels? From this paper, it is a summary of the basic situation of Chinese information processing since the 1980s. The next research work focuses on the first question, that is, the combination rules between Chinese words and words, a type of phrase. What is the combination rules between a phrase? When answering this question, you should always think that the answer is for your computer. For a computer, you may wish to imagine it as stupid as much as possible, so the answer given must be clear and clear. Ins, this paper reviews the research work in the field of Chinese information processing in the 1980s, is extremely thick lines. On the one hand, it is limited by the space limit. On the one hand, because the author is very limited, it is difficult to calculate the row of vision, and some important research work will inevitably, but the research work positioning is not very good. It may, therefore, please ask the experts to correct the experts. In addition, the references are more in this paper, and then followed by the published time sequence, no longer correspondence with the article. References (sorted according to the date of publication, the single papers within the same year) [1] Feng Zhiwei, "New Progress in Translation of Foreign Machines", containing "Foreign Language", 1st, 1980 [2] Dong Zhendong, "Logic Semantics and Its Application in Motion", containing "Intelligence Science", 1980, 2nd [3] Zhu Dexi, "Syntax Lecture", Business Press, 1982 Edition [4] "Language and Computer" (1), China Social Science Press, 1982 Edition [5] Li Jiazhi, Chen Yongming, "Machine Understanding Chinese - Experiment I", containing "Psychology", 1st, 1982 [6] Fan Ji flooded, Xu Zhi Min, "Grammatical Analysis of RJD-80 Chinese Human Dialogue System", containing "Chinese Language", 1982 No. 3 [7] Feng Zhiwei, "Introduction to Foreign Natural Language Comprehension System", containing "Computer Science", No. 2, 1984 [8] "Language and Computer" (2), China Social Science Press, 1985 Edition [9] Ning Chunyan, "Several Fundamental Problems in Natural Language Understanding", "Language Research" 1985 No. 2 [10] "Language and Computer" (3), China Social Science Press, 1986 Edition [11] Hubert L.Dreyfus, "Computer Can't do what - the limit of artificial intelligence", Ning Chunyan translated, Maxi, Sanlian Bookstore, 1986 edition [12] Lu Zhi, "About General Phrade Structure Structural Structural Structure", contains "Foreign Language", 1986, 4th [13] Luchuan, Liang Town, Han, "Regulations on Information Processing", "Journal of Chinese Information", 1987, 4th [14] Zhang Chaoheng, "Gee Law and Natural Language Treatment", carrying the "Chinese Information", 1988 4th [15] Feng Zhiwei, "Structural Description and Potential Ambiguity of Chinese Science", "Journal of Chinese Information", No. 2, 1989 [16] Feng Zhiwei, "The Ambiguity Structure of Chinese Science Techniques and Its Judgment Method", "Chinese Information Science, No. 3, 1989 [17] Ma Xiwen, "Spectacular Research from Computational Linguistics", contains "Foreign Language", the third issue of 1989 [18] Sun Massan, Huang Changning, "The Harmony in Chinese", Tongang Word Group and Its Treatment Strategy ", carrying the" Chinese Information Journal ", 1989 No. 4 [19] Qioneng, "Computational Lingu", Xuelin Publishing House, 1990 Edition [20] Lu Zhi, "Computational Linguistics", Shanghai Education Press, 1990 Edition [21] Feng Zhiwei, "Complex Characteristics in Chinese Sentence Description", carrying "Chinese Information Journal," 3rd 1990 [22] Liu Yuxi, Guo Bingyan, "Natural Language Treatment", Science Press, 1991 Edition [23] Zhang Chaoheng, "Some Nature of Semantic Expression", "Journal of Chinese Information", No. 1, 1991 [24] Yan Chengxiang, Wang Yanbing, Zhang Jiazhi, Xu Jiafu, "Chinese Combination Type Grammar", "Chinese Information", 3rd No. 3, 1991 [25] Zhang Pu, "The theory and Method of Semantic Analysis of Modern Chinese in Information Processing", carrying the "Chinese Information", 3rd No. 3, 1991 [26] Chen Zhaoxiong, Zhang Yujie, Zhang Xiang, "SC Grand Disease Function System", contains the "National Artificial Intelligence and Intelligent Computer Academic Conference Papers", Electronic Industry Press, 1991 Edition [27] Feng Zhiwei, "Chinese Information Processing and Chinese Research", Commercial Press, 1992 Edition [28] Huang Changning, Yuan Chunfa, "Foreign Table Review", "Progress in Machine Translation", Electronic Industry Press, 1992 Edition [29] Feng Zhiwei, "Computational Linguistics Challenge to theoretical Linguistics", containing "Language Text Application", 1st 1992 [30] Huang Changning, etc., "Corpion, Knowledge Acquisition and Syntactic Analysis", containing the Chinese Information Journal, No. 3, 1992 [31] Lu Shi Ming, "Study on China Grammar Research", Business Press, 1993 Edition [32] Yuan Yulin, "Language Hypothesis of Natural Language Understanding", containing "China Social Science", 1st 1993 [33] Huang Changning, "Talk about the Treatment of Large-scale True Texts", carrying "Language Text Application", No. 2, 1993 [34] Wu Wei Tian, Luo Jianlin, "Chinese Computational Linguistics - Chinese Form Grammar and Formal Analysis", Electronic Industry Press, 1994 Edition [35] Lin Xu Lang, Wang Lingling, Sun Dejin Edited, "Modern Chinese Verb Big Dictionary", Beijing Language College Press, 1994 Edition [36] Lin Xu Lang, edited, "The Verbal Dictionary", China Material Publishing House, 1994 Edition [37] Sun Honglin, "Description Method for the Semantic Dictionary of Information Processing", containing "Modern Linguistics · Third National Language Conference Papers", Language Publishing House, 1994 Edition [38] Shen Jiayu, "R.W.langacker" Cognitive Grammar ", contains" Foreign Language ", 1st 1994 [39] Luo Zhen, Zheng Bixia, "Research on Automatic Analysis and Distribution Statistics Algorithm and Strategy of Chinese Sentence", Carrying on Chinese Information Journal, No. 2, 1994 [40] Xu Tong, "Word" and Chinese Research Methodology - Betting the "Indo-European Vision" in Chinese Studies, "World Chinese Teaching", 1994, 3 [41] Feng Zhiwei, "Nature" Language Machine Translation New Theology, Language Publishing House, 1995 Edition [42] Yao Tianshun, "Natural Language Understanding - A Study Lets Machine Experience Human Language", Tsinghua University Press & Guangxi Science and Technology Press, 1995 Edition [43] Bai Shuo, "Computer Assistance Discovery of Language Knowledge" Science Press, 1995 [44] Yu Shi Wen, "The Imagudients on the Limited Rules", "Language Modernization From" 2nd, Shandong Education Press, 1995 Edition [45] Zhou Qiang, "Introduction to Corpus and Faculative Natural Language Processing Technology", containing "Computer Science", No. 2, 1995 [46] Feng Zhiwei, "The Potential of Revision Structure", containing the Chinese Information Journal, 1995, 4th [47] Feng Zhiwei, "Computer Treatment of Natural Language", Shanghai Foreign Language Education Press, 1996 Edition [48] Huang Changning, Xia Ying, "Specializing in Language Information Processing", Tsinghua University Press, Guangxi Science and Technology Press, 1996 Edition [49] Xing Fuyi, "Chinese Law", Northeast Normal University Press, 1996 Edition [50] Liu Weiquan, Wang Minghui, Zhong Zi-letter, "Hierarchical System for Establishing Relationship between Modern Chinese", contains "Chinese Information", No. 2, 1996 [51] Yu Shi Wen, Zhu Xuefeng, Wang Hui, Zhang Wei, "Modern Chinese Grammatical Dictionary Specification", contains "Chinese Information," China Information, No. 2, 1996 [52] Ji Donghong, Huang Changning, "Semantic Combination Model of Chinese Adjectives and Nouns", contained in "Chinese and Oriental Language Information Processing Society", Vol.6 NO.1 June 1996 [53] Ma Zhen, Lu Yi, "" Noun " " verb "word string", containing "Chinese language", No. 3, 1996 [54] Zhou Qiang, Yu Shi Wen, "The Determination of Chinese Philania Markers", contains "Chinese Information", 1996, 4th [55] Lu Shi Ming, "About Semantic Points", containing "Contemporary Chinese Linguistics", total first, 1996 [56] Sun Losong, Huang Changning, Shoujie, "Preliminary Study on the Quality Analysis of Chinese", contains "Chinese Language", 1st No. 1 1997 [57] Sun Honglin, "Synchronized Syntai Rules from Narc Inscription" ) [58] Sun Jian, Zhang Wei, Wang Qixiang, "Design and Application of Chinese Language Language", "Journal of Chinese Information", 3rd 1997 [59] Zhou Qiang, Zhang Wei, Yu Shi Wen, "Construction of Chinese Tree Bank", containing "Chinese Information," 1997, 4th [60] Huang Zengyang, "HNC Theory Summary", contains "Chinese Information Journal," No. 4, 1997 [61] Zhan Weidong, "PP [62] Zhu Jingbo, Zhang Yujie, Yao Tianshun, "Syntactic Analysis Technology for Data", Carrying on "Chinese Information", 1st 1998 [63] Zhang Shuwu, Huang Taiyi, "N-value Analysis of Chinese Statistical Language Model", "Journal of Chinese Information", No. 1, 1998 [64] Liu Zhiwen, "The HNC of the natural language statement", load "language text application ", No. 2, 1998