With the increasing information of society, people are increasingly intensifying to communicate with computer with the natural language. Natural language understanding is a fascinating and challenging topic in computer science. From computer science, especially from artificial intelligence, the task of natural language understanding is to establish a computer model. This computer model is able to give the pictures, analyze and answer the natural language (ie people usually useful. The result of the language).
The intelligence of the current computer is far from achieving the level that can understand the natural language like people, and there is no such level in the foreseeable future. Therefore, the understanding of the computer's natural language is generally judged from a practical perspective. If the computer implements a human chance, or the machine translation, or the automatic information such as the automatic information, the computer is considered to have the ability to understand the ability of natural language.
The first part understands that natural language is understood that natural language processing is how to study how to understand and generate people's daily (such as Chinese, English) language, making computers to understand the meaning of natural language, and propose problems with people to computers, pass The way the dialogue is to answer with the natural language. The purpose is to establish a close and friendly relationship between a person and the machine, making it a high degree of information transfer and cognitive activities. The natural language understanding system can be used as an expert system, knowledge engineering, intelligence retrieval, and office automation natural language human interface, there is a lot of practical value.
Natural language treatment has started at the beginning of the electronic computer, and the machine translation test was carried out in the early 1950s. The research method at that time was still not called "intelligence". By the 1960s, the conversion generating syntax of Jumsky is widely recognized. The core of generating syntax is the phrase structure rules. The process of analyzing the sentence structure is the use of rules from the top or bottom-up syntax generation process.
Because of the lack of semantic knowledge, it has been promoted by the prosperity of cognitive science with the prosperity of cognitive science, and the researchers and the semantic representation of semantic network, CD theory, frame framework are proposed in the 1970s. These grammar and semantic theories have gradually begun to combine each other. By the 1980s, a new grammatical theory stood out, representative of the phrase function syntax (LFG), functional combination, and a general phrase structure syntax (GPSG).
These rules-based analytical methods can be called "rationalism" in natural language processing. Although the existing means will basically master the analytical techniques of a single sentence, it is difficult to cover a comprehensive language, especially for the understanding of the entire paragraph or chapter.
The "rationalism" is relative to "empiric" research ideas, mainly to research on large-scale spending. The corpus is a collection of a large number of texts. After the computer appears, the corpus can be conveniently stored, and it is easy to use the computer. With the emergence of electronic publications, collected corps is no longer difficult. The two computer speakers prepared earlier in the 1960s have a scale of 1 million words. There are dozens of corpus that can easily list in the 1990s, like DCI, ECI, ICAME, BNC, LDC, CLR, etc., and its scale is up to 109 orders.
The research of corpus is divided into three aspects: the development of tool software, the label of the corpus, based on the language analysis method of the corpus. It is possible to directly provide various knowledge about language, and only multi-level processing such as lexics, syntax, semantics can be made possible. The way the processing is to mark various marks in the corpus, and the contents of the label include the meaning, semantic term, phrase structure, sentence type and inter-sentence relationship of each word. As the degree of labeling degree is gradually maturing, it becomes a distributed, statistical source of knowledge. Many language analysis work can be made using this knowledge source. If the frequency regular, the frequency regular, which can be summarized from the labeled intimate, divide the sentence ingredients.
The knowledge provided by the corpus is expressed in statistical strength, rather than deterministic, with the scale expanded, aimed to cover a comprehensive language. However, for the basic deterministic rules in the language still use the size of the statistical strength to judge, this is contrary to people's common sense. This "empirical" study is not allowed by "rationalism" approach. The integration of two types of methods is also the current trend of current natural language processing. The development of natural language understanding systems can be divided into two phases of the first generation system and the second generation system. The first generation system is based on the analysis of the words and word sequence analysis. The statistical method is often used in the analysis; the second generation system begins to introduce semantics and even parallelism factors, and almost completely draft statistical technology.
The first generation of natural language understanding systems can be divided into four types:
(1) Most of the natural language understanding of special format systems is a special format system. According to the characteristics of human-computer dialogue, human-machine dialogue is used. In 1963, R.Lindsay designed an SAD-SAM system in the US Carnegie Technical College with IPL-V table, which uses special formats to perform human-machine dialogue on relative relationships. A database for relative relationships is established to receive English sentence questions about problems in relative relationships, and respond with English. In 1968, Po Bobrow designed a Student system in the US MIT Institute, this system summarizes the English sentences in high-middle algebraic applications into some basic modes, which understand the English sentences in these application questions. List the solution to the equation and give the answer. In the early 1960s, Green (B.Green) established the Baseball system in the US Lincoln Lab, also used the IPL-V table to handle language, and the system's database has stored data on the US 1959 federal baseball match record. Answer some questions about the baseball match. The system's syntactic analyzer is poor, and the input sentence is very simple. There is no connection word, and there is no adjective and adverb of the comparison. It mainly rely on a machine dictionary to identify words, and use 14 word categories, all issues. A special specification expression is used.
(2) Some researchers' dissatisfaction based on the text-based system is limited in the special format system, because in a special field, the most convenient or system that is not subject to special format is used to perform human machine Dialogue, this has the Protosynthex-i system based on text-based systems, 1966 Ximeng (JF Burger, and RE long), is the storage and retrieval of text information. Work in the way.
(3) Finite logic system limited logic system further improves the text-based system. In such a system, the sentence of natural language is replaced with some more formal marks, and these marks are self-contained in a limited logic system, and some reasoning can be performed. In 1968, Raphael established a SIR system in the US MIT Institute in Li SP language. It proposes 24 matching mode for English, and the input English sentence is matched to these patterns to identify input sentences. Structure, in the process of repository knowledge, you can handle some of the concepts commonly used in people's dialogue, such as the collection of relationships, spatial relationships, etc., and can make simple logic reasoning, machine and can Learn, remember the knowledge you have learned, engage in some initial intelligent activities. In 1965, Slegle (J. Slagle) established a DedUcom system to interpret the results in intelligence retrieval. In 1966, Sampon (F.B.Thompson) established a DEACON system, managing a fictional military database through English, using a loop structure and approximate English concepts. In 1968, C.kelog has established a Converse system on the IBM360 / 67 computer, which can reasonabate according to documents on 1000 facts in 120 cities in the United States. (4) General Deductive System General Deductive System Use some standard mathematical symbols (such as prehealth calculation symbols) to express information. Logic people can use all the achievements of the agencies to establish a valid interpretation system, so that any problem can be expressed by the agencies, and actually perform the information required. Use the natural language to answer. The general interpretation system can express complex information that is not easily expressed in a limited logic system, thereby further improving the ability of the natural language understanding system. In 1968-1969, Green and Raidel established the QA2, QA3 system, using the predicate calculation method and formatted data (FORATED DATA) to perform the initial, answer the problem, and responded in English, this is a general interpretation system Typical representative.
Since 1970, a certain number of second-generation natural language understanding have emerged. Most of these systems are the program interpretation system, a large number of semantics, contextual analysis. The more famous system is the Lunar system, the SHRDLU system, the Margie system, SAM system, and PAM system.
The Lunar system is a natural language intelligence retrieval system designed in Woods (Woods) in 1972. This system uses a form question language to represent the semantics of the question, thus explaining a semantic explanation of the sentence of the question, and finally performs the formal questions language to the database, generate an answer to the problem.
The SHRDLU system was a system that established a natural language command robot in 1972 in 1972. The system combines syntactic analysis, semantic analysis, logical reasoning, greatly enhances the functionality of the system in language analysis. The object of the system is a toy robot with simple "hand" and "eye", which can be operated on a table with a toy building block with different colors, sizes, and shapes, such as cubes, pyramids, boxes, and robots. It is possible to pose these blocks according to the operator's command, move them to a new building block structure. During the human-machine dialogue process, the operator can obtain a variety of visual feedback he sent to the robot, and observe the robot in real time. The case where the command is executed. On the TV screen, you can also display the simulated image of this robot and its same true living in the electrical passenger to use the vivid scene of English conversation. The Margie system was developed in 1975 in 1975, in the United States, the artificial intelligence laboratory in the United States. The purpose of the system is to provide a natural language understanding. The system first converts the English sentence into a concept-dependent expression, then reasoning according to the relevant information in the system, and pushes a lot of facts from the conceptual dependency expression. Since people are in understanding the sentence, there is always much more contents than the external expression of the sentence, therefore, the system has 16 types, such as reasons, effects, descriptions, functions, etc., finally, the results of reasoning Convert to English output.
The SAM system is built in the 1975 Yale University in 1975. This system uses "scripts" (script) to understand the story written by the natural language. The so-called script is used to describe a standardized event series of people's activities (such as beds, see a doctor).
The PAM system is a system that Wilinski (R.Wilensky) established in 1978 in Yale University. The PAM system can also explain the storyline, answer questions, and inference, make a summary. In addition to the event sequence in "Script", "Plan" is also proposed as the basis for understanding the story. The so-called "plan" is the means taken by the characters in the story to achieve their purpose. If you want to understand the story through the Plan, you will find out the purpose of the characters and the actions taken to complete this purpose. A "Plan Box) is provided in the system stores information about the various purposes and the information of various means. In this way, when understanding the story, you can understand what the purpose of this story is asking for a part of the information about the information stored in the program in the program. When a story plot is matched with scripts, the information about the general purpose can be provided due to the "Program", which will not cause the failure of the story. For example, rescuing a person who is snapped away, in the "rescue", the general purpose of "Rescue", including the nest of the mob, and the various methods to kill the mob, the next behavior can be expected. At the same time, it can inject the purpose according to the subject. For example, enter the story: "John Ai Mary. Mary was taken away by the mob." The PAM system is expected to take action to rescue Mary. Although there is no such content, depending on the "love theme" in the program library, "John should take action to rescue Mary".
The above system is written natural language understanding system, and the input and output is written text. Oral natural language understanding system, also involves complex technologies such as speech recognition, speech synthesis, obviously a more difficult problem, and the research of oral natural language understanding has progressed in recent years. my country's development of my country's natural language understanding has started late, 17 years late than foreign countries. In 1963, the early natural language understanding system has been built in 1963, and in 1980, two Chinese natural language understanding models were built, and they were implemented in a man-machine dialogue.
In the middle of the 1980s, under the influence of the fierce competition of international new generation, the study of natural language understanding has received more attention in China. The "natural language understanding and human-machine interface" is included in the development planning of new generations. The unit increased, and the research team was also growing.
About HNC Theory HNC Theory is the abbreviation of "Hierarchical Network of Concepts" is a theoretical system for natural language understanding. It is based on conceptualization, hierarchical, and netizational expression, so it is called a conceptual network theory. The HNC theory divided the human brain cognitive structure into partial and global two types of associative condensation, and believed that expression of Lenovo convolution was the fundamental problem of deep language deep (namely language semantic level).
The central goal of the HNC theory is to establish representation and processing modes of natural language, so that the computer can simulate the language perception of the human brain. The theory has made natural language understanding of breakthrough progress, which has important theoretical and application value of exquisite and exquisite ideas of artificial intelligence, linguistics, computer science and cognitive science, and Chinese information processing and Chinese Studies have especially practical.
The HNC theory completely got rid of the shackles of this existing language jurisprudence in my country, and started from the deep future of language, based on semantic expression, the Chinese understanding opened up a new way. The HNC theory puts forward the theoretical framework for the complete natural language understanding of the project. It is a powerful and complete semantic description system facing the entire natural language understanding, including statement processing, sentence processing, chapter processing, and short memory Long-time memory extension processing, text automatic learning processing. The starting point of the HNC theory is to use the two types of Lenovo convolution to "help" computer to understand the natural language. The vocabulary of the natural language is used to express the concept, so the local Lenovo context of the vocabulary level established by HNC is embodied as a conceptual expression system. Concept is divided into abstract concept with concrete concept. The conceptual expression system of HNC theory is focused on the expression of abstract concepts. Take a closer expression method for specific concepts. The HNC theory believes that the concept should be described in two aspects of diversity performance and connotation. It created a diversified manifestation of a five-way group to express abstract concepts, and expressed the network hierarchical symbol. Its network hierarchical symbol includes three major semantic networks: primitive concept semantics network, basic concept semantic network and logical concept semantic network. HNC's five-way group symbols and hierarchical symbols of the three-pronged network can complete complete expression of abstract concepts, providing a powerful means for computer understanding of the semantics of natural language.
Natural language understanding technology can be roughly divided into several aspects of machine translation, semantic understanding and human chance. The machine translation, also known as the machine translation (MT), is the process of transit a natural language into another natural language. The intelligent search engine in this area will enable users to search for non-nominal web pages using native language and browse search results in native language. Semantic understanding achieves the understanding of words in semantic level by combining linguistics and computer technology. Human chance technology can provide the next generation of human-computer interaction interface to realize the revolution from text interface, graphical interface to the natural language interface, and has a wide range of application prospects in the humanized design of household appliances. Its technical connotation mainly includes Voice recognition, speech synthesis two core parts.
During the entire process of semantic understanding, smart word technology is an initial part, which refines the core of the composition, and is used by the semantic analysis module. In the process of word, how can I properly provide sufficient words to analyze program processing, and filter out redundant information, which is an important prerequisite for the quality and speed of the later semantic analysis. The intelligent word of Eureka avoids the ambiguity combination generated during splitting. Thus, a good original material is provided for the processing of the semantic understanding. At the same time, in the process of word, the synonyms in the knowledge base will be matched one by one and simultaneously submit it to the semantic understanding module, so that the sentence, not only provides the original sentence, but also mounted the concept of statements. The second part is well known in the search technology, and the online information explosion increases with the rapid development of the Internet and extensive popularity. How to get valuable information on huge Internet has become a income problem of netizens. This is collected, discovered, organizes, and processes information in the Internet in the Internet, and provides the user with the search technology for the user to quickly find the required search technology. The information brings the gospel.
However, in 2001, ROPER STARCH survey pointed out that 36% of Internet users spent more than 2 hours a week in online search; 71% of users encountered trouble when using search engines; average search for 12 minutes later Search is frustrated; 46% of the search is due to link errors; most (86%) Internet users feel more efficient, accurate information search technology. Another survey made by Keen, there are four questions every day to get answers from the outside world; 31% of them use search engines to find answers; the average spend 8.75 hours a week to find answers; 53.3% time spent Get answers from others, 29% of the time spent on relatives and friends, 24.3% is the time spending there; online finding answers, more than half of them; they will spend $ 14.5 per week to get correct Information.
It is not difficult to see from these survey data, although the search service provider has spent a lot of time and effort in R & D search techniques, there is still a lot of limitations, such as information loss, and there are too many information. Information has nothing to do, which makes netizens still dissatisfied with existing search technologies, and look forward to the emergence of more perfect search technologies.
Due to the three functions of natural language understanding, that is, machine translation, semantic understanding, and human chance technology can give search technology more humanized and easy to use. Therefore, in recent years, it has been widely used in searching. Whether it is a search engine at home and abroad, you can find a semantic understanding, the trace of machine translation.
The natural language understanding technology in the search engine is currently in the search engine is the machine translation and semantic understanding technology. The search engine applied to these technologies is called a smart search engine. Since it has improved information, it has certain understanding and processing capabilities of knowledge (or concept) from the current keyword level, and there is a certain understanding and processing capability of knowledge, thereby intelligent, humanized characteristics. It allows netizens to use natural language to search for information, providing them with more convenient and more exact search services.
Compared with traditional directory queries, keyword query models, the advantages of natural language queries are reflected in: First, the communication is more user-friendly; the other is to make information queries more convenient, fast and accurate. Now, more and more search engines have now announced support for natural language search characteristics. For example, foreign search engine Google,, askJEEVES; domestic search engine network is easy, Yurika, ask a question, 21st century interconnection, Sun Wukong, leisure, etc. Here we will focus on the Chinese intelligent search engine for application semantic understanding technology.
First, use the Euka search engine to simply describe the process of this intelligent search.
The process of achieving intelligent search is mainly divided into three parts: semantic understanding, knowledge management and knowledge retrieval. Among them, the knowledge base is the basis and core of intelligent search. The knowledge base is provided that the semantic understanding will eventually be provided to the user. That is the same as the Internet, the knowledge structure and capacity of human knowledge are expanded quickly, so the knowledge base also needs good adaptability. During the entire process of semantic understanding, smart word technology is an initial part, which refines the core of the composition, and is used by the semantic analysis module. In the process of word, how can I properly provide sufficient words to analyze program processing, and filter out redundant information, which is an important prerequisite for the quality and speed of the later semantic analysis. The intelligent word of the knowledge base processing technology can avoid a combination of ambiguity generated during splitting. Thus, a good original material is provided for the processing of the semantic understanding. Knowledge retrieval can utilize the results of semantic analysis, the search results for the conceptual level of the knowledge base, give the user the highest accuracy, the strongest retrieval result. For example: "I want to find a job in Beijing?". First, the semantic understanding is performed, "finding work" in the knowledge base belongs to the category of the recruitment, so analyzing the user wants to query "job search in Beijing". Then use the concept of "seeking job search in Beijing" to query the knowledge base to give the answer.
Let's take a look at the features of various search technologies claiming to applying natural language.
Ask the question (www.weniwen.com) Natural language search questions is a smart search engine developed by Wenyi.com Technology Co., Ltd. (Weniwen Technologies, Inc.). It allows users to issue query requests in Chinese or English in a natural, complete sentence. It is easy to quickly extend to different languages in a relatively economical manner. It is also possible to identify words in both languages. Using Natural Language Processing (NLP) technology, NLP technology can ask the context and meaning in the request of "Understanding" requests compared to traditional, keyword matching search technologies. By using NLP technology, ask a question to more accurately retrieve appropriate information. Especially for tour / leisure, financial, and consumer industry, it is seeking to enhance consumers to access their information or automatic transactions through the Internet.
21ILINK (www.21ilink.com) Chinese Intelligent Search Engine Century Information Based on Natural Language Processing and Artificial Neural Network Solving Some of the Language Crosswords in Traditional Information Classification, Edge Category Information Inquiry Implement a fuzzy query. The query process has highlighted personalized and humanized. With a natural language statement unit, you can also set local language characteristics. The interface is more friendly and can meet the special needs of customers with different levels, different query destination. And can guide queries to quickly and accurately find the information they need. This intelligent search technology uses domestic original Chinese Q & A and compatible with other traditional search engines. Support concept-based information search, industry, professional intelligent search, customer-customized business models, can also complete a series of supply chains, such as management, tracking, payment, etc., industry research and other consultation projects. A seamless binding of multi-collaboration and business development.
This intelligent search system is an integrated, comprehensive and high quality service system that integrates modern intelligent computing technology, exchange technology, network technology, and database technology. The unique Smart Hit (intelligent semantic), and kengine (knowledge engine). Open, advanced nature and advancement are greatly leading to similar systems. The full system has a complete smart network access service function, support voice text, data and images of each network platform, is a smart multimedia platform. There is no interruption in the platform, support the broadband network to meet the full content coverage requirements for the database. Support concept retrieval, dynamic page retrieval.
Sun Wukong (Search.chinaren.com) Search Engine Sun Wukong Search Engine is ChinaaN develops and owns a self-owned copyright product, which can search for mainland and Hong Kong, Macao and Taiwan. Sun Wukong Search not only uses traditional keyword search methods, but also in all Chinese search engines take the first question-based search, this is a intelligent Chinese processing technology that chinaren research and implementation. Through this technology, you can search directly from what you want to find, not only in line with our usual habits, but also more accurate. Sun Wukong Search Engine has powerful search capabilities, improving search intelligence and accuracy; intelligent evaluation system, ensuring that the search results are highly correlated. "Www.goyoyo.com.cn) Chinese Intelligent Search Engine Beijing Youyou Technology Development Co., Ltd. is established in October 1998, is an Internet information consultation and technical service website established based on Chinese natural language processing technology. (Www.goyoyo.com). Youyou Chinese Smart Search Engine provides primary services through www.goyoyo.com. In order to closer to the language usage habits, with advanced natural language processing technology, leisure Chinese intelligent search engines fully consider the expression structure of Chinese statements and rich varistering forms, through "speaking questions, intelligent results" Let users only use spoken expressions when querying keywords, natural statements, even entered the Chinese and English mixing statements, select the website or webpage you want to query, click Search "to press New, Youyou Chinese Smart Search The engine will automatically analyze statements according to your query, and refine themes, find satisfactory answers, meet the various query requirements of users, so you can enjoy leisurely online.
The following example shows the advantages compared to the traditional search engine to the traditional search engine.
1. Ease of search for higher search is due to the intelligent search engine with intelligent word function, so that the query makes it easier and easy to operate. Take NetEase as an example: You can search for "Liu Dehua's latest personal album", you only need to enter all search contents into the search box; you can find the relevant content; and in traditional search engines, you must comply with the basic mathematical rules of the search. Enter "Liu Dehua's latest personal album" to find the search. Obviously, the smart search engine has a significant advantage in the search for ease of use.
2. The range of search results is accurate due to the use of knowledge (concept) retrieving techniques, clear and narrowing search scope, reducing search for useless information. Take Yuryka as an example: To find "Beijing Weather" only need to enter "Beijing Weather" to find the weather forecast for Beijing, which will give the relevant weather content. In the traditional search engine, not only the content of Beijing weather, but also give all kinds of content related to Beijing Weather words, increase the difficulty of finding search results.
3. The intelligence of search results Since the intelligent search engine has a comprehensive knowledge base, information retrieval and navigation services are more intelligent. Knowledge in the knowledge base helps solve the problem of expression differences. The difference in expression is that users use different words to express the same concept. The definition of synonyms in the knowledge base can eliminate the difficulties of this expression difference.