The role of semantics and conceptual systems in NLP *
Huang Zengyang
Chinese Academy of Acoustic Academy of Acoustics HNC Laboratory 100080
Summary This article is first simply reviewing the basic historical conditions of semantics and conceptual system research. Subsequently introduces the semantic and conceptual system of HNC, and finally an example of a class analysis.
1 Introduction
The "Semantic and Concept System" in the Seminar "" Semantic and Concept System in NLP ", the Semantic and Concept System in NLP, and the following is discussed. The first section reviews the basic situation of the semantics and conceptual system in the 1990s; the second section introduces the basic views of the HNC on semantics and conceptual system issues and the basic results of the third quarter to HNC A "pleasing leopard" investigation.
2. Historical conditions for the study of semantics and conceptual system
The four representative discussion is quoted, these discussions give a relatively accurate summary of the historical situation of the semantic and conceptual system in the 1990s.
Discussion 1:
Assume that all expressions of the language L make up W = {E1, E2, ... en ...},
How to determine the finger of each EI u = {m1, m2, ... mn ...}?
How to determine the relationship between EI and Mi,
That is, how to determine the mapping method (E) R (M),
Make W to U and make U mated W?
......
However, the representation of the members of W is more varied, even else.
Because we don't know which basic units in U are
I don't know what composite units,
So, we don't know that u is not available.
Even don't know how to list each member of U
And what kind of way U should be represented.
The "All Expression Composition of Language in 1" is obviously discussed by the language space in the mind, and the "finger of each EI" is the semantic space in the minds. These two spaces have a mapping relationship, which discusses discuss that they are very clear. But discussion believes that this W space (language space) is changed, and that U space (semantic space) is more variable. why? Because there is a series of "don't know": one does not know (Note: The first "don't know" the statement in the article, the same), two do not know, three do not know, four do not know, here There is also an implicit unknown, a total of five don't know. It can be seen that this discuist is pessimistic about the research prospects of language space W and semantic space u, and it is not necessary to say that the two spaces are mapped to each other.
Discussion 2:
Semantic field analysis and poor analysis have proposed some rather than all semantic analysis.
Can only be used for limited semantic space,
It is still not competent for all words.
Discussion 2 is not discussed in the high housing construction, but compared to knowing, it can be said that it reflects a consensus in the semantics industry.
Discussion 3:
Modern grammar research follows the research pathway of speech conquestation language rules
Speech => Speech (traditional Chinese)
Speech => Language symbol system (structuralist language)
Speech => Language ability (conversion-generating language law)
Discussion 3 The mainstream of the 20th century studies is more accurate. Modern grammatical studies have indeed followed the research paths from speech to obtain language rules. This is a basic consensus of the language community. But here should add a little, that is, the argumentation of "traditional Chinese" is not fair, especially for China's training, it seems that the disclast is very little about Chinese traditional linguistics. Although this problem is closely related to this topic, the historical background of the argument is very complicated, and it is not discussed here.
From the discussion 3, you can clearly see the mainstream of the 20th century linguistics research does not involve semantics and conceptual systems. In the 1990s, the Chinese language felt this inadequate, many scholars put forward the grammar, semantic, pragmatic three-plane, and some scholars put forward the language, the language, the word value small triangle said. The three planes say that the small triangle is the return to Moris' language three-dimensional regression, which is the major steering and progress of language research targets. But this turn is basically the actual study of China 's research, grammar, semantic and pragmatic three-dimensional three-dimensional three-dimensional studies, did not rush out of the original "self-suited independent kingdom". Discussion 4:
1. Language rules describe the normative language,
And actual language materials are not all convicted.
2. The language rules actually only be established in the statistical sense.
Almost all language rules cannot be 100% applied to all language facts.
3. Language rules often only describe the main aspects of language phenomena,
Slight language phenomena often can't cover it.
The three subjects of 4 have enjoys a great extent in the language and computational language community. The discussion 4 seems to be almost impeccable, and discusses 1 have the same work.
Based on the above four subjects, this a question can be said to be a state in which the prospect is quite dull. Discussion 1 is equal to saying: semantics and its conceptual system is basically in a state, and it is also difficult to change this state. Discussion 2 The support attitude towards discusses 1 is made by a side of the semantic studies. Discussion 3 Simply putting the research objectives of linguistics in the form of language, adopting a wise avoidance strategy for semantics, or supplementary as syntactic analysis. The statistical steering advocacy of 4 is clear, and the central argument of "the" rule must have exceptions, the practical value is limited ", which is equal to the so-called" part of the language). "Semantic and Its Concept System" is impossible to have something big.
This is what this is the situation faced in the end of the last century, and the prospect seems to be inevitable.
The rules are really difficult, but scientific exploration is always known. In fact, people have been working hard from five aspects in the past few decades to change the difficulties mentioned above. First, the language research is extended to the above three planes, especially in terms of pragmatics. The second is the rise of the tanguratory language. The third is the progress of connotation logic. The fourth is the development of the word network research. Finally, the HNC exploration is the first exploration of Chinese training and represents Mr. SCHANK. Inheritance and development.
3. Semantic and conceptual system of HNC
The semantic and conceptual system of HNC is based on the following three points:
What is the body or language of the first language? This problem is very large and there is no ready-made answer. HNC is such a hypothesis, the language of the language is a language concept space in the human brain. The basic composition of this space is the concept of Lenovo, which is the result of the evolution of human evolution in humans. This is actually a famous argument of Mr. Jumski, but his name is a general syntax UG. Humans have currently there are about 6,000 languages, but humans have a common language concept space, which is a hypothesis, but must do such a hypothesis, otherwise it will slide into the ethereal way to the discussion 1. Mr. Hegel said that "the beginning of philosophy is a hypothesis", HNC accepts this idea.
The second study of language concept space, the first goal is to find out this space description base. If this goal is not reached, the study of language concept space is still difficult to get rid of the album pointed to the list.
Third, this describes the primitive of the primitive, the symbolic system, which must be easily operated. This symbol system must completely replace the natural language symbol system with the principle of the association principle. The largest weakness of the natural language is to describe the concepts associated with each other with an unrelated voice and symbols. The description symbol system of language concept space must fundamentally eliminate this weakness.
The HNC is based on such three points to consider the structure of the language concept space to describe the symbol system. This description is clearly distinguishing between three levels, the first is the concept element level, which corresponds to the words of the natural language. The basic feature is "7-2-1"; the second is the sentence class element level, corresponding to The basic characteristics of the natural language are "57-3192"; the third is the context-level, corresponding to the sentence groups, paragraphs, and chapters of the natural language. The basic features are "7-57-7". Next, the meaning of this set of numbers in the three-level surface of the language concept space is described below. "7-2-1" indicates the basic type of the concept base. "7" indicates that the seven types of abstraction concepts, "2" indicates that two specific concepts, "1" indicates two concepts with abstract and specific dual features. The specific concept directly corresponds to the real thing, and abstract concepts cannot be directly corresponding to things. But not any concepts are either abstract, either specifile, some concepts have abstract and specific dual features, such as any concept that describes the physical properties is the case.
"7-2-1" represents both three-pointed three-pointed in the concept base, but also expresses the lateral structure of each basic type. That is to say that abstract concepts have seven horizontal structures, with two transverse structures, and two concepts are only one.
The first class of the 7 types of abstraction, named the main primitive concept, with 6 root nodes, named function, process, transfer, effect, relationship, and state, so abbrevant. The action effect chain is the core of the overall structure of the language concept. It is not only the core of the concept element level, but also the core of the sentence element. This problem will be discussed below.
The second and third categories in the 7 types of abstractions are specially designed to describe the concept of human activities, and human activities are the main body described by language. Why do you have two basic types? Because the era characteristics of human activities have significant differences, some activities have been in ancient times, and they have an indispensable characteristic, although their forms and content will change. Other activities do not have this feature, some only exist in ancient times, and then completely disappeared, and some activities are produced after the generation of industrial age or after the industrial age. Wan ancient human activities are named the first class expansion primitive concept, with 5 root nodes, including the first class of spiritual life and second types of labor. The former includes psychological activities, thinking activities, and philosophical behavior, the latter includes professional activities that give specific significance (such as political, economic, cultural activities, etc.) and pursuit activities (such as reform and inheritance, competition and synergy). Non-eu paper is named as the second type of expansion base concept with 3 root nodes. The first root node is named first type of labor, which is generally corresponding to the so-called physical labor; the second root node is named the second type of spiritual life, which is generally corresponding to the so-called leisure activities; the third root node is named the third kind of spirit Life, generally corresponds to so-called belief activities. Of course, there is also a personality characteristic of the era of era in the first class expansion primitive concept, and the same second type of expansion primitive concept also exists in an ancient personality characteristic. These personality features are not difficult to indicate specific words, and the above-mentioned overall age characteristics of the two is the most basic, most important world knowledge, and is the foundation of context generation.
The fourth class in the 7-class abstract concept is named basic concept, with 9 root nodes, named sequential and generalized space, time, space, number, amount and range, quality and class, degree, attribute of judgment Description and evaluation description. Basic concepts can be considered to be the foundation of all concepts, which is the basic platform for conceptual operation, so it is also a philosophical eternal topic, especially the last item.
Category fifth class of 7 abstract concepts named language logic concept and basic logic concept. Language logic concept corresponds to the so-called virtual words in the linguistics, and the virtual word is an essential tool for conceptual expression. Conceptual expression is in linguistics, communication, including language generation, and requires virtual words, but thinking off. Language logic concepts have 12 root nodes, which will not be mentioned here. Basic logic concepts correspond to "comparison" and "whether there is no", which correspond to two root nodes respectively. These two basic judgments are animals (including people) to survive basic subjective conditions, and "whether there is no" is a basic subject of philosophy, "basic logic" is named here. The sixth category of 7 types of abstract concepts is named integrated concepts, with 4 root nodes, representing the understanding and strategy, methods, conditions, and generalized tools. The synthesis herein refers to the synthesis of the top 6 abstract concepts, or more accurately, these concepts cannot be simply incorporated into certain classes, they are the performance of each other, and the integrated concept is set to this. Interlacing of interlacing.
The last class of 7 types of abstract concepts named language concept, with 11 root nodes. The "language" here is the abbreviation of language habits. Langy has obvious language, era and geographical personality characteristics, which can also be referred to as social personality. This is to say that the concept of language is strong in social personality, and the aforementioned 6-type abstraction concept is the largest commonality of nature and society. Some of the social personality, such as subject primitive concepts, basic concepts, basic logic concepts, some weak dependence on social individual, such as the first class expansion primitive concept, language logic concept and comprehensive concept, and social personality is closely related to social personality Second class expansion base concept.
The upper 52 root nodes is all of the 7-class abstract concepts, each of which has an extension of the two directions, and each level extends represents a deterministic concept. Each root node uses a letter (representing the concept type, but the main base concept and the first, second type of extended base concept without type symbols) and a number (representative of the definition of root node), each level extends only With one digit, the extension of the two directions is interlaced. However, the extension from the start of the root node must be first lateral vertical, continuous extension represents cross-interlaced, the digital string called the concept of the concept, its extension is closed, the total extension grade has a different type of abstraction. The agreement. The middle layer representation of the high-level representation of the longitudinally called the middle layer of the concept, the lateral direction of the underlayer, and the continuous underlayer extension represents the cross-registration, and the extension range is open. The middle and underlying layers use different digital symbols, the middle-layer extension description concept. The so-called local Lenovo convolution refers to the concept of the concept of the concept, comparison, and inclusiveness, and the interpretation is generally corresponding to the so-called antonym, and the contrastability is generally corresponding to the so-called synonym, and the inclusion is corresponding to the overall-partial-individual body. The coupled concepts have a very rich connotation and internal structure, not the antonym or the opposition of Hegel's confrontation, which can be fully summarized. The underlying extension describes the network of the concept, essentially an abstract abstraction, and each number represents complex correlation between a set of concepts. This vertical and horizontal representation is the specific embodiment of the above-described concept indicating the principle of association, and its digital representation has caused a very simple characteristic of the concept associationability. For example, the HNC symbol of the concept of "Festival" is J1099. J10 in the symbol indicates "basic characteristics of basic concept time". The letter symbol J represents the basic concept, J1 represents time, J10 represents the basic characteristics, J109 enters the underlying extension, represent specific Time points, the holiday is a further extension of a specific time point J109 J1099. This symbol is actually the underlying symbol definition
J1099 :: = (L91 / WJ10-00 (672; 6804) {A00E2139} (103A8, L14, WJ10-)
Simplification, this is the specific performance of the abstraction. The right form is composed of 4 items. The first item represents a specific day, the second term is entertainment or commemoration, and the third indication may be holiday, and the fourth represents the annual one. This symbolization method can embody the basic content of the concept of the concept of the festival itself, but also to reflect the connection with other related concepts (such as holiday, holiday, etc.). The "Concept Hierarchy Network (HNC) Theory" only gives the high-level representation of the 52 root nodes of 7 types of abstraction concepts, the middle and underlying denotes only give some examples, even the simplified scheme. The complete representation of the HNC concept primitive symbol system will meet with the reader in the form of "manual".
The following is a brief explanation of the two specific concepts of HNC definition. The first class is called basic, and the second class is called hanging specific concepts. Basic is used in the description of the universe and nature. The type symbol is JW, sets 7 root nodes, represents hot, light, sound, electromagnetic, microscopic basic, macroscopic basic and life. The second type of concrete concept has two basic types of symbols P and W, P representative, W representative, PW represents human beings. It does not have its own independent digital extension symbol, which means it does not set its own root node, and attach to abstract concept. The number symbol behind the type symbol is taken from the abstraction concept, and the name is here. For example, the symbol A149 represents the event, and the PA149 represents diplomatic personnel; 1098A9 represents flow, W1098A9 represents various "streams", such as W1098A9 represents the airflow, W1098AA represents the water flow, W1098A9B represents the mudslide. Such a specific concept of second types is clearly conducive to the calculation of the activation or semantic correlation of the concept association.
The concept of "flow" is a level 5 extension of the root node "process" (belonging to the main base concept, indicated by digital symbol 1, without type symbol), first-stage extension 10 represents the basic characteristics and type of the process, according to the conventional base The high level of the meta concept extends only, 1Y = 1-4 represents the other characteristics of the process. The number 109 has entered the underlayer extension of the "process", indicating the motion process, the three-stage extension 1098 represents the motion process of the object or substance, and the four-stage extension 1098a represents the material motion process, and the five-stage extension 1098a9 represents flow. By the way, the "fluctuations" symbol corresponding to "Mobile" is 1098AA, which is one of the meaning of "fluctuating" this word. The concept of "Each Activity" is a level 3 extension of root node professional activities (belonging to the second type of labor, not type symbols, directly in digital symbolic symbol A), first-level extension A1 represents political activities, secondary extension A13 means diplomacy activity. According to the high level of the conventional second type of labor, the symbol A139 has entered the bottom layer and represents the envoys of one of foreign activities.
The bottom layer indicates that the symbol has three basic types, named extended, i extension, and / K extended, T-extended digital definition domains of 9-B or 8-B, I extended to 3 or 7, / K extended The digital definition domain is / 1- / b. These three underlying extensions represent three different types of extended structures, T extension and / K extended all groups, and the former is a small group, the latter is a large group. I extension is a monomer extension. The above two examples belong to T extension, and T extends further comprises three types of αβγ, which is not to be detailed.
The intermediate coupled indicator is M, N, EKM, and EKN. The former two indicates a double-donating concept having an opposite uniform feature, and is abbreviated as a black doll. The latter two indicate that the non-black dual, this dual concept does not have an opposite unified feature, or not only double opposition, referred to as non-Black dual. For example, "Process" "start, end, continuous and transition" is a triple donating concept, "start" and "end" opposite, there must be "continuous", and "continuous" is not "start" and "end" The opposition unity, the opposition of the two is "transition", "transition" means an old process ends and a new process starting. The digital definition region of the Black Duality Symbol M and N is 0-2 and 4-6, 1 and 2, 5 and 6 opposing (antisense), 0 or 4, respectively, respectively, respectively, respectively.
Non-Black pair is indicated by 3 digits, and the first digit e (14) is a non-Black duplicate mark, and the second digit digital K indicates a specific type of non-black doll, and its digital definition domain is 0-b, The value range of three digits M and N depends on K, and the agreed domain of M is 0-3, N is 4-7. "Start, end, continuous, transition" This group of non-black dolls symbols are referred to as 11EBM, where 11eb1 indicates "start", 11eb2 means "end", 11eb3 means "continuous", 11eb0 means "transition", four Make a local Lenovo convolution. Non-Black doll concept in language space or language concept is more than the Black doll.
Regarding "7-2-1", it is said that it can see the total table in [1]. The point to supplement here is that the appearance of "7-2-1" is a negation of the previous discussion 1 and discusses 2, and the football points of the two discussions have been completely changed, and the five "do not know" already pointed out. All I know.
Now introduce "57-3192", 57 is the total number of basic sentence classes, and 3192 is the total number of mixed sentences. The clause class is the semantic - parallel type representation of the statement. There are three fundamental issues here. What is the semantic type of the statement based on what principles? The second is how to constitute a sentence class representation to facilitate the operation of the computer? The third is how to reflect the language characteristics of the statement?
The semantic type of the clause is essentially the concept type of some abstract concepts. It is obviously not every category to determine the class semantic type. Language logic concepts and language class do not have this eligibility because it is just a tool that expresses language. Basic concepts and comprehensive concepts do not have, because both are the basic conditions for conceptual operation, not conceptual operation itself. This has the abstract concept that determines the class semantic types, only the main base concept, two types of expansion primitive concepts and basic logic concepts, which is the original intention of the overall description of the abstract concept. These 4 types of abstraction concepts have a total of 6 5 3 2 = 16 root nodes, and the high-level concepts of 16 root nodes are divided into two basic types, a basic sentence class, a mixed sentence. The former includes the six root nodes of the primitive primitive concept, the two root nodes and basic logic concepts in the first class expansion base concept, and the two root nodes of the basic logic concepts, all of which constitute a hybrid clasp. As for the underlying concept, most of them constitute a hybrid sentence.
Thinking activities and basic logic concepts are judged. Therefore, there are also 7 major classes, and the 7 categories are "active chain judgment", also known as a broad sense of effect chain. The basic sentence class is a description of a link of a broad-sense effect chain that is a description of the two links of a broad sense effect chain.
The setting or design of the active chain high-level concept node is first in accordance with the base of the root node concept, and also take into account the difference in statement semantic type performance. For example, the root node "role" concept Lenovo geeh is "action-action tolerance-life response to the role of the role", and should also consider two special types, one is called "exemption", a "constraint" , Exemption is the role of another action, and constraints are the role of "nothing or do not change", which is the contrary to the general role is to make the object "to do or have some changes." Then "Action - Reaction - Exemption - Constraint" constitutes five five high-level concepts of this root point, each with a basic sentence. They are called basic functions, subject sentences, respondents, exemption sentences and constraints. The symbols of these five roles high-rise concepts are 00, 01, 02, 03 and 04, and the clause representation is: xj = a x b
X10J = x1b x10 XBC
X20J = x2b x20 XBC
X31J = X3A X31 XABC
X4J = a x4 x4b
These five representations are five kinds of 57 sets of basic sentences. Here, it is necessary to focus on two points: First, the clause representation is the embodiment of the associated concept node Lenovo convolution, and the other is a particular knowledge of each class, called sentence class knowledge.
The class representation (also called sentence class code) consists of several units, and the unit is connected to the " " number, each unit is written. The semantic block is a function of the clause class. This is to say that the semantic role of each language is determined in the clause class, which is determined to give birth to the concept of this clause and its associative convolution. For example, the reaction sentence x20j has the following idea: The reaction is inevitable to some stimuli, and the stimulation necessarily includes the stimuli and its performance, so the accurate description of the reaction sentence is: the reactor X2B pairs of stimuli (reaction initiator) and its Performance XBC makes some reaction X20. Here the reactor describes the language symbol X2B, the reactor is described, and the reaction initiator and its expression are described by the reactor and its expression. The second language of the 5 sentences in the above 5 sentence is called the feature point of block EK, and the other is called a generalized object language. The statement format refers to the sort of JK and EK, and the above sort is the basic format. For a sentence class that must have an EK (5 groups without EK sentence classes in the 57 sets of basic clauses) HNC, in accordance with the SVO language, the EK is arranged in the second bit of the statement basic format, and JK sort is determined in this sentence. Description of the class (subject), the number of the subject is JK1, where the EK is ranked, and the subsequent JK is numbered JK2, JK3, etc., ranks behind the EK. Adjust the primary block sequence (JKM number) and plus boundary tags between JKM (these tags belonging to language logic concept) is called specification format. The normative format of Chinese is specially developed, which is a special wealth for Chinese understanding.
The sentence of the react sentence represents the term X20J describes the concept of the concept of "reaction 02". Basic sentences to bundle with this class: Reactors X2B must correspond to the specific concept of life, reaction caused and its performance XBC will inevitably correspond to specific concept XBCB (reactive trigger) and abstract concept XBCC (trigger Performance). The actual reactor may omit XBCB or XBCC, and the XBC may even completely omit in the discourse, but it is impossible to exist from sentence clause knowledge, which can be omitted or omitted in accordance with the part of the statement in the statement.
57 groups of basic sentences represents a total of 200 species of semantic blocks, 3192 groups of mixed sentence classes have a total of 10,000 semantic blocks, but the symbol block is only 10 in total. 7 primary-action X, process p, transfer T, effect Y, relationship R, state S, and judgment D, and the general reference character element E, and 3 primitives of the general object E, and the general sense target E. A, object B and content C are collectively referred to as generalized object primitives. The feature representation block EK represents only consists of only the E-based primary, and the generalized object language symbol JK represents the formula to be combined by the E-based and generalized object primitives. The above XJ and X4J sentence clause representing a semantic block represented by a single generalized object primitive element, which is only a simplified representation in the 57 set of basic sentence classes. EK and JK are called the primary block of the statement, and the actual statement has a coupling in addition to the main block. It does not enter the clause class representation, but enters the clause knowledge. The clause representation is divided by the number of two sentences, 3 sentences, and 4 blocks in the number of primary blocks; if there is a standard format in its statement format, it is divided into a wide sense of function and the general sense of effect sentence; according to its concept Cold coar characteristics are divided, with the difference between the EK statement and the EK statement, the difference between the block expanded note and the conventional statement, whether the JK has the difference between first sentence characteristics. These are basic sentence clauses, which have extremely important guiding significance for the understanding of statements. For example, the XBC and XABC in the above sentence representation have first test characteristics.
The so-called sentence refers to the content of the JK expression. It is actually equivalent to or containing a (even multiple) statement, and the sentence also has its own clause class, and the clause class represents ELJ, to distinguish between a global sentence class Indicated EGJ, the subscript symbols G and L here are the meaning of English Globle and Local. ELJ also has its own JK and ELK (if the ELJ is existing EK). Morphological developed languages (such as English) are usually (usually, not fully specified) use non-limiting forms to ELK, which use defined morphological verbs to EGK, so EGJ and ELJ are easier to identify. Morphological non-developed languages (such as Chinese) EGK and ELK have no morphological difference, EGJ and ELJ identification are relatively difficult. One of the fundamental difficulties in Chinese understanding is the identification of EG and EL. But as long as we discover the knowledge of the sentence, we give more accurate semantic description to words. This difficulty can be overcome, in fact, HNC sentence class analysis technology has formed a complete set of effective processing strategies [2 ].
Identification of EG and EL is a significant difficult point in natural language understanding, but the identification of sentence e j itself is often a significant difficult point. Due to the uncertainty of natural language words semantics, a verb often corresponds to multiple clauses, multi-sentence clauses to determine the problem of identifying the same nature of EG // EL. For example, "breaking" of Chinese corresponds to 8 sense items and 7 sentence classes [3], this selection 1 is not easy. However, the uncertainty of English words is more serious, and the multi-sentence clauses of Chinese non-single verbs are small than English, so they should also see the advantages of Chinese.
The plurality of sentences of verbs are not the performance of pure semantic characteristics, but also the performance of pragmatic characteristics. Pragmatic and context is closely related. The research of pragmatic and context is the mainstream of philosophical research in the second half of the 20th century, has achieved great results, known as the language of language [5]. However, these studies are launched around syntactic-semantic-parallel relationships, definitions and use of contexts and their use. The foundation of the discussion is that the context has existed in the brain of the communicator. But this existence is not existed about the computer, the skin is not deprecated, and the hair will be attached! Therefore, natural language understanding is facing an urgent need for contextual generation.
So HNC sentence class analysis technology is equipped with context generation modules. "7-57-7" is a description of contextual generation. These three numbers represent the three elements of the context - field DOM, Scenario SIT and Events Background BACT (Author Background BACA). The first "7" represents 7 primitives, "57" representatives of "57" representatives, the second "7" representative event background BACT7 primitive. Seven-domain primitives are the two types of labor and three-class spiritual life defined by the aforementioned HNC, plus the physical activity and natural phenomena of life, the high-level concepts of two types of labor and three-class spiritual life are subclass division in various fields. 57 scenarios are 57 groups of basic sentence classes, and they together form a found dynamic composite composition together with 3192 groups of mixed sentences. 7 Events Background BACT is the seven-way block types defined by HNC - MS, Tools IN, Ways WY, Conditions CN, Reference RE, Causes PR, and Tests RT, these auxiliary block subclasses are constant of event background dynamic composite composition Foundation. This kind of HNC's sentence analysis process is also a context generation process, that is, the acquisition process of the specific information of the speech three elements. This information is directly symbolized in the HNC mapping symbol of the word and the sentence class representation of the statement, and contains the corresponding sentence class knowledge, the acquisition process is not complicated. Of course, the technical implementation of the context is also needed to solve two key issues: First, the identification of the field sentence class, the second is the assembly of the scene information, and it will not be specifically discussed here. This section describes these, and finally, it is necessary to say a few words to the discussion above, where the rules are the description rules of the language space, and the inherent uncertainty in language space inevitably brings uncertainty of these rules. However, the situation of the language concept is a fundamental change, and the concept of Lenovo is determined. The knowledge of the sentence is determined. In the corresponding discussion above, we use "inevitable" multiple times. That is to say that the language concept space overlooks the language space, the rules are not the biggest likelihood of statistical significance, mainly the rational judges advocated by Mr. Kant. We are convinced that rational judges will have an increasingly significant effect on the decomposition of 20 difficulties in NLP.
4. A "pleadlocking leopard"
Here, through an example of a sentence, specifically, the application of sentence class analysis process is used to use the knowledge of sentence. Sentence knowledge is the essence of world knowledge, world knowledge is not, but the knowledge is limited. To make your computer to grasp the most effective way of world knowledge is to start with sentence clause.
The prototype of the example sentence is:
Kids who can skilled the computer are not necessarily excellent in learning.
The sentence class analysis is as follows:
This is a concise state S04J, one of the 57 groups of basic sentence classes, and its sentence classification is
S04J = SB SC
Is one of the commonly used ek sentence classes in Chinese. There are two main points of basic sentence knowledge of the concise state sentence. (1) The state object SB implied (content C) can be placed in JK1 = SB, or in JK2 = SC. The previous placement, SB = SBB SBC, SBB represents the status description object, and SBC represents the performance of the object; the latter is placed, the SC = SCC SCU, the SCC represents the performance of the status object, the SCU represents the properties or attributes of the performance. Value. (2) SC Description Center can only be a Class U concept phrase (u is the attribute symbol of the HNC five-yuan group, "very good" and "good" is a U class concept phrase) or quantitative phrase (at this time SCU take attribute Value, and the front case takes the attribute itself).
The expression of the example 1 is special, it transforms the state of the status "child" to refer to the auxiliary block RE - in learning. So, in cases, there are 3 languages, and the two are one aaters. If the transformation is changed, it becomes the following two sentences:
/
All three examples are concise status, but the semantic block composition of the three is very different. The HNC corpus is not explicitly indicated by these difference information. The symbols "||" and "|" represent the semantic block boundaries of EGJ and ELJ, the symbol "~" is the auxiliary block mark, the symmetry symbol {...}, <...> and / ... / respectively represent the protests, respectively. Feature sentences and packaging sentences.
There are two verbs "operation" and "learning". However, both do not constitute EG. The former acts as EL, and its ELJ's prototype statement is "child | can be skilled | computer". The latter acts as the primary C. of fk, sb, and sc
HNC sentence class analysis technology can deal with this pragmaticity of the verbs? Examples 1 and 3 are not difficult, "" "words and" in ... "provide the necessary information. Example 2 Some slightly difficult, smarter programs must first make EG assumptions for "learning", but subsequent EK-JK2 tests will deny this assumption. Thereby returning to the entire sequential string S04J assumptions, and successfully passed the inspection based on the clause knowledge of the above concise state sentence. However, the higher intelligence is not allowed to take this curve, because the "not necessarily excellent" this U-way "of the" not necessarily excellent "is provided with sufficient information in the S04J sentence class, and uses the SC semantic block constitutes knowledge (ie The main point of the above S04J sentence knowledge is 1), the problem has been solved. It should also be explained here that the above-mentioned bending can reach the other side? This involves the use of "learning" hybrid classes T19YA0 * 21J sentences, "excellent" as the elements of this sentence JK2 do not comply with the expected requirements of the EK and JK2 conceptual correlation. This is not "learning" the personality knowledge of this word, the traditional syntactic knowledge and HNC JK composition knowledge can give this expectation. Therefore, the bend can also reach the other side, but the efficiency should be low.
In case of an important problem, it is to determine that the reference assistance block RE is converted by the state description object SB, which belongs to the main auxiliary transformation problem in the "20 difficulties". Not this transformation process can cause the event background BACT error. The basis for making this main auxiliary transformation is the key points of the above concise state sentences. 1. Whether it is good at using sentence clause knowledge is one of the main criteria for measuring the intelligence of NLP understanding.
It is self-evident in 57 groups of basic sentence classes and 3192 groups of hybrid classes to describe the program of conceptual knowledge library [4], which is an unusual theory and engineering construction. However, the way to make rational or rules in NLP play a greater role in NLP, so that NLP is in a difficult situation in the "snow" edge, this construction is an urgent key basis for strengthening.
5. Conclusion
The role of semantics and its symbolics in NLP has a relatively pessimistic tone, and the four discussion of the first section is a typical representative of this base tone. But HNC's successful structure of the three-level concept of language concept space indicates that the actual situation is not so pessimistic. The key idea here is to boost from the language space to the language concept space overlooking the language. This article describes the basic concepts and dedicated terms of a series of HNCs, and it is impossible to explain one by one. The natural language mentioned in the article handles 20 difficulties, and the "7-57-7" of the context discusses the monographs and papers written by the author 1999 and 2001, but not publicly published. Interested readers can review the information via the URL http://www.hncncnlp.com/.
references
1 Huang Zengyang. The Development and Future of HNC. Chinese Journal of Chinese, 2001, Volume Issue (Issue 3): 46-64
2 Research and Implementation of the Difficulties of Multi-Mobile Words in Chinese Academy of Sciences [School of Academy of Academy of Sciences]. Acoustics, Chinese Academy of Sciences,
2003
李颖. From 'break' 'Breaking' word to see the knowledge of HNC. See: Zhang Quan Xiao Guoguan editor, "HNC and Linguistics Research".
Wuhan University of Technology Press, 2001.p187-190
4 Miao Chuanjiang. Research on the Knowledge of HNC [Institutional Paper]. Academy of Academy of Sciences, 20015 Sheng Xiaoming. "Discourse Rule and Knowledge Basis". School Lin Publishing House. 2000
6 Xu Jiayu. Present Situation and Vision - On Chinese Information Processing and Modern Chinese Research. Chinese Language, 2000, 6
7 Xing Fuyi. From the basic flow of modern Chinese grammar research for forty years. Chinese language, 1992, 6
8 Yang Chengkai. Methodology analysis of syntax, semantic, pragmatic three-level, language research, 1993, 1
* This article gets the support of 973 project "G199803050" "