4.3 statement representation and its format
The term represents this term in the "HNC theory", and the previous section has added HNC modifiers in front, which means that there are various statement representations. But we have never been defined for the statement to represent the next definition. Here, it is recommended to use the following definition: put each statement unit in a certain order, and form a certain statement representation.
According to this definition, the Jomsky's S = NP VP represents the statement of the statement representation, as it specifies the order of the statement units, but does not guarantee a certain sense, and its generation capability is known, The Joe self-mocking name quote is illustrated. Why does Johnot does not have the qualifications of the statement? Simply put, it is because his statement unit is improperly selected, and the next level of the next level of the phrase is directly adopted.
Indicates the basis of the formula HNC 21 to act as a statement representation of the statement, because it specifies a smallest essential condition for a statement: the type and number of the main language block. This sentence seems to have a "more than", one "not pass". "More than" is "having a certain meaning", "unless" is "minimum full essential condition".
The reason why "more" said: Since it is a statement, of course, it is certain, add this modification word not to draw a snake? This question is to answer from two aspects. First, "certain" here is not "certain", but a series of "must", these specifiers are all specific clauses. Second, the meaningless statement can be "produced" by Jojo, and the literature and linguistics masters often play the meaningless language game, standing in the "computer-oriented" position, need Appropriate limitations for "meaning". Therefore, adding the modified words "certain meaning" is necessary.
The reason why "not passing" is: since it is full necessary conditions, there is no "minimum". This issue also needs to answer from two aspects. First, the meaning of the statement not only decides on the type and number of semantic blocks, but also determines their sorting. Second, the meaning of the statement is mainly determined in the main language, but the contribution of the subject block can not be ignored, sometimes even exceeds the main language. Due to these two reasons, the "The Types of the Secondary Block" is just a "minimum value" of the statement with a sense of meaning. The full essential condition of strict mathematics does not have a "smallest", but the language borrows mathematical concepts, it is not necessary to take a "supernatural" method, this is another example.
So, what order is arranged in what sequential blocks in the formula HNC 21 can be constituted in order to constitute a statement representation? Don't underestimate this problem, it involves many basic language phenomena, roots, such as the style of the Siwen, the syntax function of the preposition, the "put" words of Chinese.
In order to study the statement representation, it is best to return to the form of HNC 20, and two simplified on the symbol: remove the variable x; reduce the characteristic block EK to one. In this way, the statement representation can be written into a relatively concise form:
J = [jk] ek [{fk}] (j-0)
This is a statement representation of only one characteristic voice block, however, if the concept of block expansion and syllables in JK, it can enhance its adaptability, allowing multiple characteristic speech blocks to include all 57 sets above. Basic clause. This expression of the physical meaning can be described in this way: it represents such a statement, which is the core, configure one or several generalized object language JKNs, and can also configure one or several adjuvant blocks separately. .
According to this statement, the perception or understanding of such statements is to find out these semantic blocks. In other words, the perception or understanding process of the statement is a process of identifying each semantic block. This identification process should be divided into two steps. The first step is to identify the difference between JK, EK, FK, the second step is to identify the internal differences or significance of the three. It is idea that there is such an ideal natural language. It gives a clear mark on the above two differences of each semantic block. So, the two-step identification process is solved, what two steps and semantic blocks do not matter .
However, the above ideal language is actually not existed, and it is not realistic because the tag of internal differences is very complicated. The practical practice of natural language is that the above two differences are combined into grammatical characteristics. Different languages have a thousand autumn in the way or techniques of markers, but compared may be the most "good" in Japanese, and Chinese performance is most "bad".
The morphological changes of the phylogenesis, the adhesion component of the adhesive language, mainly to achieve the following two functions: one is the difference marker of the semantic block, and the other is a indicator mark inside the semantic block. Other functions are secondary, even excessive.
In addition to the marking, the sequencing of the semantic block itself can also provide identification information of the phrase block. For example, the statement indicates that the HNC20 is rewritten into two forms below.
J = JK1 EK [JKM] [FK], M = 2, 4 (J-1)
J = [jkm] [fk] ek (j-2)
The burden on the textual block identification can be reduced because the former gives the clear position information of JK1 and EK, which explicitly gives the location information of the EK, which is equivalent to the so-called SVO and SOV language. SVO and SOV say that if the concept of "Double O" and "Double S" (Double S "(dual subject), applicability is relatively wide. However, HNC still believes that the symbol JKM is used instead of S and O, which is more convenient for studying various changes in the study statement. Of course, the necessary premise is that it is possible to give the determined semantic role in JK's subscript M, and this is not any obstacle because the foundation of the basic sentence class and hybrid classes have no obstacles. Therefore, the statement format problem is discussed based on the statement to represent the formula J-1.
The statement format is defined as the ordering order of the statement main language, and each arrangement is one or a type of format, why is there a "one" and "one class" points? This seminar has a special paper on statement format, and readers can find answers from it. Regardless of the auxiliary mesh, therefore, the J-1 is rewritten into the following five forms:
J = JK1 EK [JKM] m = 2, 3
J3 = JK1 EK JK2
J4 = JK1 EK JK2 JK3
J21 = jk ek
J22 = JK1 JK2
In addition to complex sentences, the overlay of these five forms of statements is sufficient to study the format. HNC collectively refers to the basic format of the statement of the above form. It should be emphasized again that the JKM here has allocated a given semantic role on the basis of the clause, which is completely separated from the subject object concept under the sense of syntax. The pros and cons of completely detached is worth studying, but it is not discussed in this article.
The basic format is a full utilization of the sequence information of the language block. Its basic feature is not to give any tag information for any language, because it considers that the location contains all the information required, representing the semantic role of each language block. This statement is "magnificent as cows" on the surface, but actually there is still a problem, is there a difference between the boundaries between the various semantics blocks? Even if the EK is placed in the second position, two natural boundaries can be formed. The JK2 and JK3 of the J4 statement are between the JK1 and JK2 of the J22 statement. However, the basic format is to take "quite unreasonable" attitude, which is ignored. Therefore, the language that uses the basic format can be said to be "quite unreasonable" language. So, is there such a language in the world? Have! How much I don't know, but I know at least one, that is Chinese. English is the "P400E31-0" with I me, Chinese uses a "I" full represented, and even expands to represents MY and OUR. You see Chinese "quite", it is simply to "barbarism", isn't it? !
However, as many philosophers have said, the truth and fallacy often only one step, the above "Barbar" is the performance of its high intelligence. "JK2 and JK3 of J4," between JK1 and JK2 of J22 statement "is always the boundaries of" object B "and" content C ", and it is impossible to be two" object B "or two" content C ". The boundary between, if you can distinguish "object B" and "content C", then this boundary is actually not existed. The human brain is easy to say to "object B" and "content C". According to this, Chinese is safe to use the basic format. Is this not a smart performance! However, for the computer, this boundary is blurred is an impedably obstacle, called "BC 佯" in "HNC theory" P165.
It is worth noting that in terms of the use of basic formats in terms of strict sense, and it is also rich in colorful use of various non-basin formats, including HNC definitions, violation formats, and various omissions formats.
The specification format refers to the main language blocks to exchange locations on the basic format. At this time, the boundary mark is added between the two generalized object language language, indicating the M number of the generalized object language block behind the boundary, ie The code of the language semantic role. The violation format is a violation of the standard format, the default, or all of the semantic block boundary markers. Look into the following example:
1 three || Already notified || Li Fossa # Participate || Tomorrow's bridge competition #.
2 three || [| already] ^ put / {Participation || Tomorrow afternoon bridge competition} things / || Notice || Li 4.
3 / {Participation || Tomorrow afternoon bridge competition} things / || ^ by || Zhang 3 || Notice || Li Si.
4 {Li 4 || Participation || Tomorrow afternoon bridge competition} things / || ^ by || Zhang 3 || Notice
5 Tomorrow afternoon bridge game || ^ by || Zhang 3 || Notice || Li 4 || (Participation).
6 bridge game tomorrow afternoon || Zhang 3 || To quickly notify || Li 4 || (Participation).
7 tomorrow afternoon bridge competition, Li Si || Not participating, Zhang San || To quickly notify.
8 Tomorrow afternoon bridge competition, Li Si || Not participating, [| want] ^ let Zhang San || Hurry notify || He.
This is a set of information transfer sentences that are the same content.
T3J = TA T3 TB T3C, T3C = ERJ
The first sentence is the basic format, 2-5 sentences are specific formats, and 6-7 sentences are violation formats. Here are slight explanations. Let's explain the mark symbol:
|| Semantic block boundary marker
{} Prototype tag
/ / Package sentence demon
# # Block expansion log
[|] Special speech front separation marker
^ Semantic block indicator information transfer sentence's T3C has block extended transposition, the block expanded statement is an effect sentence
Y90J = YB Y90 YC
YB is omitted in this effect sentence. According to the sentence knowledge of the information transfer sentence and the effect of the effect, the object omitted in the effector is TB, which is Li Si. If a person pronouns "he" appears in YB position, "he" must refer to TA, not YB, here is Zhang San. However, it should be noted that the "rule" later eliminates the case of the comma between TB and T3C.
In the specification format, the total number of semantics blocks constant. However, the block expanded statement in the basic format becomes a packaging prototype, which is not necessarily, but there is laws to find, it is worth studying. The participation of the fifth sentence seems to be omitted, in fact, there is a dismissal, participation, visit, and participation in the province, and the boundary of the competition staff is unclear.
The semantic block boundaries in the violation format showed "BC 佯" in the sixth sentence. In the 7-8 sentences, with a comma replaced the semantic block boundary tag, this is a syntax metrics commonly used in violation format. But there are too many syntax functions of the comma, and it is not acting as a semantic block boundary marker, but only the clause knowledge of EG or EP can make judgments.
It can be seen from the three formats above, and the different formats of Chinese are integrated with grammar "grid, time, body, state, and". Chinese language jurists have long known the grammatical function of Chinese sequence, and the research on the system, and HNC needs to be humbly. However, the "language" "language" unit seems to be more clear, dividing it into two major types of semantic blocks and semantic blocks, is it more conducive to Chinese "time, body, state, state" research? With this opportunity to make a question, I am willing to recommend: Any language phenomenon in the semantic block is "order", the concept of format is very useful. Speaking here, I can't help but remember the national masters who have suffered from all kinds of unfair evaluation in the 20th century (here the time according to the national studies), and their work is not necessarily to be scientific arguments in the 21st century. However, their excessive love of phoneticity is more than a counterfeit, but it can be made.
The key points need to be explained in this section now only "hybrid clause 3192". This number comes
57 (57-1) = 3192
There is no mystery. The hybrid sentence is the derivative of the basic sentence class. The definition of the basic sentence is that the main characteristic block only shows a link to the functional effect chain; the definition of the hybrid class is: the main characteristic representation blocks two of the effects chains ( Or more than two) links. For hybrid classes, I just want to say such a sentence, it provides a huge space for design, give, and configuring sentence classes. We need more Ph.D. to study this topic with great significance.
5, conclude
I am most afraid to look forward to the future. Because of the first, in my opinion, the future learned a lot of celebrities, but it is not a science. Second, I am just a more diligent "hedgehog", and I will never learn the wisdom of "fox", and I am looking forward to this wisdom.
The prospects of the future of HNC cannot be conducted. It is necessary to combine the second phase of the Chinese information to be treated with Mr. Xu Jialu.
The second phase of the campaign started, and the process ended. This battle has been in foreign countries (because people don't need the word processing of the battle), "breakthrough" progress has been repeatedly reported, but nor one is not a flower. Natural language processing belongs to the branch of artificial intelligence this "hodgepodge" discipline, the only Nobel Prize winner in the master character of this discipline H · Simon once two observed the glorious prospects of artificial intelligence, but all ended. We cannot consider the root causes of the above historical background or events when we look forward to the prospect of the second battle.
The application field of the second campaign involves the core technology of language knowledge processing, which is absolutely inseparable from computer understanding of the natural language. The practice of bypassing a semantic or understanding of a different path is the international mainstream of the current natural language processing technology route, but Mr. is clearly pointed out that the road to winding the semantic is not good. I fully support this basic discussion of Mr. Xu. The HNC theory is an understanding of the computers to formulate a standard, namely class analysis technology, and a series of corresponding knowledge representations for this technology. Now, this technology has been initially implemented on the computer, and there are many reports in this seminar.
The current basic potential is: Two wings (software and knowledge base) of sentence class analysis technology and the theoretical basis need to have a new leap to win the decisive victory of the second battle.
The main sign of the software new leap is: to achieve comprehensive "compilation" or similar symbolic system (but must not be a natural language symbol system itself) to achieve comprehensive "compilation"; upgrade sentence class analysis technology to extended clause analysis technology, implement understanding 20 difficulties of the 20 difficulties.
The main sign of the new leap in the knowledge base is to further improve the sentence knowledge system or similar statement knowledge system (but must not be just a syntax knowledge system); increase the coverage of words expected knowledge to approaching the human brain.
The main sign of the theoretical basic new leap is: the formation of contextual knowledge framework, can't use only "fox" research, to promote the transformation of "hedgehog" mode; set the sentence group, paragraph, chapter point to represent frame form And advance the same transition in the research method.
The key to achieving new leaps is to emerge in a team of top talents. This team not only needs theoretical and technical talents, but also needs the market operation of the market; it is necessary to cooperate with Zhang Liang, Han Xin, Xiao He's close cooperation, and a Liu Bang is needed.
I am willing to recommend it, act as half a half.
The core of this team is being formed, but it needs to be extended ten times.
The basic formation of this extension team is the second campaign decisive victory upcoming reliably sign.
This is the outlook for my second campaign for Chinese information. The future of HNC is of course attached to this prospect.
* This article gets the support of 973 project "G199803050" "
** This article listened to many important recommendations of Li Yaoyong's post-doctors, and he was first acknowledgment.