ICTCLAS word system research (3) - atomic cutoff

xiaoxiao2021-03-19  189

The first step in ICTCLAS word is atomic fraction. However, before conducting atomic sections, first, the selection process is first made. The so-called segmentation is based on the separation mark of the statement such as the separator, the carriageway, and the source string is separated into a plurality of slightly simple short sentences, and then perform the word processing. Foundation results.

After dividing a short sentence, atomic word can be performed, the so-called atom refers to the unsearbable minimal mymestone unit in the short sentence. A Chinese character, the beginning of the start identification field, all-round punctuation, and the digital alphanumeric characters, etc. The last case can be exemplified, such as: Samsung SHX-132 model of mobile phone 1 yuan, the SHX-132, 1 is an atom, and the other Chinese characters are an atom.

According to this way, the results of atomic word are formed by simple Chinese character segmentation, and each atomic unit is analyzed. NPOS = 1 indicates that the start tag, NPOS = 4 indicates the end tag, and NPOS = 0 indicates an unrecognized word. The data structure after atomic segmentation is shown in Figure 1:

Figure one

Examples of atomic word were shown in Figure 2:

Figure II

After the atomic word, the first word can be performed below. See the word system research (4).

转载请注明原文地址:https://www.9cbs.com/read-130174.html

New Post(0)