Conversion requirements of the input method: the number of four below - "words or words
The code of the word library needs: Word - "Code
From the narrative from the previous article, it can be seen in the above demand, requiring the following libraries:
Fontue: Common, rare
Word Library: Common, rare
Considering that only 16 bits (not supported), and the coding is only 11110 (calculated with the arrangement of :)
I use data files and methods of index files.
Where: Data file stores actual content, index file puts the encoding or this word as an index pointing to content in the data file (this design is always the entire input method design)
Fonts index:
First, I made a code-"index of Chinese characters, convert the encoding to a four-digit number, then take the value of this four-digit value as the index in the index file, then find the word corresponding to this code according to the index Start position. So saying that it may not understand, give an example:
Encoding: 3584, 3584 * 2 in an index file (2 is 2 is because of a index value for 16 Bit, ie two bytes), the content of the word size in the code 3584 is in the data file. Start location, then follow the position of this starting file, know that a sentence end flag is found, where all words are all words corresponding to 3584, as for how to display just a small problem.
If the content of the data file is:
3584 resignation of the dumpling 觪辤 皎 皎 皎 鲛 鲛鱆
Then the 3584 code corresponds to the word, this will have dozens of files to read the operating speed is not very fast, but for the input method is minimal.
The word library coding generation:
The encoding of the words said that the code can be generated by the code, the generation rule is also introduced in the previous article, and the code to use, that is, the code is required to check the code.
I analyzed the word, found that only 0x0000 - "0xfff is only 0xffff, and one encoding can be replaced with a word (two bytes), then an index file can be established, and the value of Chinese characters is index. The content of the index file location is numerical encoded, then you can get this word encoded, the entire file size 64 * 4 = 128K, that is, a 128K index file, although most of the places areted, you can change fast Search is also worth it.
The index of the word:
The index of the facts is similar, but also changed from the coding to the index and then removes a line from the data file, but only the words should be separated with spaces or TAB, and the word does not need to be required.
Broken, there is a problem:
Do you think about what is the problem? Oh, some words "one", only one code, there is only two three codes, this time the value encoding "99" is ninety-nine, and the code "099" corresponds to ninety-nine ,How to do?
Oh, then I will add 1 for each one, then use the hexadecimal, ie the value of 0013 is 0x1124,0099 0x11AA, so there is no problem, but for insurance (behind, I have encountered problems) , The empty coding is 0xF, that is, the encoding 042 corresponds to 0xF153, which is nothing.
Ok, I wrote here today, some mess, organize and write