1, Chinese character encoding principle
What should I do if I have randomly generate Chinese characters? Where is the Chinese characters come from? Is there a background data table, which store all of the Chinese characters needed, and use the program randomly remove several Chinese character combinations? This is also a way to use the background database to save all Chinese characters, which is also a way, but Chinese Chinese characters have so many, how can I make it? In fact, you can do this without any background database, you can use the program. To know how to generate Chinese characters, you must first understand the principle of the coding of Chinese characters.
In 1980, in order to make each Chinese character have a national unified code, my country promulgated the first Chinese character code: GB2312-80 "Information exchange with Chinese characters coding character set", referred to as GB2312, this character set is my country The development foundation of Chinese information processing technology is also a unified standard of all Chinese character systems in China. It has been announced later, the national standard GB18030-2000 "Extension of Chinese characters coding character set" This is the most important Chinese character coding standard after GB2312-1980 and GB13000-1993, and is also one of the basic standards that the computer system must follow in the future.
Currently in the Chinese Windows operating system, the default code page in .NET programming is GB18030 Simplified Chinese. But in fact, if the Chinese Chinese character verification is only required to use the GB2312 character set. In addition to the Chinese characters we all know, there are many Chinese characters we don't know, and they don't know much. If there are many Chinese characters we don't know in the Chinese characters verification code, let us enter, for friends who use the pinyin input method, it is not a good thing, five users can still play out according to the long words of Chinese characters, huh, huh! So Chinese characters in GB2312 characters are not all used.
Chinese Chinese characters can be expressed by the area code, see
Chinese zone bit code table http://navicy2005.home4u.china.com/resource/gb2312tbl.htm Chinese character zone code code table http://navicy2005.home4u.china.com/resource/gb2312tbm.htm
In fact, the two tables are the same thing, but only one use of a hex partition representation, a digital location where the location is located. For example, the hexadecimal code of "Good" is Ba C3. The first two are the area, the latter two representative positions, Ba is in the 26th district, "Good" in this area, the 35th, the 35th place, the 35th place is C3 Location, the digital code is 2635. This is the principle of GB2312 Chinese characters. According to the "Chinese Zone Bit Code", we can find that the 15th district is that the AF area has no Chinese characters. Only a small number of symbols, Chinese characters start from the 16th District B0, which is why GB2312 character set starts from 16 district.
2, .NET program Processing Chinese character encoding principle analysis can use System.Text in .NET to process encoding of all languages. In the System.Text namespace containing numerous encoded classes, you can operate and transform. The ENCoding class is a class that focuses on Chinese character encoding. By querying the Encoding class in the .NET document we can find that all and text codes are all byte arrays, two of which are well used:
Encoding.getbytes () Method The full or part of the contents of the specified String or character array is the byte array eNCoding.getstring () method to decode the specified byte array to a string.
That's right, we can use these two ways to encode the Chinese character character as a byte array, and also know the byte array coding of the Chinese character GB2312 can also decode the byte array as a Chinese character. After encoding the "good" word, after the byte array
Encoding GB = system.text.Encoding.Getencoding ("GB2312"); object [] bytes = gb.encoding.getbytes ("good"); found a length of 2 byte arrays bytes, use
String lowcode = system.convert.tostring (Bytes [0], 16); // Remove Element 1 Coded Content (Two - One 16 Enter) String Hightcode = System.convert.toString (Bytes [1], 16); // Take out element 2 encoded content (two-digit 16)
After that, the contents of the byte array BYTES16 began to make {ba, c3}, just a "good" hexadecimal area code (see area code table). Therefore, we can randomly generate a hexadecimal array of lengths 2, and use the getString () method to decode it to get the Chinese characters. However, for the generated Chinese Chinese character verification code, because the 15th district is that the AF area has no Chinese characters before, only a small amount of symbols, the Chinese characters start from the 16th District B0, and the Chinese characters after the location D7 will be the same, it is very difficult. Both Chinese characters, so these are discharged. Therefore, randomly generated Chinese character hexadecimal code No. 1 bit is between B, C, D, if the first bit is D, the second bit code cannot be 7 after seven hexadecimal numbers. Take a look at the area code table found that the first position of each zone is empty, no Chinese characters, so the third bit of the randomly generated location code, if it is a, the 4th is not 0; If the 3 digits are f, the 4th is not F. Ok, I know the principle, the process of randomly generate Chinese Chinese characters will come out, the following is the C # console code that generates 4 random Chinese characters:
3, program code:
Using system; usingspace consoleapplication {class chinesecode {public static void main () {// Get GB2312 Code Page (Table) Encoding GB = Encoding.GeteNCoding ("GB2312"); // Call function generates 4 random Chinese Chinese characters encoding object [] bytes = creteregioncode (4); // Decoding Chinese Chinese characters string str1 = gb.getstring according to the character array of Chinese character encoding ((byte [0], typeof (bytes [0], typeof [])); String str2 = gb.getstring (Bytes []) Convert.ChangeType (bytes [1], TypeOf (Byte []))); string str3 = gb.getstring ((byte []) Convert. ChangType (bytes [2], Typeof (byte []))); string str4 = gb.getstring (byte []) Convert.ChangeType (Bytes [3], Typeof (byte []))); // Output Console.writeline (STR1 STR2 STR3 STR4);} / ** // * This function randomly creates an array of hex byte arrays including two elements in the Chinese character encoding range, each byte array representative One Chinese character and stores four bytes arrays in the Object array.
Parameters: Strlength, representing the needs to generate Chinese characters * / public static object [] createRegioncode (int str hard "{// Define a string array stored Chinese character encoding element String [] RBASE = new string [16] {" 0 "" 1 "," 2 "," 3 "," 4 "," 5 "," 6 "," A "," B "," C ", "d", "e", "f"}; random rnd = new random (); // Defines an Object array to use to object [] bytes = new object [strLength]; / ** // * Each cycle A hex three-byte array containing two elements, and put it in the BJECT array, each Chinese character has four zone code to form a location code 1st and location code second bit as the first element of the byte array Location code third bit and location code 4th bit as the second element * / for (int i = 0; i