Detailed Java Chinese Issue (Master Must Read)

zhaozj2021-02-08  377

Let me know how Tomcat implements JSP. Preparatory knowledge: 1. Bytes and Unicode Java kernels are Unicode, even Class files are also, but many media, including file / streams are used to use word current. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The function of Byte / Char in Java is in the middle of Sun.io's package. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very commonly used static functions are public static bytetocharconvert (); public static bytetocharconverter getConverter (String eNCoding); if you do not specify Converter, the system will automatically use the current Encoding, GB platform with GBK, EN platform 8859_1 Let's come to a simple example: "You" GB code is: 0xc4e3, Unicode is 0x4f60 you use: --encoding = "gb2312"; --BYTE B [] = {(byte) '/ u00c4', Byte) '/ u00E3'}; --convertor = bytetocharconverter.getConvertector; --char [] c = converter.convertall (b); --for (int i = 0; i

Many procedures are rare to use Encoding, directly with Default's Encoding, this brings a lot of difficulties to our transplant - 2.UTF-8 --UTF-8 is corresponding to Unicode, which is very simple - - 7 bits of Unicode: 0 _ _ _ _ _ _ --11 bits of Unicode: 1 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ _16 bits of Unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ __21 a bit of Unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ - only 1 of the 16-bit Unicode: - "You" GB code is: 0xC4E3, ​​Unicode is 0x4f60 - We still use the example - - Example 1: 0XC4E3 binary: - - 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 - - Due to only two we are in the two codes, we found this invented, - - Because the 7th is not 0 therefore, return "?" - - - - Example 2: 0x4F60 binary: - - 0 1 0 0 1 1 1 0 1 0 0 0 0 0 - - We use UTF-8 to make up, becoming: - - 11100100 10111101 10100000 - - E4 - BD - A0 - - Already returned 0xE4, 0XBD, 0xA0 - - 3 . String and byte [] - String actually core is char [], however, to convert Byte into string, must be encoded. --String.Length () is actually the length of the Char array, if you use different coding, it can all be scattered, resulting in scattering and garbled. - Example: ---- Byte [] b = {(byte) '/ u00c4', (byte) '/ u00E3'}; ---- String Str = New String (b, eNCoding); ---- ---- If eNCoding = 8859_1, there will be two words, but Encoding = GB2312 only one word ---- - This problem is in processing paging frequently 4.Reader, Writer / InputStream, OutputStream --Reader and Writer The core is CHAR, INPUTSTREAM and OUTPUTSTREAM cores are BYTE.

- But Reader and Writer's main purpose is to read Char read / write InputStream / OutputStream - a reader: - Document Test.txt has only one "you" word, 0xc4, 0xe3 ---- string encoding =; --NPutStreamReader Reader = New InputStreamReader (- New FileInputStream ("text.txt"), eNCoding; - char [] c = new char [10]; - int length = reader.read (c); --for (int i = 0; i

转载请注明原文地址:https://www.9cbs.com/read-949.html

New Post(0)