Reprinted: Java Chinese problem detailed, underlying coding anatomy

xiaoxiao2021-03-06 139

Detailed knowledge of java Chinese issues: 1. Bytes and Unicode Java kernels are Unicode, even Class files are also, but many media, including file / streams save ways to use word stream. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The function of Byte / Char in Java is in the middle of Sun.io's package. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very commonly used static functions are public static bytetocharconvert (); public static bytetocharconverter getConverter (String eNCoding); if you do not specify Converter, the system will automatically use the current Encoding, GB platform with GBK, EN platform 8859_1 Let's come to a simple example: "You" GB code is: 0xc4e3, Unicode is 0x4f60 you use: --encoding = "gb2312"; --BYTE B [] = {(byte) '/ u00c4', Byte) '/ u00E3'}; --convertor = bytetocharconverter.getConvertector; --char [] c = converter.convertall (b); --for (int i = 0; i

Many procedures are rare to use Encoding, directly with Default's Encoding, this brings a lot of difficulties to our transplant - 2.UTF-8 --UTF-8 is corresponding to Unicode, which is very simple - - 7 bits of Unicode: 0 _ _ _ _ _ _ --11 bits of Unicode: 1 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ _16 bits of Unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ __21 a bit of Unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ - only 1 of the 16-bit Unicode: - "You" GB code is: 0xC4E3, Unicode is 0x4f60 - We still use the example - - Example 1: 0XC4E3 binary: - - 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 - - Due to only two we are in the two codes, we found this invented, - - Because the 7th is not 0 therefore, return "?" - - - - Example 2: 0x4F60 binary: - - 0 1 0 0 1 1 1 0 1 0 0 0 0 0 - - We use UTF-8 to make up, becoming: - - 11100100 10111101 10100000 - - E4 - BD - A0 - - Already returned 0xE4, 0XBD, 0xA0 - - 3 . String and byte [] - String actually core is char [], however, to convert Byte into string, must be encoded. --String.Length () is actually the length of the Char array, if you use different coding, it can all be scattered, resulting in scattering and garbled. - Example: ---- Byte [] b = {(byte) '/ u00c4', (byte) '/ u00E3'}; ---- String Str = New String (b, eNCoding); ---- ---- If eNCoding = 8859_1, there will be two words, but Encoding = GB2312 only one word ---- - This problem is in processing paging frequently 4.Reader, Writer / InputStream, OutputStream --Reader and Writer The core is CHAR, INPUTSTREAM and OUTPUTSTREAM cores are BYTE.

- But Reader and Writer's main purpose is to read Char read / write InputStream / OutputStream - a reader: - Document Test.txt has only one "you" word, 0xc4, 0xe3 ---- string encoding =; --NPutStreamReader Reader = New InputStreamReader (- New FileInputStream ("text.txt"), eNCoding; - char [] c = new char [10]; - int length = reader.read (c); --for (int i = 0; i

- But if it is in the English platform, the default value of Chartobyteconvert is 8859_1. - FileWriter automatically calls 8859_1 to transform STR, but he can't explain, so he will - output "?" ---- --2 After compiling on the English platform, the STR is running with CHAR [] is 0x00c4 0x00e3, and the Chinese cannot identify on the Chinese platform, so it will appear ?? on the English platform, 0x00c4 -> 0xc4, 0x00e3-> 0xe3, therefore 0xc4, 0xe3 is put into - file ---- 1. Explanation for the text of JSP: - Tomcat first look at the "<% @ @ @ @ omc Page Include symbol. Yes, then set response.setContentType (..) in the same - place; follow the eNCoding, do not follow the file according to 8859_1--, then write it into .java file with UTF-8, then Use Sun.Tools.main to read this file, - (Of course it uses UTF-8 to read), then compile it into a class file - SetContentType changes the properties of the OUT, the outable of the out variable is 8859_12. Parameter - Unfortunate Parameter only interpretation of ISO8859_1, this quality can be found in the server's implementation code. 3. Interpretation of the include, but very unfortunate, because the write "org.apache.jasper.compiler.parser "People in array jsputil.validattribute [] Forgot to add a parameter: Encoding, thus causing not supporting this way. You can compile the source code, plus support for Encoding: If you are under NT, the easiest The method is to deceive Java, do not add any encoding variables: Hello <% = Request.getParameter ("Value")%> http://localhost/test/test.jsp? Value = You Results: Hello you, but this method is limited, such as segmentation of the uploaded article, this approach is dead, the best solution is to use this program: <% @ Page ContentType = "Text / HTML; Charset = GB2312 "%> Hello <% = new string (Request.GetParameter (" Value "). getBytes (" 885 9_1 ")," GB2312 ")%> BLOG:

http://blog.9cbs.net/feng_sundy/

转载请注明原文地址:https://www.9cbs.com/read-99362.html

9cbs

New Post(0)