Some solutions and experiences about Chinese garbled issues

xiaoxiao2021-03-06  79

1.byte and Unicode Java kernels are Unicode, even Class files are also, but many media, including file / streams save the use of word streams. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The functions of Byte / Char in Java are in the middle of Sun.IO. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very common static functions are: public static bytetocharconverter getDefault (); public static bytetocharconverter getConverter (String eNCoding); If you do not specify Converter, the system will automatically use the current Encoding, GB platform on GBK, EN platform Use 8859_1. BYTE -> Char: "You" GB code is: 0xc4e3, Unicode is 0x4f60, the presentation example is as follows: import sun.io. *; import java.io. *;

Public class bytetochar {

public static void main (String [] args) {String encoding = "gb2312"; byte b [] = {(byte) '/ u00c4', (byte) '/ u00e3'}; try {ByteToCharConverter converter = ByteToCharConverter.getConverter ( Encoding; try {char c [] = converter.convertall (b); for (int i = 0; i

what's the result? 0x4f60 If encoding = "8859_1", what is the result? 0x00c4, 0x00e3 If the code is changed to Byte B [] = {(byte) '/ u00c4', (byte) '/ u00E3'}; bytetocharconverter CONVER = bytetocharconverter. GetDefault (); char c [] = converter.convertall (b) For (int i = 0; i byte: string encoding = "GB2312"; char C [] = {'/ u4f60'}; chartobyteconverter converter = chartobyteconverter.getConverter (Encoding); Byte B [] = Converter.convertall (c); for (int); for i = 0; i

2.UTF-8UTF-8 is corresponding to Unicode, which is very simple 7-bit unicode: 0 _ _ _ _ _ _ _ 11-bit unicode: 1 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ _ 16-bit unicode: 1 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ 21-bit unicode: 1 1 1 0 _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ most case is only 1 of the 16-bit unicode: "You" GB code is: 0xC4E3, ​​Unicode is the binary of 0x4f60 0xc4e3: 1100 Since only two we are in the two codes, we found this way, because the 7th is not 0, therefore, return "?" 0x4f60 binary: 0100, 1111, 0110, 0000 We Replenish with UTF-8, becomes: 1110, 0100, 1011, 1101, 1010, 0000 E4 - BD - A0 there is returned: 0xE4, 0xBD, 0xA0.3.String and Byte [] String actually core is char [ ], However, to convert BYTE into string, must be encoded. String.length () is actually the length of the char array, and if you use different codes, it is likely to be scattered, resulting in scattering and garbled. For example: String encoding = ""; byte [] b = {(byte) '/ u00c4', (byte) '/ u00E3'}; string str = new string (b, eNCoding); if encoding = 8859_1, there will be two One word, but eNCoding = GB2312 is only one word. This problem occurs frequently in processing paging. 4. Reader, Writer / InputStream, OutputStreamReader and Writer core are CHAR, INPUTSTREAM, and OUTPUTSTREAM cores are BYTE. But the main purpose of Reader and Writer is to read Char read / write InputStream / OutputStream. For example: file test.txt has only one "you" word, 0xc4, 0xE3String Encoding = "gb2312"; inputStreamReader reader = new inputStreamReader (New FileInputStream ("text.txt"), encoding; char c [] = new char [10 INT length = reader.read (c); for (int i = 0; i

5. We have to know about the Java compiler: Javac? Encoding We often have no encoding parameters. In fact, ENCODING is important for cross-platform operations. If you do not specify eNCoding, follow the system's default eNCoding, the GB platform is GB2312, and the English platform is ISO8859_1. Java's compiler actually calls Sun.Tools.javac.main class, compiles files, this class has an encoding variable in the middle of the Compile function, and -Encoding parameters are actually transmitted to the Encoding variable. The compiler is based on this variable, and then compiles the UTF-8 form into a Class file. Example code: string str = "you"; filewriter Writer = New FileWriter ("text.txt"); Writer.close (); if you compile with GB2312, you will find the field of E4 BD A0; If you compile with 8859_1, binary: 00000000, 0000, 1100, 0100, 0000, 0000, 1110, 0011, because each character is greater than 7 bits, so use 11-bit encoding: 1100,0001, 11000, 0100, 1100, 0011, 1010,0011 C1 - 84 - C3 - A3 You will find C1 84 C3 A3. But we tend to ignore this parameter, so this often has a cross-platform problem: sample code compiles on the Chinese platform, generates the zhclass sample code compiled on the English platform, outputs Enclass (1). ENCLASS executive on Chinese platform OK, but not on the English platform (2). Enclass executes OK on the English platform, but not on the Chinese platform: (1). After compiling on the Chinese platform, in fact, STR is running with the state of the state [] is 0x4f60 In the Chinese platform, the default code of FileWriter is GB2312, so the ChartobyteConverter automatically uses the CONVERTER calling GB2312 to convert the STR into the fileOutputStream, so 0xC4, 0XE3 is put into the file. But if it is in the English platform, the default value of Chartobyteconvert is 8859_1. FileWriter will automatically call 8859_1 to transform STR, but he can't explain, so he will output "?" (2). After compiling on English platform, in fact STR CHAR [] is 0x00c4 0x00E3 in the running state, running on the Chinese platform, there is no way to identify, so it will appear; on the English platform, 0x00c4 -> 0xc4, 0x00e3-> 0xE3, therefore 0xc4, 0xe3 is put into File. 6. Other reasons: <% @ page contenttype = "text / html; charset = GBK"%> Set the display code of the browser, if the result of Response is UTF8 encoding, the display will be garbled, but garbled and the above reasons are still different .

7. Places where you have encoded:? From the database to Java programs Byte -> char? From the Java program to the database char -> byte? From the file to the Java program Byte -> char? To the file char -> BYTE? From the Java program to the page Display char -> byte? Submit data from page form to Java programs Byte -> char? From the Java program Byte -> char? CHAR -> Byte Xie Zhi Gang Workaround: I use the method of configuring the filter to solve Chinese garbled: RequestFilter net.golden.uirs.util. Requestfilter charset GB2312 Requestfilter *. Jsp public void Dofilter (ServletRequest Req, ServletResponse Res, Filterchain fChain) throws IOException, ServletException {HttpServletRequest request = (HttpServletRequest) req; HttpServletResponse response = (HttpServletResponse) res; HttpSession session = request.getSession (); String userId = (String) session.getAttrib Ute ("UserID"); Req.SetCharacterencoding (this.FilterConfig.GetItParameter ("charset")); // Setting the character set? In fact, it is set up byte -> char of Encoding try {if (userid == null || userid.equals (")) {if (! Request.getRequestURL (). TOSTRING (). Matches (". * / UIRS / LOGON / LOGON (CONTROLLLER) {0,1} // x2ejsp $ ")) {session.invalidate (); response.sendredirect (Request.GetContextPath () " /uirs/logon/logon.jsp ");

转载请注明原文地址:https://www.9cbs.com/read-120236.html

New Post(0)