Java's solutions and experiences about Chinese garbled problems

xiaoxiao2021-03-06  14

I. Byte and UnicodeJava kernel are Unicode, even Class files are also, but many media, including file / streams save mode use word current. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The functions of Byte / Char in Java are in the middle of Sun.IO. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very commonly used static functions are:

Public static bytetocharconverter getDefault (); public static bytetocharconverter getConverter (String eNCoding)

If you don't specify Converter, the system automatically uses the current Encoding, with GB platform with GBK, EN platform 8859_1.

Byte -> char: "You" GB code is: 0xC4E3, ​​Unicode is 0x

4f

60String Encoding = "GB2312"; BYTE B [] = {(Byte) '/ U

00C

4 ', (byte)' / u00E3 '}; bytetocharconverter converter = bytetocharconverter.getConverter (Encoding); char c [] = converter.convertall (b); for (int i = 0; i

4F

60 If encoding = "8859_1", what is the result? 0x

00C

4,0x00e3

If the code is changed to:

Byte b [] = {(byte) '/ u

00C

4 ', (byte)' / u00E3 '}; bytetocharconverter control = bytetocharconverter. GetDefault (); char c [] = converter.convertall (b); for (int i = 0; i

What will the results will it be? This is to be determined according to the encoding of the platform.

Char -> Byte: String Encoding = "GB2312"; char C [] = {'/ u

4F

60 '}; chartobyteconverter.getConverter (Encoding); Byte B [] = Converter.convertall (c); for (int i = 0; i

00C

4,0x00e3 If encoding = "8859_1", what is the result? 0x

3F

If the code is changed to String Encoding = "GB2312"; char C [] = {'/ u4f

60 '}; chartobyteconverter = chartobyteconverter.getDefault (); byte b [] = converter.convertall (c); for (int i = 0; i

What will the results will it be? Still depending on the encoding of the platform. Many Chinese issues are derived from these two simplest classes. However, many classes don't directly support Encoding entries, which brings us more inconvenience. Many procedures are rare to use Encoding, directly with Default's Encoding, which gives us a lot of difficulties. Second, UTF-8 UTF-8 is corresponding to Unicode, which is very simple:

7-bit unicode: 0 _ _ _ _ _ _ 11 bits of Unicode: 1 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ 16-bit unicode: 1 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ 21 bits of Unicode: 1 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ _ _ _ _ _ _ _ _

Most of the cases are only available to Unicode below:

"You" GB code is: 0xc4e3, Unicode is 0x

4F

60 0xc4e3's binary: 1100, 0100, 1110, 0011

Since only two we are in the two codes, we have found this line, because the 7th is not 0, therefore, return "?"

0x

4F

60 binary: 0100, 1111, 0110, 0000 We use UTF-8 to make up: 1110, 0100, 1011, 1101, 1010, 0000 E4 - BD - A0 there is returned: 0xE4, 0XBD, 0xA0.

Third, string and byte [] String actually core is char [], however, to convert Byte into String, must be encoded. String.length () is actually the length of the char array, and if you use different codes, it is likely to be scattered, resulting in scattering and garbled. E.g:

String encoding = ""; byte [] b = {(byte) '/ u

00C

4 ', (Byte)' / u00E3 '}; string str = new string (b, eNCoding);

If eNCoding = 8859_1, there will be two words, but Encoding = GB2312 is only one word this problem in processing paging. Fourth, Reader, Writer / InputStream, OutputStream Reader and Writer core are CHAR, INPUTSTREAM, and OUTPUTSTREAM cores byte. But the main purpose of Reader and Writer is to read Char read / write InputStream / OutputStream. For example: file test.txt has only one "you" word, 0xc4, 0xE3String Encoding = "gb2312"; inputStreamReader reader = new inputStreamReader (New FileInputStream ("text.txt"), encoding; char c [] = new char [10 ]; Intleth = reader.read (c); for (int i = 0; i

what's the result? It's you". If encoding = "8859_1", what is the result? "??" two characters, indicating that you don't know. Instead of examples

5. We have to know about Java's compiler:

Javac? Encoding

We often have no encoding parameters. In fact, ENCODING is important for cross-platform operations. If you do not specify eNCoding, follow the system's default eNCoding, the GB platform is GB2312, and the English platform is ISO8859_1. Java's compiler actually calls Sun.Tools.javac.main class, compiles files, this class has an encoding variable in the middle of the Compile function, and -Encoding parameters are actually transmitted to the Encoding variable. The compiler is based on this variable, and then compiles the UTF-8 form into a Class file. Example code:

String str = "you"; FileWriter Writer = New FileWriter ("text.txt"); Write.Write (); Writer.close (); If you compile with GB2312, you will find the field of E4 BD A0; if you use 8859_1 Compilation,

00C

4 0000, 0000, 1100, 0100, 0000, 0000, 1110,0011 Because each character is greater than 7 digits, use 11-bit encodiated: 1100,0001, 1100, 0100, 1100, 0011, 1010,0011 C1 - 84 - C3 - A3 you will find C1

84 C

3 a

3

But we tend to ignore this parameter, so this often has a cross-platform problem: sample code compiles on the Chinese platform, generates the ENCLASS sample code compiles on the English platform, output Enclass (1) ENCLASS executes OK on Chinese platform However, it is not possible to perform OK on English platform (2) Enclass, but not on the Chinese platform is: (1) After compiling on the Chinese platform, in fact, STR is running the state's char [] is 0x4f

60. Run on the Chinese platform, the default code of FileWriter is GB2312, so the CHARTOBYTECONVERTER automatically uses the CONVERTER that calls GB2312, and converts the STR into Byte to enter the fileOutputStream, so 0xC4, 0XE3 is put into the file. But if it is in English platform, the default value of Chartobyteconvert is 8859_1. FileWriter will automatically call 8859_1 to transform Str, but he can't explain, so he will output "?" (2) After compiling on English platform, in fact STR Running state char [] is 0x

00C

4 0x00E3, run on the Chinese platform, Chinese can't identify, therefore will appear ??; on the English platform, 0x

00C

4 -> 0xc4, 0x00e3-> 0xE3, so 0xC4, 0XE3 is placed in the file. Sixth, other reasons:

<% @ page contenttype = "text / html; charset = GBK"%>

Set the display code of the browser, if the data of Response is UTF8 encoding, the display will be garbled, but garbled and the above reasons are still different. Seven, where the encoding occurs: 1. From the database to Java programs Byte -> char 2. From the Java program to the database char -> byte 3. From the file to the Java program Byte -> char 4. From the Java program to the file Char -> Byte 5. From the Java Program to Page Display Char -> Byte 6. Submitted from Page Form to Java Programs Byte -> Char 7. From the Java Program Byte -> Char 8. From Java Programs Blow Char -> Byte can use the configuration filter to solve Chinese garbled:

Requestfilter net.golden.uirs.util.requestfilter Charset GB2312 RequestFilter * Jsp public void doFilter (ServletRequest req, ServletResponse res, FilterChain fChain) throws IOException, ServletException {HttpServletRequest request = (HttpServletRequest) req; HttpServletResponse response = (HttpServletResponse) res; HttpSession session = request.getSession (); String userId = (String) session.getAttribute ( "userid"); req.setCharacterEncoding (this.filterConfig.getInitParameter ( "charset")); // set the character set? In fact, it is set up byte -> char of Encoding try {if (userid == null || userid.equals (")) {if (! Request.getRequestURL (). TOSTRING (). Matches (". * / UIRS / LOGON / LOGON (CONTROLLLER) {0,1} // x2ejsp $ ")) {session.invalidate (); response.sendredirect (Request.GetContextPath () " /uirs/logon/logon.jsp ");} } Else {// See if there is a permission IF (! Net.golden.uirs.util.uirschecker.Check (userid, "information reporting system", net.golden.uirs.UTIL.UIRSCHECKER.ACTION_DO)) {If (! Request.getRequestURL (). ToString (). Matches (". * / Uirs / logon / logon (controller) {0,1} // x2ejsp $

)) {Response.sendRedirect (request.getContextPath () "/uirs/logon/logonController.Jsp");}}}} catch (Exception ex) {response.sendRedirect (request.getContextPath () "/ uirs / logon /Logon.jsp ");} fchain.dofilter (REQ, RES);} Transfer from: http://www.javaresearch.org

转载请注明原文地址:https://www.9cbs.com/read-50644.html

New Post(0)