Some solutions and experiences in Chinese garbled issues

xiaoxiao2021-03-06  80

1. Bytes and Unicode Java kernels are Unicode, even Class files are also, but many media, including file / streams save ways to use word current. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The functions of Byte / Char in Java are in the middle of Sun.IO. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very commonly used static functions are

Public static bytetocharconvert ();

Public static bytetocharconverter getConvert (String Encoding);

If you don't specify Converter, the system automatically uses the current Encoding, with GB platform with GBK, EN platform 8859_1.

Byte -> char:

"You" GB code is: 0xc4e3, Unicode is 0x4f60

String encoding = "GB2312";

BYTE B [] = {(byte) '/ u00c4', (byte) '/ u00E3'};

ByTocharconverter CONVERTER = bytetocharconverter.getConverter (Encoding);

Char C [] = converter.convertall (b);

For (int i = 0; i

System.out.println (Integer.tohexString (C [i]));

}

what's the result? 0x4f60

If encoding = "8859_1", what is the result? 0x00c4, 0x00e3

If the code is changed to

BYTE B [] = {(byte) '/ u00c4', (byte) '/ u00E3'};

Bytetocharconvert (); getDefault ();

Char C [] = converter.convertall (b);

For (int i = 0; i

System.out.println (Integer.tohexString (C [i]));

}

What will the results will it be? Depending on the encoding of the platform.

Char -> Byte:

String encoding = "GB2312";

Char C [] = {'/ u4f60'};

Chartobyteconverter CONVERTER = Chartobyteconverter.getConverter (Encoding);

Byte b [] = converter.convertall (c);

For (int i = 0; i

System.out.println (Integer.tohexString (B [i]));

}

what's the result? 0x00c4, 0x00e3

If encoding = "8859_1", what is the result? 0x3f

If the code is changed to

String encoding = "GB2312";

Char C [] = {'/ u4f60'};

Chartobyteconverter CONVERTER = chartobyteconvert (); byte b [] = converter.convertall (c);

For (int i = 0; i

System.out.println (Integer.tohexString (B [i]));

}

What will the results will it be? Depending on the encoding of the platform.

Many Chinese issues are derived from these two simplest classes. However, many classes don't directly support Encoding entries, which brings us more inconvenience. Many procedures are rare to use Encoding, directly with Default's Encoding, which gives us a lot of difficulties.

2.UTF-8

UTF-8 is corresponding to Unicode, which is very simple

7-bit unicode: 0 _ _ _ _ _ _ _

11 unicode: 1 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ _

16-bit unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ _

21 unicode: 1 1 1 1 0 _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _

Most of the cases are only available to Unicode below:

"You" GB code is: 0xc4e3, Unicode is 0x4f60

Binary of 0xC4E3:

1100, 0100, 1110, 0011

Since only two we are in the two codes, we have found this line, because the 7th is not 0, therefore, return "?"

0x4f60 binary:

0100, 1111, 0110, 0000

We make up with UTF-8 to become:

1110, 0100, 1011, 1101, 1010, 0000

E4 - BD - A0

Then return: 0xE4, 0XBD, 0xA0.

3.String and Byte []

String is actually core is char [], however, to convert Byte into string, must be encoded. String.length () is actually the length of the char array, and if you use different codes, it is likely to be scattered, resulting in scattering and garbled.

E.g:

String encoding = "";

Byte [] b = {(Byte) '/ u00c4', (byte) '/ u00E3'};

String str = new string (b, eNCoding);

If eNCoding = 8859_1, there will be two words, but Encoding = GB2312 is only one word this problem in processing paging.

4.Reader, Writer / InputStream, OutputStream

Reader and Writer cores are CHAR, INPUTSTREAM, and OUTPUTSTREAM cores are BYTE. But the main purpose of Reader and Writer is to read Char read / write InputStream / OutputStream.

E.g:

Document Test.txt has only one "you" word, 0xc4, 0xe3

String encoding = "GB2312";

InputStreamReader Reader = New FileInputStream ("Text.txt"), Encoding;

Char C [] = new char [10];

INT length = reader.read (c);

For (INT i = 0; i

System.out.println (C [i]);

}

what's the result? you

If encoding = "8859_1", what is the result? ?? Two characters, indicating that they don't know.

Instead, do it yourself.

5. We have to know about the Java's compiler:

Javac? encoding

We often have no encoding parameters. In fact, ENCODING is important for cross-platform operations. If you do not specify eNCoding, follow the system's default eNCoding, the GB platform is GB2312, and the English platform is ISO8859_1.

Java's compiler actually calls Sun.Tools.javac.main class, compiles files, this class has an encoding variable in the middle of the Compile function, and -Encoding parameters are actually transmitted to the Encoding variable. The compiler is based on this variable, and then compiles the UTF-8 form into a Class file.

Example code:

String str = "you";

FileWriter Writer = New FileWriter ("text.txt");

Write.write (STR);

Writer.close ();

If you compile with GB2312, you will find the field of E4 BD A0;

If you compile with 8859_1, the binary of 00c4 00e3:

0000, 0000, 1100, 0100, 0000, 0000, 1110, 0011

Because each character is greater than 7 bits, so use 11-bit encoding:

1100,000, 0100, 1100, 0011, 1010,0011

C1 - 84 - C3 - A3

You will find C1 84 C3 A3.

But we tend to ignore this parameter, so this often has a cross-platform problem:

Sample code is compiled on the Chinese platform to generate zhclass

Sample code compiles on English platform, output enclass

(1). ENCLASS executes OK on the Chinese platform, but not on the English platform

(2). Enclass executes OK on the English platform, but not on the Chinese platform

the reason:

(1) After compiling on the Chinese platform, the STR runs on the running state, running on the Chinese platform, FileWriter's default encoding is GB2312, so ChartobyteConverter automatically uses CONVERTER calling GB2312 to transform STR Enter the fileOutputStream in the fileOutputStream, and 0xC4, 0XE3 is put into the file.

But if it is in the English platform, the default value of Chartobyteconvert is 8859_1. FileWriter automatically calls 8859_1 to transform STR, but he can't explain, so he will output "?"

(2). After compiling on the English platform, the STR is running on the running state, running on the Chinese platform, there is no way to identify, so there will be ??, on the English platform, 0x00c4 -> 0xc4 0x00e3-> 0xE3, therefore 0xC4, 0XE3 is placed in the file.

6. Other reasons: <% @ Page ContentType = "text / html; charset = GBK"%>

Set the display code of the browser, if the data of Response is UTF8 encoding, the display will be garbled, but garbled and the above reasons are still different.

7. Places where you have encoded:

? From the database to Java programs, Byte -> char

? From the Java program to the database char -> byte

? From the file to the java program Byte -> char

? From the Java program to file char -> byte

? From the Java program to the page display char -> byte

• Submit data from page form to Java programs BYTE -> char

? From flow to Java programs, BYTE -> char

? From the Java program to Char -> Byte

Solution of Xie Zhikang:

I am using the configuration filter to solve Chinese garbled:

requestfilter

net.golden.uirs.util.requestfilter

charset

GB2312

requestfilter

*. jsp

Public Void Dofilter (ServletRequest Req, ServletResponse Res,

Filterchain fchain) throws ioException, servletexception {

HTTPSERVLETREQUEST REQEST = (httpservletRequest) Req;

HTTPSERVLETRESPONSE RESPONSE = (httpservletResponse) res;

HttpSession session = request.getations ();

String userid = (string) session.getattribute ("UserID");

Req.setCharacterencoding (this.filterConfig.getinitParameter ("charset"); // Setting the character set?

In fact, it is set byte-> char's Encoding

Try {

IF (userid == null || userid.equals (")) {

if (! Request.getRequestURL (). TOSTRING (). matches.

". * / uirs / logon / logon (controller) {0,1} // x2ejsp $")) {session.INVALIDATE ();

Response.sendredirect (Request.GetContextPath ()

"/uirs/logon/logon.jsp");

}

}

Else {// Take a look at the permissions of information reporting system

IF (! Net.golden.uirs.util.uirschecker.check (userid, "information reporting system",

NET.GOLDEN.UIRS.UTIL.UIRSCHECKER.ACTION_DO)) {

if (! Request.getRequestURL (). TOSTRING (). matches.

"* / uirs / logon / logon (controller) {0,1} // x2ejsp $")) {

Response.sendredirect (Request.GetContextPath ()

"/uirst/logon/logontroller.jsp");

}

}

}

}

Catch (Exception EX) {

Response.sendredirect (Request.GetContextPath ()

"/uirs/logon/logon.jsp");

}

Fchain.dofilter (REQ, RES);

}

转载请注明原文地址:https://www.9cbs.com/read-107134.html

New Post(0)