Java-Chinese character issues in depth

xiaoxiao2021-03-06 93

I. Topic: Chinese issues about Java

Java's Chinese problem is more prominent, mainly in control panel output, JSP page output, and database access.

This article tries to avoid the font problem, but only talks. Through this article, you can learn about the origin of java Chinese issues.

The solution, which brings a method of accessing the database with a JDBC.

Second, the problem description:

1) Compile and run in the Chinese W2000 Chinese window, using the international version of JDK, connected to Chinese W2000 CP936

Encoded SQL Server Database:

J: / supercise / demo / encode / helloworld> make

Wed May 30 02:54:45 CST 2001

J: / supercise / demo / encode / helloworld> RUN

Wed May 30 02:51:33 CST 2001

Chinese

[B @ 7bc8b569

[B @ 7b08b569

[B @ 7860b569

Chinese

????

Chinese

????

2) If compiled under the Chinese W2000's Western window (encoded 437), use Java to run due to no fonts

Often display, if you run on the Chinese W2000 in the Chinese W2000, the output is:

J: / supercise / demo / encode / helloworld> RUN

Wed May 30 02:51:33 CST 2001

????

[B @ 7bc0b66a

[B @ 7b04b66a

[B @ 7818b66a

????

Chinese

????

Third, analysis

1) There is garbled (that is,?). Because only? There is no small square box, indicating that there is a problem, not

Is a font problem. In the encoding, if it is converted from a character set to a character set, the typical is from GB2312

Convert to ISO8859_1 (ie, ASCII), then many Chinese characters (half of Chinese characters) are unable to map to Western characters.

In this case, use these characters in this case? instead. Similarly, there are also small character sets that cannot be collected.

The situation is not described herein.

2) There is a Chinese environment compiled, and the Chinese environment is running. Chinese characters have the correct place to have an incorrect place.

Compiling in a text environment, similar situations in the Chinese environment. This is due to automatic (default) or manual (also

The result of the never new string (Bytes [, Encode]) and Bytes getBytes ([Encode])).

2.1) in the Java Source File -> Javac -> Class -> Java -> Gettes () -> New String () -> Show

In each step, each step has an encoded conversion process, which always exists, but sometimes use the default parameters into

Row. Let's take a step by step to analyze why the above situation.

2.2) Here is the source code: HelloWorld.java:

---------------------------------------------------------------------------------------------------------------------------------------

Public class helloworld

{

Public static void main (string [] argv) {

Try {

System.out.println ("Chinese"); // 1

System.out.println ("Chinese" .GetBytes ()); // 2

System.out.println ("Chinese" .GetBytes ("GB2312"))); // 3

System.out.println ("Chinese" .GetBytes ("ISO8859_1"); // 4

System.out.println (New String ("Chinese" .GetBytes ())); //5

System.out.println (New String ("Chinese" .Getbytes (), "GB2312")); //6

System.out.println (New String ("Chinese" .Getbytes (), "ISO8859_1"); // 7

System.out.println (New String ("Chinese" .GetBytes ("GB2312"))); // 8

System.out.println (New String ("Chinese" .GetBytes ("GB2312"), "GB2312")); // 9

System.out.println (New

String ("Chinese" .GetBytes ("GB2312"), "ISO8859_1"))); // 10

System.out.println (New String ("Chinese" .GetBytes))))))))))))

System.out.println (New

String ("Chinese" .GetBytes ("ISO8859_1"), "GB2312")); // 12

System.out.println (New

String ("Chinese" .Getbytes ("ISO8859_1"), "ISO8859_1")); // 13

}

Catch (Exception E) {

E.PrintStackTrace ();

}

For convenience, the operation sequence number is added after each conversion, which is 1, 2, ..., 13, respectively.

2.3) It is to be explained that Javac is read from the source file by the system default, and then press Unicode to encode. in

When Java is running, Java is also encoded by Unicode and the default input and output are the silence of the operating system.

Record code, that is, in New String (Bytes [, Encode]), the system considers that the entered is encoded as Encode.

Byte stream, in other words, if you press Encode to translate BYTES to get the correct result, this result is in Ja

Save in VA, it is still to convert from this Encode into unicode, which means that there is bytes -> Encode character -> UNI

Code character's conversion; and in String.getbytes ([Encode]), the system is to be a Unicode character -> ENCO

DE Character -> BYTES conversion.

In this example, except when the English window is encoded, the default code is GBK in the case of this example (in this case, we will treat GBK and GB2312.

2.4) Due to the conversion that is not specified in the above-mentioned code implementation, the system will use it if the eNCode is not specified.

Recognized encoding (here GBK), we believe that the top 5, 6, 7 and 8, 9, 10 is the same, 8 and 9, 11 and 12 are also one

So, we will only discuss 1, 9, 10, 12, 13 in the discussion. The 2, 3, 4 is only used for testing, not in our

The discussion is within the scope of the discussion.

2.5) Let's take the translation of the "in the" word in the program, let us first say compilation and transportation under the Chinese window.

The procedure, pay attention to the following letters, I consciously use some numbers to express the same, different or

Related 2.5.1) Let's first take the code 9 in the above code segments as an example:

Steps Content Location Description

01: C1 HelloWorld.java C1 generally refers to a GBK character

02: U1 Javac read U1 generally refers to a Unicode character

03: C1 getBytes () First step Java first and operating system exchange

04: B1, B2 getBytes () second steps and return to the byte array

05: C1 new string () First step Java first and operating system exchange

06: U1 new string () second steps and return characters

07: C1 Println (String) can display the word, content and the same

2.5.2) Then use the code segment 10 as an example, we noticed that it is just:

Steps Content Location Description

01: C1 HelloWorld.java C1 generally refers to a GBK character

02: U1 Javac read U1 generally refers to a Unicode character

03: C1 getBytes () First step Java first and operating system exchange

04: B1, B2 getBytes () second steps and return to the byte array

05: C3, C4 new string () First step Java first and operating system exchange, then resolution errors

06: U5, U6 new string () second steps and then return characters

07: C3, C4 Println (String) Due to the middle word, it is just two halves, and there is no character in ISO8859_1.

It can be mapped, so it is displayed as "??". In the example above,

"Chinese" is displayed as "????"

2.5.3) Similar to other situations in full Chinese mode, I don't have much to say

2.6) We can then look at why the classes compiled under the Western DOS window also have similar situations under the Chinese window.

Don't display Chinese characters correctly in the case of some cases.

2.6.1) We are still first as an example of code segment 9:

Steps Content Location Description

01: C1C2 HelloWorld.java C1C2 is a generally ISO8859_1 character, "in" word is disassembled

02: U3U4 Javac read U1U2 generally refers to a Unicode character

03: C5C6 getBytes () First step Java first and operating system exchange, then resolution errors

04: B5B6B7B8 getBytes () second steps and return to the byte array

05: C5C6 new string () First step Java first and operating system exchange

06: U3u4 new string () second steps and return characters 07: C5c6 println (String) Although it is two characters, but is not the first "two ISO8859_1 words

", But" two BGK characters "," in "display"? ? "

And "Chinese" shows "????"

2.6.2) Let's take a code segment 12 as an example, because it can display Chinese characters correctly

Steps Content Location Description

01: C1C2 HelloWorld.java C1C2 is a generally ISO8859_1 character, "in" word is disassembled

02: U3U4 Javac read U1U2 generally refers to a Unicode character

03: C1C2 getBytes () First step Java first and operating system exchange (note is still right!)

04: B5B6 getBytes () second steps and return to the byte array (this is a key step!)

05: C12 new string () First step Java first and operating system exchange (this is a more important step, Java is known

Road B5B6 is to analyze into a Chinese character! )

06: U7 new string () second steps and return characters (really one item two! U7 contains information about U3u4)

07: C12 println (String) This is the original "middle" word, it is very grievable by Javac, but the quotation

The preamplist is all right! Of course, the words "Chinese" can be displayed correctly!

3) Why is there only JDBC?

New string (rotordset.getbytes (int) [, eNCode])

Recordset.getsting (int)

RecordSet.setBytes (String.getbytes))

with

Recordset.setString (String)

When will there be garbled?

In fact, the problem occurs in writing JDBC, and it is possible to read the data from the database, it may be self-acting

Zhang made a conversion from GB2312 (default) to Unicode, I am this WebLogic for SQL Server

JDBC Driver is like this. When I read a string, I didn't read the correct Chinese characters, but I hated me.

But you can write a Chinese character string directly, which makes people a bit unacceptable!

In other words, we have to transfer when reading or writing, although this transcoding sometimes is not so obvious,

This is because we used the default encoding to transcode. The operation made by JDBC Driver, we only entered the source

The code can be clear, isn't it?

转载请注明原文地址:https://www.9cbs.com/read-90429.html

9cbs

New Post(0)