I. Topic: Chinese issues about Java
Java's Chinese problem is more prominent, mainly in control panel output, JSP page output, and database access.
This article tries to avoid the font problem, but only talks. Through this article, you can learn about the origin of java Chinese issues.
The solution, which brings a method of accessing the database with a JDBC.
Second, the problem description:
1) Compile and run in the Chinese W2000 Chinese window, using the international version of JDK, connected to Chinese W2000 CP936
Encoded SQL Server Database:
J: / supercise / demo / encode / helloworld> make
Created by xcompiler. Philosoft All Rights Reserved.
Wed May 30 02:54:45 CST 2001
J: / supercise / demo / encode / helloworld> RUN
Created by xrunner. Philosoft All Rights Reserved.
Wed May 30 02:51:33 CST 2001
Chinese
[B @ 7bc8b569
[B @ 7b08b569
[B @ 7860b569
Chinese
Chinese
????
Chinese
Chinese
????
??
??
??
2) If compiled under the Chinese W2000's Western window (encoded 437), use Java to run due to no fonts
Often display, if you run on the Chinese W2000 in the Chinese W2000, the output is:
J: / supercise / demo / encode / helloworld> RUN
Created by xrunner. Philosoft All Rights Reserved.
Wed May 30 02:51:33 CST 2001
????
[B @ 7bc0b66a
[B @ 7b04b66a
[B @ 7818b66a
????
????
????
????
????
????
Chinese
Chinese
????
Third, analysis
1) There is garbled (that is,?). Because only? There is no small square box, indicating that there is a problem, not
Is a font problem. In the encoding, if it is converted from a character set to a character set, the typical is from GB2312
Convert to ISO8859_1 (ie, ASCII), then many Chinese characters (half of Chinese characters) are unable to map to Western characters.
In this case, use these characters in this case? instead. Similarly, there are also small character sets that cannot be collected.
The situation is not described herein.
2) There is a Chinese environment compiled, and the Chinese environment is running. Chinese characters have the correct place to have an incorrect place.
Compiling in a text environment, similar situations in the Chinese environment. This is due to automatic (default) or manual (also
The result of the never new string (Bytes [, Encode]) and Bytes getBytes ([Encode])).
2.1) in the Java Source File -> Javac -> Class -> Java -> Gettes () -> New String () -> Show
In each step, each step has an encoded conversion process, which always exists, but sometimes use the default parameters into
Row. Let's take a step by step to analyze why the above situation.
2.2) Here is the source code: HelloWorld.java:
---------------------------------------------------------------------------------------------------------------------------------------
Public class helloworld
{
Public static void main (string [] argv) {
Try {
System.out.println ("Chinese"); // 1
System.out.println ("Chinese" .GetBytes ()); // 2
System.out.println ("Chinese" .GetBytes ("GB2312"))); // 3
System.out.println ("Chinese" .GetBytes ("ISO8859_1"); // 4
System.out.println (New String ("Chinese" .GetBytes ())); //5
System.out.println (New String ("Chinese" .Getbytes (), "GB2312")); //6
System.out.println (New String ("Chinese" .Getbytes (), "ISO8859_1"); // 7
System.out.println (New String ("Chinese" .GetBytes ("GB2312"))); // 8
System.out.println (New String ("Chinese" .GetBytes ("GB2312"), "GB2312")); // 9
System.out.println (New
String ("Chinese" .GetBytes ("GB2312"), "ISO8859_1"))); // 10
System.out.println (New String ("Chinese" .GetBytes))))))))))))
System.out.println (New
String ("Chinese" .GetBytes ("ISO8859_1"), "GB2312")); // 12
System.out.println (New
String ("Chinese" .Getbytes ("ISO8859_1"), "ISO8859_1")); // 13
}
Catch (Exception E) {
E.PrintStackTrace ();
}
}
}
For convenience, the operation sequence number is added after each conversion, which is 1, 2, ..., 13, respectively.
2.3) It is to be explained that Javac is read from the source file by the system default, and then press Unicode to encode. in
When Java is running, Java is also encoded by Unicode and the default input and output are the silence of the operating system.
Record code, that is, in New String (Bytes [, Encode]), the system considers that the entered is encoded as Encode.
Byte stream, in other words, if you press Encode to translate BYTES to get the correct result, this result is in Ja
Save in VA, it is still to convert from this Encode into unicode, which means that there is bytes -> Encode character -> UNI
Code character's conversion; and in String.getbytes ([Encode]), the system is to be a Unicode character -> ENCO
DE Character -> BYTES conversion.
In this example, except when the English window is encoded, the default code is GBK in the case of this example (in this case, we will treat GBK and GB2312.
2.4) Due to the conversion that is not specified in the above-mentioned code implementation, the system will use it if the eNCode is not specified.
Recognized encoding (here GBK), we believe that the top 5, 6, 7 and 8, 9, 10 is the same, 8 and 9, 11 and 12 are also one
So, we will only discuss 1, 9, 10, 12, 13 in the discussion. The 2, 3, 4 is only used for testing, not in our
The discussion is within the scope of the discussion.
2.5) Let's take the translation of the "in the" word in the program, let us first say compilation and transportation under the Chinese window.
The procedure, pay attention to the following letters, I consciously use some numbers to express the same, different or
Related 2.5.1) Let's first take the code 9 in the above code segments as an example:
Steps Content Location Description
01: C1 HelloWorld.java C1 generally refers to a GBK character
02: U1 Javac read U1 generally refers to a Unicode character
03: C1 getBytes () First step Java first and operating system exchange
04: B1, B2 getBytes () second steps and return to the byte array
05: C1 new string () First step Java first and operating system exchange
06: U1 new string () second steps and return characters
07: C1 Println (String) can display the word, content and the same
2.5.2) Then use the code segment 10 as an example, we noticed that it is just:
Steps Content Location Description
01: C1 HelloWorld.java C1 generally refers to a GBK character
02: U1 Javac read U1 generally refers to a Unicode character
03: C1 getBytes () First step Java first and operating system exchange
04: B1, B2 getBytes () second steps and return to the byte array
05: C3, C4 new string () First step Java first and operating system exchange, then resolution errors
06: U5, U6 new string () second steps and then return characters
07: C3, C4 Println (String) Due to the middle word, it is just two halves, and there is no character in ISO8859_1.
It can be mapped, so it is displayed as "??". In the example above,
"Chinese" is displayed as "????"
2.5.3) Similar to other situations in full Chinese mode, I don't have much to say
2.6) We can then look at why the classes compiled under the Western DOS window also have similar situations under the Chinese window.
Don't display Chinese characters correctly in the case of some cases.
2.6.1) We are still first as an example of code segment 9:
Steps Content Location Description
01: C1C2 HelloWorld.java C1C2 is a generally ISO8859_1 character, "in" word is disassembled
02: U3U4 Javac read U1U2 generally refers to a Unicode character
03: C5C6 getBytes () First step Java first and operating system exchange, then resolution errors
04: B5B6B7B8 getBytes () second steps and return to the byte array
05: C5C6 new string () First step Java first and operating system exchange
06: U3u4 new string () second steps and return characters 07: C5c6 println (String) Although it is two characters, but is not the first "two ISO8859_1 words
", But" two BGK characters "," in "display"? ? "
And "Chinese" shows "????"
2.6.2) Let's take a code segment 12 as an example, because it can display Chinese characters correctly
Steps Content Location Description
01: C1C2 HelloWorld.java C1C2 is a generally ISO8859_1 character, "in" word is disassembled
02: U3U4 Javac read U1U2 generally refers to a Unicode character
03: C1C2 getBytes () First step Java first and operating system exchange (note is still right!)
04: B5B6 getBytes () second steps and return to the byte array (this is a key step!)
05: C12 new string () First step Java first and operating system exchange (this is a more important step, Java is known
Road B5B6 is to analyze into a Chinese character! )
06: U7 new string () second steps and return characters (really one item two! U7 contains information about U3u4)
07: C12 println (String) This is the original "middle" word, it is very grievable by Javac, but the quotation
The preamplist is all right! Of course, the words "Chinese" can be displayed correctly!
3) Why is there only JDBC?
New string (rotordset.getbytes (int) [, eNCode])
Recordset.getsting (int)
RecordSet.setBytes (String.getbytes))
with
Recordset.setString (String)
When will there be garbled?
In fact, the problem occurs in writing JDBC, and it is possible to read the data from the database, it may be self-acting
Zhang made a conversion from GB2312 (default) to Unicode, I am this WebLogic for SQL Server
JDBC Driver is like this. When I read a string, I didn't read the correct Chinese characters, but I hated me.
But you can write a Chinese character string directly, which makes people a bit unacceptable!
In other words, we have to transfer when reading or writing, although this transcoding sometimes is not so obvious,
This is because we used the default encoding to transcode. The operation made by JDBC Driver, we only entered the source
The code can be clear, isn't it?