Chinese characters
I. Topic: About Java's Chinese issues Java's Chinese problem is more prominent, mainly in control panel output, JSP page output, and database access. This article tries to avoid the font problem, but only talks. Through this article, you can learn about the origin of the Java Chinese problem, the solution to the problem, which encompasses the method of accessing the database with JDBC. Second, the problem description: 1) Compile and run in the Chinese W2000 Chinese window, using the international version of JDK, connected to the CP936 encoded SQL Server database under Chinese W2000: J: / EXERCISE / DEMO / Encode / HelloWorld> Make . Created by XCompiler PhiloSoft All Rights Reserved Wed May 30 02:54:45 CST 2001 J:. / exercise / demo / encode / HelloWorld> run Created by XRunner PhiloSoft All Rights Reserved Wed May 30 02:51:33 CST 2001.. Chinese [b @ 7bc8b569 [b @ 7b08b569 [b @ 7860b569 Chinese Chinese Chinese Chinese if compiled under the Chinese W2000 Western window (encoded 437) Java runs normally because there is no font, if you run like the Chinese W2000 in Chinese W2000, the output is: J: / EXERCISE / DEMO / ENCODE / HELLOWORLD> Run Created by xrunner. Philosoft All Rights Reserved. Wed May 30 02:51:33 CST 2001 ???? [b @ 7bc0b66a [b @ 7b04b66a [b @ 7818b66a ????????????? Chinese Chinese? ?????) Analysis 1) There is garbled (that is,?). Because only? There is no small box, which means that there is a problem, not a font problem. In the encoding, if it is converted from a character set to another character set, it is more typical to switch from GB2312 to ISO8859_1 (ie, ASCII), then many Chinese characters (half of Chinese characters) cannot be mapped to Western characters. In this case, use these characters in this case? instead. Similarly, there is also a case where the small character set cannot go to the big character set. The specific reason is not described here. 2) There is a Chinese environment compilation. The Chinese environment is running. Chinese characters have the correct place, similarly, compiled in the Western environment, and similar situations in the Chinese environment. This is due to automatic (default) or manual (also for new string (bytes [, Encode]) and Bytes getBytes ([Encode]). 2.1) In Java Source Files -> Javac -> Class -> Java -> Gettes () -> New String () -> All steps have encoded conversion processes, this process Always exist, just in the time of the default parameters. Let's take a step by step to analyze why the above situation.
2.2 This is the source code: HelloWorld.java: ------------------------ Public class helloworld {public static void main (string [] argv) { Try {system.out.println ("Chinese); // 1 system.out.println (" Chinese ".GetBytes ()); // 2 System.out.println (" Chinese ".GetBytes (" GB2312 ") ); // 3 System.out.println ("Chinese" .GetBytes ("ISO8859_1")); // 4 System.out.Println (New String ("Chinese" .getbytes ()))); // 5 System. Out.println (New String ("Chinese" .getbytes (), "GB2312")); //6 System.out.Println (New String ("Chinese" .GetBytes (), "ISO8859_1")); // 7 System.out.println ("Chinese" .GetBytes ("GB2312")))); // 8 System.out.Println (New String ("Chinese" .GetBytes ("GB2312"), "GB2312")) ; // 9 System.Out.println (New String ("Chinese"), "ISO8859_1"); // 10 System.out.Println (New String ("Chinese" .GetBytes ("ISO8859_1 "))); // 11 System.out.Println (New String (" Chinese ".GetBytes (" ISO8859_1 ")," GB2312 ")); // 12 System.out.Println (New String (" Chinese ". Gettes ("ISO8859_1"), "ISO8859_1")); // 13} Catch (Exception E) {E.PrintStackTrace ();}}} For convenience, the operation serial number is added after each conversion, which is 1 , 2, ..., 13. 2.3) It is to be explained that Javac is read from the source file by the system default, and then press Unicode to encode. When Java is running, Java is also encoded by Unicode, and the default input and output is the default code of the operating system, that is, in the new string (bytes [, Encode]), the system considers that the input is encoded as Encode The byte stream, in other words, if you press Encode to get the correct result, this result is to save in Java, it still wants to convert from this Encode into unicode, that is, bytes -> Encode characters -> Unicode characters conversion; and in String.getbytes ([Encode]), the system is to be a Unicode character -> Encode character -> Bytes conversion.
In this example, except when the English window is encoded, the default code is GBK in the case of this example (in this case, we will treat GBK and GB2312. 2.4) Due to the conversion that is not specified in the above-mentioned code implementation, if the eNCode is not specified, the system will use the default encoding (here GBK), we think the top 5, 6, 7, and 8, 9, 10 Yes, 8 and 9, 11 and 12 are also the same, so we will discuss 1, 9, 10, 12, 13 in the discussion. The 2, 3, 4 is only used for testing, not within our discussion. 2.5) Let's track the conversion history of the word in the program, let us first say the compilation and running process under the Chinese window, pay attention to the following letters, I consciously use some numbers to express Same, different or related 2.5.1) Let's first take the code 9 in the above code segments: Steps Content Location Description 01: C1 HelloWorld.java C1 Generally refers to a GBK Character 02: U1 JAVAC Read U1 Generally referring to a Unicode character 03: C1 getBytes () first step Java first and operating system exchange 04: B1, b2 getBytes () second steps and return bytes 05: c1 new string () first step Java first and operation System Communication 06: U1 New String () Spring 2 and return Character 07: C1 Println (String) Can display the word, content and the same 2.5.2) and then use the code segment 10 as an example, we noticed Just: Step Content Location Description 01: C1 HelloWorld.java C1 Generally refers to a GBK Character 02: U1 Javac Read U1 General Finger A Unicode Character 03: C1 Gettes () First Step Java First and Operating System Communication 04: B1, B2 Gettes () second steps and return bytes 05: C3, C4 new string () First step Java first and operating system exchange, then resolution errors 06: U5, U6 new string () second steps and return Character 07 : C3, C4 Println (String) Due to the middle word, it is divided into two halves, and there is no character map in ISO8859_1, so it is displayed as "??". In the example above, "Chinese" is displayed as "????" 2.5.3) Similar to other situations in full Chinese mode, I don't have much to say 2.6) We can then see why in Western The class compiled under the DOS window also appears in the Chinese window, especially why there is a case where Chinese characters can be displayed correctly.