Abstract: About the discussion of Java applications in processing Chinese is quite, different from most discussions, this article will discuss problems existing from the Java language processing in the Java language input and output from the perspective of Chinese characters.
Although there is no shortage of problems with Java in handling Chinese characters, due to Java technology involves a wide content (J2EE contains more than a dozen related technologies), there are many technology suppliers, and Java-oriented Web servers, application servers And the JDBC database drivers are not official standards, so Java applications have inherent problems in handling Chinese, and will have some problems related to the platform with the different servers. That is, when processing the Chinese problem, the portability of the Java code has been discounted.
In general, Java's Chinese processing issues appear more centrally in JSP technology applications and Java database access. This is because both JSP applications or JDBC-based database access involves the interaction between Java programs and another application system. This interaction inevitable requirement system interaction and parameters, while Java processing Chinese Places where problems often is to read and output.
Chinese issues that the JSP program should pay attention to
Taking the JSP application of Tomcat 3.2.1 as an example. Generally, Chinese issues can be used to use the following encoded forced conversion functions to convert the internal code.
Public Static String tochinese (String Strigue)
{
Try {
IF (strValue == null)
Return NULL;
Else
{
Strvalue = new string (Strigue.getbytes ("ISO8859_1"), "GBK");
Return Strvalue;
}
} catch (exception e) {
Return NULL;
}
}
Note that before using this function, we need to analyze the causes of Chinese could not correctly output, and unable to use this problem with this method to solve it. For example, if it is not possible to solve this function if it is forgotten that the Chinese output code created as GB2312 or GBK is not possible. A good habit is the character set to which you want to output in the first line of files we have to write every JSP page, such as
<% @ Page ContentType = "text / html; charset = GBK"%> or <% @ page contenttype = "text / html; charset = GB2312"%>
We can also make the following settings for some JSP versions that do not support the defined output.
It is also important to note that this function is used to solve the code that does not exhibit the Chinese code, not a common function to ensure the correct output of the Chinese characters. Since the Chinese characters cannot be properly output or read, it is due to this character's encoding and system default character set encoding (or the character set to which you want to output, both of us generally the same). So before applying this function, we must determine whether the coding of characters we have to read or output is the same as the system default character set encoding.
The following example will give the correct and error use of the function. The JSP system used by the example is Tomcat 3.2.1, the client and server running environment are all Chinese's Windows2000.
example 1
<% @ Page contenttype = "text / html; charSet = GBK"%>
TestJSP
Title>
HEAD>
<%
Class testChina extends object {
Public String tochinese (String Strvalue)
{
Try {
IF (strValue == null)
Return NULL;
Else
{
Strvalue = new string (Strigue.getbytes ("ISO8859_1"), "GBK");
Return Strvalue;
}
} catch (exception e) {
Return NULL;
}
}
Public void test () {
}
}
TestChina testc = new testchina ();
String str1 = new string ("This is a test for Chinese support" .GetBytes ("GBK"));
String str2 = new string ("This is a test for Chinese support" .GetBytes ("GBK"), "ISO-8859-1");
String str3 = new string (Testc.toCHINESE (STR2));
Out.println ("Begin
");
Out.println ("str1");
Out.println (str1 "
");
Out.println ("Str2");
Out.println (STR2 "
");
Out.println ("Str3");
Out.println (STR3 "
");
Out.println ("END
");
System.getProperties (). List (system.out);
%>
H1>
Body>
Html>
We know that Java programming language default encoding mode is Unicode but the character set used by the Java compiler is the default character set of the operating system. The Chinese Windows is GBK, and the English system is ISO-8895-1. For Example 1, the default character set of the system is GBK, and the output character set of JSP is also GBK, both of which are consistent. For STR1, we use the system default character set encoding; for Str2 we deliberately convert it to ISO-8895-1 encoding to generate Chinese unable to output the result; STR3 is an incorrect usage of the Testc class tochinese function It converts the original character output into character encoding with the system character set, but causing an error that caused Chinese output; STR3 is a correct usage of the TestC class tochinese function, it will output str2 characters output errors Corrected. So we must analyze the cause of the character output is not normal and use the tochinese function. So how do we distinguish between those characters may have problems. There are several major principles below to note: 1) Mainly considering the status variable. Since the character coding form of the variable is more concealed, the change and operation of multiple variables can cause changes to the character set; in various operations of the variables and the data submitted, different encoding format characters are compared. Happening.
2) Pay attention to the reading of characters, read. Most characters of coding formats in the target encoding format occurred in the read and output procedure of characters. For example, the FORM submission, the URL is obtained, and the display of the control content (such as a List control), and the like.
3) It is necessary to test when necessary. Since the production of Java's Chinese issues varies with the different web servers, browsers, running environments, and development tools, we must do some targeted tests for better avoidance.
Of course, the method of solving the Java Chinese issue is not limited to the forced coding output. We can also use the following methods to solve:
1) Compile the source program in a Javac -Encoding Big5 SourceFile.java or Javac -Encoding GB2312 SourceFile.java.
2) Use Java2 JDK's Chinese localized version (http://java.sun.com/products/jdk/1.2/chinesejdk.html), but this version is an unofficial version, Sun does not guarantee its upgrade.
Chinese problem during database access
After discussion above, it is not unpleasant about Chinese issues existing during database access.
At present, most of the JDBC drivers are not designed for Chinese systems (Chinese data is mostly used by ISO-8859-1 encoding mode), which often requires transformation of character encoding during data reading and writing.
If the system runs under the Chinese operating system platform, then:
1) The reading of the Chinese characters can be used as follows:
StrChinese = new string (String (rs.getObject (j) .tostring (). getBytes ("ISO-8859-1"));
For the Win2000 platform, the JDBC driver provided by WebLogic 6.0 can be read to the Chinese code can be written as follows (the character operation in the example):
Driver mydriver = (driver) class.forname ("WebLogic.jdbc.msqlserVer4.driver"). NewInstance ();
Conn = MyDriver.Connect ("JDBC: WebLogic: MSSQLServer4", PrOPS; Conn.SetCatalog ("LabManager");
Statement st = conn.createstatement ();
File: // Execute a Query
String teststr;
String testTempstr = new string ();
TestStr = New String (TestTempstr.getbytes); // Code Transformation
DatabaseMetadata dbmetadata = conn.getMetadata ();
ResultSet RS = DBMetadata.gettables (Null, Null, Null, New String "{" Table "});
While (rs.next ()) {
For (int J = 1; j <= rs.getMetadata (). getColumnCount (); J ) {
Teststr = teststr string (rs.getObject (j) .tostring (). getBytes ("ISO-8859-1"));
}
}
2) The output of Chinese. The output and reading of Chinese are just a reverse process. We need to convert the character's system default encoding to the ISO-8859-1 encoding supported by JDBC. The code can be written as follows:
TempBytes = STRINPUT.GETTEXT (). getBytes ();
SQLSTR = New String (TempBytes, "ISO-8859-1);
It should be noted that different JDBC drivers are different from the support of the same database, while the same class JDBC driver is different for different databases, that is, our character conversion code must be necessary when the JDBC driver changes The test can determine if it is working properly, otherwise we will turn into a snake. For example, I-NET's Una 2000 Driver Version 2.03 for MS SQL Server, we do not need to do any coding transformation at all, you can implement normal operations in Chinese. However, since the driving of the JDBC does not have clearly given its support for Chinese characters, it is recommended to test it when using JDBC.
in conclusion
In fact, there is a problem in the Java Chinese processing, and the root cause is due to the difference between the coded format of the Chinese characters (variables), and all of these issues are actually read in the character. Output In the process, as long as we grasp this link, you can better understand and deal with Java's Chinese issues.