Several analysis principles about Java Chinese issues
Wang Mingwei (Wangmv@hotmail.com) Wang Mingwei (Wangmv@hotmail.com) Northwestern University of Technology CAD / CAM National Professional Laboratory December 2002
?
Although discussions on Java Chinese issues are quite, because Java's related technical standards, there is no official standard for Java web servers, application servers, and JDBC database drives, so Java applications exist in Chinese. The problem has not changed without disappearing but changes in factors such as the servers, drivers, and operating environments. So how do we find problems from many phenomena, and analyze and solve it? Unlike most discussions, this paper will give recommendations from how to predict, discovery, and inspect problems, helping developers to find various source that may cause problems, giving better solutions in Java.
introduction
Although there is no shortage of discussions for Java Chinese processing issues, since Java technology involves a wide content (J2EE contains more than a dozen related technologies), technology suppliers have a wide range of technologies, and Java-oriented web servers, application servers, and JDBC database drivers. There is no official standard, so the Java application has a variety of java Chinese issues that have an inherent problem in handling the Chinese process, and the multi-variation from the Java Chinese problem that is different from the different servers and drivers increases the problem. So how do we find problems in such a problem?
General Solutions for Java Chinese Issues
In fact, Java's Chinese issues are due to the default encoding format used by Java applications and the coding format of the target or application to read (detailed). There are usually four ways for how to solve the Chinese problem of Java.
1) Select the Chinese localized version of JDK. Although Java2 JDK's Chinese localized version (http://java.sun.com/products/jdk/1.2/CHINESEJDK.HTML) is not an official version, SUN has no promise to upgrade the localized version, But it is still a solution that is not a Java Chinese issue.
2) Select the appropriate compilation parameters. For Java's international versions, we can also implement its compilation results to Chinese by specifying the determined encoding mechanism when compiling Java applications. For example, the source program can be compiled by Javac -Encoding Big5 SourceFile.java and Javac -Encoding GB2312 SourceFile.java can be compared to traditional Chinese and Simplified Chinese applications.
3) Implement the conversion code of character encoding by programming. Solving the Chinese issue of Java by programming has become a more common approach. Below is a most common character encoded conversion function that converts the coding format of the character into the GBK coding form of the Chinese Windows system.
?
Public Static String tochinese (String Strigue)
?? {
???????? try {
???????????? f (strValue == null)
??????????????? Return NULL;
???????????? ELSE
???????????? {
??????????????? Strvalue = new string (strValue.getbytes ("ISO8859_1"), "GBK");
??????????????? Return Strata
????? ???}
????????} catch (exception e) {
?????????????? Return NULL;
????????}
??}
??
4) Define the character output set. For JSP applications, we can define the character output set of the JSP page via <% @ Page ContentType = "Text / HTML; Charset = GBK"%> or <% @ Page ContentTyPE = "%>> . Of course, we can also mark HTML
To define the output set of characters.
Existence problem
According to the method implementation, we can divide the above four methods into two categories, and one is the method of implementing some standards or rules, and 1), 2), 4) belong to this class; It is a method implemented by a targeted programming, and the method 3 mentioned above is here.
Since Method 1), 2), 4) is a type of normative, so the method is relatively simple, and the solution does not have large pertility, more general, for example we can use method 2) compilation. The Java source file is used to implement the preset of the internal code without considering the source code in which some of the Java has occurred, such as output garbled, etc.
However, it is because these methods do not have targeted, solve problems are too unified, so in some cases, they do not completely solve the Java Chinese issue. Establish a very common example. Under normal circumstances, the user's Java application often needs to interact with other Java application interfaces, such as accessing the database through a version of JDBC. Since the encoding supported by JDBC differs from the providers and even versions, if Chinese can't handle problems during the input and output of the database, we need to do two times in the input and output processes. Coding conversion, this is often unresolved for methods 1), 2), 4). Of course, for method 2, we can also make the above situation by adopting some techniques, one of the most effective ways is to minimize the various parts of the Java application. For example, we can achieve different character encoding requirements by breaking down the database's read and output code to different source files. But the usual programming is unlikely to meet this requirement because the results of this procedure are likely to be unreasonable. For example, we encapsulate a database readout and write method into a class is a relatively appropriate design, but if the two methods of this class are very unreasonable, it is very unreasonable. Therefore, for the 1), 2), 4) method, although it is relatively simple, it has some disadvantages that cannot be overcome. This is also why those that implement relatively complex programming methods are popular.
Method 3 is better targeted and flexible relative to Method 1), 2), 4). The program can make flexible processing according to different situations, and the character's coding conversion is performed in any required place, but the characteristics of this method also put forward higher requirements for the developers of the software - must be able to capture the possibility Places in Chinese processing problems and make correct judgments and processes.
Principle of analysis
In general, all methods for solving Java Chinese processing are not very complicated. Conversely, since Java technology, in particular J2EE technology, various web servers, application servers, and JDBC database drivers are uniform, so how to correct and timely discovery applications have become relatively complex. many. So how do we discover these questions?
Typically, the issues generated by Java processing Chinese are due to the different main reasons for the default encoding format used by the user's Java application, which causes these different main reasons. Java applications with other applications, data exchanges that do not match (including direct or indirect data inputs, output). So, in order to find problems in time, we can start with this, and analyze the application according to the following principles: 1. ?????? Pay attention to the character variable. Since the character coding form of the variable is more concealed, the change and operation of multiple variables can cause changes to the character set; in various operations of the variable and the data submitted, different coding format characters are comparable. .
2. ?????? Note Any form of characters read and output. The reason why it is necessary to mention any form because most of the Java application is developed as a network application, so Java applications need to face all kinds of character data exchange in the world compared to other languages. For example, various forms of data submission, data read in the URL form, the encrypted computing character data exchange, the input of the web control selection result, the display of the control content (such as a List control), etc.
3. ?????? Carefully use the components and applications of third parties. Since the implementation of third-party components and applications is non-transparent, in general, it is difficult to determine what the default encoding format of these components or drivers is not controlled. Therefore, it is necessary to pay special attention when using the interface functions provided by the interface functions, if there is indeed a Chinese could not handle the situation correctly, we should first check our own code and adjust the relevant code to accommodate these interfaces, because these components or applications are basically There is no interface to adjust the encoding mechanism. If necessary, we may need to use other replaceable components or applications.
4. ?????? Note the data input and output included in the request object. This is a very concealed situation, when our application interacts with an object's way (such as serialized objects), if this object contains the processing of character data inside, or contains certain data input, output, and even even It is throwing an abnormality that uses Chinese annotations, and there may be a problem that Chinese cannot display correctly. Since these behaviors are often encapsulated in the object, we are easily ignored when writing programs. And this situation has a certain unpredictability, for example, we may not know what exception will this object will throw, so we need to do certain test work.
5. ?????? Note the data access process of the database. Java establishes a connection with the database through JDBC. For JDBC drivers, since most of the JDBC drivers are not designed for Chinese systems (Chinese data is mostly used by ISO-8859-1 encoding mode), it is often necessary to read and write during data reading. Character encoding conversion. But we still recommend that users carefully read its description when using these JDBC drivers. If it is indeed unable to clarify the code of JDBC character data, our suggestion is to do some necessary tests. For example, the following is a group of JDBC drivers supplied from the MS SQL Server 2000 in the Simplified Chinese WIN2000 platform (character calculation in the example):
?
??????? ...
Class.Forname ("WebLogic.jdbc.mssqlserVer4.driver"). NewInstance ();
???????? conn = mydriver.connect ("JDBC: WebLogic: MSSQLServer4", PROPS);
????? conn.setcatalog ("labmanager");
???? statement st = conn.createstatement (); ??????????????? q EXECUTE A Query
??????? String? Teststr;
String testTempstr = new string ();
????? Teststr = new string (TestTempstr.getbytes ("ISO-8859-1"); // Code Transformation
??????? DatabaseMetadata dbmetata = conn.getMetadata ();
?????????????? RESULTSET RS = DBMetadata.gettables (null, null, null, new string "{" Table "});
?????????????? while (rs.next ()) {
?????????????????????? for (int J = 1; j <= = gtmetadata (). getColumnCount (); J ) {
Teststr = teststr string (rs.getObject (j) .tostring (). getBytes ("ISO-8859-1"));
??????????????????????}
??????????????}
??????????????
6.. ?????? However, it should be noted that different JDBC drivers are different from the same database, and the same class JDBC driver is different for different databases, which means our characters transformation. The code may not work correctly in JDBC driver changes or even version change. For example, for the same environment, in the same environment, it is not possible to correctly process Chinese when the I-NET's Una 2000 Driver Version 2.03 for MS SQL Server is applied. The reason is very simple, this JDBC driver itself supports the GBK encoding mechanism, so there is no need to do any coding transformation at all.
6) The necessary tests. Since the production of Java Chinese issues varies with the difference in web servers, browsers, running environments, and development tools, we must do some targeted tests for better avoidance. In addition, in the case of determining whether the Java's Chinese processing issues may occur or if the problem occurs if the problem occurs because of which link (is a web server, browser or JDBC data driver, etc.) Work is very important. And we may need more comprehensive testing, such as testing of web servers, browsers, and JDBC data drivers, so that we can find out what they are hidden in multiple link coordination processes.
in conclusion
In fact, there is a problem in the Java Chinese processing, and the root cause is due to the difference between the coded format of the Chinese characters (variables), and all of these issues are actually read in the character. Output In the process, as long as we grasp this link, you can better discover, analyze, process, and prevent Java's Chinese issues.
?