Chinese processing issues and solutions in Java development
In the process of application development, there will always be some "difficult to understand" system defects and "difficult solution". In fact, by careful analysis, it is still possible to solve most of the problems without having to buy expensive products.
■ JDBC ODBC Bridge BUG and its solution When writing a database manager, it is found that the JDBC-ODBC Bridge has a bug that is not easy to discover. When inserting data into the data table, if the English characters are complete, the storage content is completely correct. If the Chinese characters are stored, some databases can only store the first seven or eight Chinese characters, and the other content is intercepted, resulting in the incomplete storage content (some This problem does not exist, such as Sybase SQL Anywhere 5.0.jdbc-ODBC Bridge also has bugs that cannot be built.
This is a bad message for the Java programmers who need to store Chinese information. Either modulate other languages, or choose other expensive database products. "Once written, run around", it is also a discount. Can I use a variant method to process Chinese information and then store this problem? The answer is yes.
The specific idea of solving the problem, the method Java uses the Unicode code encoding method, and the Chinese and English characters use 16bit storage. Since storage English information is correct, according to a certain rule, the Chinese information is converted into English information, and there is no interception phenomenon. When reading information, the reverse operation is performed, and the English information is restored into Chinese information. It can be seen from the GB2312 encoding rule. The Chinese characters are generally two ASCII codes, remove the two high levels of one Chinese characters when converted, and then add two high positions 1 plus. In order to deal with Chinese strings with English characters, a Byte 0 mark is needed for English characters. The two common static methods provided below can be used in any class.
Conversion of Chinese and English strings into pure English Strings Public Static String Totureasciistr (String Str) {
StringBuffer SB = new stringbuffer ();
Byte [] bt = str.getbytes ();
For (int i = 0; i IF (BT [i] <0) { File: // is Chinese characters to high 1 Sb.append ((char) (BT [i] && 0x7f); } else {// is a record of English characters Sb.append ((char) 0); sb.append ((char) BT [i]); } } Return sb.toString (); } Restore the converted string of public static string unTotrueasciistr (String Str) { Byte [] bt = str.getbytes (); INT I, L = 0, Length = bt.length, j = 0; For (i = 0; i IF (bt [i] == 0) { l ; } } Byte [] bt2 = new byte [length-l]; For (i = 0; i IF (bt [i] == 0) { i ; BT2 [J] = BT [I]; } else { BT2 [J] = (BYTE) (BT [i] | 0x80); } J ; } String TT = New String (bt2); Return TT; } The above example is good in actual programming, just the stored Chinese information needs to be handled by other systems. And if the Chinese string appears in English characters, there is actually an additional storage space. ■ Solaris SERVLET programming Chinese issues and solutions When using Java to develop an application on the Internet, discovers that completely normal servlets under Windows, upload them onto the Solaris server, running, fail - the return page cannot be displayed Chinese, the information should be allocated in Chinese; use Chinese information to make a keyword, and the database cannot be properly retrieved. Later, the method of joining the examination code can be used to detect the cause of the failure is as follows: The display garbled is mainly because the method setContentType supplied by the class HttpservletResponse cannot change the encoding mode of the data returned to the customer. The correct encoding method should be GB2312 or GBK, but in fact, the default ISO8859-1. It is because the Chinese information submitted by the customer is not allowed to decode it correctly after the Chinese information submitted by the customer passes the browser. For example, it shows that the garbled solution SERVLET is generally usually the same as follows: Public class zldteServlet Extends httpservlet { Public void doget (httpservletRequest request, httpservletResponse response) throws servletexception, ioException { File: // Set the Content-Type Header before returning the data with Writer, set the corresponding character set here GB2312 Response.setContentType ("text / html; charSet = GB2312"); Printwriter out = response.getwriter (); file: // * / / Officially return data Out.println (" Out.println ("This is a test page!"); Out.println (" Body> HTML>"); Out.close (); } ... } Solve the page display garbled problem, you need to change * the code code is as follows: Printwriter out = new printwriter (response.getstreamwriter (), "GB2312"); SOLARIS Chinese Information Retrieval Questions The browser uses the table one-way server to submit information, generally encoded the data in the MIME format of X-WWW-Form-Urlencoded. If the GET method, the parameter name, and the parameter value are used after encoding, it is called a query string in Java. In the servlet program, if the method of servletRequest getParameter gets the parameter value, in the Solaris environment, the Chinese characters cannot be decoded correctly. Therefore, the database cannot be retrieved correctly. Urlencode and URLDECode classes are available in Java 1.2 package - Java.net. Class URLENCode provides a way to convert a given string in the X-WWWW-Form-Urlencoded format. Class URLENCODE provides an inverse method.