The final use of the code is:
<% = new string (rst.getstring (2) .getbytes ("ISO-8859-1"), "GB2312")%>
Inquire:
New string (Rst.getstring (2) .Getbytes ("ISO-8859-1"), "GB2312");
submit:
Sqlstr = new string (SQLSTR.GETBYTES ("GB2312"), "ISO8859-1");
Everyone often occurs in the development of JSP, and there may be a problem with Chinese garbled. May be plagued you, I now write the problem and solution of Chinese garbled in JSP development to you.
First, the JSP page shows that there is garbled code on the display page below (display.jsp):
<% OUT.PRINT ("JSP Chinese Processing");%> For different web servers and different JDK versions, processing results are different. Cause: The encoding method used by the server is different and the browser is caused by different characters display results. Solution: Specify the encoding mode (GB2312) in the JSP page, that is, the first line of the page plus: <% @ page contentType = "text / html; charset = GB2312"%> can eliminate garbled. The full page is as follows: <% @ Page ContentType = "text / html; charset = GB2312"%>
<% out.print ("JSP Chinese Processing");%>
Second, the form is submitted in Chinese, there is a submission page (submit.jsp), the code is as follows:
Here is the processing page (Process.jsp) code:
<% @ Page ContentType = "text / html; charset = GB2312"%>
<% = Request.getParameter ("name")%> If submit.jsp submits the English characters correctly, there will be garbled if you submit Chinese. Cause: The browser uses the UTF-8 encoding method to send requests by default, and the UTF-8 and GB2312 encoding methods indicate that the characters are different, so that the characters cannot be identified. Workaround: Unify the request by request.seChacTerencoding ("GB2312"), the normal display of Chinese is implemented. The modified process.jsp code is as follows: <% @ page contentType = "text / html; charSet = GB2312"%> <% Request.secharacterencoding ("GB2312");%>
<% = Request.getParameter ("Name")%>
Third, the database connection is garbled as long as the Chinese is all garbled, the solution: add User = true & characterencoding = GBK in the database URL of the database.
Fourth, the database's display garbled in mysql4.1.0, the varchar type, the text type will have Chinese garbled, which can solve Chinese problems for the varchar type to binary attributes, and use a coded conversion class for the TEXT type. The implementation is as follows: public class control {/ ** converts ISO-8859-1 code to GB2312 * / public static string isotogb (string ISO) {string GB; try {ix (iso.equals (") || ISO = = NULL) {Return "";} else {ISO = ISO.TRIM (); GB = New String (ISO.GETBYTES ("ISO-8859-1"), "GB2312"); Return GB;}} catch (Exception e) {system.err.print ("Encoding Conversion Error:" E.GetMessage ()); Return "";}}} You can call it to CLASS, you can call the Static method for the Convert class isotogb () to convert the encoding . Like Java, JSP is a topic that is currently more popular. It is a web design language compiled in the server side, because the scripting language uses Java, so JSP inherits all the advantages of Java. However, in the process of using the JSP program, often encounter Chinese garbled problems, many people have a headache, and the author is suffering, and the use of platforms, the solution to the Chinese garbled problem is also different, invisible to learn JSP Difficulty. In fact, after thorough understanding of related reasons, the problem is more easily solved. The author combines its own work practice, and has conducted a study of Chinese display issues and has been related in different environments. The following is a solution to the author, and believe that the reader will have a certain reference. Characters, each country (or region) specifies the character encoding set for computer information exchange, such as the US expansion ASCII code, China's GB2312-80, Japan's JIS, etc., as the foundation of the country (region) information processing There is an important role in unified coding. Since the local character set code range overlap, the information exchange is difficult to exchange, and the software localized version is highly maintained. It is therefore necessary to draw a commonly drawn in localization, do consistency, reduce special localization processing content to least, which is called internationalization (I18n). Various language information is specified for local information, while the underlying character set adopts Unicode containing all characters.
Character Code refers to the internal code used to represent characters. We use the internal code when entering and stored documents, and the internal code is divided into single-byte internal code and double-byte internal code. The full name of single-byte internal code is SINGLE-BYTE Character Sets (SBCS), you can support 256 character encodings; English all-in-one English is Double-Byte Character Sets (DBCS), you can support 65,000 character encodings, Mainly used to encode the Oriental text of the big character set.
CodePage refers to a list of characters that are selected in a specific order. For the language of the early single-byte internal code, the internal code order in the CodePage enables the system to be used in this list according to the input value of the keyboard. Corresponding internal code. For double-byte internal codes, the multibyte to Unicode's corresponding table can be converted to the corresponding character in the unicode form. Introducing support for CodePage is mainly to access multi-language file names, currently using Unicode on file systems under NTFS and FAT32 / VFAT, which requires the system to dynamically convert it to the corresponding language coding when reading these file names. . I believe that the readers of the JSP code must be unfamiliar with ISO8859-1, ISO8859-1 is our usual use of a more CodePage, which belongs to the Western European. GB2312-80 is developed in the initial phase of domestic computer Chinese character information technology, which contains most commonly used, secondary Chinese characters and 9 districts. This character set is a Chinese character set supported by almost all Chinese systems and international software, which is also the most basic Chinese character set.
GBK is an extension of GB2312-80 and is compatible. It contains 20902 Chinese characters whose coding range is 0x8140 ~ 0xFefe, which eliminates the line 0x80, and all characters can be missed to Unicode 2.0, that is, Java actually provides support for GBK character sets.
> GB18030-2000 (GBK2K) further expanded Chinese characters on the basis of GBK, increasing the text of the hidden, Mongolian minority. GBK2K has fundamentally solved the problem of insufficient word and lack of glyphs.
Differences of different development platforms 1. Tomcat 4 Development Platform
There will be Chinese issues in Tomcat 4 or above in Windows 98/2000 (there is no problem in Linux and Tomcat 3.x in Tomcat 3.x), the main performance is that the page display garbled. Adjusting the character set in IE is GB2312, it can be displayed normally.
To solve this problem, you can add <% @? Page? Language = "java"? Contenttype = "text / html ;? charset = GB2312"%>. However, this is not enough, although Chinese displayed, but found that the field read from the database became garbled. After analysis, the Chinese characters saved in the database are normal. The database uses ISO8859-1 characters to access data, while the Java program uses a unified ISO8859-1 character set when processing characters (this also reflects Java Internationalization Thought), so Java and databases are handled in ISO8859-1 when data is added, which will not be wrong. However, there is a problem when reading data, because the data readout is also used in the ISO8859-1 character set, and the JSP file head has statement <% @? Page? Language = "java"? Contenttype = "text / html ;? CHARSET = GB2312 "%>, this description page is displayed with the character set of GB2312, which is different from the read data. At this time, the page displays the characters read from the database is garbled, and the solution is to transfer these characters. From ISO8859-1 to GB2312, it can be displayed normally. This solution has versatility for many platforms, readers can be flexible.
2. Tomcat 3.x, RESIN and Linux platform
In Tomcat 3.x, RESIN or in Linux, no statement <% @? Page? Language = "java"? Contenttype = "text / html ;? charset = GB2312"%>, the statement in the page The role, it can be displayed normally. On the contrary, if you add <% @? Page? Language = "java"? ContentType = "text / html ;? charset = GB2312"%> system will report an error, indicating that the Tomcat 4 or more version of the engine is still different when processing JSP. .
In addition, the choice of character set is important for different databases such as SQL Server, Oracle, MySQL, Sybase, etc. If you consider a multi-language version, the character set of the database should be unified using ISO8859-1, and it is possible to do conversion between different character sets when you need to output.
The following is a summary of different platforms:
(1) JSWDK is only suitable for normal development, stability, and other issues may not be as good as commercial software. Since the JDK version 1.3 version is better than JDK 1.2.2, and the support of Chinese is also good, so we should use it as much as possible.
(2) As a free commercial software, RESIN is not only fast, stable, automatically compiled, but also points to the wrong line, and can support the use of JavaScript in the server side, and support for Chinese support.
(3) Tomcat is just a realization of JSP 1.1, servlet 2.2 standard, we should not require this free software to be in detail and performance, it mainly considers English users, this is why not do special conversion, Chinese characters use URL method There is a problem with the pass. Most IE browser is always sent in UTF-8, which seems to be a shortcoming of Tomcat, and the other Tomcat does not care about ISO8859 regardless of the current operating system. It seems that it is not proper.
The Chinese processing of JSP code is often involved in Chinese processing in the JSP code:
1. This is included in the URL. Here, the Chinese parameters can usually be read directly, for example: <% =? Request.GetParameter ("ShowWord")%>
2. Read the Chinese value submitted by the HTML form in JSWDK, and the more concise write is:
String name1 = new string (Request.getParameter ("user_id"). GetBytes ("ISO8859_1")).
In addition, with the support of JDK 1.3, do not join <% @? Page? Contenttype = "text / html; charSet = GB2312"%>, and below JDK 1.2.2, even if the above two methods are used simultaneously stable. But in the RESIN platform, the situation is better, as long as the first line of the page is added: <% @? Page? ContentType = "text / html; charset = GB2312"%> can process Chinese correctly, if the addition code is not correct.
3. In the JSwDK, the Chinese contains the Chinese, if the value read from the form is correctly displayed, but it is not possible to give a Chinese value directly, and the RESIN platform is very good.
4. Add the code option when compiling the Servlet and JSP. Use Java-Encoding ISO8859-1 MyServlet.java when compiling servlets; modify compilation parameters in the JSP Zone configuration file: compiler = Builtin - Javac- Encoding ISO8859-1. After using this method, you don't need to do anything else to display Chinese. In addition, the popular relational database system supports database eNCoding, which means that it can specify its own character set settings when creating a database, store data in the specified encoding. When the application accesses the data, there is an encoding conversion at the portions and exits. For Chinese data, the setting of the database character encoding should ensure the integrity of the data. GB2312, GBK, UTF-8, etc. are optional database eNCoding, or ISO8859-1 (8-bit), but increased programming complexity, ISO8859-1 is not recommended database ENCoding. When programming in JSP / Servlet, you can check if the management function provided by the database management system is correct.
Processing Example The following is two specific Chinese garbled solutions, readers may have gains after careful study.
1. Common character conversion method
Transfer the value in the Form to the data library and then remove the full change. "?" FORM is submitted by POST, using the statement in the code: string st = new (Request.GetParameter ("name"). GetBytes ("ISO8859_1")), and also declares that charset = GB2312.
To handle the Chinese parameters delivered in the Form, you should add the following code to the JSP, and define a GetStr class specifically solved this problem, and then convert the received parameters: string keyword1 = request.getParameter ("keyword1"); Keyword1 = getStr (keyword1); this can solve the problem, the code is as follows: <% @? page? contenttype = "text / html; charSet = GB2312"%> <% !? public? string? getStr (String? str) {? Try {string? Temp_p = Str ;? Byte []? Temp_t = Temp_p.getbytes ("ISO8859-1");? String? Temp = new? String (temp_t) ;? Return? Temp ;?}? Catch ( Exception? E) {?}? Return? "Null" ;?}?%> <% - http://www.cndes.com test -%> <%? String? Keyword = "Chuanglian Network Technology Center Welcome to "? String? Keyword1 = request.getParameter (" keyword1 ") ;? keyword1 = getStr (keyword1) ;? out.print (keyword) ;? out.print (keyword1) ;?%>
2. JDBC Driver Character Conversion
At present, most JDBC Driver uses local coding format to transmit Chinese characters, such as Chinese characters "0x4175" will be transferred to "0x41" and "0x75". Therefore, the characters returned by JDBC Driver and the characters to be sent to JDBC DRIVER are converted. When inserting data into the database with JDBC Driver, you need to transfer Unicode to Native Code; when Query data from the database, you need to convert Native Code into Unicode. The implementation of these two conversions is given: String Native2Unicode (string s) {if (s == null || s.Length () == 0) {Return null;} Byte [] buffer = new byte [s. Length ()]; for (int i = 0; I s.Length (); i ) {if (S.Charat (i)> = 0x100) {c = S.Charat (i); byte [] buf = " C) .getbytes (); buffer [j ] = (char) buf [0]; buffer [j ] = (char) buf [1];} else {buffer [j ] = S.Charat (i) }} Return New String (Buffer, 0, J);} It should be noted that some JDBC Driver If the correct character set attribute is set by JDBC Driver Manager, the above method is not required. Refer to the relevant JDBC information for details.
Related information 1. Organizations and standards
International Standards Organization Unicode (http://www.icode.org) provides the following conversion tables:
GB and Unicode conversion tables: ftp: //ftp.unicode.org/public/mappings/eastasia/gb; BIG5 and Unicode conversion table: ftp://ftp.unicode.org/public/mappings/eastasia/other; jis and Unicode conversion table: ftp://ftp.unicode.org/public/mappings/eastasia/jis; ksc and unicode conversion table: ftp://ftp.unicode.org/public/mappings/eastasia/ksc; due to GBK is not a country Standard, unicode does not provide GBK to Unicode conversion table, but only one version of Microsoft's CodePage: ftp://ftp.unicode.org/public/mappings/vendors/MICSFT/Windows/cp {936,950 }.TXT.
2. Related software download JDK 1.3 http://java.sun.com; resin 2.0.1 http://www.caochu.com; apache 1.3.20 http://www.apache.org; mysql 3.23 http: / / www.mysql.org; mysql 3.23 http://www.mysql.org; dbtools 1.0.12 http://www.dbtools.com.dr;