Like Java, JSP is a topic that is currently more popular. It is a web design language compiled in the server side, because the scripting language uses Java, so JSP inherits all the advantages of Java. However, in the process of using the JSP program, often encounter Chinese garbled problems, many people have a headache, I am deeply harmful when I begin school, and the use of platforms, the Chinese garbled problem is different, invisible. Learn the difficulty of JSP. In fact, after thorough understanding of related reasons, the problem is more easily solved. The following is the solution I summarized, and I believe that the reader will have a certain reference significance. (Because I used the most Tomcat environment, it is mainly Tomcat as an example, and other environments will only mention it, but the solution is also almost the same!
Each country (or region) specifies the character coding set for computer information exchange, such as the expansion of the United States ASCII code, China's GB2312-80, Japan's JIS, etc., as the foundation of the country (regional) information, there is a unified The important role of coding. Since the local character set code range overlap, the information exchange is difficult to exchange, and the software localized version is highly maintained. It is therefore necessary to draw a commonly drawn in localization, do consistency, reduce special localization processing content to least, which is called internationalization (I18n). Various language information is specified for local information, while the underlying character set adopts Unicode containing all characters.
I believe that the readers of the JSP code must be unfamiliar with ISO8859-1, ISO8859-1 is our usual use of a more CodePage, which belongs to the Western European. GB2312-80 is developed in the initial phase of domestic computer Chinese character information technology, which contains most commonly used, secondary Chinese characters and 9 districts. This character set is a Chinese character set supported by almost all Chinese systems and international software, which is also the most basic Chinese character set.
GBK is an extension of GB2312-80 and is compatible. It contains 20902 Chinese characters whose coding range is 0x8140 ~ 0xFefe, which eliminates the line 0x80, and all characters can be missed to Unicode 2.0, that is, Java actually provides support for GBK character sets.
> GB18030-2000 (GBK2K) further expanded Chinese characters on the basis of GBK, increasing the text of the hidden, Mongolian minority. GBK2K has fundamentally solved the problem of insufficient word and lack of glyphs.
1. Tomcat 4 Development Platform
This version should be the version we often use, so discussing more detailed.
There will be Chinese issues in Tomcat 4 or above in Windows 98/2000 (there is no problem in Linux and Tomcat 3.x in Tomcat 3.x), the main performance is that the page display garbled.
To solve this problem, the easiest way is to add <% @ page language = "java" contentty = "text / html; charset = GB2312"%>. However, this is not enough, although Chinese displayed, but found that the field read from the database became garbled. After analysis, the Chinese characters saved in the database are normal. The database uses ISO8859-1 characters to access data, while the Java program uses a unified ISO8859-1 character set when processing characters (this also reflects Java Internationalization Thought), so Java and databases are handled in ISO8859-1 when data is added, which will not be wrong. However, when reading data, there is a problem, because the data readout is also used in the ISO8859-1 character set, while the JSP's file header has a statement <% @ page language = "cybenttype =" text / html; charset = GB2312 "%>, this shows that the page is displayed with the character set of GB2312, which is different from the read data. At this time, the page displays the characters read from the database is garbled, and the solution is to transfer these characters. From ISO8859-1 to GB2312, it can be displayed normally. This solution has versatility for many platforms, readers can be flexible. The specific method will explain the following detailed explanation. In addition, the choice of character set is important for different databases such as SQL Server, Oracle, MySQL, Sybase, etc. If you consider a multi-language version, the character set of the database should be unified using ISO8859-1, and it is possible to do conversion between different character sets when you need to output. The following is a summary of different platforms:
(1) JSWDK is only suitable for normal development, stability, and other issues may not be as good as commercial software. Since the JDK version 1.3 version is better than JDK 1.2.2, and the support of Chinese is also good, so we should use it as much as possible. Now JDK has been out of the 1.4 version, so if it is best to upgrade to the latest version, this will be better for Chinese, and more support can be obtained.
(2) Tomcat is just a realization of JSP 1.1, servlet 2.2 standard, we should not ask this free software to be in detail and performance, it mainly considers English users, this is why not do special conversion, Chinese characters use URL method There is a problem with the pass. Most IE browser is always sent in UTF-8, which seems to be a shortcoming of Tomcat, and the other Tomcat does not care about ISO8859 regardless of the current operating system. It seems that it is not proper.
2. JSP code Chinese processing
(1) If the data is not related to the data, you can join the header of the page.
(2) Transfer the value in the Form to the database and then it has become "?"? FORM Submitted data with POST, using the statement in the code:
String ST = New ("name"). GetBytes ("ISO8859_1")), and also declares that charset = GB2312.
To handle Chinese parameters delivered in the Form, you should add the following code to the JSP, and define a GetStr class specifically to solve this problem, and then convert the received parameters:
String Keyword1 = Request.getParameter ("Keyword1");
Keyword1 = GetStr (keyword1);
This will solve the problem, the code is as follows:
<% @ Page contenttype = "text / html; charSet = GB2312"%> <%!
Public string getstr (string str) {
Try {string temp_p = STR;
BYTE [] TEMP_T = Temp_p.getbytes ("ISO8859-1");
String temp = new string (temp_t);
Return Temp;
}
Catch (Exception E) {}
Return "NULL";
}
%>
<% - http://www.cndes.com Test -%>
<% String Keyword = "The Chuanglian Network Technology Center welcomes you";
String Keyword1 = Request.getParameter ("Keyword1");
Keyword1 = GetStr (keyword1);
Out.print (Keyword);
Out.print (Keyword1);
%>
In addition, the popular relational database system supports database eNCoding, which means that it can specify its own character set settings when creating a database, store data in the specified encoding. When the application accesses the data, there is an encoding conversion at the portions and exits. For Chinese data, the setting of the database character encoding should ensure the integrity of the data. GB2312, GBK, UTF-8, etc. are optional database eNCoding, or ISO8859-1 (8-bit), but increased programming complexity, ISO8859-1 is not recommended database ENCoding. When programming in JSP / Servlet, you can check if the management function provided by the database management system is correct.
(3) JDBC Driver Character Transformation Currently, most JDBC Driver uses local coding format to transmit Chinese characters, such as Chinese characters "0x4175" will be transferred to "0x41" and "0x75". Therefore, the characters returned by JDBC Driver and the characters to be sent to JDBC DRIVER are converted. When inserting data into the database with JDBC Driver, you need to transfer Unicode to Native Code; when Query data from the database, you need to convert Native Code into Unicode. The implementation of these two conversions is given below:
String Native2Unicode (string s) {
IF (s == null || s.Length () == 0) {
Return NULL;
}
BYTE [] Buffer = New byte [S.Length ()];
For (int i = 0; I s.Length (); i ) {if (S.Charat (i)> = 0x100) {
C = S.Charat (i);
BYTE [] BUF = (" c) .getbytes ();
BUFFER [J ] = (char) buf [0];
BUFFER [J ] = (char) BUF [1];
}
Else {Buffer [J ] = S.Charat (i);
}
Return New String (Buffer, 0, J);
}
It should be noted that some JDBC Driver If the correct character set attribute is set by JDBC Driver Manager, the above method is not required. Refer to the relevant JDBC information for details. In fact, the Chinese garbled is such a thing! Repeated use will take out a certain door! I think the above three methods, as long as you can really understand, when you encounter Chinese problems, in these three Methods Test, I promise that you will no longer make this Chinese problem!
The above is just some of your own experiences. If there is anything wrong, I hope to ask, learn together!