Charset in J2EE Web Application

xiaoxiao2021-03-06  64

Operating environment: Win2k Pro Japanese version, IE 6.0sp1 Japanese version, Sun J2SDK 1.4.2_04, Tomcat 4.1.27, JSPS

Due to the TOMCAT, the data obtained from the request (such as through the Request.GetParameter (String) method) is a Unicode string corresponding to "ISO8859_1".

(I guess the whole process should be like this:

Suppose HTML's Encoding is "Shift_JIS", IE sends the value of each Input control in the Form (encoded to "Shift_JIS") to Tomcat; then Tomcat is connected to the data received by URL decoding to "ISO8859_1" The corresponding Unicode data (such as the "Shift_JIS" of the full-angle line "~" in Japanese is 0x8160 (16), two Byte, high in front), after the URL Encoding is sent to Tomcat, then Tomcat is URL Decoding got "/ u0081 / u0060", which is "" ISO8859_1 "corresponding Unicode data"); JSPS After completing its own processing, the Unicode data corresponding to "Shift_JIS" is passed to Tomcat through the Response object. Then Tomcat is again URL ENCODING in these data (Tomcat is made by the encoding set similar to reponse.setContentType ("text / html; charset = shift_jis"), if the ContentType is not set in JSPS, Tomcat will use OS. The default encoding is used for URL ENCODING, then transferred back to the client's IE; IE to convert it to "Shift_JIS" encoded data and eventually displayed as an HTML page.

So after we get the request data in JSPS, we generally need to convert the data after conversion. We need to do the following conversions in JSPS:

String reqparama = new string (Request.getParameter). GetBytes ("ISO8859_1"), "Shift_JIS");

But when data is output to the client, you don't need to switch again. First call reponse.setContentType ("text / html; charset = shift_jis"); then directly output the Unicode string corresponding to "Shift_JIS" to the client via HttpServletResponse.

The above description is to issue a request from the client to the server to the server side to send a response to the client, the client receives a response and eventually displays the typical process of the HTML page, but there are some exceptions to pay attention:

By calling response.sendredirect (str_URL), the server side is directly redirected to another URL. If STR_URL contains querystring (such as SomeURL? Parama = "[contains a string containing double-byte characters]" Paramb = "[contains strings of double-byte characters]" ...), then Str_url must be similar to the following The conversion, otherwise the request data that is mapped to the JSPS of the destination URL is garbled (because the requested data they expect should be "ISO8859_1" Unicode string):

Response.sendredIRet (Str_url.getbytes ("Shift_JIS"), "ISO8859_1"); // This approach may cause garbled under Linux, have not verified. File Upload. The coding form of the data stream acquired by the server-side JSPS is the Unicode data stream corresponding to the character set of the file. For example, in the Win2k Pro-Japanese version, a .CSV file encoding is "MS932", then when it is uploaded to the server side, the data stream acquired by JSPS is the Unicode data stream corresponding to "MS932". Assuming that the operating environment at this time is Win2k Pro Japanese Edition Vobsenhydra 5.1se, when the server is issued to the server when the client JavaSRIPTALDIALOG () (ENCODIS "in this ModalDialog is" Shift_JIS "), the server side The received request data is a Unicode string corresponding to the "MS932"; if Tomcat 4.1.27 is used, the server-side received is a Unicode string corresponding to the normal "ISO8859_1". In addition, when the server-side calls reponse.getttpresponse.sendredirect (targetURL) is redirected, if targetUR is "ISO8859_1" corresponding Unicode string, the request data received by the target PO is actually a Unicode string corresponding to "MS932"; However, under Tomcat 4.1.27, if targetUR is "ISO8859_1" corresponding Unicode string, the request data received by the target JSPS is also the Unicode string corresponding to "ISO8859_1". It seems that Vobsenhydra 5.1se is still somewhat different from Tomcat 4.1.27. Assume that the running environment becomes Miracle Linux 2.1 (the OS default character set is "Euc-jp-linux"), Vobsenhydra 5.1se. When the client is redirected by reponse.gethttpresponse.sendredirect (targetURL), you need to simulate the HTTP client to perform URL ENCODING to TargetURL (将erystring in targeturl "http: // localhost: 8002 / test.po? TXT_FirstName =% 81% 60% 8bi% 8D% 81% 81% 60 & txt_lastname =% 93% A1% 8C% B4 & Submit = Submit "TXT_LASTNAME is" Fujira "on the web page, and txt_firstname is" Ji Xiang ". You I certainly have seen a strange string similar to this in the address bar of IE. In the Java program, you can perform URL Encoding by calling Java.Net.urlencoder.Encode (String, String) method; but is similar in javaScript The method is unknown), because if the QueryString in Targeturl contains double-byte characters, the target PO will not get the correct querystring:

// This program can run normally in Miracle Linux 2.1 Vobsenhydra 5.1se and Win2k Pro Japanese version Vobsenhyra 5.1se

0) String PageEncoding = "shift_jis"; // Assume the HTML page uses "Shift_JIS" encoding

1) String param = comms.Request.getParameter ("param"); ​​// Value of PARAM from the request 2) param = new string (param.getbytes ("ISO8859_1"), "shift_jis");

3) Param = urlencoder.encode (param, pageencoding); // Perform URL ENCODING (most important step!)

4) String str_url = "xxx.po? Param =" param;

5) Comms.Response.getttpservletResponse (). SendRedirect (str_url); // Directive

However, if it is in the Win2k Pro Japanese version, in the Vobsenhydra 5.1se environment, if you comment on the 2nd, 3 line code (that is, directly "ISO8859_1" corresponding to the "ISO8859_1" corresponding to the "ISO8859_1", the program can run However, the target PO acquires data from the Requet to the Unicode string corresponding to the "Shift_JIS", not the Unicode string corresponding to the usual "ISO8859_1". However, such a practice will be garbled in the Miracle Linux Vobsenhydra 5.1se environment, which is still in the survey. Conclusion: In the application (such as client JavaScript calls Window.ShowModalDialog (target_url, ...) or server-side Response.sendRedirect (target_URL), you must have a get request to the QueryString of Target_URL must be the value of QueryString in target_url. Part of the URL ENCODING, otherwise the server-side target program may not get the correct queryString (especially when including double-byte characters in querystring). This approach is that Tomcat is also VOBSENHYDRA or any other web server, whether in Windows or Unix / Linux.

I have two suggestions, one is when the character set conversion is performed, it is best to clearly write the source character set and the destination character set, because if not written, Java will use the system default character set to process, and different OS The default character set is usually different, so it is likely to have the same J2EE Web App in Winnt / 2K / XP, but a problem of garbled when it is transplanted into UNIX / Linux.

Recommended Writing: String str = new string (Str.getbytes), "ISO8859_1"); // Assume that the original code of the STR is GB2312, unrelated to the OS

Not recommended: string str = new string (Str.getbytes (), "ISO8859_1"); // Assume that the original code of the STR is GB2312, unrelated to the OS

转载请注明原文地址:https://www.9cbs.com/read-119055.html

New Post(0)