--Unicore Inside Locale Outsite
http://www.chedong.com/tech/unicode_java.html
Author: Cha Dong chedong@bigfoot.com
Last updated: 2002-12-21 23:40:30
Copyright Notice: You can reprint anything, please indicate the original source and author information when reproduced.
abstract:
1 According to Java's International Design Framework: How to make Java applications support Chinese through the localization settings of Linux systems
2 Follow the Java WebApp design framework specification: Set the problem with the Urlencoder.Encode () method and the system default encoding method via web.xml
3 Take Google's search engine as an example: explaining how to apply international and localization to our own application design (Unicode Inside Locale Outsite)
Let Java application support Chinese through the localization settings of the Linux system
The analysis of Chinese characters in Java programming technology and solves this article very good until recently, it is often reposted by some websites. One example shows how many Chinese programmers encounter Chinese characters have a problem of garbled code: "GB2312 IT" (Chinese)
The original text is as follows:
>>>>>>>>>>>
...... Not long ago, my technical friend sent me a letter that he finally found the root of Java servlet Chinese issues. Two weeks, he has been troubled for the Chinese issue of Java servlet, because each string containing a Chinese characters must be enforced to get the correct result (this is the only unique solution) . Later, he didn't want to continue to rest, because such things should not be the work to be done by senior programmers, he finds the source code for servlet decoding, because he doubizes the problem. . After four hours of struggle, he finally found the root of the problem. It turns out that his suspicion is correct, the decoding portion of the servlet does not consider the double-byte and directly regards% XX as a character. (The original Java Soft will also make this low-level mistake!)
If you are interested in this question or if you have the same troubles, you can modify servlet.jar according to his step:
Find the Static Private String Parsename in the source code httputils, copy the SB (StringBuffer) into BYTE BS [] before returning to Return New String (BS, "GB2312"). After making the above modification, you need to decode yourself:
Hashtable form = httputils .parseQueryString (Request.GetQueryString ()) or
Form = httputils.parsePostData (...)
Don't forget to build it in servlet.jar.
......
<<<<<<<<<<
I would like to ask a few questions: "Advanced" programmer:
1 If this is a commercial product, do you need your Hacking's servlet.jar to run this application?
2 Is this product only available on Chinese GB2312? What if it is the Japanese application, such as Hacking?
Maybe I am wrong, but my feeling is that the low mistake is not Java Soft, because the localization of the Java app is not implemented in the web application, but the system default encoding method of the JVM according to the environment of the operating system ( Locale) change to achieve. At the end of 2000, the Linux at the time was limited to Chinese Locale support, so the system default encoding mode was changed according to the setting of the LOCALE into GB2312, thereby changing the UVM default encoding method. For Linux to L10N, please see: Linux programmer must read: Chinese culture and GB18030 standard
How to set up the Linux from the system level to support Chinese coding (system default file.Encoding follows Chinese code decoding)?
So under redHat6.x, no matter how you set Locale, the default default file.Encoding of the system is ISO_8859_1 because redhat6.2 is based on GLIBC-2.1.x. Glibc-2.2.x on the RedHat7.x system has more complete support, so you can set it.
LC_all = zh_cn.gb2312; export lc_all
LANG = zh_cn.gb2312; export lang
Let the system default encoding method becomes GB2312 GBK ... thus changing the default encoding method of the JVM, then, the JVM will be converted in accordance with the system default encoding method after any word stream. .
On Linux based on GLIBC2.2: It is possible to change the default encoding method of the system through local settings, thereby changing the default encoding, decoding method of the application.
There are 2 points here I want to explain:
1 To a business operating system, it is said that Linux supports internationalized support from the commercial operating system such as Windows Solaris: 2 years or even more.
2 Linux is developed by the GNU tool: there is no Linux without GNU. So Linux supports localization, and it is also gradually developed after the core Glibc-2.2.x has better support for Chinese Locale.
Solve the Urlencoder.Encode () method and system default encoding mode related issues related to Web.xml
According to the scope I understand, JDK1.3 is very uncontrollable in Java's international norms is when using Urlencoder:
For example, in Chinese WIN98, use Urlencoder.Encode (String S): For example, "Chinese" These 2 words are directly encoding result is "% 3F% 3F" => "?". The reason is very simple, "Chinese" In the eNCode (), you need to encode the GBK coding in the encode () into 4 Byte after URLENCoding is correct. This is also revised in JDK1.4. Method Encode (String S) has not been encouraged, in addition to the need for Urlencoding strings, while need to specify ENCODE (String S, String ENC) of string coding. In this way, Urlencoder can have nothing to do with the default encoding method of the system.
Under JDK1.3, a web-App Framework-based application can be resolved by settings in web-inf / web.xml:
...
If the product is running in Chinese Windows98, the default character set is using GBK, then this application's web.xml needs to be set:
...
Unicode Inside Locale Outsite
The above 2 methods still just make the application more easily, and the application itself is not a real international application. Imagine how to design a global forum system: Can users with Chinese and Japanese can express the publication? What should be stored in the data intermediate processing phase? The answer is simple: Unicode. In the past, many articles had an introduction to how to design an international interface, just the interface output of international applications, but rarely mentioned the input and storage phases of the data, it is necessary to consider the internationalization after application, and UNICODE processing and storage should be used. Finally, I use Google's International Language Search Engine to show how to achieve international application: Google is a very good international application example (but I don't say Google is Java). Google users often have this feeling:
Why do I go to Google for the first time, what is the interface of Chinese?
Why do you have a Chinese website in all sites: Sometimes you have the result of the Japanese website? For example: "Google Secret"
Take "Google Secret" as an example: We enter "Google Secret" in the input box.
http://www.google.com/search?hl=zh-cn&newwindow=1&q=google