String.getBytes () Chinese code problem

xiaoxiao2021-03-06 88

String.getbytes () Chinese code issues SRX81 original (Participation: 240, expert points: 2040) Published: 2003-8-21 3:05 pm Version: 1.0 Read: 5371 times

String.getbytes () Question String's getBytes () method is to get the byte array of strings, which is well known. However, it is important to note that this method returns the byte array of the operating system default encoding format. If you don't take into account this when you use this method, you will find a good system on a platform and you can't think about another machine. For example, the following program, class testcharset {public static void main (String [] args) {new testcharset (). Execute ();} private void execute () {string s = "hello! Hello!"; Byte [] Bytes = S.getbytes (); system.out.println ("Bytes LengHT is:" bytes.length);}} Under a Chinese WindowsXP system, running, the result is: bytes Lenet is: 12 but if you put it Under the unix environment of English: $ JAVA TESTCHARSETBYTESLENGHT IS: 9 If your program depends on this result, problems will be caused in subsequent operations. Why is the result in a system 12, but the other changed to 9? The above has been mentioned, which is related to the platform (encoding). In the Chinese operating system, the getBytes method returns a Bikk or GB2312's Chinese encoded byte array, wherein the Chinese characters, each comprise two bytes. In the English platform, the general default code is "ISO-8859-1", each character takes only one byte (regardless of whether it is non-Latin characters). Java's code support Java is a multi-country code. In Java, characters are stored in Unicode, such as the Unicode encoding of "You" word is "4F60", we can verify the following experiment code: Class Testcharset {public static void main (string [] args) {char c = 'you'; int i = C; system.Out.println (c); system.out.println (i);}} No matter you at any platform Executation, there will be the same output: ---------------- Output ------------------ You 2032020320 is Unicode " The integer value of 4F60. In fact, you can refer to the above class, you can find that the character "you" (or any other Chinese string) in the generated .class file is stored in Unicode encoding: CHAR C = '/ u4f60'; .. Even if you know the encoded encoding format, such as: Javac -Encoding GBK Testcharset.java is generated after compiling. Class files are still stored in a Unicode format or string. Since using String.getbytes (String Charset), in order to avoid this problem, I suggest that everyone is using the string.getbytes (String Charset) method in the encoding.

Below we will extract the byte arrays of ISO-8859-1 and GBK from the strings, see what results will be: class testcharset {public static void main (String [] args) {new testcharset (). Execute ();} private vid execute () {string s = "Hello! Hello!"; Byte [] bytesiso8859 = null; byte [] bytesgbk = null; try {bytesiso8859 = s.getbytes ("ISO-8859-1 "); Bytesgbk = s.getbytes (" gbk ");} catch (java.io.unsupportedEncodingexception e) {E.PrintStackTrace ();} system.out.println (" ----------- --- / N 8859 bytes: "); System.out.Println (" bytes IS: " arraytostring (bytesiso8859)); System.out.println (" HEX Format IS: " EncodeHex (Bytesiso8859)); system. Out.println (); system.out.println ("------------ / n GBK BYTES:"); System.out.Println ("bytes IS:" arraytositing (bytesgbk) )); System.out.println ("Hex Format IS:" EncodeHex (Bytesgbk));} public static final string encodehex (byte [] B Ytes) {stringbuffer buff = new stringbuffer (bytes.length * 2); string b; for (int i = 0; i 2? B.substring (6, 8): b); buff.Append ("");} Return buff.tostring (); public static final string arraytostring (byte [] bytes) {stringbuffer buff = new stringbuffer (); for (int i = 0; i

转载请注明原文地址:https://www.9cbs.com/read-115922.html

9cbs

New Post(0)