First, bytes and Unicode
The Java kernel is Unicode, even Class files are also, but many media, including file / streams, are used by word. Therefore, Java is to transform these byte streams. Char is Unicode, and Byte is byte. The functions of Byte / Char in Java are in the middle of Sun.IO. The BytetocharConverter class is scheduled, which can be used to tell you, you use the Convertor. Two of these very commonly used static functions are:
Public static bytetocharconvert ();
Public static bytetocharconverter getConvert (String Encoding);
If you don't specify Converter, the system automatically uses the current Encoding, with GB platform with GBK, EN platform 8859_1.
Byte -> char:
"You" GB code is: 0xc4e3, Unicode is 0x4f60
String encoding = "GB2312";
BYTE B [] = {(byte) '/ u00c4', (byte) '/ u00E3'};
ByTocharconverter CONVERTER = bytetocharconverter.getConverter (Encoding);
Char C [] = converter.convertall (b);
For (int i = 0; i System.out.println (Integer.tohexString (C [i])); } what's the result? 0x4f60 If encoding = "8859_1", what is the result? 0x00c4, 0x00e3 If the code is changed to: BYTE B [] = {(byte) '/ u00c4', (byte) '/ u00E3'}; Bytetocharconvert (); getDefault (); Char C [] = converter.convertall (b); For (int i = 0; i System.out.println (Integer.tohexString (C [i])); } What will the results will it be? This is to be determined according to the encoding of the platform. Char -> Byte: String encoding = "GB2312"; Char C [] = {'/ u4f60'}; Chartobyteconverter CONVERTER = Chartobyteconverter.getConverter (Encoding); Byte b [] = converter.convertall (c); For (int i = 0; i System.out.println (Integer.tohexString (B [i])); } what's the result? 0x00c4, 0x00e3 If encoding = "8859_1", what is the result? 0x3f If the code is changed to String encoding = "GB2312"; Char C [] = {'/ u4f60'}; chartobyteconverter converter = chartobyteconverter.getDefault (); Byte b [] = converter.convertall (c); For (int i = 0; i System.out.println (Integer.tohexString (B [i])); } What will the results will it be? Still depending on the encoding of the platform. Many Chinese issues are derived from these two simplest classes. However, many classes don't directly support Encoding entries, which brings us more inconvenience. Many procedures are rare to use Encoding, directly with Default's Encoding, which gives us a lot of difficulties. Second, UTF-8 UTF-8 is corresponding to Unicode, which is very simple: 7-bit unicode: 0 _ _ _ _ _ _ _ 11 unicode: 1 1 0 _ _ _ _ _ 1 0 _ _ _ _ _ _ _ 16-bit unicode: 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ _ 21 unicode: 1 1 1 1 0 _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ _ 1 0 _ _ _ _ _ _ Most of the cases are only available to Unicode below: "You" GB code is: 0xc4e3, Unicode is 0x4f60 Binary of 0xC4E3: 1100, 0100, 1110, 0011 Since only two we are in the two codes, we have found this line, because the 7th is not 0, therefore, return "?" 0x4f60 binary: 0100, 1111, 0110, 0000 We make up with UTF-8 to become: 1110, 0100, 1011, 1101, 1010, 0000 E4 - BD - A0 Then return: 0xE4, 0XBD, 0xA0. Third, String and Byte [] String is actually core is char [], however, to convert Byte into string, must be encoded. String.length () is actually the length of the char array, and if you use different codes, it is likely to be scattered, resulting in scattering and garbled. E.g: String encoding = ""; Byte [] b = {(Byte) '/ u00c4', (byte) '/ u00E3'}; String str = new string (b, eNCoding); If eNCoding = 8859_1, there will be two words, but Encoding = GB2312 is only one word this problem in processing paging. Four, Reader, Writer / InputStream, OutputStream Reader and Writer cores are CHAR, INPUTSTREAM, and OUTPUTSTREAM cores are BYTE. But the main purpose of Reader and Writer is to read Char read / write InputStream / OutputStream. E.g: Document Test.txt has only one "you" word, 0xc4, 0xe3 String encoding = "GB2312"; InputStreamReader Reader = New FileInputStream ("Text.txt"), Encoding; Char C [] = new char [10]; INT length = reader.read (c); For (INT i = 0; i System.out.println (C [i]); } what's the result? It's you". If encoding = "8859_1", what is the result? "??" two characters, indicating that you don't know. Instead, do it yourself.