Some of my concepts is not very definitely correct, please give high people.
Related factors are: the source of the source code, the character set when compiling, the character set at runtime.
1, String is stored in char [], the Unicode character stored in each char, each Unicode contains two bytes (regardless of Chinese, English, or what language), and the CHAR correspondence is the character in Unicode. number.
2. After the source code is compiled into a Class file, the String string is UTF8 code in the class. So if the string in the source code contains Chinese, only the character set of the corresponding environment can be compiled (Javac -Encoding) can get the correct UTF8 code. The source code character set is consistent with the character set used when compiling.
3. When the program is running, the default character set used by the String.getbytes () is the character set of the current runtime environment, not the character set when compiling.
4. String.getbytes (charset) The process is that each character char in the string is mapped to the same character in the character set chars, get this character in the character set. CHARSET (String.getBytes (charset) byte [] is also a representation of this number). When the character set Charset does not include character char, this mapping is unsuccessful. For example, the Chinese is Chinese characters, String.getbytes ("ISO-8859-1"), because ISO-8859-1 does not include Chinese words, this time getting this time [] is a 63, which is ISO-8859- 1 characters'? '(Character set will not know the character default to'? ').