---- Abstract: This article mainly discusses the special expression form of characters in the Java language, especially the expression of Chinese information, and the key to character processing is to convert the sixteen Unicode characters to this underground platform, The character form of the platform that runs the Java virtual processor can understand.
---- Keywords: Java, Character, 8-bit, 16-bit, Unicode character set
---- Java is a programming language, a running system, a set of development tools, and an application programming interface (API). Java is built in C , and the useful features, and cancel the complex, dangerous and excess elements of C . It is a language that is safer, simpler and easier.
1, Java characters expression
---- Java Language and C Language Differential descriptions of characters, Java uses 16-bit Unicode character set (this standard describes a variety of languages of many languages), so Java characters are a 16-bit no Symbol integers, character variables are used to store single characters, not a complete string.
---- Character, is a single letter, many letters constitute a word, a set of words forms a sentence, and so on. But for characters containing information, such as Chinese information, it is not that simple.
---- Java's basic char type is defined as a 16-bit unsigned 16-bit, which is the only symbol type in Java. The main reason for using 16-bit expression characters is to allow Java to support any Unicode characters, so that Java is suitable for describing or displaying any language supported by Unicode, which is more portability. However, it is possible to support a string display of a language, and the string that can properly print a language is often two different issues. Since Oak (Java initial code) development group is the UNIX system and some of the Unix-based systems, the most convenient and practical character set is ISOLATIN-1 for developers. Accordingly, this development group has UNIX genetic, which also leads to the Java I / O system to a large extent, in the UNIX stream concept, and in the UNIX system, each I / O device They are all represented by a string of 8 bits. This approach to UNIX in terms of I / O system, so that the Java language has 16 Java characters, but only 8-bit input devices, which brings some shortcomings to Java. Therefore, in any Java string, it is read or written in 8 bits. It has a short program code, called "Hack", to make 8-bit character mapping into 16-bit Unicode, or The 16-bit Unicode is smashed into 8-bit characters.
2, problem and solving
---- We have to implement information from one file, especially read files containing Chinese information, and display the read information on the screen, usually we open the file using the fileInputStream function, ReadChar function reads the character. as follows:
Import java.io. *;
Public class rf {
Public static void main (string args []) {
FileInputstream Fis;
DataInputstream DIS;
Char C;
Try {
FIS = New FileInputStream ("xinxi.txt");
DIS = New DataInputStream (FIS);
While (true) {
C = dishdchar ();
System.out.print (c);
System.out.flush ();
IF (c == '/ n') Break;
}
Fis.close ();} catch (exception e) {}
System.exit (0);
}
}
---- But in fact, run this procedure, the result of the output can be used is a bunch of useless garbled. The xinxi.txt file content cannot be correctly output, because the readchar function is read in the 16-bit Unicode character, and system.out.print is output as an eight-bit ISO Latin-1 character output.
---- Java 1.1 version introduced a new Readers and Writers interface to handle characters. We can use the InputStreamReader class instead of DataInputStream to process files. Modify the above program as follows:
Import java.io. *;
Public class rf {
Public static void main (string args []) {
FileInputstream Fis;
InputStreamReader IRS;
CHAR CH;
Try {
FIS = New FileInputStream ("xinxi.txt");
IRS = New InputStreamReader (FI);
While (true) {
CH = (char) Irs.read ();
System.out.print (c);
System.out.flush ();
IF (CH == '/ n') Break;
}
fis.close ();
} catch (exception e) {}
System.exit (0);
}
}
--- This can properly output text (especially Chinese information) in xinxi.txt. In addition, when the xinxi.txt file comes from a different machine, that is, a machine from different operational platforms (or Chinese characters), such as: files come from the client (client upload file to the server), and the operation of reading information The server is executed. If this function is implemented with the above program, it may still not get the correct result. The reason is that the input coding conversion failed, we also need to make the following changes:
......
INT C1;
INT j = 0;
StringBuffer str = new stringbuffer ();
Char lll [] [] = new char [20] [500];
String ll = "";
Try {
FIS = New FileInputStream ("FName.txt");
IRS = New InputStreamReader (FI);
C1 = IRS.READ (LLL [1], 0, 50);
While (LLL [1] [J]! = ') {
Str.Append (LLL [1] [J]);
J = J 1;
}
LL = str.tostring ();
System.out.println (LL);
} catch (ioexception e) {
System.out.println (e.tostring ());
......
---- This, the result of the output is correct. Of course, the above program is incomplete, just explaining the method of solving.
---- In short, character processing in Java languages, especially the treatment of Chinese information, more special. In Java, the key to character processing is to convert a sixteen Unicode character to this underground platform is also a character form that runs the Java virtual processor platform.