Character set series four summary

xiaoxiao2021-03-06 28

I have been a relatively vague issue of character sets in Java, and finally has a relatively clear concept.

1. The default in the Java virtual machine is a Unicode string, refers to the method of java class, including the Chinese string exists in JavaAss, so Java class can cross the platform. It also refers to the way the virtual machine is running. The String object itself is no character set concept. It is just an array of Unicode Char, and the BYTE stream is characteristic of a specific character set, and it is not clear to tell the JVM. . Getbytes can be seen as the transformation of Unicode Char to BYTE. The process of building a string can be seen as the transformation of Byte to Unicode Char.

2. The encoding of various strings in the web container httprequest is implemented by the web container, according to the RFC2616, if the character set is not specified in the HTTP request, use ISO8859_1 encoding, such as: Tomcat, but resin2.1.0 is implemented Coded local coding.

In fact, RESIN has made a transformation from other services, and he transforms the character stream of ISO8859_1 from the browser into a local character set, and then passed through HttpRequest to the application.

Java Servlets 2.3 Draft Draft, in which a method setcharacterencoding (String ENC) is added in the servletRequest interface, you can replenish the CHARSET information that is missing in the HTTP request, which can be implemented inside the servlet engine, but there is not so realization in RESIN because The value obtained by the getcharacterencoding method is NULL;

3, JDBC: Database and JDBC drivers are ISO8859_1, for applications, when stored, the correct driver should be able to store memory in Unicode mode, convert to ISO8859_1, and then send it to After the database, the database is obtained, stored according to the database character set.

(Yesterday, I saw it today, I think it is still not right, I have made some modifications today, and some concept errors are corrected. Thank you.).

4, in-depth understanding and conjecture: I am thinking of byte streaming in the network, just like this word stream with Java's processing mode for ISO8859_1 characters (one byte matches a character, I guess other 00-ff The 8-bit character set can be available). So we will think there is a conversion from Unicode to ISO8859_1, in fact, is transmitted on the network is the BYTE stream.

Thus, when the JDBC client sends data, a GetBytes () method is used to get a local encoded BYTE stream, which is passed to the server, and the server side uses the default database character set to assemble, form a string.

HTTP services are also similar, but the client environment is a browser. When the encoding is sent, the browser is encoded at the time.

转载请注明原文地址:https://www.9cbs.com/read-46476.html

9cbs

New Post(0)