Several basic issues related to Chinese transcodes

xiaoxiao2021-03-06 45

The following is a summary of my recent practices, write beginners who met Chinese issues in J2EE Web development, and the masters don't have to be. The examples mentioned in the text take self-participation projects. I developed Traditional Web under Simplified System. The network page transmits the transfer process of Chinese parameters to servlet

1. Type Chinese strings in the input box of the web page, click Submit.

II. The browser encodes the typical Chinese string A, submits to the server. (The browser encoded by the browser is encoded, such as "IE6" menu: View-> encoding " The encoding specified in the middle is taken as an example of MS950).

3. After the server gets byte stream A, press ISO8859_1 (default) to decode the string B. Specific can be divided into two steps. 1. Explain the incoming byte stream A with ISO8859_1 encoding rules. Get string B. 2. Explanation String B is encoded with Unicode, placed in memory. Use the Request.GetParameter ("paraName") in the servlet; the result is such a string. As can be seen, at this time, the server does not know what to encode the byte stream sent by the client browser, and uniformly interprets the incoming MS950 encoded byte stream A. Explain String B is naturally different from the string A entered in the web, that is, errors. At this time, no matter how it is output, it will be garbled.

I understand the above process, to give the servlet to the correct string A, the principle is to restore the above process. The specific steps are as follows: 1. String spara = request.getParameter ("paraName"); get the above "error" string B. 2. Byte bpara = spara.getbytes ("ISO8859_1"); gets the byte stream A, the above reduction process three. 1. 3. String Spara = New String (BPARA, "MS950"); get string a, and the above reduction process is two.

The STMParaname is saved at this time is the correct string of Unicode encoded. Servlets generally have two operations for this string: read and write the database or output to the client.

5. Java program interacts through the JDBC connector with the database, and the interaction process is based on the byte stream encoded by ISO8859_1. The following examples insert the spara into the database. 1. Perform Update Into TbName Set ColName: =: Spara 2JAVA process The spara string in this SQL statement is encoded into byte stream by ISO8859_1, sent to the JDBC connector, the JDBC connector sends this byte stream to the database. The database is restored to a string according to the ISO8859_1 mode and writes the string into the database. Strings are related to the database settings in the database (take UTF8 as an example). As can be seen more, the process of JDBC transmission byte stream is transparent to the user. Regardless of the coding storage in the database, you can write the database correctly if you guarantee the "correct" string in the SQL statement. No need to do any transcodes. When the Java is read from the database, there are two ways, one is to read the string directly, using SMSG = ResultSet.GetString ("colname"); the other is to read the byte array, with BMSG = ResultSet.GetBytes ("colname"); the former one directly got a string, the latter, the latter, you need to know the storage coding method of the database, such as: smsg = new string (BMSG, "UTF-8"); . When you output a string to the client, the server is sent to the character string to the browser according to the settings in the Servlet or JSP. In the JSP via <% @ Page ContentType = Text / HTML; Charset = MS950 "%> setting, in the servlet program via Response.setContentType (" text / html; charset = ms950 "); Customers want to see the normal display, just let the browser's encoding settings and the above settings, or compatible. If there is no above setting, the servers are output to the client by default, and since ISO8859_1 is single-byte encoding, the Chinese words cannot be encoded, so it has been distorted during the encoding process, so it is not possible to display correctly in the browser.

Number of bytes about text: To get the correct byte number, that is, the size of the string storage, the key has two points. 1. The correct string, guarantees the Unicode string in Java is a string that is really required. 2. Determined encoding mode, in different encoding methods, the number of bytes that the same Chinese characters may be different. For example, we have to get the number of bytes encoded by the string spara, can be divided into two steps below. 1. Get the correct spara string, reference process four. 2.bpara = spara.getbytes ("MS950"); gets the MS950 encoding. 3. Bpara.length get the number of bytes. If you do not specify an encoding method, it is meaningless to talk about the number of bytes of the string.

Supplement: In Java, string objects have almost stored in memory in the unicode, and there are different coding and storage methods in database, files, network transfers, and browsers. When encountering Chinese internal code issues, the most convenient thinking is to be centered on the Unicode string and then confirm that the string is required to convert the character string or array.

转载请注明原文地址:https://www.9cbs.com/read-60426.html

9cbs

New Post(0)