A little understanding of the Chinese code system problem [reproduced]

xiaoxiao2021-03-06 46

Author: zhaofei

Source:

Http://zhaofei8009.blogchina.com/blog/Article_55041.502959.html

On December 22, 2004, many posts in the Forum were discussing the Chinese code system on the mobile phone. I have also been plagued by such problems and got a lot of enthusiastic friends. Looking for learning and testing through one end time, I have a little bit of understanding and ideas for this question, I don't dare to share it, I will share it for everyone. Because I have limited level, it is also amateur development enthusiasts, there is no professional theoretical study. The level, so please understand and propose some of the mistakes in the article, this article is only to throw the jade, very welcome to pay, our group of group, together to solve this problem. :) Many posts on the forum are discussing the Chinese code system on the mobile phone, and I have been bothered by such problems and got a lot of enthusiastic friends. Looking for learning and testing through one end time, I have a little bit of understanding and ideas for this question, I don't dare to share it, I will share it for everyone. Because I have limited level, it is also amateur development enthusiasts, there is no professional theoretical study. The level, so please understand and propose some of the mistakes in the article, this article is only to throw the jade, very welcome to pay, our group of group, together to solve this problem. :) The string inside the phone is basically the UTF-8 encoding method. The basically the ASCII and Unicode encoding method used on the PC machine is the single-byte coding method, can only represent 256 characters, English letters are enough but they can't represent Chinese characters Unicode is double bytes. The encoding method can be used to represent Chinese characters, but there is a waste of too many spaces for the general English letters (at least the storage of mobile phones). UTF-8 is a new coding method for this embedded device for mobile phones. His feature is that traditional ASCII characters are represented by one byte, but if the character does not belong to the ASCII character set, they use two to three. Points.

The character (traditional ASCII characters) between 0x0001-0x007f (traditional ASCII characters) uses a bit 0 | bits0-6 in 0x000 and the characters between 0x0080-0x07FF Use the following: 1 | 1 | 0 | BITS 6-10 | 1 | 0 | BITS 0-5 If the virtual machine sees such a character, the virtual opportunity will take the rest of the remaining bits of the 110 and the first one of the first bytes of the first byte. Combined into a 2-byte number of digits to represent characters: 00000 | BITS 6-10 | BITS 0-5 Similarly, 0x0800 - 0xffffffffffffffffffffffffffFFF BITS 6-11 | 1 | 0 | BITS 0-5 can also be re-combined into a string of two bytes into a string of two bytes, and the NULL characters in kjava are also used to represent the two bytes. Not a byte :) Of course, the English string does not have any problems in the UTF-8 encoding method (the default is the standard ACSII encoding mechanism) Main problems or Chinese, I personally engage in Chinese characters in Kjava mobile phone development The problem is mainly divided into the following categories: 1.RMS database read and write questions; 2. Writing a game Chinese name in JAD; 3. Chinese problem in network transmission (decoding of KXML transmission); 4. Partial simulator also It does not support Chinese. These parts are in mobile development, Chinese often errors, the usual form is garbled :) 1. Understand the basic principle of the UTF-8 code is very conducive to us to solve the code system transformation The problem is in transforming the UTF-8 code to process the method to write to Chinese String AppT3 = "Chinese characters" to the database; ByteArrayoutputStream bos = new byterrayoutputstream (); DataOutputStream DOS = New DataPutstream (BOS); DOS; DOS .writeutf (appt3); Byte [] bytes3 = bos.tobyteaRray (); rs.addrecord (bytes3, 0, bytes3.length); // read Chinese by reading in the database [] = RS.GETRECORD (DBID); DataInputStream DIS = New DataInputStream (New ByteArrayinp UtStream (b3)); string chinastring = disp.readutf (); Writeutf () and readutf () are methods of DataOutputStream and DataInputStream objects, providing a way to translate from Unicode to UTF-8.

Midp a closer look at the documentation, you can see the following writeUTF (): First, two bytes are written to the output stream as if by the writeShort method giving the number of bytes to follow This value is the number of bytes actually written. out, not the length of the string. Following the length, each character of the string is output, in sequence, using the UTF-8 encoding for the character.If no exception is thrown, the counter written is incremented by the total number of Bytes Written to the Output Stream. This Will Be at Least Two Plus The Length of Str, And at Most Two Plus thrice the length of str. Of course, we can also write code by hand, transform the Chinese string into Byte [] Then put it in RMS, and turn into string when taken out. Here, borrow bingo_guan method (bingo_guan, please don't mind :)), of course, this code is also very design mode :) Hehe, this class can also be used for text file operations.

/ ***

Title: *

Description: Unicode string conversion tool *

Company: CC Studio * @author Bingo * @version 1.0 * / public class UnicodeString {public UnicodeString () {} public static String byteArrayToString (byte abyte0 [], int i) {StringBuffer stringbuffer = new StringBuffer ( ""); for (INT j = 0; j > 8);} Return Abyte0;}} 2. Second, in Jad and Manifest, Chinese characters (such as the name of the game ) It is actually UTF-8 encoding. This block is also a dangerous area that is often issued. I suggest that I have manually converted into UTF-8 code writes above, otherwise, if you write Chinese with Unicode Codes In the mold There is a risk that the program cannot be performed on the dispenser or actual device. So everyone should try as much as possible when editing JAD files :) Special note, WTK JAD automatically generated tools do not support directly entering UTF-8 format directly in JAD and Manifest, so it is impossible to handle this step: (. 3. Different mobile phones, the default code system, which is also supported, this is also the key to problems, and the system properties of CLDC "microedition.encoding" define the default character encoding of the device, and its value can use System. GetProperty method gets. We can also translate into relevant supported encoding mechanisms to actually run our program. This approach we usually use in the mobile phone Chinese problem transmission, because the mobile phone is uncertain during the network. I give an instance code and explore this problem.

Server to the client: --------------------------------------------- -------------------- The following code is that the server is written to the Client side. After gbencoding () method, all characters are encoded into: / uxxxx. - -------------------------------------------------- ------------- Code: ----------------------------------- -------------------------- / *** Write the string data ** @Param out * @Param value * / public static void writeunicode final DataOutputStream out, final String value) throws ActionException {try {final String unicode = StringFormatter.gbEncoding (value); final byte [] data = unicode.getBytes (); final int dataLength = data.length; System.out.println ( "Data Length IS:" Datalength); System.out.Println ("Data IS:" Value); Out.writeInt (Datalength); // Write the length of the string Out.write (Data, 0, Datanceth ); // then write the transformed string} catch (ioException e) {throw new actionException (IMDefaultAction.class.getName (), E.getMessage ());}} ---------- -------------------------------------------------- ---------- The following code is a gbencoding () method, convert the double-byte character into / uxxx, the ASIIC code is federated in front.

-------------------------------------------------- -------------------- / *** this method will encode the string to uncode. ** @Param gbstring * @ return / code: ----- -------------------------------------------------- ------------------------- Public STATIC STRING GBENCODING (FINAL STRING GBSTRING) {char [] UTFBYTES = GBString.toCharaRray (); String Unicodebytes = " "; for (int ByteIndex = 0; byteindex -1) {end = datatr.indexof ("// u", start 2); string charStr = ""; if (end == -1) {Charstr = Datastr . Substring (Start 2, DataStr.length ());} else {charstr = Datastr.Substring (Start 2, End);} char letter = (char) integer.parseint (charstr, 16); // 16 into Parse plastic string.

Buffer.append (new character (letter) .tostring ()); start = end;} Return Buffer;} ------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- ---- ---------------------------------------------- ------------------------ Client to Servers: --------------------- -------------------------------------------------client The end uses the following method to encode the characters of the mobile phone into ISO-8859-1, pass it to the server. -------------------------------------------------- -------------------- Code: ------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- - / *** WRITE THE STRING DATA * @Param Value * @Param Outdata * / Private Void Writesjis (DataStream Outdata, String Value) {Try {Byte [] Data = NULL; // Data = (Value) .getbytes "UTF-8"); DATA = (value) .getbytes ("ISO8859_1"); Outdata.writeInt (Data.Length); Outdata.write (Data, 0, Data.length); System.out.println ("Data .length: " Data.Length; System.out.println (" DATA.VALUE: " value);} catch (exception ex) {system.out.println (" Write Error "); ex.printstacktrace () ;}} ------------------------------------------------- ----------------------------------------------------------------- -------------------------------------------------- --- The server side receives the client character stream, which is converted to UTF-8 with the following method, and the later operation is based on UTF-8 encoding. SQLServer may be due to different transformations, so accessing the database is also the corresponding processing of the specific DB internal code.

转载请注明原文地址:https://www.9cbs.com/read-70127.html

9cbs

New Post(0)