Pick up unicode and coding experience

xiaoxiao2021-03-06 31

You may not be too clear, if you don't say Decode, so many people still don't understand

How to turn it, I will add it, it may be repeated with you.

Unicode strings are actually stored in memory in Unicode. Some platforms, 2

Bytes represent a character (English letters and Chinese characters are represented by double bytes), called UCS-2;

There is a platform to represent a character in 4 bytes, called UCS-4; then, 2 bytes (or 4) are as a

The unit is handled, the disassembly is meaningless, it is like an ASCII character byte only 4 positions.

kind. Python is UCS-2. Because the Unicode string includes almost all text, we should

This will try to use the Unicode string.

(UCS-2 and UCS-4 can be found online.)

The above is the internal code, if you want to send the Unicode string through the network, or write files, these

It is a stream IO process, in bytes, also known as a byte stream. This will turn double bytes or four-byte units into bytes.

Unit, called encoding; and a non-Unicode string should be converted to a Unicode string,

Decode is called decode.

In addition, you have to distinguish UCS-2 and UTF-16, and the previous one byte in UCS-2 is meaningless, which does not mean.

A character, so you have to store the Unicode string into files in UTF-16, or encode:

f = open ('Test.txt', 'WB ') # Note To open the file in binary

a = unicode ('China', 'CP936')

# Generate a Unicode string, you can replace with A = 'China' .Decode ('cp936'), maybe more

Clear, symmetrical with Encode.

F.Write (a.encode ('UTF-16'))

# You can try other coding, such as UTF-8, will generate a Unicode file header, and WindowsXP's notepad can also

Enough to identify this format, open the saved as, will display the current encoding.

f.close ()

In addition, call Unicode ('China', 'CP936') is a decoding.

转载请注明原文地址:https://www.9cbs.com/read-46281.html

9cbs

New Post(0)