Preliminary understanding of UTF8 coding!

xiaoxiao2021-03-06  64

There are many places in the network to use UTF8 encoding. Since you want to write procedures related to the mail server, some local users have used UTF8 encoding, so there is a preliminary understanding of it!

It is actually similar to Unicode, which is different in the coding method!

First, the size of UTF8 encoding is not necessarily, unlike the size of the Unicode encoding!

Let's first see the unicode code: an English letter "a" and a Chinese character "good", the size of the occupied space is the same, both of which are two bytes!

And UTF8 encoding: an English letter "a" and a Chinese word "good", the size of the space occupied after the encoding is different, the former is a byte, the latter is three bytes!

Let's take a look at the principle of UTF8 coding:

Because a letter is added to some keyboards to add only one binary seven bits, and one byte is eight bits, so UTF8 uses one byte to character and some keyboards. However, how do we know its composition after we get the encoded byte? It may be a byte of English letters, or it may be one byte in the three bytes of Chinese characters! So, UTF8 is a sign!

When the content to be represented is 7 characters, use one byte: 0 ******* The first 0 is the flag, and the remaining space can represent the content of ASCII 0-127.

When the content to be represented is 8 to 11 bits, use two bytes: 110 ***** 10 ***** 110 and the second byte of the first byte of the first byte flag .

When the content to be expressed, use three bytes in 12 to 16: 1110 ***** 10 ****** 10 ******, the same, the first byte 1110 And the second, the three bytes of 10 are flag bits, and the remaining space can represent Chinese characters.

Push this:

Four bytes: 11110 **** 10 ****** 10 ****** 10 ******

Five bytes: 111110 *** 10 ****** 10 ****** 10 ****** 10 ******

Six bytes: 1111110 ** 10 ****** 10 ****** 10 ****** 10 ****** 10 ******

.............................................

............................................

Understand?

The encoding method is from low to high

Let's take a look at the example now!

Yellow is marker

Other coloring in order to display its, encoded position

Unicode sixteen-entered unicode binary UTF8 binary UTF8 sixteen-entered UTF8 bytes number b 1 9d 00010011101 11000010 10011101 C2 9D 2 A89E 10101000 10011110 11101010 10100010 10011110 EA A2 9E 3

转载请注明原文地址:https://www.9cbs.com/read-111654.html

New Post(0)