ANSI and Unicode

zhaozj2021-02-16  106

Peter Qian 07-31-04

ANSI - National Standards Society

In the initial time, there is only one character set on the Internet, which uses 7 bits to represent a character, which represents 128 characters, including common characters such as English letters, numbers, punctuation. After that, it is expanded, and the 8 bits represents a character, and 256 characters can be represented, and some special symbols such as tabs are added to the original 7 bits character set.

Later, due to the addition of language language, ASCII does not meet the needs of information exchange, so in order to represent the text of other countries, countries have developed their own character sets based on ASCII, which is habitually obtained from the ANSI standard. The collective is an ANSI character set, and their official name should be MBCS (Multi-Byte Chactacter System, That is, multi-byte character system). These derived character sets are characterized by ASCII 127 Bits, compatible with ASCII 127. They use larger than 128 encoding as a Leading Byte, followed by the second (or even third) characters behind Leading Byte as the Leading Byte. Actual code. There are a lot of such character sets, and our common GB-2312 is one of them.

Since each language has established its own character set, it is very inconvenient to convert the character set frequently in international exchanges. Therefore, the Unicode character set is proposed, which is fixed using 16 bits (two bytes, one words) to represent a character, and can represent 65536 characters. A common character of almost all languages ​​in the world is convenient for information exchange. The standard Unicode is called UTF-16. Later, for the double-byte Unicode can be properly transmitted on the existing handler system, UTF-8 appears, and Unicode is encoded using similar MBCS. Note that UTF-8 is encoded and it belongs to the Unicode character set. The Unicode character set has a variety of coded forms, while ASCII has only one, most MBCS (including GB-2312) is only one.

转载请注明原文地址:https://www.9cbs.com/read-11354.html

New Post(0)