Preliminary discussion of character sets (1)

xiaoxiao2021-03-06  58

Original text published in ITPUB Technology Series

Oracle Database DBA Special Technology is essential, unauthorized, it is strictly prohibited to reprint this article.

Original link: http://www.eygle.com/special/nls_character_set_01.htm

Oracle Global Support (ie, Globalization Support) allows us to store and retrieve data using local language and format. With global support, Oracle can support multiple languages ​​and character sets to display powerful charm of the database.

Due to the common storage of different languages ​​and character sets, the character set is once a major problem that is generally plagued by everyone. This article tries to discuss some common problems, hoping to share some practical experiences!

1. Basic knowledge of the character set

If you start from the beginning, the earliest coding scheme from the character set came from with ASCII.

This is also our most common encoding method. The program originated in the early 1960s, initially developed a common standard for the US Congress Library to be used as a book of the American library world, and finally improved the national standard ASCII (American Standard CodeFor Information Interchange), which was developed, and further evolved into worldwide Computer Character Coding Standard ISO646 (which is named 7-bit Coded Character Set for Information Interchange). Become a basis for computer coding schemes.

The earliest-supported coding scheme for Oracle Database is US7ASCII. But we know that English characters are generally stored in one byte, and the 7-bit coding scheme can only represent 128 characters; the extended 8-bit coding scheme is only It can represent 256 characters, which is far from satisfying the need for computer development, and more code is required for complex character storage of Asian countries, so that various coding schemes are born.

In order to accommodate all characters and symbols all over the world, solve compatible and conversion problems between different codes, January 1991, more than 10 companies jointly, and set up the Unicode Association, then Unicode coding is generated. The slogan of the Unicode Association is: gives each character with a unique number, no matter what the platform, no matter what the procedure, no matter what language. Initially Unicode encoding uses 2-Byte (16bit) to encode, but only 65,536 characters can be accommodated, and it is still not used. Later, it is the unicode3.1 standard, add additional supplemental character definition, now Unicode4. 0 The standard has been released, and you can refer to the official site of Unicode:

www.unicode.org

The Unicode coding scheme has three implementations: UTF-8USC-2UTF-16Oracle starts support UTF-8 encoding from 7.2, providing Unicode encoding support.

According to the meaning of various standards, Oracle recommends that if your database needs to store different symbols and characters in different languages, it is recommended to use the Unicode coding scheme. It is true that UNICODE solutions can represent more characters, but due to multiple storage, additional storage space and network transmission are required, the most suitable database character set still needs to be careful.

转载请注明原文地址:https://www.9cbs.com/read-118085.html

New Post(0)