Chinese Character Code Standards and Identification (1) Page (CODE PAGE)

xiaoxiao2021-03-05  47

BBS Shuimu Tsinghua Station: Essence Sender: Yanlc (Soul Yan Yuan ~~ Don't care about me, annoying), letter area: Linux Title: Chinese Character Code Standard and Identification (1) Send Station: BBS Shuimu Tsinghua Station (SAT Apr 29 17:19:05 2000) http://www.linuxforum.net/cgi-bin/perl/showpost.pl?board=Chinese& number = 766 & Page = 2 & view = expanded & sb = 5 Subject Chinese Code Code Standard and Identification (1) Code Page Page Posted Posted by ShuYong Posted On 4/16/2000 9:05 PM Chinese Characters Coding Standards and Identification (1) Code Page This section is written according to the following article, it is recommended to be serious Study on the high theory of these experts. Reference 1 <> censole

> Weekly 97-1-17 Reference 2 <

<张 轴 材 历 历 历 历 历 程 程 程 程

> <

> Zhou reporter Huang Weimin Xiao Chunjiang 99-8-30 Reference 3 <

> Wu Jian <

> Publish Date: 1998-12-21 Total number: 348 This year: 51 Reference 4 <

> Sun Yufang <

> Publishing Date: 1998-07-06 Total number: 323 This year: 26 Reference 5 cjk.inf: fp: //ftp.ora.com/pub/examples/nutshell/ujip/ doc / cjk.inf I am just amateur level, not an expert, and many terms in the reference materials don't understand, and I have never seen any standard formal text, the wrong and blur are inevitable. At the same time, because the relevant departments of the state are not enough for publicity, promotion and implementing national standards, such as beginners or small companies in this field are unfavorable competing due to lack of information resources due to lack of information resources. When ASCII is developed, there is no support for multi-language, especially the object of Chinese characters. For this, many solutions have been put forward, with the code page system (ISO2022) is a generally implemented program, while ISO10646 / GB13000 / Unicode is the future direction. China's Chinese character encoding standard GB2312 is 7BITS standard, specifically a double 7-bit byte standard. And ASCII is a single 7-bit byte standard, how is the computer distinguished? One is in the eighth position "1", prompting the computer to transfer double-byte coding, which is the most common implementation, also called EUC (Extended Unix Code. The other is to use a special mark prompt computer to transfer the double-byte coding, such as Hz coding is started with the beginning, with the ended block identifies a double-byte coding area. They are all implementations of GB2312. Object Chinese Chinese characters Such an image system, code page is based on various countries, regions or industry standards, encoded according to EUC. The code page is compatible with ASCII, which is an inequality. It will bring the complexity of code, and will also lead the garble problem caused by code page switching. Unicode is a multi-byte equation. ISO10646 / GB13000 / Unicode is now consistent on UCS2, that is, double-byte coding standards have been implemented. The ISO10646 / GB13000 / Unicode discussed below is just the case of UCS2. Unicode is compatible with ASCII's policy implementation of the "0" byte of the ASCII. As the ASCII code of "A" is 0x41, the Unicode code is 0x00, 0x41. Here is mainly from the National Standard (GB) series to understand Unicode. If you don't look at the reference 5 (English), I still don't know if the country is about the standard of Chinese character encoding. Chinese people actually have to understand Chinese character coding standards from English materials, which is really helpless. Common Chinese coding standard Source: cjk.inf GB2312-1980 (GB0) (Simplified) GB7589-1987 (GB2) (Simplified) GB7590-1987 (GB4) (简体) GB13000-1993 GB6345.1-1986 (GB0 Amendment) GB8565 .2-1988 (GB8, GB0 expansion) GB / T12345-90 (GB1) (Traditional) GB / T13131-9X (GB3) (Traditional) GB / T13132-9X (GB5) (Traditional) The transverse representation character set series. Longitudinal representation of various series of development standards. Where GB2312 is a base set, it is the most common standard. The GB7589 / GB7590 is an extension, which may not be able to coexist with GB2312 during use, and you need to switch. GB7589 / GB7590 is arranged according to the part (department) and pen (strokes), but what is the word, how to arrange, in what field, unclear. The GB2312 series has been differentiated and expanded, and the original GB2312-1980 standards have been different (refer to 5). Because there is no standard text, it is not known that the font is being used. According to the latest Unicode 3.0, the latest national standard is GB16500-95, and I don't know which series.

ISO / IEC 10646 is equivalent to GB13000-1993 / JIS0221-1995 / KSC5000-1995. The goal of formula is to include the text of each language, of which the most Chinese characters (Unicode 2.0 has 20,090 Chinese characters). With regard to the standard, you can see the reference 1. The wind and rain in the process can be found. In short, this is an international standard for our country to participate in and dominant. GBK is an intermediate product that GB2312 transition to GB13000. It is a large extension of GB2312, encoding downward compatible with the EUC coding of GB2312, the word exchange (character set), and GB13000, and is 3 times the GB2312. So, GBK also contains the words of BIG5, Shift-JIS, KSC. Note that only the word exchange is included, and the coding is different from the original standard. In the specific application, the string of GB2312, BIG5, SHIFT-JIS, KSC can be displayed with GBK fonts. But except for GB2312 strings, all other converts. Because the language is unknown, it is unclear who dominates the GBK. Because some English funds said that Microsoft has developed GBK, and the country has not been described. From these reference materials, only 94 years ISO / IEC 10646 is released, Microsoft develops the Windows 95 Chinese version, and Chinese extension coding is required. The "Chinese Character Expansion Code Specification" GBK was released in 1996 (refer to 1 ~ 3). According to the standard release, it is estimated in the late year, which is 95 years. Windows95 and subsequent version of the Chinese version supports GBK. The EUC coding range of GB2312 is the first byte 0xA1 to 0xFe (actually only 0xF7), the second bytes 0xA1 ~ 0xFE. GBK expands this. The first byte is 0x81 ~ 0xfe, the second byte is divided into two parts, one is 0x40 ~ 0x7e, and the second is 0x80 ~ 0xFe. The same area is the same as the GB2312, the word is exactly the same. The extension part is probably taken from the GB13000 from the GB13000 according to the part (partial) and pen (strokes). Therefore, GBK is not GB13000, although the words are the same, but the coding system is different. One is the ISO2022 series does not equalize, one is equal length code, and the encoding area is also different. Note that GBK is actually not national standards. There is a GB2312 base set before that, it is a more advanced GB13000. GBK is just a transition and expansion specification. So there is GB2312-> Unicode, GB12345-> Unicode in Unicode, without a GBK-> Unicode conversion form. Only Microsoft made Code Page 936 (cp936.txt) can be counted as GBK-> Unicode conversion form. But pay attention to this is a document made by a business company, rather than the national or international standard organization, which is likely to have inconsistent with the standard. Recently, you have found some useful standard files in Fang Zheng Fonts. Interested to download. But pay attention to GBK-BIG5.TAB and GB-BIG5.TAB These two files are a bit flawed. Http://www.founderpku.com/ FONTWEB / DOWNLOAD / GBK-BIG5.TAB http://www.founderpku.com/fontweb/download/gb-big5.tab http://www.founderpku.com/fontweb/gb2312.htm http://www.founderpku .com / fontweb / gbk.htm uses these conversion tables to make other standard mutual conversion tables, which will differ from traditional conversion tables.

转载请注明原文地址:https://www.9cbs.com/read-32640.html

New Post(0)