Chinese character expansion internal code specification - GBK
Chinese characters extended internal code - GBK's purpose is to solve the lack of Chinese characters, simple and flourish, simplifying code body
The bottleneck problem of Chinese character information such as inter-range conversion is exchanged, and in the premise of maintaining application software compatibility,
International Unified Double Byte Character Set Standard ISO10646.1.
1 Principle of expanding Chinese character internal code norms
With "Information Exchange Chinese Character Code Code - Basic Set", the internal code system standard for the national standard GB2312-80
Fully compatible.
Support ISO 10466.1 / "CJK unified Chinese character encoding character set" is the national standard GB13000.1
All CJK Chinese characters.
Non-Chinese characters also cover most common "BIG5" non-Chinese characters.
2 specification name and abbreviation
Chinese name: Chinese character internal code specification
English Name: Chinese Internal Code Specification
Abbrevite: GBK (k is "expanded" Chinese pinyin first letter)
3 specification content
Scope of application:
As a code page for non-UCS (ISO 10646), it is suitable for the processing of Chinese information.
Forward, store, appear, input, and output.
Word:
All Chinese characters of GB2312-80, non-Chinese characters.
Other CJK Chinese characters in GB13000.1.
The above two totaled 20,902 GB chemical Chinese characters.
52 Chinese characters of GB 13000.1 have not yet incorporated in the Summary of the Simplified Words; ie, GBK can not only include
All seven thousand Chinese characters of "New Modern Chinese General Words" can also contain all the simplifications in the "Simplified Word Troubleshoot"
Words and their corresponding traditional characters.
"Kangxi Dictionary" and "Dangs" have not yet incorporated in GB 13000.1, a total of 28 of the important components.
13 Chinese character structures.
"BIG5" is not included in GB2312-80, a graphic symbol exists in ISO 10646.1
139.
Formal income with tone pinyin letters 30 and ɑ, (press GB 12345-90 printing).
Chinese characters "O" (GB13001.1 code 0x3007 "zero").
The vertical punctuation symbols encoded in GB 12345-90, but there is no income in the UCS.
21 Chinese characters from ISO 10646.1 / GB 13000.1 Pick out from the CJK compatible area to ensure a number
BIG5 (TCACNS11643) file, JIS file and IBM file are not lost in two-way round-trip conversion.
31 IBM OS / 2 dedicated symbols, all ISO 10646.1 / GB 13000.1 has income all income
Or agree.
Chinese characters
The Chinese characters of GB2312-80 are still in accordance with the original i-level words, II levels, according to Pinyin, Ministry / Steel
Column.
Other CJK Chinese characters of GB13000.1, in order of the UCS code size.
The 80 Chinese characters, ministries, and the two types of words are separated from the above two types of words, and press the Kangxi Dictionary. Separate
arrangement.
Code allocation (omitted)
The overall 8140-FeFe rectangular area is removed, and the XX7F line is removed, a total of 2,3940 code bits.
Chinese zone: 21008 code bits. GB2312-80 Chinese zone B0A1-F7FE, 6768 code bits, 6763 Han
Word; GB13000.1 Expand the rectangular area of 8140-A0FE in the Chinese character zone, remove XX7F, 6080 code bits; AA40-Fea0,
Remove XX7F, 8160 code bits, of which 21 CJK compatible Chinese characters are encoded in FD9C-FE4F; 80 additions
Chinese characters / mini / components are in Fe50-Fea0.
Graphics symbol area: 1038 code bits. GB2312-80 Non-Chinese Zone A1A1-A9FE, 846 code bits, in addition to the original standard characters, which also: 10 lowercase Rome numbers "supplemented in A2A1-A2AA, 30 tones
The sound alphabet and ɑ, ɡ schedule are arranged between the A6E0-A6F5 in A8A1-A8C0 and 19 vertical symbols. GB13000.1
Expansion of non-Chinese zones A840-A9A0, remove XX7F, 192 code bits, BIG5 non-Chinese characters, structures and "O"
Arrange in this area.
User custom district: 1894 code bits, rectangular area of AAA1-AffE, 564 code bits; Moment of F8a1-Fefe
Shape, 658 code bits, the rectangular area of A140-A7A0, 672 code bits (excluded XX7F).
Correspondence between GBK and GB 13000.1
All characters in the Chinese zone and graphics symbol area correspond to the character of GB 13000.1 has encoded.
52 additional Chinese characters, 28 units / components, and 13 structures should temporarily correspond to GB 13000.1
Special Zone (Private USE A, E000-F8FE), if these characters officially incorporated by ISO 10646 / GB
13000, this specification will work
Pinyin letter with tone corresponds to the Latin encoded character of a_zone in GB 13000.1; with GB
13000.1 The two letters cannot be corresponding to the SC2 / WG2 application code.
GBK's glyph
The GBK shape is consistent with ISO 10646.1 / GB 13000.1.
In the total framework of CJK Chinese characters, choose the "non-heavy code" Chinese character shape after "no heavy code"?