CodePage definition and history
Charcter Code refers to the internal code used to represent characters. The reader uses the internal code when entering and storeing documents, and the internal code is divided into
Single-byte internal code - SINGLE-BYTE Character Sets (SBCS), you can support 256 characters - Double-Byte Character Sets (DBCS), can support 65,000 character encodings. Mainly used Code the oriental text of the big character set.
CodePage refers to a list of characters that are selected in a specific order. For the language of the early single-byte internal code, the internal code order in the CodePage enables the system to be used in this list according to the input value of the keyboard. Corresponding internal code. For double-byte internal code, it is given to the corresponding table of Multibyte to Unicode so that the characters stored in the Unicode form into the corresponding character, or in turn, in the Linux core The function is UTF8_MBTOWC and UTF8_WCTOMB.
Before 1980, there is still no international standards such as ISO-8859 or Unicode to define how to expand US-ASCII encoding for non-English countries. Many IT vendors invented their own coding, and used the number of difficult memory. Identification:
For example, 936 represents Simplified Chinese. 950 represents traditional Chinese.
1.1 CJK CodePage
Both Extended Unix Coding (EUC) coding, all of the Far East CodePage uses the C1 control code {= 80 .. = 9f} as the first byte, use the ASCII value {= 40 .. = 7e { For the second byte, it can contain up to tens of thousands of double-byte characters, indicating that the ASCII value of less than 3F in this encoding does not necessarily represent the ASCII character.
CP932
Shift-jis includes Japanese Charset Jis X 0201 (one byte each character) and JIS X 0208 (two bytes per character), so JIS X 0201 Pacific name contains one byte half-wide character, whose remaining 60 bytes are used to do 7076 Chinese characters and 648 other full wide characters. The EUC-JP coding is different, Shift-Jis does not include 5802 Chinese characters defined in JIS X 202.
CP936
GBK expands EUC-CN coding (GB 2312-80 encoding, including 6763 Chinese characters) to the 20902 Chinese characters defined in Unicode (GB13000.1-93), China's mainland is used in Simplified Chinese ZH_CN.
CP949
UnifiedHangul (UHC) is a supercoming Korean Euc-KR code (KS C 5601-1992 encoding, including 2350 Korean, 4888 Chinese character a), including 8822 additional Korean teshes (in C1)
CP950
Instead of the BIG5 encoding (13072 Traditional EN_TW Chinese words) of Euc-TW (CNS 11643-1992) Traditional Chinese, these definitions are found in the cjk.inf of Ken Lunde or in the Unicode coding table.
Note: Microsoft uses four CodePage, so the above CodePage must be used when accessing the Microsoft file system.
1.2 IBM Far East Language CodePage
IBM's CodePage is divided into SBCS and DBCS:
IBM SBCS CodePage
37 (English) * 290 (Japanese) * 836 (Simplified Chinese) * 891 (Korean) 897 (Japanese) 903 (Simplified Chinese) 904 (Traditional Chinese) IBM DBCS CodePage
300 (Japanese) * 301 (Japanese) * 835 (Traditional Chinese) * 837 (Simplified Chinese) * 926 (Korean) 927 (Traditional Chinese) 928 (Simplified Chinese) Mix the CodePage of SBCS and DBCS It is: IBM MBCS CodePage
930 (Japanese) (CodePage 300 plus 290) * 932 (CodePage 301 plus 897) 933 (Korean) (CodePage 834) * 934 (Korean) (CodePage 926 plus 891) 938 (CodePage 927) Add 904) 936 (Simplified Chinese) (CODEPAGE 928 plus 903) 5031 (Simplified Chinese) (CODEPAGE 837 plus 836) * 5033 (CODEPAGE 835 plus 37) ** Representative adopted EBCDIC coding format, Mircosoft CJK CodePage comes from IBM's CodePage.
2. The role of CodePage under Linux
Introducing the support for CODEPAGE under Linux mainly to access the multilingual file name under FAT / VFAT / FAT32 / NTFS / NCPFS, and Unicode is used in the file system under NTFS and FAT32 / VFAT, This requires the system to dynamically convert it to the corresponding language coding when reading these file names. Therefore, NLS support is introduced. The corresponding program file is under / usr / src / linux / fs / nls:
Config.inMakefilenls_base.cnls_cp437.cnls_cp737.cnls_cp775.cnls_cp850.cnls_cp852.cnls_cp855.cnls_cp857.cnls_cp860.cnls_cp861.cnls_cp862.cnls_cp863.cnls_cp864.cnls_cp865.cnls_cp866.cnls_cp869.cnls_cp874.cnls_cp936.cnls_cp950.cnls_iso8859-1.cnls_iso8859-15.cnls_iso8859- 2.cnls_iso8859-3.cnls_iso8859-4.cnls_iso8859-5.cnls_iso8859-6.cnls_iso8859-8.cnls_iso8859-9.cnls_koi8-rc
Really implemented the following functions:
extern int utf8_mbtowc (__ u16 *, const __u8 *, int); extern int utf8_mbstowcs (__ u16 *, const __u8 *, int); extern int utf8_wctomb (__ u8 *, __u16, int); extern int utf8_wcstombs (__ u8 *, const __u16 * Int);
This will be set to set the CODEPAGE with the following parameters when loading the appropriate file system:
For CodePage 437
Mount -t vfat / dev / hda1 / mnt / 1 -o codepage = 437, ocharset = cp437
This way, you can access the long text names of different languages under Linux.
3. Codepage supported under Linux
NLS CodePage 437 - United States / Canadian English NLS CodePage 737 - Greek NLS CodePage 775 - Baltic NLS CodePage 850 - Some characters in Western Europe (German, Spanish, Italian) NLS CodePage 852 - Latin 2 Includes Middle Eastern Europe (Albanian, Croatian, Czech, English, Finnish, Hungarian, Irish, German, Polish, Roman Lay, Serbian, Slovak, Slovenian, Sorbian) NLS CodePage 855 - Slavic NLS CodePage 857 - Turkish NLS CodePage 860 - Portuguese NLS CodePage 861 - Icelandic NLS CodePage 862 - Hebrew NLS CodePage 863 - Canadian NLS CodePage 864 - Arabic NLS CodePage 865 - Germanian NLS CodePage 866 - Slavic / Russian NLS CodePage 869 - Greek (2) NLS CodePage 874 - Thai NLS CodePage 936 - 简体 中文 GBK NLS CodePage 950 - Traditional Chinese Big5 NLS ISO8859 -1 - Western Europe (Albanian, Spain Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Garrica, Irish, Icelandic, Italian, Norway Language, Portuguese, Switzerland.) This simultaneously applies to American English. NLS ISO8859-2 - Latin 2 Character Set, Slavic China Eubiology (Czech, German, Hungarian, Polish, Romanian, Croatian, Slovakia Language, Slovenian) NLS ISO8859-3 - Latin 3 Character Set, (Striry, Garrica, Malta, Turkish) NLS ISO8859-4 - Latin 4 Character Set, (Ai Salian, Latvian, Lithuanian), is the pre-sequence of Latin 6 character sets NLS ISO8859-5 - Slavic (Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian) Generally recommended KOI8-R CodePage NLS ISO8859-6 - Arabic. NLS ISO8859-7 - Modern Greek NLS ISO8859-8 - Hebrew NLS ISO8859-9 - Latin 5 Character set, (removed some Icelandic characters used in Latin 1) to Turkish characters NLS ISO8859-10 - Latin 6 Character Set, (due to Newt (Greenland), Samos Island, etc. NLS ISO8859-15 - Latin 9 character set, is Latin 1 character set Update version, remove some unused characters, add support for Eric, French and Finnish part, increase the default support for Euro characters) NLS KOI8-R - Russian 4. Simplified Chinese GBK / Traditional Chinese Big5 CodePage
How to make Simplified Chinese GBK / Traditional Chinese BIG5 CodePage?
The Unicode definition of GBK / BIG5 is obtained from the Unicode organization. Since GBK is based on ISO 10646-1: 1993 standard, the corresponding Japanese is JIS X 0221-1995, Korean is KS C 5700-1995, they are submitted to Unicode standard The timetable is: UNICODE VERSION 1.0Unicode version 1.1 <-> ISO 10646-1: 1993, JIS X 0221-1995, GB 13000.1-93Unicode version 2.0 <-> KS C 5700-1995 starting with GBK coding from Windows 95. You need cp936.txt and big5.txt and then use the following program to convert it to the Unicode <-> GBK code table for Linux core ./Genmap big5.txt | Perl uni2big5.pl ./gembaMap cp936.txt | Perl uni2gbk.pl and modify the related functions of FAT / VFAT / NTFS to complete the core modification. You can use the following command when used: Simplified Chinese: mount -t vfat / dev / hda1 / mnt / 1-t CodePage = 936, IOCHARSET = CP936 Traditional Chinese: mount -t vfat / dev / hda1 / mnt / 1 -o codePage = 950, IOCHARSET = CP936 is interesting, because GBK contains all GB2312 / BIG5 / JIS's internal code, So use 936 CodePage or display BIG5 file name.
5. Appendix
5.1 Author and related documents
Making the CodePage950 supporting Mr. Cosmos in Taiwan, homepage http://www.cis.nctu.edu.tw:8080/~is84086/project/kernel_cp950/
Making GBK's CP936 supporting TurboLinux's Chinese R & D team Fang Han and Chen Xiangyang
5.2 genmap
#! / bin / shcat $ 1 | awk '{ix (INDEX ($ 1, "#") == 0) Print $ 0}' | awk 'begin {fs = "0x"} {Print $ 2 $ 3}' | awk '{ IF (Length ($ 1) == Length ($ 2)) Print $ 1, $ 2} '
5.3 uni2big5.pl
#! / usr / bin / perl @ code = ("00", "01", "02", "03", "04", "05", "06", "07", "08", "09 "" 0a "," 0b "," 0c "," 0d "," 0e "," 0f "," 10 "," 11 "," 12 "," 13 "," 14 "," 15 ", "16", "17", "18", "19", "1a", "1b", "1c", "1d", "1e", "1f", "20", "21", "22 "" 23 "," 24 "," 25 "," 29 "," 2A "," 2b "," 2C "," 2D "," 2E ", "2F", "30", "31", "32", "33", "37", "38", "39", "3A", "3B "," 3C "," 3D "," 3e "," 3F "," 40 "," 44 "," 45 "," 46 "," 47 ", "48", "49", "4a", "4b", "4c", "4D", "4E", "4F", "50", "51", "52", "53", "54 "" 55 "," 56 "," 57 "," 58 "," 59 "," 5A "," 5b "," 5C "," 5D "," 5E "," 5F "," 60 ", "61", "62", "63", "64", "65", "66", "67", "68", "69", "6A", "6b", "6C", "6D "," 6e "," 6f "," 70 "," 71 "," 72 "
"73", "74", "75", "76", "77", "78", "79", "7A", "7b", "7C", "7D", "7e", " 7f "," 80 "," 81 "," 82 "," 83 "," 84 "," 85 "," 86 "," 87 "," 88 "," 89 "," 8A "," 8B " "8C", "8D", "8e", "8f", "90", "91", "92", "93", "94", "95", "96", "97", " 98 "," 99 "," 9a "," 9b "," 9c "," 9d "," 9e "," 9f "," A0 "," A1 "," A2 "," A3 "," A4 " , "A5", "A6", "A7", "A8", "A9", "AA", "AB", "AC", "AD", "AE", "AF", "B0", " B1 "," B2 "," B3 "," B4 "," B5 "," B6 "," BA "," BB "," BC "," BD " "BE", "BF", "C0", "C1", "C2", "C3", "C7", "C8", "C9", " Ca "," CB "," CC "," CD "," CE "," CF "," D0 "," D1 "," D2 "," D6 "," D5 "," D6 " , "D7", "D8", "D9", "DA", "DB", "DC", "DD", "DE", "DF", "E0", "E1", "E2", " E3 "," E4 "," E5 "," E6 "," E7 ","
E8 "," E9 "," EA "," EB "," EC "," ED "," EE "," EF "," F0 "," F1 "," F2 "," F3 "," F4 " , "F5", "F6", "F7", "F8", "F9", "FB", "FC", "FD", "FE", "FF"); while (
$ unicode = "0000"); Print "/ N / t" IF ($ low% 4 == 0); Print "/ * $ code [$ high] $ code [$]] * // n / t "IF ($ low% 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) /; printf (" {0x% 2S, 0x % 2S}, "$ ULOW, $ uhigh);}} print" / n}; / n / n "; for ($ high = 1; $ high <= 255; $ high ) {if (Defined $ table { $ code [$ high]}) {print "static unsigned char points [$ high] / [512 /] = {/ n / t"; for ($ low = 0; $ low <= 255; $ low ) {$ BIG5 = $ table {$ code [$ low]}; $ big5 = "3f3f" if (! (defined $ big5)); if ($ low> 0 && $ low% 4 == 0) {Printf ("/ * 0x% 02x-0x% 02x * // n / t", $ low-4, $ low-1);} print "/ n / t" if ($ low == 0x80); ($ BHigh, $ blow) = $ BIG5 = ~ /(..) (..) ("0x% 2S, 0x% 2S,", $ BHIGH, $ blow);} Print "/ * 0xFC-0xFF * // n}; / n / n";}}} print "static unsigned char * page_uni2charset [256] = {"; for ($ high = 0; $ high <= 255; $ HIGH ) {Print "/ n / t" IF ($ high% 8 == 0); if ($ high> 0 && defined $ table {$ code [$ high]}) {print "Page $ code [$ high], "} else {print" null, ";}} print << EOF;}; static unsigned char charset2upper [256] =
{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, / * 0x00-0x07 * / 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, / * 0x08-0x0f * / 0x10, 0x11 , 0x12, 0x13, 0x14, 0x15, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, / * 0x18-0x1f * / 0x20, 0x21, 0x22, 0x23 , 0x24, 0x25, 0x26, 0x27 * / 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, / * 0x28-0x2f * / 0x30, 0x31, 0x32, 0x33, 0x34, 0x35 0x36, 0x37, / 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, / * 0x38-0x3f * / 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47 , / * 0x40-0X47 * / 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, / * 0x48-0x4f * / 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, / * 0x50 -0x57 * / 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5E, 0x5f, / * 0x58-0x5f * / 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x60-0x67 * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x68-0X6F * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x70-0x77 * / 0x00, 0x00, 0x7d, 0x7E, 0x7f, / * 0x78-0x7f * / 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, / * 0x80-0X87 * / 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, / * 0x88-0x8f * / 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, / * 0x90-0x97 * / 0x98, 0x99, 0x9a, 0x00, 0x9c, 0x00, 0x00, 0x00, / * 0x98-0x9f * / 0x00, 0x00, 0x00, 0x00, 0xa4, 0xa5, 0xA6, 0xA7, / * 0xa0-0xa7 * / 0xa8, 0xa9, 0xaa, 0xAb, 0xAc, 0xAD, 0xAE, 0xAF, / * 0xA8-0XAF * / 0XB0, 0XB1, 0XB2, 0XB3, 0XB4, 0XB5, 0XB6, 0XB7, / * 0xB0-0XB7 * / 0xB8, 0XB9, 0XBA, 0XBB, 0XBC, 0XBD, 0XBE, 0XBF, / * 0xB8-0XBF * / 0XC0, 0XC1, 0XC2, 0XC3, 0XC4, 0XC5,
0xc6, 0xc7, / * 0xc0-0xc7 * / 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, / * 0xc8-0XCF * / 0xD0, 0xD1, 0xD2, 0x00, 0x00, / * 0xD0-0XD7 * / 0x00, 0xD9, 0x00, 0x00, 0xDF, / * 0xD8-0XDF * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xe0 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xE8-0XEF * / 0xF0, 0xF1, 0x00, 0x00, 0x00, 0xF5, 0x00, 0xF7, / * 0xF0-0xf7 * / 0xF8 , 0xf9, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, / * 0xf8-0xff * /}; static void inc_use_count (void) {MOD_INC_USE_COUNT;} static void dec_use_count (void) {MOD_DEC_USE_COUNT;} static struct nls_table table = { "cp950", page_uni2charset, charset2uni, inc_use_count, dec_use_count, NULL}; int init_nls_cp950 (void) {return register_nls ();} # ifdef MODULEint init_module (void) {return init_nls_cp950 ();} void cleanup_module (void) {unregister_nls () Return;} # ENDIF / * * OV errides for Emacs so that we follow Linus's tabbing style. * Emacs will notice this stuff at the end of the file and automatically * adjust the settings for this buffer only. This must remain at the end * of the file. * ---- -------------------------------------------------- --------------------- * local variables: * c-indent-level: 8 * c-brace-imaginary-offset: 0 * c-brace-offset: -8 * c-argDecl-indecd: 8 * c-label-offset: -8 * c-continued-statement-offset: 8 * c-continued-brace-offset: 0 * end: * / eof5.4 uni2gbk.pl
#! / usr / bin / perl @ code = ("00", "01", "02", "03", "04", "05", "06", "07", "08", "09 "" 0a "," 0b "," 0c "," 0d "," 0e "," 0f "," 10 "," 11 "," 12 "," 13 "," 14 "," 15 ", "16", "17", "18", "19", "1a", "1b", "1c", "1d", "1e", "1f", "20", "21", "22 "" 23 "," 24 "," 25 "," 29 "," 2A "," 2b "," 2C "," 2D "," 2E ", "2F", "30", "31", "32", "33", "37", "38", "39", "3A", "3B "," 3C "," 3D "," 3e "," 3F "," 40 "," 44 "," 45 "," 46 "," 47 ", "48", "49", "4a", "4b", "4c", "4D", "4E", "4F", "50", "51", "52", "53", "54 "" 55 "," 56 "," 57 "," 58 "," 59 "," 5A "," 5b "," 5C "," 5D "," 5E "," 5F "," 60 ", "61", "62", "63", "64", "65", "66", "67", "68", "69", "6A", "6b", "6C", "6D "," 6e "," 6f "," 70 "," 71 "," 72 "
"73", "74", "75", "76", "77", "78", "79", "7A", "7b", "7C", "7D", "7e", " 7f "," 80 "," 81 "," 82 "," 83 "," 84 "," 85 "," 86 "," 87 "," 88 "," 89 "," 8A "," 8B " "8C", "8D", "8e", "8f", "90", "91", "92", "93", "94", "95", "96", "97", " 98 "," 99 "," 9a "," 9b "," 9c "," 9d "," 9e "," 9f "," A0 "," A1 "," A2 "," A3 "," A4 " , "A5", "A6", "A7", "A8", "A9", "AA", "AB", "AC", "AD", "AE", "AF", "B0", " B1 "," B2 "," B3 "," B4 "," B5 "," B6 "," BA "," BB "," BC "," BD " "BE", "BF", "C0", "C1", "C2", "C3", "C7", "C8", "C9", " Ca "," CB "," CC "," CD "," CE "," CF "," D0 "," D1 "," D2 "," D6 "," D5 "," D6 " , "D7", "D8", "D9", "DA", "DB", "DC", "DD", "DE", "DF", "E0", "E1", "E2", " E3 "," E4 "," E5 "," E6 "," E7 ","
E8 "," E9 "," EA "," EB "," EC "," ED "," EE "," EF "," F0 "," F1 "," F2 "," F3 "," F4 " , "F5", "F6", "F7", "F8", "F9", "FB", "FC", "FD", "FE", "FF"); while (
$ unicode = "0000"); Print "/ N / t" IF ($ low% 4 == 0); Print "/ * $ code [$ high] $ code [$]] * // n / t "IF ($ low% 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) /; printf (" {0x% 2S, 0x % 2S}, "$ ULOW, $ uhigh);}} print" / n}; / n / n "; for ($ high = 1; $ high <= 255; $ high ) {if (Defined $ table { $ code [$ high]}) {print "static unsigned char points [$ high] / [512 /] = {/ n / t"; for ($ low = 0; $ low <= 255; $ low ) {$ BIG5 = $ table {$ code [$ low]}; $ big5 = "3f3f" if (! (defined $ big5)); if ($ low> 0 && $ low% 4 == 0) {Printf ("/ * 0x% 02x-0x% 02x * // n / t", $ low-4, $ low-1);} print "/ n / t" if ($ low == 0x80); ($ BHigh, $ blow) = $ BIG5 = ~ /(..) (..) ("0x% 2S, 0x% 2S,", $ BHIGH, $ blow);} Print "/ * 0xFC-0xFF * // n}; / n / n";}}} print "static unsigned char * page_uni2charset [256] = {"; for ($ high = 0; $ high <= 255; $ HIGH ) {Print "/ n / t" IF ($ high% 8 == 0); if ($ high> 0 && defined $ table {$ code [$ high]}) {print "Page $ code [$ high], "} else {print" null, ";}} print << EOF;}; static unsigned char charset2upper [256] =
{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, / * 0x00-0x07 * / 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, / * 0x08-0x0f * / 0x10, 0x11 , 0x12, 0x13, 0x14, 0x15, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, / * 0x18-0x1f * / 0x20, 0x21, 0x22, 0x23 , 0x24, 0x25, 0x26, 0x27 * / 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, / * 0x28-0x2f * / 0x30, 0x31, 0x32, 0x33, 0x34, 0x35 0x36, 0x37, / 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, / * 0x38-0x3f * / 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47 , / * 0x40-0X47 * / 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, / * 0x48-0x4f * / 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, / * 0x50 -0x57 * / 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5E, 0x5f, / * 0x58-0x5f * / 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x60-0x67 * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x68-0X6F * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x70-0x77 * / 0x00, 0x00, 0x7d, 0x7E, 0x7f, / * 0x78-0x7f * / 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, / * 0x80-0X87 * / 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, / * 0x88-0x8f * / 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, / * 0x90-0x97 * / 0x98, 0x99, 0x9a, 0x00, 0x9c, 0x00, 0x00, 0x00, / * 0x98-0x9f * / 0x00, 0x00, 0x00, 0x00, 0xa4, 0xa5, 0xA6, 0xA7, / * 0xa0-0xa7 * / 0xa8, 0xa9, 0xaa, 0xAb, 0xAc, 0xAD, 0xAE, 0xAF, / * 0xA8-0XAF * / 0XB0, 0XB1, 0XB2, 0XB3, 0XB4, 0XB5, 0XB6, 0XB7, / * 0xB0-0XB7 * / 0xB8, 0XB9, 0XBA, 0XBB, 0XBC, 0XBD, 0XBE, 0XBF, / * 0xB8-0XBF * / 0XC0, 0XC1, 0XC2, 0XC3, 0XC4, 0XC5,
0xc6, 0xc7, / * 0xc0-0xc7 * / 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, / * 0xc8-0XCF * / 0xD0, 0xD1, 0xD2, 0x00, 0x00, / * 0xD0-0XD7 * / 0x00, 0xD9, 0x00, 0x00, 0xDF, / * 0xD8-0XDF * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xe0 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xE8-0XEF * / 0xF0, 0xF1, 0x00, 0x00, 0x00, 0xF5, 0x00, 0xF7, / * 0xF0-0xf7 * / 0xF8 , 0xf9, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, / * 0xf8-0xff * /}; static void inc_use_count (void) {MOD_INC_USE_COUNT;} static void dec_use_count (void) {MOD_DEC_USE_COUNT;} static struct nls_table table = { "cp936", page_uni2charset, charset2uni, inc_use_count, dec_use_count, NULL}; int init_nls_cp936 (void) {return register_nls ();} # ifdef MODULEint init_module (void) {return init_nls_cp936 ();} void cleanup_module (void) {unregister_nls () Return;} # ENDIF / * * OV errides for Emacs so that we follow Linus's tabbing style. * Emacs will notice this stuff at the end of the file and automatically * adjust the settings for this buffer only. This must remain at the end * of the file. * ---- -------------------------------------------------- --------------------- * local variables: * c-indent-level: 8 * c-brace-imaginary-offset: 0 * c-brace-offset: -8 * c-argDecl-indent: 8 * c-label-offset: -8 * c-Continued-statement-offset: 8 * c-continued-brace-offset: 0 * end: * / EOF5.5 Convert CodePage tool
/ * * CPI.c: a Program to Examine MSDOS CODEPAGE Files (* .cpi) * And Extract Specific CodePages. * Compiles Under Linux & DOS (Using BC 3.1). * * Compile: GCC -O CPI CPI.c * CALL : codepage file.cpi [-a | -l | nnn] * * Author: ahmed m. naas (ahmed@oea.xs4all.nl) * Many Changes: aeb@cwi.nl [Changed Until It Would Handle All * *. CPI Files People Have Sent Me; I Have No Documentation, * So All this is expenental] * Remains to do: drdos fonts. * * Copyright: public domain. * / # include
unsigned short p2 PACKED;} PrinterFontHeader; FILE * in, * out; void usage (void); int opta, optc, optl, optL, optx; extern int optind; extern char * optarg; unsigned short codepage; int main (int argc CHAR * argv []) {if (argc <2) usage (); if ((in = fopen (argv [1], "r"))) == null) {printf ("/ NUNABLE TO OPEN FILE% S ./N ", Argv [1]); exit (0);} Opta = OPTC = OPTL = OPTL = OPTX = 0; optIND = 2; if (argc == 2) OPTL = 1; Else While (1) { Switch (ARGC, Argv, "ALLC")) {CASE 'A': OPTA = 1; Continue; Case 'C': OPTC = 1; Continue; Case 'L': OPTL = 1; Continue; Case 'L ': OPTL = 1; Continue; Case'? ': Default: usage (); case -1: break; } Break;} if (Optind! = argc) {if (OptinD! = argc-1 || OPTA) USAGE () Usage (); CODEPAGE = ATOI (Argv [Optind]); Optx = 1;} if (OPTC) Handle_CodePage (0 ); else handle_fontfile (); if (optX) {Printf ("NO PAGE% D FOUND / N", CODEPAGE); EXIT (1);} fclose (in); return (0);} voidhandle_fontfile () {INT i , J; J = FREAD (, 1, Sizeof (FontfileHeader), IN); if (j! = sizeof (fontfileHeader) {Printf ("
Error Reading FontfileHeader - Got% D Chars / N ", J); EXIT (1);} if (! strcmp (fontfileHeader.ID 1," drfont ")) {Printf (" this Program Cannot Handle Drdos Font Files / N "); exit (1);} if (opTl) Printf (" FontfileHeader: ID =% 8.8s Res =% 8.8s Num =% d type =% C Offset =% ld / n / n ", fontfileHeader.ID, FontFileHeader.res, FontFileHeader.num_pointers, FontFileHeader.p_type, FontFileHeader.offset); j = fread (, 1, sizeof (FontInfoHeader), in); if (j = sizeof (FontInfoHeader!)) {printf ( "error reading FontInfoHeader - Got% D Chars / N ", J); EXIT (1);} IF (OPTL) Printf (" FontInfoHeader: Num_CODEPAGES =% D / N / N ", FontInfoHeader.Num_Codepages); for (i = fontinfoHeader.Num_codepages; i ; i -} INTHANDE_CODEPAGE (INT MORE_TO_COME) {Int J; char Outfile [20]; unsigned char * fonts; long INPOS, NEXTHDR; J = FREAD (, 1, SizeOf (CpenTryHeader), IN); if (j! = sizeof (cpensryHeader) {Printf ("Error Reading CpenTryHeader - GOTINT% D charS / n ", j); exit (1);} if (opTl) {Int T = cpensryHeader.device_type; printf (" cpenTryHeader: size =% D dev =% d [% s] name =% 8.8s / CODEPAGE =% D / N / T / TRES =% 6.6s nxt =% ld off_font =% ld / n / n ", cpensryHeader.Size, T, (t == 1)?" screen ": (t == 2 "Printer": "?"
, CPEntryHeader.device_name, CPEntryHeader.codepage, CPEntryHeader.res, CPEntryHeader.off_nexthdr, CPEntryHeader.off_font);} else if (optl) {printf ( "/ nCodepage =% d / n", CPEntryHeader.codepage); printf ( "Device =% .8s / n ", cpensryHeader.Device_name);} #if 0 if (cpensryHeader.Size! = Sizeof (cpensryhead)) {/ * seen 26 and 28, so That the Difference Below is -2 or 0 * / if (optl) printf ( "Skipping% d bytes of garbage / n", CPEntryHeader.size - sizeof (CPEntryHeader)); fseek (in, CPEntryHeader.size - sizeof (CPEntryHeader), SEEK_CUR);!} #endif if (opta && (! OPTX || CpenTryHeader.codePage! = CODEPAGE) &&! OPTC) Goto next; INPOS = FTELL (IN); if (INPOS! = CpenTryHeader.off_Font &&! OPTC) {IF (OPTL) Printf ("POS =% LD Font At% LD / N ", INPOS, Cpentr YHeader.off_Font; FSeek (in, cpensryHeader.off_font, seek_set);} j = fread (, 1, sizeof (cpinfoHeader), IF (j! = sizeof (cpinfoHeader) {Printf ("Error Reading CpinfoHeader) Got% D Chars / N ", J); EXIT (1);} if (OPTL) {Printf (" Number of Fonts =% D / N ", cpinfoHeader.Num_FONTS); Printf (" Size of Bitmap =% D / n ", cpinfoheader.size);}} (cpinfoHeader.Num_fonts == 0) goto next; if (optc) return 0; sprintf (Outfile,"% D.cp ", cpensryHeader.codepage); if ((out =
FOPEN (OUTFILE, "W")) == NULL) {Printf ("/ NUNABLE TO OPEN FILE% s. / N", OUTFILE); EXIT (1);} else printf ("/ nwriting% s / n", Outfile; fonts = (unsigned char *) malloc (cpinfoheader.size); FREAD (FONTS, CPINFOHEADER.SIZE, 1, IN); FWRITE (, sizeof (cpensryHeader), 1, out); fwrite (, sizeof (cpinfoHeader) , 1, out; j = fwrite (fonts, 1, cpinfoheader.size, out); if (j! = CpinfoHeader.size) {Printf ("Error Writing% S - Wrote% D Chars / N", Outfile, J EXIT (1);} fclose (out); FREE (FONTS); if (OPTX) EXIT (0); Next: / * * It see That if entry headers and fonts are interspers, * the nextdr Will Point Past T Font, Regardless of * WHether More Entries, First All Entry Headers Are Given, And Then * All fonts; in this case, * / nexthdr = CpenTryHeader.off_nexthdr; if (NEXTHDR == 0 || NEXTHDR == -1) {IF (more_to_come) {Printf ("Mode CodePages Expected, But Nextdr =% LD / N", NextDR); EXIT (1);} Else Return 1;} INPOS = FTELL (IN); if (INPOS! = cpensryHeader.off_nexthdr) {if (OPTL) Printf ("POS =% LD NEXTHDR AT% LD / N", INPOS, NEXTHDR); IF (Opta &&! More_to_come) {Printf ("No more code pages, but nextdr! = 0 / n"); Return 1;