[ZT] CodePage Introduction

xiaoxiao2021-03-06  76

CodePage definition and history

Charcter Code refers to the internal code used to represent characters. The reader uses the internal code when entering and storeing documents, and the internal code is divided into

Single-byte internal code - SINGLE-BYTE Character Sets (SBCS), you can support 256 characters - Double-Byte Character Sets (DBCS), can support 65,000 character encodings. Mainly used Code the oriental text of the big character set.

CodePage refers to a list of characters that are selected in a specific order. For the language of the early single-byte internal code, the internal code order in the CodePage enables the system to be used in this list according to the input value of the keyboard. Corresponding internal code. For double-byte internal code, it is given to the corresponding table of Multibyte to Unicode so that the characters stored in the Unicode form into the corresponding character, or in turn, in the Linux core The function is UTF8_MBTOWC and UTF8_WCTOMB.

Before 1980, there is still no international standards such as ISO-8859 or Unicode to define how to expand US-ASCII encoding for non-English countries. Many IT vendors invented their own coding, and used the number of difficult memory. Identification:

For example, 936 represents Simplified Chinese. 950 represents traditional Chinese.

1.1 CJK CodePage

Both Extended Unix Coding (EUC) coding, all of the Far East CodePage uses the C1 control code {= 80 .. = 9f} as the first byte, use the ASCII value {= 40 .. = 7e { For the second byte, it can contain up to tens of thousands of double-byte characters, indicating that the ASCII value of less than 3F in this encoding does not necessarily represent the ASCII character.

CP932

Shift-jis includes Japanese Charset Jis X 0201 (one byte each character) and JIS X 0208 (two bytes per character), so JIS X 0201 Pacific name contains one byte half-wide character, whose remaining 60 bytes are used to do 7076 Chinese characters and 648 other full wide characters. The EUC-JP coding is different, Shift-Jis does not include 5802 Chinese characters defined in JIS X 202.

CP936

GBK expands EUC-CN coding (GB 2312-80 encoding, including 6763 Chinese characters) to the 20902 Chinese characters defined in Unicode (GB13000.1-93), China's mainland is used in Simplified Chinese ZH_CN.

CP949

UnifiedHangul (UHC) is a supercoming Korean Euc-KR code (KS C 5601-1992 encoding, including 2350 Korean, 4888 Chinese character a), including 8822 additional Korean teshes (in C1)

CP950

Instead of the BIG5 encoding (13072 Traditional EN_TW Chinese words) of Euc-TW (CNS 11643-1992) Traditional Chinese, these definitions are found in the cjk.inf of Ken Lunde or in the Unicode coding table.

Note: Microsoft uses four CodePage, so the above CodePage must be used when accessing the Microsoft file system.

1.2 IBM Far East Language CodePage

IBM's CodePage is divided into SBCS and DBCS:

IBM SBCS CodePage

37 (English) * 290 (Japanese) * 836 (Simplified Chinese) * 891 (Korean) 897 (Japanese) 903 (Simplified Chinese) 904 (Traditional Chinese) IBM DBCS CodePage

300 (Japanese) * 301 (Japanese) * 835 (Traditional Chinese) * 837 (Simplified Chinese) * 926 (Korean) 927 (Traditional Chinese) 928 (Simplified Chinese) Mix the CodePage of SBCS and DBCS It is: IBM MBCS CodePage

930 (Japanese) (CodePage 300 plus 290) * 932 (CodePage 301 plus 897) 933 (Korean) (CodePage 834) * 934 (Korean) (CodePage 926 plus 891) 938 (CodePage 927) Add 904) 936 (Simplified Chinese) (CODEPAGE 928 plus 903) 5031 (Simplified Chinese) (CODEPAGE 837 plus 836) * 5033 (CODEPAGE 835 plus 37) ** Representative adopted EBCDIC coding format, Mircosoft CJK CodePage comes from IBM's CodePage.

2. The role of CodePage under Linux

Introducing the support for CODEPAGE under Linux mainly to access the multilingual file name under FAT / VFAT / FAT32 / NTFS / NCPFS, and Unicode is used in the file system under NTFS and FAT32 / VFAT, This requires the system to dynamically convert it to the corresponding language coding when reading these file names. Therefore, NLS support is introduced. The corresponding program file is under / usr / src / linux / fs / nls:

Config.inMakefilenls_base.cnls_cp437.cnls_cp737.cnls_cp775.cnls_cp850.cnls_cp852.cnls_cp855.cnls_cp857.cnls_cp860.cnls_cp861.cnls_cp862.cnls_cp863.cnls_cp864.cnls_cp865.cnls_cp866.cnls_cp869.cnls_cp874.cnls_cp936.cnls_cp950.cnls_iso8859-1.cnls_iso8859-15.cnls_iso8859- 2.cnls_iso8859-3.cnls_iso8859-4.cnls_iso8859-5.cnls_iso8859-6.cnls_iso8859-8.cnls_iso8859-9.cnls_koi8-rc

Really implemented the following functions:

extern int utf8_mbtowc (__ u16 *, const __u8 *, int); extern int utf8_mbstowcs (__ u16 *, const __u8 *, int); extern int utf8_wctomb (__ u8 *, __u16, int); extern int utf8_wcstombs (__ u8 *, const __u16 * Int);

This will be set to set the CODEPAGE with the following parameters when loading the appropriate file system:

For CodePage 437

Mount -t vfat / dev / hda1 / mnt / 1 -o codepage = 437, ocharset = cp437

This way, you can access the long text names of different languages ​​under Linux.

3. Codepage supported under Linux

NLS CodePage 437 - United States / Canadian English NLS CodePage 737 - Greek NLS CodePage 775 - Baltic NLS CodePage 850 - Some characters in Western Europe (German, Spanish, Italian) NLS CodePage 852 - Latin 2 Includes Middle Eastern Europe (Albanian, Croatian, Czech, English, Finnish, Hungarian, Irish, German, Polish, Roman Lay, Serbian, Slovak, Slovenian, Sorbian) NLS CodePage 855 - Slavic NLS CodePage 857 - Turkish NLS CodePage 860 - Portuguese NLS CodePage 861 - Icelandic NLS CodePage 862 - Hebrew NLS CodePage 863 - Canadian NLS CodePage 864 - Arabic NLS CodePage 865 - Germanian NLS CodePage 866 - Slavic / Russian NLS CodePage 869 - Greek (2) NLS CodePage 874 - Thai NLS CodePage 936 - 简体 中文 GBK NLS CodePage 950 - Traditional Chinese Big5 NLS ISO8859 -1 - Western Europe (Albanian, Spain Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Garrica, Irish, Icelandic, Italian, Norway Language, Portuguese, Switzerland.) This simultaneously applies to American English. NLS ISO8859-2 - Latin 2 Character Set, Slavic China Eubiology (Czech, German, Hungarian, Polish, Romanian, Croatian, Slovakia Language, Slovenian) NLS ISO8859-3 - Latin 3 Character Set, (Striry, Garrica, Malta, Turkish) NLS ISO8859-4 - Latin 4 Character Set, (Ai Salian, Latvian, Lithuanian), is the pre-sequence of Latin 6 character sets NLS ISO8859-5 - Slavic (Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian) Generally recommended KOI8-R CodePage NLS ISO8859-6 - Arabic. NLS ISO8859-7 - Modern Greek NLS ISO8859-8 - Hebrew NLS ISO8859-9 - Latin 5 Character set, (removed some Icelandic characters used in Latin 1) to Turkish characters NLS ISO8859-10 - Latin 6 Character Set, (due to Newt (Greenland), Samos Island, etc. NLS ISO8859-15 - Latin 9 character set, is Latin 1 character set Update version, remove some unused characters, add support for Eric, French and Finnish part, increase the default support for Euro characters) NLS KOI8-R - Russian 4. Simplified Chinese GBK / Traditional Chinese Big5 CodePage

How to make Simplified Chinese GBK / Traditional Chinese BIG5 CodePage?

The Unicode definition of GBK / BIG5 is obtained from the Unicode organization. Since GBK is based on ISO 10646-1: 1993 standard, the corresponding Japanese is JIS X 0221-1995, Korean is KS C 5700-1995, they are submitted to Unicode standard The timetable is: UNICODE VERSION 1.0Unicode version 1.1 <-> ISO 10646-1: 1993, JIS X 0221-1995, GB 13000.1-93Unicode version 2.0 <-> KS C 5700-1995 starting with GBK coding from Windows 95. You need cp936.txt and big5.txt and then use the following program to convert it to the Unicode <-> GBK code table for Linux core ./Genmap big5.txt | Perl uni2big5.pl ./gembaMap cp936.txt | Perl uni2gbk.pl and modify the related functions of FAT / VFAT / NTFS to complete the core modification. You can use the following command when used: Simplified Chinese: mount -t vfat / dev / hda1 / mnt / 1-t CodePage = 936, IOCHARSET = CP936 Traditional Chinese: mount -t vfat / dev / hda1 / mnt / 1 -o codePage = 950, IOCHARSET = CP936 is interesting, because GBK contains all GB2312 / BIG5 / JIS's internal code, So use 936 CodePage or display BIG5 file name.

5. Appendix

5.1 Author and related documents

Making the CodePage950 supporting Mr. Cosmos in Taiwan, homepage http://www.cis.nctu.edu.tw:8080/~is84086/project/kernel_cp950/

Making GBK's CP936 supporting TurboLinux's Chinese R & D team Fang Han and Chen Xiangyang

5.2 genmap

#! / bin / shcat $ 1 | awk '{ix (INDEX ($ 1, "#") == 0) Print $ 0}' | awk 'begin {fs = "0x"} {Print $ 2 $ 3}' | awk '{ IF (Length ($ 1) == Length ($ 2)) Print $ 1, $ 2} '

5.3 uni2big5.pl

#! / usr / bin / perl @ code = ("00", "01", "02", "03", "04", "05", "06", "07", "08", "09 "" 0a "," 0b "," 0c "," 0d "," 0e "," 0f "," 10 "," 11 "," 12 "," 13 "," 14 "," 15 ", "16", "17", "18", "19", "1a", "1b", "1c", "1d", "1e", "1f", "20", "21", "22 "" 23 "," 24 "," 25 "," 29 "," 2A "," 2b "," 2C "," 2D "," 2E ", "2F", "30", "31", "32", "33", "37", "38", "39", "3A", "3B "," 3C "," 3D "," 3e "," 3F "," 40 "," 44 "," 45 "," 46 "," 47 ", "48", "49", "4a", "4b", "4c", "4D", "4E", "4F", "50", "51", "52", "53", "54 "" 55 "," 56 "," 57 "," 58 "," 59 "," 5A "," 5b "," 5C "," 5D "," 5E "," 5F "," 60 ", "61", "62", "63", "64", "65", "66", "67", "68", "69", "6A", "6b", "6C", "6D "," 6e "," 6f "," 70 "," 71 "," 72 "

"73", "74", "75", "76", "77", "78", "79", "7A", "7b", "7C", "7D", "7e", " 7f "," 80 "," 81 "," 82 "," 83 "," 84 "," 85 "," 86 "," 87 "," 88 "," 89 "," 8A "," 8B " "8C", "8D", "8e", "8f", "90", "91", "92", "93", "94", "95", "96", "97", " 98 "," 99 "," 9a "," 9b "," 9c "," 9d "," 9e "," 9f "," A0 "," A1 "," A2 "," A3 "," A4 " , "A5", "A6", "A7", "A8", "A9", "AA", "AB", "AC", "AD", "AE", "AF", "B0", " B1 "," B2 "," B3 "," B4 "," B5 "," B6 "," BA "," BB "," BC "," BD " "BE", "BF", "C0", "C1", "C2", "C3", "C7", "C8", "C9", " Ca "," CB "," CC "," CD "," CE "," CF "," D0 "," D1 "," D2 "," D6 "," D5 "," D6 " , "D7", "D8", "D9", "DA", "DB", "DC", "DD", "DE", "DF", "E0", "E1", "E2", " E3 "," E4 "," E5 "," E6 "," E7 ","

E8 "," E9 "," EA "," EB "," EC "," ED "," EE "," EF "," F0 "," F1 "," F2 "," F3 "," F4 " , "F5", "F6", "F7", "F8", "F9", "FB", "FC", "FD", "FE", "FF"); while () {($ Unicode, $ BIG5) = split; ($ high, $ low) = $ unicode = ~ /(..) (..) ($ TABLE2 {$ high} {$ low} = $ BIG5; ($ high, $ low) = $ big5 = ~ /(..) (..) ($ TABLE {$ high} {$ low} = $ unicode;} print << EOF; / * * Linux / fs / nls_cp874.c * * charset cp874 translation tables. * Generated automatically from the Unicode and charset * tables from the Unicode Organization (www.unicode.org). * The Unicode to charset table has only exact mappings. * / # include #include #include #include / * a1 - f9 * / static struct nls_unicode charset2Uni [(0xf9-0xa1 1) * (0x100-0X60)] = {EOFFOR ($ high = 0xA1; $ high <= 0xf9; $ high ) {for ($ low = 0x40; $ low <= 0x7f; $ low ) {$ unicode = $ table2 {$ Code [$ high]} {$ code [$ low]}; $ unicode = "0000" if (! (Defined) $ unicode)); Print "/ n / t" IF ($ low% 4 == 0); print "/ * $ code [$ high] $ code [$ low] * // n / t" if ($ low) % 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) ("{0x% 2S, 0x% 2S}, $ ULOW, $ uHighh );} for ($ low = 0xA0; $ low <= 0xff; $ low ) {$ unicode = $ table2 {$ code [$ limited]} {$ code [$ low]};

$ unicode = "0000"); Print "/ N / t" IF ($ low% 4 == 0); Print "/ * $ code [$ high] $ code [$]] * // n / t "IF ($ low% 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) /; printf (" {0x% 2S, 0x % 2S}, "$ ULOW, $ uhigh);}} print" / n}; / n / n "; for ($ high = 1; $ high <= 255; $ high ) {if (Defined $ table { $ code [$ high]}) {print "static unsigned char points [$ high] / [512 /] = {/ n / t"; for ($ low = 0; $ low <= 255; $ low ) {$ BIG5 = $ table {$ code [$ low]}; $ big5 = "3f3f" if (! (defined $ big5)); if ($ low> 0 && $ low% 4 == 0) {Printf ("/ * 0x% 02x-0x% 02x * // n / t", $ low-4, $ low-1);} print "/ n / t" if ($ low == 0x80); ($ BHigh, $ blow) = $ BIG5 = ~ /(..) (..) ("0x% 2S, 0x% 2S,", $ BHIGH, $ blow);} Print "/ * 0xFC-0xFF * // n}; / n / n";}}} print "static unsigned char * page_uni2charset [256] = {"; for ($ high = 0; $ high <= 255; $ HIGH ) {Print "/ n / t" IF ($ high% 8 == 0); if ($ high> 0 && defined $ table {$ code [$ high]}) {print "Page $ code [$ high], "} else {print" null, ";}} print << EOF;}; static unsigned char charset2upper [256] =

{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, / * 0x00-0x07 * / 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, / * 0x08-0x0f * / 0x10, 0x11 , 0x12, 0x13, 0x14, 0x15, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, / * 0x18-0x1f * / 0x20, 0x21, 0x22, 0x23 , 0x24, 0x25, 0x26, 0x27 * / 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, / * 0x28-0x2f * / 0x30, 0x31, 0x32, 0x33, 0x34, 0x35 0x36, 0x37, / 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, / * 0x38-0x3f * / 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47 , / * 0x40-0X47 * / 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, / * 0x48-0x4f * / 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, / * 0x50 -0x57 * / 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5E, 0x5f, / * 0x58-0x5f * / 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x60-0x67 * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x68-0X6F * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x70-0x77 * / 0x00, 0x00, 0x7d, 0x7E, 0x7f, / * 0x78-0x7f * / 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, / * 0x80-0X87 * / 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, / * 0x88-0x8f * / 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, / * 0x90-0x97 * / 0x98, 0x99, 0x9a, 0x00, 0x9c, 0x00, 0x00, 0x00, / * 0x98-0x9f * / 0x00, 0x00, 0x00, 0x00, 0xa4, 0xa5, 0xA6, 0xA7, / * 0xa0-0xa7 * / 0xa8, 0xa9, 0xaa, 0xAb, 0xAc, 0xAD, 0xAE, 0xAF, / * 0xA8-0XAF * / 0XB0, 0XB1, 0XB2, 0XB3, 0XB4, 0XB5, 0XB6, 0XB7, / * 0xB0-0XB7 * / 0xB8, 0XB9, 0XBA, 0XBB, 0XBC, 0XBD, 0XBE, 0XBF, / * 0xB8-0XBF * / 0XC0, 0XC1, 0XC2, 0XC3, 0XC4, 0XC5,

0xc6, 0xc7, / * 0xc0-0xc7 * / 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, / * 0xc8-0XCF * / 0xD0, 0xD1, 0xD2, 0x00, 0x00, / * 0xD0-0XD7 * / 0x00, 0xD9, 0x00, 0x00, 0xDF, / * 0xD8-0XDF * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xe0 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xE8-0XEF * / 0xF0, 0xF1, 0x00, 0x00, 0x00, 0xF5, 0x00, 0xF7, / * 0xF0-0xf7 * / 0xF8 , 0xf9, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, / * 0xf8-0xff * /}; static void inc_use_count (void) {MOD_INC_USE_COUNT;} static void dec_use_count (void) {MOD_DEC_USE_COUNT;} static struct nls_table table = { "cp950", page_uni2charset, charset2uni, inc_use_count, dec_use_count, NULL}; int init_nls_cp950 (void) {return register_nls ();} # ifdef MODULEint init_module (void) {return init_nls_cp950 ();} void cleanup_module (void) {unregister_nls () Return;} # ENDIF / * * OV errides for Emacs so that we follow Linus's tabbing style. * Emacs will notice this stuff at the end of the file and automatically * adjust the settings for this buffer only. This must remain at the end * of the file. * ---- -------------------------------------------------- --------------------- * local variables: * c-indent-level: 8 * c-brace-imaginary-offset: 0 * c-brace-offset: -8 * c-argDecl-indecd: 8 * c-label-offset: -8 * c-continued-statement-offset: 8 * c-continued-brace-offset: 0 * end: * / eof5.4 uni2gbk.pl

#! / usr / bin / perl @ code = ("00", "01", "02", "03", "04", "05", "06", "07", "08", "09 "" 0a "," 0b "," 0c "," 0d "," 0e "," 0f "," 10 "," 11 "," 12 "," 13 "," 14 "," 15 ", "16", "17", "18", "19", "1a", "1b", "1c", "1d", "1e", "1f", "20", "21", "22 "" 23 "," 24 "," 25 "," 29 "," 2A "," 2b "," 2C "," 2D "," 2E ", "2F", "30", "31", "32", "33", "37", "38", "39", "3A", "3B "," 3C "," 3D "," 3e "," 3F "," 40 "," 44 "," 45 "," 46 "," 47 ", "48", "49", "4a", "4b", "4c", "4D", "4E", "4F", "50", "51", "52", "53", "54 "" 55 "," 56 "," 57 "," 58 "," 59 "," 5A "," 5b "," 5C "," 5D "," 5E "," 5F "," 60 ", "61", "62", "63", "64", "65", "66", "67", "68", "69", "6A", "6b", "6C", "6D "," 6e "," 6f "," 70 "," 71 "," 72 "

"73", "74", "75", "76", "77", "78", "79", "7A", "7b", "7C", "7D", "7e", " 7f "," 80 "," 81 "," 82 "," 83 "," 84 "," 85 "," 86 "," 87 "," 88 "," 89 "," 8A "," 8B " "8C", "8D", "8e", "8f", "90", "91", "92", "93", "94", "95", "96", "97", " 98 "," 99 "," 9a "," 9b "," 9c "," 9d "," 9e "," 9f "," A0 "," A1 "," A2 "," A3 "," A4 " , "A5", "A6", "A7", "A8", "A9", "AA", "AB", "AC", "AD", "AE", "AF", "B0", " B1 "," B2 "," B3 "," B4 "," B5 "," B6 "," BA "," BB "," BC "," BD " "BE", "BF", "C0", "C1", "C2", "C3", "C7", "C8", "C9", " Ca "," CB "," CC "," CD "," CE "," CF "," D0 "," D1 "," D2 "," D6 "," D5 "," D6 " , "D7", "D8", "D9", "DA", "DB", "DC", "DD", "DE", "DF", "E0", "E1", "E2", " E3 "," E4 "," E5 "," E6 "," E7 ","

E8 "," E9 "," EA "," EB "," EC "," ED "," EE "," EF "," F0 "," F1 "," F2 "," F3 "," F4 " , "F5", "F6", "F7", "F8", "F9", "FB", "FC", "FD", "FE", "FF"); while () {($ Unicode, $ BIG5) = split; ($ high, $ low) = $ unicode = ~ /(..) (..) ($ TABLE2 {$ high} {$ low} = $ BIG5; ($ high, $ low) = $ big5 = ~ /(..) (..) ($ TABLE {$ high} {$ low} = $ unicode;} print << EOF; / * * Linux / fs / nls_cp936.c * * charset cp936 translation tables. * Generated automatically from the Unicode and charset * tables from the Unicode Organization (www.unicode.org). * The Unicode to charset table has only exact mappings. * / # include #include #include #include / * 81 - fe * / static struct NLS_UNICODE CHARSET2UNI [(0xfe-0x81 1) * (0x100-0X40)] = {EOFFOR ($ high = 0x81; $ high <= 0xfe; $ high ) {for ($ low = 0x40; $ low <= 0x7f; $ low ) {$ unicode = $ table2 {$ Code [$ high]} {$ code [$ low]}; $ unicode = "0000" if (! (Defined) $ unicode)); Print "/ n / t" IF ($ low% 4 == 0); print "/ * $ code [$ high] $ code [$ low] * // n / t" if ($ low) % 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) ("{0x% 2S, 0x% 2S}, $ ULOW, $ uHighh );} for ($ low = 0x80; $ low <= 0xff; $ low ) {$ unicode = $ table2 {$ code [$ high]} {$ code [$ low]};

$ unicode = "0000"); Print "/ N / t" IF ($ low% 4 == 0); Print "/ * $ code [$ high] $ code [$]] * // n / t "IF ($ low% 0x10 == 0); ($ uhigh, $ ulow) = $ unicode = ~ /(..) (..) /; printf (" {0x% 2S, 0x % 2S}, "$ ULOW, $ uhigh);}} print" / n}; / n / n "; for ($ high = 1; $ high <= 255; $ high ) {if (Defined $ table { $ code [$ high]}) {print "static unsigned char points [$ high] / [512 /] = {/ n / t"; for ($ low = 0; $ low <= 255; $ low ) {$ BIG5 = $ table {$ code [$ low]}; $ big5 = "3f3f" if (! (defined $ big5)); if ($ low> 0 && $ low% 4 == 0) {Printf ("/ * 0x% 02x-0x% 02x * // n / t", $ low-4, $ low-1);} print "/ n / t" if ($ low == 0x80); ($ BHigh, $ blow) = $ BIG5 = ~ /(..) (..) ("0x% 2S, 0x% 2S,", $ BHIGH, $ blow);} Print "/ * 0xFC-0xFF * // n}; / n / n";}}} print "static unsigned char * page_uni2charset [256] = {"; for ($ high = 0; $ high <= 255; $ HIGH ) {Print "/ n / t" IF ($ high% 8 == 0); if ($ high> 0 && defined $ table {$ code [$ high]}) {print "Page $ code [$ high], "} else {print" null, ";}} print << EOF;}; static unsigned char charset2upper [256] =

{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, / * 0x00-0x07 * / 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, / * 0x08-0x0f * / 0x10, 0x11 , 0x12, 0x13, 0x14, 0x15, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, / * 0x18-0x1f * / 0x20, 0x21, 0x22, 0x23 , 0x24, 0x25, 0x26, 0x27 * / 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, / * 0x28-0x2f * / 0x30, 0x31, 0x32, 0x33, 0x34, 0x35 0x36, 0x37, / 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, / * 0x38-0x3f * / 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47 , / * 0x40-0X47 * / 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, / * 0x48-0x4f * / 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, / * 0x50 -0x57 * / 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5E, 0x5f, / * 0x58-0x5f * / 0x60, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x60-0x67 * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x68-0X6F * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0x70-0x77 * / 0x00, 0x00, 0x7d, 0x7E, 0x7f, / * 0x78-0x7f * / 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, / * 0x80-0X87 * / 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, / * 0x88-0x8f * / 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, / * 0x90-0x97 * / 0x98, 0x99, 0x9a, 0x00, 0x9c, 0x00, 0x00, 0x00, / * 0x98-0x9f * / 0x00, 0x00, 0x00, 0x00, 0xa4, 0xa5, 0xA6, 0xA7, / * 0xa0-0xa7 * / 0xa8, 0xa9, 0xaa, 0xAb, 0xAc, 0xAD, 0xAE, 0xAF, / * 0xA8-0XAF * / 0XB0, 0XB1, 0XB2, 0XB3, 0XB4, 0XB5, 0XB6, 0XB7, / * 0xB0-0XB7 * / 0xB8, 0XB9, 0XBA, 0XBB, 0XBC, 0XBD, 0XBE, 0XBF, / * 0xB8-0XBF * / 0XC0, 0XC1, 0XC2, 0XC3, 0XC4, 0XC5,

0xc6, 0xc7, / * 0xc0-0xc7 * / 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, / * 0xc8-0XCF * / 0xD0, 0xD1, 0xD2, 0x00, 0x00, / * 0xD0-0XD7 * / 0x00, 0xD9, 0x00, 0x00, 0xDF, / * 0xD8-0XDF * / 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xe0 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, / * 0xE8-0XEF * / 0xF0, 0xF1, 0x00, 0x00, 0x00, 0xF5, 0x00, 0xF7, / * 0xF0-0xf7 * / 0xF8 , 0xf9, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, / * 0xf8-0xff * /}; static void inc_use_count (void) {MOD_INC_USE_COUNT;} static void dec_use_count (void) {MOD_DEC_USE_COUNT;} static struct nls_table table = { "cp936", page_uni2charset, charset2uni, inc_use_count, dec_use_count, NULL}; int init_nls_cp936 (void) {return register_nls ();} # ifdef MODULEint init_module (void) {return init_nls_cp936 ();} void cleanup_module (void) {unregister_nls () Return;} # ENDIF / * * OV errides for Emacs so that we follow Linus's tabbing style. * Emacs will notice this stuff at the end of the file and automatically * adjust the settings for this buffer only. This must remain at the end * of the file. * ---- -------------------------------------------------- --------------------- * local variables: * c-indent-level: 8 * c-brace-imaginary-offset: 0 * c-brace-offset: -8 * c-argDecl-indent: 8 * c-label-offset: -8 * c-Continued-statement-offset: 8 * c-continued-brace-offset: 0 * end: * / EOF5.5 Convert CodePage tool

/ * * CPI.c: a Program to Examine MSDOS CODEPAGE Files (* .cpi) * And Extract Specific CodePages. * Compiles Under Linux & DOS (Using BC 3.1). * * Compile: GCC -O CPI CPI.c * CALL : codepage file.cpi [-a | -l | nnn] * * Author: ahmed m. naas (ahmed@oea.xs4all.nl) * Many Changes: aeb@cwi.nl [Changed Until It Would Handle All * *. CPI Files People Have Sent Me; I Have No Documentation, * So All this is expenental] * Remains to do: drdos fonts. * * Copyright: public domain. * / # include #include #include #include int handle_codepage (int); void handle_fontfile (void); # define PACKED __attribute__ ((packed)) / * Use this (instead of the above) to compile under MSDOS * // * # define PACKED * / struct {unsigned char id [8] PACKED; unsigned char res [8] PACKED; unsigned short num_pointers PACKED; unsigned char p_type PACKED; unsigned long offset PACKED;} FontFileHeader; struct {unsigned short num_codepages PACKED } Font InfoHeader; struct {unsigned short size PACKED; unsigned long off_nexthdr PACKED; unsigned short device_type PACKED; / * screen = 1; printer = 2 * / unsigned char device_name [8] PACKED; unsigned short codepage PACKED; unsigned char res [6] PACKED ; unsigned long off_font PACKED;} CPEntryHeader; struct {unsigned short reserved PACKED; unsigned short num_fonts PACKED; unsigned short size PACKED;} CPInfoHeader; struct {unsigned char height PACKED; unsigned char width PACKED; unsigned short reserved PACKED; unsigned short num_chard PACKED Screenfontheader; Struct {UNSIGNED SHORT P1 PACKED;

unsigned short p2 PACKED;} PrinterFontHeader; FILE * in, * out; void usage (void); int opta, optc, optl, optL, optx; extern int optind; extern char * optarg; unsigned short codepage; int main (int argc CHAR * argv []) {if (argc <2) usage (); if ((in = fopen (argv [1], "r"))) == null) {printf ("/ NUNABLE TO OPEN FILE% S ./N ", Argv [1]); exit (0);} Opta = OPTC = OPTL = OPTL = OPTX = 0; optIND = 2; if (argc == 2) OPTL = 1; Else While (1) { Switch (ARGC, Argv, "ALLC")) {CASE 'A': OPTA = 1; Continue; Case 'C': OPTC = 1; Continue; Case 'L': OPTL = 1; Continue; Case 'L ': OPTL = 1; Continue; Case'? ': Default: usage (); case -1: break; } Break;} if (Optind! = argc) {if (OptinD! = argc-1 || OPTA) USAGE () Usage (); CODEPAGE = ATOI (Argv [Optind]); Optx = 1;} if (OPTC) Handle_CodePage (0 ); else handle_fontfile (); if (optX) {Printf ("NO PAGE% D FOUND / N", CODEPAGE); EXIT (1);} fclose (in); return (0);} voidhandle_fontfile () {INT i , J; J = FREAD (, 1, Sizeof (FontfileHeader), IN); if (j! = sizeof (fontfileHeader) {Printf ("

Error Reading FontfileHeader - Got% D Chars / N ", J); EXIT (1);} if (! strcmp (fontfileHeader.ID 1," drfont ")) {Printf (" this Program Cannot Handle Drdos Font Files / N "); exit (1);} if (opTl) Printf (" FontfileHeader: ID =% 8.8s Res =% 8.8s Num =% d type =% C Offset =% ld / n / n ", fontfileHeader.ID, FontFileHeader.res, FontFileHeader.num_pointers, FontFileHeader.p_type, FontFileHeader.offset); j = fread (, 1, sizeof (FontInfoHeader), in); if (j = sizeof (FontInfoHeader!)) {printf ( "error reading FontInfoHeader - Got% D Chars / N ", J); EXIT (1);} IF (OPTL) Printf (" FontInfoHeader: Num_CODEPAGES =% D / N / N ", FontInfoHeader.Num_Codepages); for (i = fontinfoHeader.Num_codepages; i ; i -} INTHANDE_CODEPAGE (INT MORE_TO_COME) {Int J; char Outfile [20]; unsigned char * fonts; long INPOS, NEXTHDR; J = FREAD (, 1, SizeOf (CpenTryHeader), IN); if (j! = sizeof (cpensryHeader) {Printf ("Error Reading CpenTryHeader - GOTINT% D charS / n ", j); exit (1);} if (opTl) {Int T = cpensryHeader.device_type; printf (" cpenTryHeader: size =% D dev =% d [% s] name =% 8.8s / CODEPAGE =% D / N / T / TRES =% 6.6s nxt =% ld off_font =% ld / n / n ", cpensryHeader.Size, T, (t == 1)?" screen ": (t == 2 "Printer": "?"

, CPEntryHeader.device_name, CPEntryHeader.codepage, CPEntryHeader.res, CPEntryHeader.off_nexthdr, CPEntryHeader.off_font);} else if (optl) {printf ( "/ nCodepage =% d / n", CPEntryHeader.codepage); printf ( "Device =% .8s / n ", cpensryHeader.Device_name);} #if 0 if (cpensryHeader.Size! = Sizeof (cpensryhead)) {/ * seen 26 and 28, so That the Difference Below is -2 or 0 * / if (optl) printf ( "Skipping% d bytes of garbage / n", CPEntryHeader.size - sizeof (CPEntryHeader)); fseek (in, CPEntryHeader.size - sizeof (CPEntryHeader), SEEK_CUR);!} #endif if (opta && (! OPTX || CpenTryHeader.codePage! = CODEPAGE) &&! OPTC) Goto next; INPOS = FTELL (IN); if (INPOS! = CpenTryHeader.off_Font &&! OPTC) {IF (OPTL) Printf ("POS =% LD Font At% LD / N ", INPOS, Cpentr YHeader.off_Font; FSeek (in, cpensryHeader.off_font, seek_set);} j = fread (, 1, sizeof (cpinfoHeader), IF (j! = sizeof (cpinfoHeader) {Printf ("Error Reading CpinfoHeader) Got% D Chars / N ", J); EXIT (1);} if (OPTL) {Printf (" Number of Fonts =% D / N ", cpinfoHeader.Num_FONTS); Printf (" Size of Bitmap =% D / n ", cpinfoheader.size);}} (cpinfoHeader.Num_fonts == 0) goto next; if (optc) return 0; sprintf (Outfile,"% D.cp ", cpensryHeader.codepage); if ((out =

FOPEN (OUTFILE, "W")) == NULL) {Printf ("/ NUNABLE TO OPEN FILE% s. / N", OUTFILE); EXIT (1);} else printf ("/ nwriting% s / n", Outfile; fonts = (unsigned char *) malloc (cpinfoheader.size); FREAD (FONTS, CPINFOHEADER.SIZE, 1, IN); FWRITE (, sizeof (cpensryHeader), 1, out); fwrite (, sizeof (cpinfoHeader) , 1, out; j = fwrite (fonts, 1, cpinfoheader.size, out); if (j! = CpinfoHeader.size) {Printf ("Error Writing% S - Wrote% D Chars / N", Outfile, J EXIT (1);} fclose (out); FREE (FONTS); if (OPTX) EXIT (0); Next: / * * It see That if entry headers and fonts are interspers, * the nextdr Will Point Past T Font, Regardless of * WHether More Entries, First All Entry Headers Are Given, And Then * All fonts; in this case, * / nexthdr = CpenTryHeader.off_nexthdr; if (NEXTHDR == 0 || NEXTHDR == -1) {IF (more_to_come) {Printf ("Mode CodePages Expected, But Nextdr =% LD / N", NextDR); EXIT (1);} Else Return 1;} INPOS = FTELL (IN); if (INPOS! = cpensryHeader.off_nexthdr) {if (OPTL) Printf ("POS =% LD NEXTHDR AT% LD / N", INPOS, NEXTHDR); IF (Opta &&! More_to_come) {Printf ("No more code pages, but nextdr! = 0 / n"); Return 1;

转载请注明原文地址:https://www.9cbs.com/read-121010.html

New Post(0)