Uncapacted ASP Chinese character Simplified Traditional mutual conversion function

xiaoxiao2021-03-06  107

We develop ASP programmers, often encounter customers with the need to develop traditional websites, but the construction of two websites is trouble, the second is that customers are not willing to update, so they require an ASP function that automatically GB2312 and BIG5 transformation, if Please contact me if you need.

Simplified conversion into traditional test URL

http://www.abkk.com/cn/online_tools/gb2312_1.asp

Traditional conversion into a Simplified test URL

http://www.abkk.com/cn/online_tools/big5_1.asp

-------------------------------------------------- -------------

In fact, it can be used without the library.

Interchange technology of GB code and BIG5 code

Chinese and English use ASCII code a byte representation, which uses two bytes to represent. In fact, saved in a text file is two byte encodings corresponding to each Chinese character, and the display problem is automatically resolved by the Chinese operating system.

Chinese character encoding is not uniform, we use the GB code, and the Taiwan area is BIG5 code. The BIG5 code file is saved is the corresponding BIG5 encoding of Chinese characters, and the GB code saved in the GB code file is the corresponding GB coding. Therefore, the key to the conversion work is that there is a code table file that records each BIG5 encoding corresponding GB encoding.

The GB code coding rule is this: Each Chinese character consists of two bytes, and the first byte ranges from 0xa1-0xfe, a total of 96 species. The range of the second byte is 0xA1-0xfe, respectively, a total of 96 kinds. 96 * 96 = 8836 Chinese characters can be defined using these two bytes. There is a total of 6763 Chinese characters.

The BIG5 code encoding rule is this: Each Chinese character consists of two bytes, and the first byte ranges from 0x81-0xfe, a total of 126 species. The range of the second byte is 0x40-0x7e, 0xa1-0xfe, a total of 157 species. That is, 126 * 157 = 19782 Chinese characters can be defined using these two bytes. Part of these Chinese characters is commonly used, such as one, Dan, which is called common words, and its BIG5 code is 0xA440-0XC671, a total of 5401. More unused words, such as abuse, adjust, we call the common words, range from 0xc940-0xf9fe, a total of 7652, and the rest is some special characters.

The principle of making the code table file is this: first write all GB codes into a file, then use the software with GB code to BIG5 code conversion function, such as Convert.exe under UCDOS, convert files to a BIG5 code file It is a code table file.

The following program writes all national codes into files GB.TXT (the following full program is written with FoxPro, which can be easily converted into other languages)

fp = fopen ("gb.txt", 2)

For i = 161 to 247

For j = 161 to 254

= fwrite (FP, CHR (I) CHR (j))

NEXT

= fwrite (FP, CHR (13) CHR (10))

NEXT

= fwrite (FP, CHR (26))

= fclose (fp)

Organized Form of File: The line corresponds to the first byte of the encoded, column corresponding to the second byte of the encoded. Please pay attention to the encoded offset, such as the Chinese character "ah" GB code 0xB1A1 first byte 0xB1 (177) second byte 0xA1 (161) so he should be in the file (177-161 = 16) ((161-161) * 2 = 0) column.

Run convert.exe converts GB.TXT to a BIG5 code file, so you can get the BIG5 code table file BIG5.TXT organized by GB code organization. Easy can also get a GB code table file according to the BIG5 code organization. The idea of ​​conversion is this: (written with FoxPro)

First load the code table file into an array

fp = fopen ("big5.txt")

i = 0

Do While Feof (FP)

i = i 1

DIME DICT [I]

DICT [i] = fgets (fp)

Enddo

= fclose (fp)

Second, the text will be transferred to the variable

CREATE CURSOR TEMP (MM M)

Append Blank

Append Memo MM from TextFileName

Text = mm

Then scan the text and replace all GB codes

Temp = ""

i = 1

Do While i

CH = SUBSTR (Text, I, 1)

IF Isascii (CH) && is an ASCII code

Temp = TEMP CH

i = i 1

Else

CH1 = SUBSTR (Text, I 1, 1)

BIG5 = SUBSTR (DICT [ASC (CH) -161 1] (ASC (CH1) -161) * 2 1, 2)

Temp = Temp BIG5

i = i 2

ENDIF

Enddo

Finally, the conversion text will be obtained in TEMP.

It should be noted that in the foxPro-array pointer starts at 1, the start bit of the SUBSTR function is> = 1.

FoxPro should be, you can understand.

转载请注明原文地址:https://www.9cbs.com/read-106030.html

New Post(0)