Code conversion technology

zhaozj2021-02-17  40

The irregular understanding is that GB is GBK Simplified. 3, "Traditional" is not equivalent to BIG5, there is also traditional in GBK, GB12345 episodes are traditional. But the three Chinese characters encoding are different. The use of the GBK character set in Windows95 / 98 / NT / 2000 (Simplified); the traditional version of the BIG5 character set is not available in the Simplified Edition, and the BIG5 character is not displayed. The traditional version cannot display GB characters. 4, in IE, enter the BIG5 code website (such as: Taiwan website), if there is a BIG5 character set support, IE will convert the BIG5 web page into GBK Traditional display, no garbled. When IE is displayed in GBK, the Chinese characters entered in the web page should be GBK Traditional. When displaying the BIG5 code (garbled), enter the BIG5 code character (input garbled? First enter the GBK Simplified ---- GB code, then use small The tool converts it to BIG5, copy, and paste it). 5. In a common gadget, BIG5 can be converted into GBK, and there are not many GBK Simplified Traditional. The reason is that they have a correspondence between GB2312 character sets and BIG5 character sets.

Third, the internal code conversion principle and the method of internal code conversion: the establishment of a correspondence between different character sets. Both GBK2BIG5 (as unuspetent) such as: let the words, encoded in GBK is C8C3. If we turn the characters in the GBK code table into a BIG5 code format, the C8C3 bit should be 攍 Och to let the characters of the word Big5 code characters (琵 不 不 的,, 攠 攠 攠BIG5 code Chinese characters display results in the GBK environment). This way we read the text to be converted, find it in GBK (already converted into a BIG5 format) code table, remove the characters at this location, replace the original characters. Reading and writing is not a problem. The key is how to locate the Chinese characters in the code table file and how to convert the pure GBK code table to the GBK code table represented by the BIG5 format. Question 1. Position Chinese characters. GBK code table (in order sequential] 81-87 88-8F 90-97 98-9F A0-A7 A8-AF B0-B7 B8-BFC0-C7 C8-CF D0-D7 D8-D0-E7 E8-EF F0-F7 F8-Fe81 0 1 2 3 4 5 6 7 8 9 A b C D e f4 丂 丄 丅 丆 丏 丒 丗 丢 丠 両 丣 丩 丩 丮 丮 丯 丱 丱 丳 丵 丵 丩 丩 丩 丮 丱 丱 丱 丱 丠 丠 丱 丱 丱乕 乕 乚 乛 乢 乣 乤 乥 乧 乨 乪 乫 乬 乭 乮 乯 乲 乴 乵 乶 乷 乷 乸 乹 乺 乻 乼 乽 乿 亀 亁 亃 亃 亄 亅 亇 亊 8 亐 亖 亗 亜 亝 亝 亣 亪 亯 亰 亰亱 亴 亶 亶 亸 9 亹 亼 亽 亽 仈 仌 仏 仐 仒 仚 仛 仜 仠 仢 仢 仧 仧 仩 仩 仭 仮 仯 仱 仱 仴 仴 仹 仹 仺 仾 伃 仱 仱 仴 仴 仴 伄 伃 伄 伃 伃 伃 伃 伃 伄 伄 伃 伄 伃 伃 伃 伒 伒 伒 伄 伒 伒 伃 伃 伒 伒 伄 伄伜 伝 伡 伣 伨 伩 C 伬 伭 伮 伱 伳 伵 伷 伹 伻 伾 伿 佀 佁 佂 佄 佅 D 伫 伫 佉 佊 佋 佌 佒 佖 佡 佡 佦 佨 佪 佫 佭 E 佮 佱 佲 佲 佷 佸 佹 佺 佽侀 侁 侂 侅 侇 侊 侊 侌 侎 侐 侒 侓 侕 侘 侘 侙 侚 侜 侞 侟 価 侢 侢

The above is to arrange the GBK code table in the order of the code, a total of 126 districts, 190 Chinese characters per zone. The calculation of the Chinese character position is as follows: POSIT = (CH1 - 129) * 190 (CH2 - 64) - (CH2 / 128); (Nth Chinese) POSIT = POSIT * 2; (Nth byte) first The question even. Problem 2, use the GBK code table with BIG5. We can use existing tools, such as Oriental Express 3000, convert GBK code tables into BIG5 format. But there is a problem in practice because GBK is more than the Chinese characters of BIG5, then the characters in the GBK, and the characters in the BIG5 may be deleted in the conversion, and the above-mentioned post code table is not available. And actually unable to position. However, I found a text of the GBK code table represented by BIG5 (may be official), and the character is not lacking. This problem is also available.

Also we can perform BIG52GBKT (Traditional), BIG52GBKS (Simplified), GBKS2GBKT, GBKT2GBKS, GBK2BIG5 conversion. Here, BIG5 code format, and positioning algorithm: BIG-5 Code Table A0-A7 A8-AF B0-B7 B8-BF C0-C7 C8-CFD0-D7 D8-DF E0-E7 E8-EF F0-F7 F8 -Fe (has been converted into GBK) B0 0 1 2 3 4 5 6 7 8 9 A b C d e f4 蚓 蚩 蚩 蚣 衷 记 记 记 5 5 5 5 5 5 5 5 5 5 5讫 訏 訑 岂 豺 豹 财 起 6 轩 轫 軏 軏 送 退 迺 迺 迺 郢 郢 郢 配 配 配 配 针 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉 陉Horse bone high fighting ghost dry 咱 B pseudo stop fake 偃 偌 偌 健 偭 副 副 副 副 偯 偯 商 匏 匏 区 区 区 区 区 区 参 参 匙 区 区 区 区 区 区 区 区 区 区Dumbphine, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, sing, s)

Positioning method: IF ((CH2> = 64) && (CH2 <= 126)) {POSIT = (CH1 - 160) * 157 (CH2 - 64); POSIT = POSIT * 2 - 1;} else IF ((CH2 > = 161) && (CH2 <= 254)) {POSIT = (CH1 - 160) * 157 62 (CH2 - 160); POSIT = POSIT * 2 - 1;}

A program for GBK2BIG5 C Builder is given here:

FGBK2BIG5 = FOPEN ("Puregbk2big5byOrder.txt", "RB");

Unsigned long i, posit; // convert GB code to GBKT UNSIGNED CHAR CH1, CH2; String Scontext; char CHR;

Scontext = MEMO1-> lines-> text; i = 1; while (i = 129) && (CH1 <= 254)) {IF ((CH2> = 64) && (CH2 <127)) || ((CH2> 127) && (CH2 <= 254))) {POSIT = (CH1 - 129) * 190 (CH2 - 64) - (CH2 / 128); POSIT = POSIT * 2; IF ((POSIT> 23940 * 2) || (POSI <0)) {i ; Continue;} fseek (fgbk2big5, posit - ftell (FGBK2BIG5), 1); FREAD ((void *) (& chr), sizeof (char), 1, fgbk2big5); scontext [i] = chr; fread ((void *) (& chr), sizeof (char), 1 , FGBK2BIG5); SCONTEXT [i 1] = CHR; I = 2;} else {i ;}} else {i ;}}

Memo1-> lines-> text = scontext;

转载请注明原文地址:https://www.9cbs.com/read-29857.html

New Post(0)