A little knowledge of Unicode, DBCS

zhaozj2021-02-16  59

First of all, DBCS is the character set in Asia, including ANSI, ANSI is a character between the ASCII value of 0-255. When the character is ANSI, it is a byte that is stored in the file. If it is non-ANSI, it takes up two bytes. The use of VB's ASC function can easily get a DBCS value of a character (or ANSI value)

If the DBCS value of a character is & H1234, of course, this value is converted into hexadecimal, because for disk storage, general use bits (bits), ie binary storage, and display bytes with sixteen The encenration is very intuitive. Store "12 34" in the file (this is in the form of seeing the sixteen-in-text editor)

Unicode is a worldwide character set, which contains almost all characters in the world, and each character has a single unicode value. The Unicode value is also occupied by two bytes. But different is that it also contains standard ANSI character values, but ANSI characters only occupy one byte, Unicode will automatically add a value of 0 after the ANSI value. For example, an ANSI value is a character of & h45, and it is stored in Unicode form "45 00". As for how to get a unicode value of a character with VB, the ASCW function can be easily set. However, ordinary non-ANSI characters are from right to left when unicode form. For example, a value of a value of & h1234, is "34 12" when it is unicode

Do you know what to use? The significance of Unicode is to identify operating system in different environments. For example, you have written a text article under the Chinese operating system. But you have to get Win2K in other environments (choose WIN2K because Win2K supports Unicode, otherwise you can only use external Chinese platform to view (such as US computer, operating system is English, code page It is also the US (Win2K has a set code page), even if the computer has already installed the Chinese font, the editing software such as Word is definitely a pile. Why is this? Because English Win2K operating system can only identify Unicode! You can't identify the DBCS code in Asia!

Workaround as long as you convert it to Unicode codes, it is OK! Convert in Utrla Edit. There is also a code converter in Win2k, and the Notepad under Win2k can use another CHUN to Unicode. If you use Win9x, you can solve it with VB. The text files stored in the Unicode code and the ordinary text files are only the file header is added "FF Fe". Other, the code value is different. As long as the character's code is converted, add the "FF Fe" before the file, it is to save this Chinese article as a Unicode format! After converting into a Unicode format, the machine of the English operating system can view your file with Word! (Why is Word because it has a font recognition feature. Ordinary Notepad only connects the text chain to the system font, while the default font of the English system does not contain Chinese! Of course, you can't show it.)

The string store format such as Exe is almost like this. However, the program written by VB is a bit, some characters are stored in DBCS, some of which are stored in Unicode. More than other tools are mostly stored for DBCS single types. The characters stored by DBCS are very easy. Use the hex text editor such as the UE to find the string to modify, just hook "Find Ascii" to hook OK. But it is not good to use Unicode. We can solve it with VB! As mentioned earlier, the character stored like a value of & h1234 is "34 12" as "34 12", then we get its hexadecimal value with the VB's ASCW function, and will it be in turn? In turn, take this value to the UE to search, don't choose "Find ASCII", you can find the text we have to modify! Then use VB to convert the text to be replaced into the same form, and then modify it to the corresponding value in the UE.

In general, the character stored in the development environment in advance in the VB, such as the character in the label, is stored in DBCS. And the text of hardcodes in EXE. such as:

"This is the text compiled in the code"

These Chinese characters are stored in Unicode. The same is true for string resources in resource files.

In this way, even if the software author hardcodes in EXE, we can modify it. Hoho

I hope that I can write this article to help you.

Attachment: Theoretical program uses Unicode can not be affected by the operating system environment, and the help of Windows is also said). But in fact, because of environmental factors such as font links, the text cannot be displayed normally. To make the software international or recommend a resource file, build a string table of several different code pages. In this way, it is convenient to modify, and the second to the resource occupancy of the exe program is also beneficial.

转载请注明原文地址:https://www.9cbs.com/read-28125.html

New Post(0)