Win32 Learning Notes Chapter 2 Unicode

zhaozj2021-02-16  59

Win32 learning notes

Author: Jiang Xuezhe (netsail0@163.net)

Textbook: Windows Program Design (Fifth Edition) Peking University Press [United States] Charles Petzold Translations from Beijing Boan Technology Development Co., Ltd.不 作者 不 编 工 出 社 社 社 社 工 社 社 出 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 社 译 译 译 张 张 张 张 张 译 译 译 译 译 译 译 译 译 译 译 译 ¥ ¥ ¥ ¥ ¥ ¥ 译 ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ Xu Kexi ¥: 42

Environment: Windows2000 Server Internet Explorer 6.0 DirectX7.0 Visual C 6.0

Copyright, Reprint, please explain the source ------------------------------------------------------------------------------------------------------------------------------- --------------------------------------- [Chapter 2 Unicode]

Buddha: Everything is equal. But people can use higher intelligence to stand in many creatures. When I had breakfast, I looked at the fish that was made into a dish and thought it was life. So, there is still some levels of grades. I learned the existence of Unicode during WIN32. Unicode eventually replaces the ASCII code to become a standard. But there is still a long way to go. Although the advantages of Unicode are obvious, ASCII will have a long period of time because of historical legacy issues.

You should come into contact with the ASCII code when you learn C language. Learning the content of this chapter requires ASCII knowledge.

Learning Unicode is necessary to understand the history of the character set. Starting from the earliest word shape, we have used character texts for nearly 6,000 years. Several inventors in the 19th century invented the telegram, the code used in the telegram was Morse code. Each character in the alphabet corresponds to a series of short and long pulses.

The computer is a data of the data is actually a series of 1 and 0. Each number represents a character. This is the ASCII code. The 7-digit ASCII code is very good for the US character set. Unfortunately, there are more than 100 countries and regions on the earth, more than 2,000 ethnic groups. For users other than the US, they show their national text difficulties in their own countries.

Especially China, Japan, North Korea is even more. Take China as an example, there is a number of people who are not unclear. The way is always there. People introduced the concept of "code page" and "double-word character set". This coding method is very large and complicated, which is not conducive to maintenance. At this time, Unicode came into being.

Unicode's solution is very simple. Since we can't repay with 7 or 8 digits, then we should try a wider value. For example, 16 bits, this allows 65536 characters to be expressed. Unicode and ASCII are compatible. That is, the value of the first 128 characters is the same.

The maximum benefit of Unicode is only one character set. Of course, Unicode also has a disadvantage that the memory occupied by Unicode is twice the ASCII code. And people are not very habit of unicode.

For programmers, 8-bit ASCII codes and 16-bit Unicode are the problems we have to face. In order to solve the width character (16-bit) problem, Windows defines the "new" data type in the header file.

Typedef unsigned short wchar_t;

It can be seen that Wchar_t is actually 16-bit unsigned short integer. Windows stores 16-bit characters in this way.

Wchar_t * p = l "Hello!";

There is a capital letter L (representing long) before "Hello!". This will tell the compiler that the string is saved by widening characters. That is, each character takes up 2 characters. Storage This string requires 14 bytes. There is also one / 0 at the end of the string and two bytes. We all know how to get the length of the string.

INT ILENGTH; Char * PC = "Hello!"; Ilength = Strlen (PC);

The function strlen () returns the length of the string, and the length will not be included / 0 at the end. The variable Ilength will be equal to 6, which is the number of characters in the string.

Next we try to check the string of wide character with strlen ().

Wchar_t * pw = l "Hello!"; Ilength = Strlen (PW);

Strlen () parameters should be a CHAR type pointer, but now I have accepted a pointer of a UNSIGNED SHORT type. After you compile, you will find that Ilength is equal to 1.

The 6 character wide character code in the "Hello!" is as follows:

0x0048 0x0065 0x006c 0x006c 0x006f 0x0021

The Intel processor saves them in memory:

48 00 65 00 6C 00 6C 00 6F 00 21 00

The working process of strlen () is to end. Because 0 indicates the end of a string. When Strlen () after reading "48" is 0, strlen () returns 1.

It can be seen from the above example to see a wide character correctly. The function of string in the parameter needs to be overwritten. The width character version of Strlen () is wcslen (), and there is declaration in String.h and Wchar.h.

Now we know, to get the length of the wide string, you can call

ilength = WCSLEN (PW);

This function returns the number of characters that will return in the string. Keep in mind that the character length of the string is changed to the width byte, but the length of the byte is changed. Don't confuse.

Because Unicode occupies twice the memory space, the function in the wide byte run is larger than the regular function. So it is best to build two versions of the program, an ASCII string, and another processes the Unicode string. But this has attracted another small problem. Because each function of each implementation of a specific function has two versions, the name is not good. Whether it is an ASCII version or a Unicode version, how is the same name? Fortunately, this problem has been resolved.

The solution is to use the TCHAR.H header file included in Visual C . The header file is not part of the standard C. In order to open with the standard C header file, each function defined in the header and a macro definition have a next line.

TCHAR.H provides a series of alternative names for the standard runtime function that requires strings. Sometimes these names are called "universal" function names because they can point to the Unicode version of the function, or point to the ASCII version. With _TCSLEN (), if it defines the identifier of the _unicode, and the program contains tchar.h, then _tcslen () is defined as WCSLEN ():

#define_tcslen wcslen

If no _unicode is defined, _tcslen () is defined as strlen ().

#define_tcslen strlen

TCHAR.H also uses a new data type TCHAR to solve the problem of two character data types. If the _unicode identifier is defined, then tchar is Wchar_t:

Typedef wchar_t tchar;

Otherwise tchar is char:

Typedef char tchar;

Remember the text () that appeared in the first chapter? That is to compatibility with the Unicode character set. Let's take a look at how text () is defined in the header file.

#define __t (x) l ## x

Laid L ## x You may not understand. Few books mention it. But that is indeed part of the standard C pre-processed. This pair "##" is called a TOKEN PASTE. It seems that our understanding of standard C is not enough. It is time to buy this "The C Programming Language". It adds the letter L to the macro parameter. __T ("Hello!") Is equal to L ## "Hello!" Is equal to L "Hello!".

There are also two macros as __t definitions:

#define _t (x) __t (x) #define _text (x) __t (x)

A macro has also been defined in the winnt.h header file, which is also added to the string before __t.

#ifdef unicode # define __text (quote) L ## quote # else # define __text (quote) quote # Endif

#define text (quote) __text (quote)

It is TEXT (quote) in this book. Now you know why you will enclote the character string. What? I still don't understand? You can make a criteria C, then read more. Will n't it be that a natural problem. However, even the villagers like me can understand, you have no reason!

◎ No. 34

Scrnsize is a stuff that you can detect your current display resolution. For example, I am 1024 * 768

For this program, you can run once, don't want to understand it, let him see the ghost. We must understand the Hellowin of the third chapter. You may not read a few times, you don't know much about the second chapter, it doesn't matter, a life, two times. It is possible to see more.

转载请注明原文地址:https://www.9cbs.com/read-27630.html

New Post(0)