Programming Windows, Fifth EDTION study notes
Introduction UnicodeIn the first chapter, I promised to elaborate on any aspects of C that you might not have encountered in conventional character-mode programming but that play a part in Microsoft Windows. The subject of wide-character sets and Unicode almost certainly qualifies in that Respect.
Unicode is a 16-bit character set that can be ported to all major computer platforms and override almost all worlds. It is also a single area; it does not include code page or other complex things that make software difficult to read and test. There is no reasonable multi-platform character set that can compete with it.
Wide Characters and CFew programmers are aware that ANSI / ISO 9899-1990, the "American National Standard for Programming Languages-C" (also known as "ANSI C") supports character sets that require more than one byte per character through a concept called "Wide Characters." The Wide Characters Coexist Nicely With Normal and Familiar Characters.
ANSI C also supports multi-byte characters, such as supporting Chinese, Japanese, and Korean Windows version. However, these multi-byte characters are treated with a single character's string of the single character. In contrast, the width character is better than the usual character to be wide and contain some compilers. Wide characters are not necessarily Unicode. Unicode is a possible wide character encoding.
Wide characters are not necessarily Unicode. Unicode is one possible wide-character encoding. However, because the focus in this book is Windows rather than an abstract implementation of C, I will tend to speak of wide characters and Unicode synonymously.
THE CHAR DATA TYPE, we should be familiar with the use of Char data types in our C procedures to define and store characters and strings. But in order to simplify how C how to handle wide characters, let's review the character definition in the Win32 program.
The following statement defines an initialized variable that includes a character: char c = `a '; variable c requires a 1-byte space and initialized with hexadecimal value 0x41, which is from ASCII encoding. Character A value.
You can also define a pointer to string, such as: char * p; because Windows is a 32-bit operating system, pointer variable p takes 4 bytes to store. You can initialize a string for this pointer: char * p = "Hello!"; P still requires 4 bytes to store. The string stores in a static memory and uses 7 bytes, where 6 bytes are characters and end in 0.
You Can Also Define An Array of Characters, Like this: Char A [10]; in this case, The Compiler Reserves 10 bytes of storage for the array. The expression sizeof (a) Will Return 10. IF The Array Is Global (That IS, Defined Outside An Array of Characters by Using A Statement Like SO:
Char a [] = "Hello!";
IF you define this array as a local variable to a function, it must be defined as a static variable, as flollows:
Static char a [] = "Hello!";
In Either Case, The String Is Stored In Static Program Memory With A 0 appended at the end, thus requiring 7 bytes of storage.
Wide character Nothing about Unicode or wide characters alters the meaning of the char data type in C. The char continues to indicate 1 byte of storage, and sizeof (char) continues to return 1. In theory, a byte in C can be greater than 8 BITS, But For Most of US, A Byte (and Hence A Char) IS 8 BITS WIDE.
Width characters in c Based on the WCHAR_T data type, Wchar_t defines in several header files, including wchar.h: typedef unsigned short wchar_t; therefore, Wchar_t data type is equivalent to a UNSIGNED SHORT INTEGER, 16-bit wide.
Define a variable containing a wide character, use the following statement: wchar_t c = `a '; C is a 2-byte value 0x0041, which is a representation in Unicode.
Similarly, you can also define an initialized wide string pointer: wchar_t * p = l "Hello!"; The first letter L in the value is in the compiler that the string is stored in wide characters. This pointer requires 4 bytes to store, and each character occupies 2 bits and ends with double-bit 0, which takes up 14 bits of space.
Similarly, a wide character array can be defined: static wchar_t a [] = l "Hello!"; This string also requires 14 bits to store, SIZEOF (a) will return 14.
Wchar_t c = l'a ';
But it's usually not necessary. The c compiler will zero-extend the character.
Wide character class library functions We all know how to get the length of the string. For example: char * pc = "Hello!"; We can call ilength = strlen (PC); // ilength will get 6, is the length of the string
it is good! We define a wide string pointer: wchar_t * pw = l "Hello!"; Ilength = Strlen (PW); problem came out. First, the C compiler will give you some warning messages, perhaps the following information: `function ': incompatible type - from` unsigned short *' to `const char * 'It tells you that the strlen function is defined as the receiving character pointer type, It is now indeed a unsigned short. You can continue to compile and run the program, but you will find that Ilength is set to 1, what happened? Length is 6 string "Hello!" Has 16-bit value: 0x0048 0x0065 0x006C 0x006C 0x006F 0x0021 They are stored via Intel processor storage In memory: 48 00 65 00 6C 00 6C 00 6F 00 Strull function, assume it trying to find a string length, calculate the first character 48, but the second character is 0, it means that the string ends . This small code clearly explains the C language and runtime library functions. The compiler explains the string L "Hello!" To 16-bit short integers and stored to the WCHAR_T array. The compiler also handles some other operations, but the run library function is joined when Strlen is in Like. This function looks like a string containing a single character. When they face a wide string, they can't do as we expect.
Oh, great, you say. Now every C library function has to be rewritten to accept wide characters. Well, not every C library function. Only the ones that have string arguments. And you do not have to rewrite them. It's already been DONE.
The wide-character version of the strlen function is called wcslen ( "wide-character string length"), and it's declared both in STRING.H (where the declaration for strlen resides) and WCHAR.H. The strlen function is declared like this : size_t __cdecl strlen; const char *);
And The Wcslen Function Looks Like this:
SIZE_T __CDECL WCSLEN (const wchar_t *);
So Now We know that when we need to find out the length of a wide-character string we can Call
ilength = WCSLEN (PW);
. ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
All your favorite C run-time library functions that take string arguments have wide-character versions. For example, wprintf is the wide-character version of printf. These functions are declared both in WCHAR.H and in the header file where the normal function Is Declared.MAINTAINING A SINGLE SOURCE Of course, there are certain unfavorable conditions. The primary all over your program will take twice the space. Also, you will notice that the width character running time is larger. For these reasons, you may need to deliver two versions to your program with ASCII strings and Unicode strings. The best way is to maintain a source code file, you can compile with ASCII or Unicode.
There is a small problem because the runtime library function has a different name, allowing you to directly define the characters, so the annoying prefix character L is present.
Use Visual C to include header file tchar.h is an answer. This header file is not the standard of ANSI C. Therefore, each function and macro definition is loaded on the previous underscore. TCHAR.H provides a set of alternating names for the normal rule of rules that require strings parameters (such as: _tprintf and _tcslen). Sometimes a function is called "generic" name because they can use Unicode and non-Unicode functions.
If a _Unicode's flag is defined, the tcahr.h header is included in your program, _tcslen is defined into wcslen: #define _tcslen wcslen if unicode is not defined, _tcslen is defined as strlen: #define _tcslen strlen Waiting a lot. Tchar.h solves the problem of two character data types by defining a new data type TCHAR. If the _unicode identifier is defined, TCHAR is wchar_t: typedef wchar_t tchar; otherwise, Tchar is a simple char: typedef char tchar;
Then the discussion is a viscous L problem. If the _unicode identifier is defined, macro call_T is defined as follows: #define __t (x) L ## x This is a blurred syntax, but it is the ANSI C standard of the C processor. Another pair of markers are called "Token Paster", which causes the character L to be appended to the macro parameter. Therefore, if the parameters of the macro are "Hello!", Then L ## x is l "hello!".
If the _unicode identifier is not defined, _t macro is a simple definition as follows: #define __t (x) x Two macros are defined in order _t: #define _t (x) __t (x) #define _text (x ) __T (x)
Which of the WIN32 console programs are primarily on how you choose. Basically, you must define your string in the _t or _text macro, as follows: _Text ("Hello!")
This will stably use the _unicode marker, use a wide character, and if it is not used, 8-bit characters are used.
(When I go to school, I seem to pay too much about these things. The book is also less than the book. It is still enough foundation. This is the first part of the second chapter. I have to see the second part)
Rumor in Xi'an 2005-1-27