"Windows program design" reading notes

xiaoxiao2021-03-06  43

Introduction to Unicode

See the money

2005-3-1

Why do I need a Unicode encoding? Because ASCII codes cannot indicate enough characters. In ASCII encoding, each character is represented by a 7-bit, and 128 different characters can be represented; in Unicode, each character is represented by a 16-bit representation 216, 65536 different characters.

Why is the double-byte character set (DBCS: Double-Byte Character Set) Can you fully meet? In the double-byte character set, the initial 128 encoding (0-127) is ASCII, starting from 128 (0x80), which makes a character in two bytes. This brings additional programming issues. It is difficult to determine the number of characters with the number of bytes, it is difficult to perform a string pointer operation. All characters all the characters of Unicode are two bytes, and there is no problem in programming, and the only latter is to take up more disk space.

In the C program, use the char type to store 1 characters in 1 byte length, such as:

CHAR C = 'a'; // Variable C size is 1 byte, storage content is 0x41

For Unicode characters, the Wchar_t type is used to store 1 character in 2 byte lengths. Such as:

Wchar_t d = l'a '; // Variable D size is 2 bytes, storage content is 0x0041

In fact, Wchar_T and unsigned short interstitures are 16-bit wide:

Typedef unsigned short wchar_t;

As for the uppercase letter L before the string, it is a modified symbol, telling the compiler to assign 2 byte memory for each character of the string.

For wide characters, the ANSI C standard provides new character processing library functions, which have declarations in string.h and wchar.h. For example, the original Strlen function, the corresponding wide character version function prototype declaration is as follows:

SIZE_T __CDECL WCSLEN (const wchar_t *);

But what is the problem brings to how to handle two character types at the same time?

One way is to use the Tchar.h header files included in Visual C , which is not part of the ANSI C standard, so it is underlined before each defined function and macro definition. If the _unicode identifier is defined, the following macro definition is performed:

#define_tcslen wcslen

Otherwise, definition:

#define_tcslen strlen

A TCHAR type is also added in tchar.h header files to solve the problem of two character types. If the _unicode identifier is defined, the following types are made:

Typedef wchar_t tchar;

#define __t (x) L ## x or #define _t (x) L ## x or #define _text (x) l ## x

Otherwise statement:

Typedef char tchar;

#define __t (x) x or #define _t (x) x or #define _text (x) x

These are all declarative definitions based on C runtime libraries, and the Microsoft redefined new data types and functions under Windows, and we can see in Winnt.h header files:

Typedef char char

Typedef wchar_t wchar

Typedef char * pchar, * lpch, * pch, * npstr, * lpstr, * pstr

Typedef const char * lpcch, * pcch, * lpcstr, * pcstr

Typedef Wchar * PWchar, * LPWCH, * PWCH, * NPWSTR, * LPWSTR, * PWSTR; TYPEDEF Const Wchar * LPCWCH, * PCWCH, * LPCWSTR, * PCWSTR

If the Unicode identifier (no underscore) is defined, Tchar and point to the TCHAR pointer are defined as Wchar and point to the WCHAR pointer, otherwise it is defined as a char and points to the char pointer. In addition, a macro in the Winnt header file is to add L to the first quotation number of the string to the string.

#ifdef unicode

#define __text (quote) L ## quote

#ELSE

#define __text (quote) quote

#ENDIF

Windows NT fully supports Unicode. So most API functions have two version declaration definitions, usually locate different function entry points according to the Unicode identifier.

C The runtime library function has provided a complete character handling function, but in order to more convenient for the need for program development under Windows, some similar character processing functions are redefined in Windows.

Unfortunately, Printf is already in use in a Windows program. We can still use sprintf and vsprintf. The Printf function is declared as follows:

INT Printf (const char * pszformat, ...);

The first parameter is a format string, and there are several parameters corresponding to the corresponding format string.

INT Sprintf (Char * pszbuffer, const char * pszformat, ...);

The first parameter is consistent with the meaning of the character buffer, other parameters, and Printf function parameters. The Sprintf function writes the result to the character buffer

INT __CDECL vSprintf (Char * pszbuffer, const char * pszformat, va_list argslist);

The first parameter is a character buffer, the second parameter is a format string, and the last parameter is a pointer to formatting parameter arrays, in fact, the pointer points to the variables that are called in the stack.

The VSPrintf function is used as follows:

INT __CDECL ShowMessage (Char * Pzscaption, Char * pszformat, ...)

{

Char * pszbuffer [256];

VA_LIST PARGLIST;

VA_Start (Parglist, PSZFormat);

vsprintf (pszbuffer, pszformat, parglist);

VA_END (PARGLIST);

Return MesageBox (NULL, PSZBuffer, Szcaption, 0);

}

The issue of using the sprintf and vsprintf functions is that the character buffer cannot be guaranteed.

The _snprintf function solves the above problems by adding a buffer to accommodate the number of characters.

转载请注明原文地址:https://www.9cbs.com/read-52329.html

New Post(0)