"Windows programming" learning notes (2)

zhaozj2021-02-16  60

Chapter 2 Unicode Introduction

This chapter briefly introduces the history of UNICODE, and the specific use in Windows programming. It is very important to understand and use for Unicode. (Especially developing sharing software :)

Unicode explained this in "Microsoft British British Computer Encyclopedia": a 16-bit character coding standard. It represents a character by using two bytes, so Unicode can use a single character set to represent almost all written languages ​​in the world. In contrast, the 8-bit ASCII code cannot represent all the combinations of letters and distinguishes in the Roman alphabet.

The emergence of Unicode is an inevitable result of the computer in the world. Because it is 16-bit, it allows representation of 65536 (2) characters, which means that all characters and the language used in the world, including a series of mathematics, symbols, and monetary collection. Very abundant. The first 128 Unicode characters are ASCII, the next 128 Unicode characters are ASCII extensions, and the rest of the characters are used in different languages ​​and symbols. Unicode puts all the world's possible text and symbols to a uniform to a character set, but the memory of the Unicode string is twice the ASCII string. (I think the current hardware development makes this problem very obvious.)

You can define a Unicode character with WCHAR_T,

Wchar_t c = 'a';

Wchar_t is defined in wchar.h:

Typedef unsigned short wchar_t;

That is to say it is 16-bit unsigned short. Note that 'A' is saved in the order of 0x41, 0x00. You can also define an array of Unicode characters to see the following procedures:

#include

#include

void main ()

{

Wchar_t a = 'a';

Cout << SizeOf (a) << endl;

Wchar_t * p = l "Hello!";

COUT << SizeOf (P) << endl;

Static wchar_t b [] = l "Hello!";

Cout << sizeof (b) << Endl;

}

The result of the above program in VC6.0 is:

2

4

14

Where P is a pointer, occupying 4 bytes.

For C operation operators SIZEOF, because it is processed at compile time, the Unicode character is considered to be 16-bit short-intensive data, so it can be run normally. However, for most C runtime library functions (especially those as parameters as parameters), the function consists of the Unicode string consists of single-byte characters, so the new version of the function support is required to operate properly. Unicode characters.

Note: The above is my initial understanding, I don't know if it is right. If you have any understanding, please call.

The following example illustrates the impact of Unicode characters on the C runtime library function:

#include

#include

#include

void main ()

{

Char * pc = "Hello!";

COUT << Strlen (PC) << endl; // Single-byte character

Wchar_t * pw = l "Hello!"; // Cout << Strlen (PW) << Endl; Compilation under VC6.0: Error

COUT << WCSLEN (PW) << endl; // Suitable for Unicode characters

}

operation result:

6

6

In order to make our programs apply to single-byte characters (Windows 98 now use it? Functions for use If it is the ANSI C standard, you can control the unicode; if not the ANSI C standard can be controlled by defining _unicode. For example, the following definition:

#ifdef unicode

#define messagebox messageBoxW // Suitable for Unicode characters

#ELSE

#define messagebox messageboxa // Suitable for single-byte characters

#ENDIF

The same MessageBox function is defined in different settings as different functions (MessageBoxw and MessageBoxa). The definition of the above Unicode can select Setting in the Project menu of VC6.0, and then set in C / C .

Note: ASCII: American Standard Code for Information Interchang US Information Exchange Standard Code

转载请注明原文地址:https://www.9cbs.com/read-28251.html

New Post(0)