Wide character label L "XX" Differences in VC6.07.0 and GNU G ++.

xiaoxiao2021-03-06 57

锲子: This article is from the VCKBase C Forum and Zhou Xingxing's big brother, which makes me chase Source and find the proven of theoretical basis and practice. (Some information and test code this article are provided by Zhou Xingxing)

In "The C Programming Language 3rd" There are so two words: from 4.3: A type wchar_ t is provided to hold characters of a larger character set such as Unicode It is a distinct type The size of wchar_t is implementation-defined and.. large enough to hold the largest character set supported by the implementation's locale (see §21.7, §C.3.3). The strange name is a leftover from C. In C, wchar_t is a typedef (§4.9.7) rather than a builtin . type The suffix _ t was added to distinguish standard typedefs.from 4.3.1: Wide character literals are of the form L'ab', where the number of characters between the quotes and their meanings is implementation-defined to match the wchar_t type . A Wide Character Literal Has Type Wchar_t.

Two points in these two words are our concern: 1> Wchar_t length is determined by the implementation; 2> L'ab 'meaning is determined by the implementation.

So how did GNU G and VC6.0 / 7.0 do? Look at the following code:

// author: **. zhou

#include

Void PRT (const void * padd, size_t n)

{

Const unsigned char * p = static_cast (pADD);

Const unsigned char * pe = p n;

For (; p

}

int main ()

{

CHAR A [] = "VC Knowledge Base";

Wchar_t b [] = l "VC Knowledge Base";

PRT (A, SIZEOF (A));

PRT (B, SIZEOF (B));

System ("pause");

// Description:

// DEV-CPP4990 is shown as:

// 56 43 D6 AA CA B6 BF E2 00

// 56 00 43 00 D6 00 aa 00 ca 00 b6 00 bf 00 E2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

// VC 6.0 and VC.Net2003 are shown as:

// 56 43 D6 AA CA B6 BF E2 00

// 56 00 43 00 E5 77 C6 8B 93 5E 00 00

// Visual

SetwindowTexta (H, A);

System ("pause");

SetwindowTextw (H, B);

System ("pause");

// Description:

// VC 6.0 and VC.Net 2003 can successfully change the title to "VC Knowledge Base"

// and DEV-CPP4990 only SetwindowTexta is displayed correctly, and SETWINDOWTEXTW shows garbled

}

This code shows that the g (DEV-CPP is the MINGGW compiler) in the "XX" interprets that the "XX" expanded as a non-wide-char as a wide-char, the Wchar_t, is insufficient in high 0; the L "XX" of VC6.0 is explained to convert "XX" as MBCs to Wchar as Unicode, and the current MBCS is a storage unit as a CHAR, and Wchar is defined in Winnt.h as TypedEf Wchar_t Wchar. On the WIN platform, as long as the CHAR type in the range of more than 0 to 127 is considered MBCS, it consists of 1 to 2 bytes, and the MBCS character set is related to its region code page number. In a particular WIN platform, the default code page number can be set in the Control Panel -> Regional option.

About the above conclusion can have the following programs to verify:

// Author: smileonce

#include

Void PRT (const void * padd, size_t n)

{

Const unsigned char * p = static_cast (pADD);

Const unsigned char * pe = p n;

For (; p

}

int main ()

{

CHAR A [] = "VC Knowledge Base";

Wchar_t b [] = l "VC Knowledge Base";

PRT (A, SIZEOF (A));

PRT (B, SIZEOF (B));

PSTR PMULTIBYTESTER = (PSTR) a;

Pwstr pwidecharstr;

Int nlenofwidecharstr;

// Use the API function multibytetowideChar () to convert A into Unicode characters

Nlenofwidecharstr = MultibyToWideChar (CP_ACP, 0, PMULTIBYTESTR, -1, NULL, 0);

PWIDECHARSTR = (PWSTR) Heapalloc (getProcessHeap (), 0, NlenofwideCharstr * Sizeof (Wchar));

Assert (PWIDECHARSTR);

MultibyToWideChar (CP_ACP, 0, Pmultibytestr, -1, PWidecharstr, Nlenofwidecharstr);

PRT (PWIDECHARSTR, NLENOFWIDECHARSTR * SIZEOF (Wchar)); System ("Pause");

// // Description:

// 56 43 D6 AA CA B6 BF E2 00 // Char a [] = "VC Knowledge Base";

// 56 00 43 00 E5 77 C6 8B 93 5E 00 00 // wchar_t b [] = l "VC Knowledge Base";

// 56 00 43 00 E5 77 C6 8B 93 5E 00 00 // converts A to Unicode with multibytetowideChar ()

// / / Visible, the character code of B [] is Unicode code

Return 0;

}

Oh, the problem is already clear, summed up:

1> ISO C Wchar_t is a typedef, Wchar_T in ISO C is the data type built in the language, L'XX 'is the syntax of the textual volume of Wchar_t in the ISO C / C language; 2> Wchar_t length is Implementation decision; 3> L'XX 'meaning is determined by the implementation; 4> The default' XX 'is non-wide-char, the type of each element data is char; the L'XX' who wants to correspond to it. Is Wide-char, the type of each element data is Wchar_T.

Why is the C / C language defines the L'XX 'as the implementation decision? This is obviously for the general, portability of C / C . Bjarne's view that the way C is to allow programmers to use any character sets as a string. In addition, Unicode encoding has developed a number of versions, and it is not known to be permanently suitable. For detailed discussion of Unicode and comparison with other character sets, I recommend you to see "No nonsense XML".

转载请注明原文地址:https://www.9cbs.com/read-62184.html

9cbs

New Post(0)