Supplement - C ++ strings fully guided .htm

xiaoxiao2021-03-06  53

Written N years, recently planted in the string. : (Seriously studied some articles on strings, and note.

Many questions about strings, in the final reference article, I believe there are more in-depth and accurate descriptions. However, about Chinese, I want to add some of my own opinions.

Background: Win32 Console program, using the Printf output string. I believe that many people have used.

Platform: Visualstudio.net 2003 (MFC 7.1).

MBCSUNICODE Cai B2 CC21 85A41 0041 00

Block 1: Use std :: string

#include {// Note: use s1._bx._buf to see the memorystd :: string s1 ("Cai"); // b2 cc 00} {std :: wstring s1 (l "Cai"); / / b2 00 cc 00 00 00} The above code is the same regardless of the use of MBCS or _unicode compilation, the result is the same. Because string (actually BASIC_STRING) does not automatically perform conversion from MBCS to Unicode. So use the PrintF or WPRINTF output. The premise is of course your system needs to support Chinese.

Let us modify the code and hope to output the contents of the string:

{std :: string S1 ("Cai"); // B2 CC 00outputdebugStringA (s1.c_str ()); Printf (s1.c_str ());} {std :: WString S1 (L "Cai"); // B2 00 cc 00 00 00Outputdebugstringw (S1.C_STR ()); WPRINTF (s1.c_str ());} OutputDeBugString is actually the function of atltrace () last call, which outputs the VisualStudio Output window, while Printf and WPrintf Console window output. What is the final result? OutputDebugstringw output is a strange character! ! Why? ? S1.c_STR () is delivered to OutputDebugstringw and WPrintf not all content LPCWSTR?

1) Because the string of OutputDebugstringw must be a true Unicode encoded string, not all const wchar_t * (ie, lpcwstr) can get the correct result. Here S1 although the Wchar_t type is used, the actual content is MBCS encoding.

2) Easy to: CRT's WPRINTF only supports the MBCS-encoded string, and cannot be a Unicode encoded string. In block 2 we can see the true Unicode encoded string.

This is a misunderstanding for years of years: Wchar_t type strings are Unicode strings. It is actually understood that Unicode is a 16-bit character set, which can be stored using the WCHAR_T type.

Block 2: Use CString to see the description of the program.

1. // start: compile with _unicode ///

2. {

3. CSTRING S1 ("a"); // 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

4.}

5. {

6. CSTRING S1 (L "a"); // 41 00 00 00 00 00 00 00 00 00 00

7.}

8. {

9. CSTRING S1 (_T ("a")); // 41 00 00 0010.}

11. {

12. CSTRING S1 ("Cai"); // 21 85 00 00

13.}

14. {

15. CSTRING S1 (l "Cai"); // B2 00 CC 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

16.}

17. {

18. CSTRING S1 (_T ("Cai")); // b2 00 cc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

19.}

20. // end: compile with _unicode ////

21. // Start: compile with _mbcs ///

twenty two. {

23. CSTRING S1 ("a"); // 41 00

twenty four. }

25. {

26. CSTRING S1 (L "A"); // 41 00

27.}

28. {

29. CSTRING S1 (_T ("a")); // 41 00

30.}

31. {

32. CSTRING S1 ("Cai"); // B2 CC 00

33.}

34. {

35. CSTRING S1 (L "Cai"); // 32 A8 AC 00

36.}

37. {

38. CSTRING S1 (_T ("Cai")); // B2 CC 00

39.}

40. // end: compile with _mbcs ///

1) It is the same as the results of English letters 'A', MBCS and Unicode.

2) LINE 15.

CSTRING S1 (L "Cai"); // b2 00 cc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Get still the string of MBCS, just add 0 as Trail Byte. Instead, I understand the Unicode string! This is another misunderstanding for many years: _T converts to the _unicode to L, and the string behind the L is Unicode encoding. In the "TCHAR.H in TCHAR.H in the General Text Mappings in Tchar.h) (many places in msdn), you can see similar descriptions:

General text data type mapping

General text data type names _UNICODE undefined or defined _MBCS _MBCS defined _UNICODE_TCHARcharcharwchar_t_TINTintintwint_t_TSCHARsigned charsigned charwchar_t_TUCHARunsigned charunsigned charwchar_t_TXCHARcharunsigned charwchar_t_T _TEXT or invalid (removed from the pre-processor) is invalid (removed by the preprocessor) L (the later Character or string to the corresponding Unicode form)

In fact, "xxx" is only informing the compiler, we need a Wchar_t type string without affecting the encoding.

Where is the real unicode string?

3) LINE 12:

Is equivalent to CStringW S1 ("Cai"); // 21 85 00 00

We see that a real Unicode string is obtained. Because CString (in MFC 7.1, there is no MFC CString, it is actually defined by atl :: cstringt by typedef), which is actually the constructor of cStringW, according to the input parameters, is a CHAR type string The multibytetowideChar converts the MULTIBYTOWIDECHAR will automatically call the MBCS string as a Unicode string. 4) The corresponding line 12, then Line 35 results 32 A8 AC 00? Similar to LINE 12:

The CString constructor is actually the constructor of CStringA, which is the WCHARTOMULTIBYTE conversion Unicode string to the MBCS string according to the WCHAR_T type string based on the input parameters. But according to 2), we know that the input parameters are not a Unicode string, just the MBCS Wchar_t type string, so getting the wrong code.

Summarize the above, you can know:

1) CRT cannot generate and process the Unicode type string, for the WCHAR_T type string, can only process MBCS coding;

2) VC runtime functions with W suffixes, and all COM functions, for Wchar_t type strings, only UNICODE encoding can only be processed;

3) If the output is not considered, only copy, compare, etc. 4) Use the unicode or MBCS compilation option, just affect the character type of the string (automatic identification_t, cstring, etc.), automatically convert the function to XXXXA () or xxxxw ()), does not affect the encoding of the string. The code results below the table are not affected by the compilation option (this is also the result of the compiler processing _t, cstring, etc.):

CStringA S = "xxx"; / / equal to CA2A ("xxx") result is SBCS (single-byte code) Printf () correct outputdebugstringa () correct cstractw s = "xxx"; // equal to CA2W ("xxx") result Code WPRINTF () error outputDebugstringw () Error outputdebugstringw () correct cstringa s = l "xxx"; / / equal to CW2A (l "xxx") result is MBCS encoding (possibly error) Printf () error OutputdebugstringA () error CStringw s = l " XXX "; / / equal to CW2W (" XXX ") does not change the encoding of the string, is still MBCS. WPRINTF () correct OutputDebugstringw () error

Question: I think the CW2W is determined by system coding, can you get unicode directly?

Regarding CW2A, if the latter string is indeed Unicode encoding, the correct corresponding MBCS encoded string can be obtained. In fact, this is also what we have to output Unicode:

CStringW s = "Cai"; // S is now Unicode Code

// wprintf (s) is incorrect

CW2A PSZ (s); // PSZ is now the correct MBCS coding! Printf (PSZ); / / correct

// all is Ok, a little more to say

CA2W WSZ (PSZ); // WSZ is now a PSZ error Unicode encoding, ie 32 A8 AC 00

Recommended reference

The Complete Guide to C Strings: Personally think is very good and very comprehensive

The Complete Guide To C Strings, Part I - Win32 Character EncoDings

http://www.codeproject.com/string/cppstringguide1.asp

The Complete Guide To C Strings, Part I - Win32 Character Encodingshtp: //www.codeproject.com/string/cppstringguide2.asp

These 2 are nice Chinese translation. :) C string fully guided one - Win32 character encoded http://www.vckbase.com/document/viewdoc/?id=1082c string fully guided two-string package class http: // www. vckbase.com/document/viewdoc/?id=1096 Other STL string classes and unicodehttp: //www.vckbase.com/vckbase/default.aspx is of course, less than MSDNTCHAR.H map http: // MSDN.Microsoft.com/library/chs/default.asp?url=/library/chs/vccore/html/_core_generic.2d.text_mappings_in_tchar..h.asp suggestion: It is best to take the entire "International Programming" directory (though After reading or confused :)) http://msdn.microsoft.com/library/chs/vccore/html/_core_international_programming_topics.asp

转载请注明原文地址:https://www.9cbs.com/read-83864.html

New Post(0)