C ++ string full guide

xiaoxiao2021-03-06  40

Http://blog.9cbs.net/thuszhouc string full guide - Win32 character encoding (1)

Foreword

The character of the string is different, like Tchar, std :: string, bstr, etc., sometimes we will see the mobs of _TCs starting. The purpose of this guide is to illustrate various string types and their purposes, and explain how to transform each time when necessary.

In the first part of the guide, three character encoding formats are introduced. It is important to understand the working principle of the encoding. Even if you already know that the string is a characterful array of characters, please read this article, which will let you understand the relationship between the various string classes.

The second part of the guide will explain the various string classes, and which type of string class is used, and its mutual conversion will be used.

String Basics - ASCII, DBCS, Unicode

All string classes originate from the C language string, and the C language string is an array of characters. First understand the character type. There are three coding methods and three character types.

The first coding method is a single-byte character set, called SBCS, and all of its characters have only one byte length. The ASCII code is SBCS. The SBCS string is ending a zero byte.

The second coding method is a multi-byte character set, called MBCS, which contains single-word throttle characters in characters, and multi-word thrower characters. Windows uses only two character types, single-byte characters, and double-byte characters. Therefore, the most used characters in Windows are double-byte character sets, namely DBCs, usually used in place of MBCs.

In DBCS coding, use some reserved values ​​to indicate that the character belongs to a double-byte character. For example, SHIFT-JIS (General Japanese) encoding, value 0x81-0x9f and 0xe0-0xfc mean: "This is a double-byte character, the next byte is part of this character." Such values ​​are often referred to as a lead BYTE, always greater than 0x7F. The front guide byte is followed by a trail byte. DBCS's follower byte can be any non-zero value. Like SBCs, the DBCS string is also ending with a zero-byte.

The third coding method is Unicode. All characters in the Unicode coding standard are double-word. Sometimes Unicode is called a wide character set because its characters are wider than the single-byte character (using more memory). Note that Unicode is not MBCS - the difference between the character length in the MBCS encoding is different. Unicode string ends with two zero-byte characters (a zero value encoding of a wide character).

The single-byte character set is a Latin alphabet, stress text, defined with ASCII standards for the DOS operating system. Double-byte character set is used in East Asia and the Middle East language. Unicode is used inside COM and Windows NT.

The readers are familiar with the single-byte character set, and its data type is char. Dual-word character set also uses a char data type (one of the many weird places in the double-byte character set). Unicode character set Wchar_t data type. UNICODE string uses L prefix started, such as:

Wchar_t WCH = l'1 '; // 2 bytes, 0x0031

Wchar_t * wsz = l "Hello"; // 12 bytes, 6 wide characters

String storage

Single-byte string sequentially stores each character and uses zero bytes to indicate the end of the string. For example, the storage format of the string "bob" is:

In the Unicode encoding, the storage format of "Bob" is:

End strings with 0x0000 (Unicode's zero code).

DBCs looks a bit like SBCS. We will see subtle differences in the string processing and pointer use. The storage format of the string "Nihongo" is as follows (with LB and TB representing the front-line byte and the leader, respectively): Note that the "Ni" value is not Word value 0xfa93. Value 93 and FA sequence combination encodes as characters "Ni". (In the high position priority CPU, the storage order is as described above).

String handler

C Language String Processing Functions, such as strcpy (), sprintf (), atol (), etc. can only be used for single-byte strings. In the standard library, there is only functions of the Unicode string, such as WCSCPY (), SWPRINTF (), _Wtol ().

Microsoft joined support for DBCS strings in the C run library (CRT). Corresponding to the strxxx () function, DBCS uses the _mbsxxx () function. When processing DBCS strings (such as Japanese, Chinese, or other DBCS), use the _mbsxxx () function. These functions can also be used to process SBCS strings (because DBCS strings may contain only single-byte characters).

Now use an example to illustrate the difference between the string processing function. If there is a Unicode string L "bob":

The order of the x86 CPU is a low-endian, and the value of the value of 0x0042 is 42 00. At this time, if you use the strlen () function to seek a string of strings. The function finds the first byte 42, then 00, means the string ends, so returns 1. Conversely, use the WCSLEN () function to ask "BOB" to be worse. WCSLEN () first locate 0x6F42, then 0x0062, will continue to find 00 00 00 00 00 00 00 00 00 00 00 00 0062, in the future, until a general protection error (GPF) occurs.

Strxxx () and its corresponding _MBSXXX () How do I work? The difference between the two is very important, directly affecting the way to correctly traverse DBCS strings. Let's first introduce string traversal and then come back to discuss strxxx () and _mbsxxxx ().

String traversal

Most of us are growing from SBCs, all accustomed to the and - operators of the pointer to traverse strings, sometimes using arrays to process characters in the string. These two methods are correct for SBCS and Unicode strings, because the characters of the two are equal, and the compiler can correctly return the character position of our sought.

But it is not possible for the DBCS string. There are two principles to access the DBCS string with pointers, and it will cause errors to break these two principles.

1. Do not use the operator unless each time it is checked whether it is a front-end byte.

2. Never use the - operator to traverse it.

First illustrate the principles 2, because it is easy to find an unmanned example. Suppose, there is a formulated file, and the program is read from the installation path when the program starts, such as: c: / program files / mycoolapp / config.bin. The file itself is normal.

Assume that the following code is used to formulate the file name:

Bool getconfigfilename (char * pszname, size_t nbuffsize)

{

Char szconfigfilename [max_path];

// Here, read the file installation path from the registry, assumes everything.

// If there is no anti-alarm line at the end of the path, add a reverse slope.

// First, point to the end zero by the pointer:

Char * plastchar = strchr (szconfigfilename, '/ 0');

/ / Then return a character:

Plastchar -

IF (* plastchar! = '//')

STRCAT (SZConfigfilename, "//");

/ / Plus the file name:

STRCAT (SZConfigfilename, "config.bin");

// If the string length is enough, return to the file name:

IF (Strlen (SzconfigFileName> = nbuffsize)

Return False;

Else

{

Strcpy (pszname, szconfigfilename);

Return True;

}

}

The protection of this code is very strong, but it will be wrong with the DBCS string. If the installation path of the file is expressed in Japanese: C: / ヨウヨウソ, the memory expression of the string is:

At this time, use the above getConfileName () function to check if the file path contains a reverse laster, and gets the wrong file name. Where is wrong? Pay attention to the two hexadecimal value 0x5c (blue) above. The front 0x5c is character "/", and the back is the character value 83 5c, which represents the character "ソ". However, the function mistakenly considers anti-laminated line.

The correct way is to use the DBCS function to point the pointer to the appropriate character position, as shown below:

Bool FixedgetconfigfileName (Char * pszname, size_t nbuffsize)

{

Char szconfigfilename [max_path];

// Here, read the file installation path from the registry, assumes everything.

// If there is no anti-alarm line at the end of the path, add a reverse slope.

// First, point to the end zero by the pointer:

Char * plastchar = _mbschr (szconfigfilename, '/ 0');

/ / And then return a double byte character:

Plastchar = charprev (szconfigename, plastchar);

IF (* plastchar! = '//')

_mbscat (szconfigfilename, "//");

/ / Plus the file name:

_MBSCAT (SzconfigfileName, "config.bin");

// If the string length is enough, return to the file name:

IF (_mbslen (szinstalldir)> = nbuffsize)

Return False;

Else

{

_mbscpy (pszname, szconfigfilename);

Return True;

}

}

This improved function moves the pointer PlastCha by moving a character with a CharPrev () API function. If the characters at the end of the string are double-byte characters, move 2 bytes backwards. The result returned at this time is correct because the character is not misjudiced as a backslash.

Now I can imagine the first principle. For example, to traverse the string look for characters ":", if you do not use the charNext () function, you will use the operator when you use the character value, just ":" is wrong.

Related to principles 2 is the use of array subscripts:

2A. You must not use the decrement subscript in the string array.

The cause of the error is the same as the principle 2. For example, set pointer PlastChar as:

Char * plastchar = & szconfigfilename [strlen (SzconfigFileName) - 1];

The result is the same as the principle 2. The subscript minus 1 is the pointer to move one byte, does not match the principle 2.

Talk about strxxx () and _mbsxxx () can now be clear why you want to use the _mbsxxx () function. The strxxx () function does not know the DBCS character and _mbsxxx () knows. If the strrchr ("C: //", '//') function may be wrong, but _mbsrchr () knows the double-byte character, so it can return a pointer position to the last anti-slope character.

Last to the strix length measurement function in the strXxx () and _MBsxxx () function family, they return to the number of bytes of the string. If the string contains 3 double-byte characters, _mbslen () will return 6. And Unicode's function returns the number of Wchar_ts, such as WCSLEN (L "Bob") Returns 3C string full guide - Win32 character encoding (2) Translation: Connection 15/11/2002 URL: http://www.zdnet .com.cn / west / story / 0,2000081602,39098306,00.htm

MBCS and Unicode in Win32 API

API's two character sets

Maybe you didn't notice that the string handle function in the Win32 API and the message is two, one for the MCBS string, the other is a Unicode string. For example, there is no interface such as setWindowText () in Win32, but use the setWindowTexta () and setWindowTextW () functions. The suffix A indicates that the MBCS function is indicated by the suffix W (representing a wide character) indicated a Unicode function.

When writing a Windows program, you can choose to use the MBCS or Unicode API interface function. When using the VC AppWizard Wizard, if the pre-processor setting is not modified, the MBCS function is used by default. But there is no setWindowText () function in the API interface, what is called? In fact, the following definition is made in WinUser.h header:

Bool WinAPI SetwindowTexta (HWND HWND, LPCSTR LPSTRING);

Bool WinAPI SetWindowTextw (HWND HWND, LPCWSTR LPSTRING);

#ifdef unicode

#define setWindowText setWindowTextw

#ELSE

#define setWindowText setWindowTexta

#ENDIF

When writing MBCS applications, you don't have to define Unicode, and preprocessing is:

#define setWindowText setWindowTexta

The setWindowText () is then processed as a real API interface function setWindowTexta () (if you prefer, you can call the setWindowTexta () or setwindowTextw () function directly, but there are very few needed).

If you want to change the default application interface to Unicode, remove the _mbcs tag to the pre-processing flag of the pre-processing setting, join Unicode and _unicode (two tags to join, different header files use different tags). However, it is necessary to handle a normal string at this time. If there is code:

HWND HWND = GetSomewindowHandle ();

Char sznewtext [] = "We love bob!";

SetwindowText (hwnd, sznewtext);

After the compiler replaces "SETWINDOWTEXT" to "SetWindowTextw", the code becomes:

HWnd hwnd = getSomewindowHandle (); char sznewtext [] = "We love bob!";

SetWindowTextw (HWND, Sznewtext);

It's a problem, here a Unicode string handles functions to handle a single-byte string.

The first solution is to use macro definitions:

HWND HWND = GetSomewindowHandle ();

#ifdef unicode

Wchar_t sznewtext [] = l "We love bob!";

#ELSE

Char sznewtext [] = "We love bob!";

#ENDIF

SetwindowText (hwnd, sznewtext);

It is obviously a headache to do such macro to every string. So use TCHAR to solve this problem:

Tchar's fire fire role

TCHAR is a character type for two types of MBCS and Unicode. There is no need to use macro definitions everywhere.

The macro of TCHAR is as follows:

#ifdef unicode

Typedef wchar_t tchar;

#ELSE

Typedef char tchar;

#ENDIF

So, tchar is a char type in the MBCS program, in Unicode, Wchar_t type.

For Unicode strings, there is a _t () macro for resolving the L prefix:

#ifdef unicode

#define _t (x) l ## x

#ELSE

#define_t (x) x

#ENDIF

## is a pretreatment operator and paste two variables together. No matter when the character string is used as a _t macro, you can add a L prefix in the Unicode encoding, such as:

Tchar sznewtext [] = _t ("we love bob!");

There are other hidden macros in the SetWindowTexta / W function family to be used instead of strXxx () and _mbsxxxx () string functions. For example, you can use the _TCSRCHR macro to Strrchr (), _ mbsrchr (), or WCSRCHR () function. _tcsrchr uses the right form function according to the code marked as _mbcs or Unicode, and the right function is correspondingly extended. Macro definition method is similar to SetWindowText.

More than Strxxx () Functions have TCHAR macro definitions, there are also some other functions. For example, _stprintf (replace sprintf () and swprintf ()), and _TFOpen (replaces fopen () and _WFOpen ()). All macros of MSDN are defined under the "Generic-Text Routine Mappings" section.

String and TCHAR type definition

The function names listed in the Win32 API file are common names (such as "SETWINDOTEXT"), all strings are processed according to TCHAR type. (Except for XP, XP only uses Unicode types). Here is the common type definition given by MSDN:

Meaning as Unicode encoding type in the encoding MBCS WCHAR wchar_t wchar_t LPSTR zero-terminated string of char (char *) zero-terminated string of char (char *) LPCSTR constant zero-terminated string of char (constchar *) constant zero- terminated string of char (constchar *) LPWSTR zero-terminated Unicode string (wchar_t *) zero-terminated Unicode string (wchar_t *) LPCWSTR constant zero-terminated Unicode string (const wchar_t *) constant zero-terminated Unicode string (const wchar_t *) TCHAR char wchar_t LPTSTR zero-terminated string of TCHAR (TCHAR *) zero-terminated string of TCHAR (TCHAR *) LPCTSTR constant zero-terminated string of TCHAR (const TCHAR *) constant zero-terminated string of TCHAR (const TCHAR *) where That use tchar and unicode

There may be questions: "Why do I use unicode? I have always used ordinary strings."

Use Unicode in three cases:

The program is only running on Windows NT.

Handling string is long than the number of characters defined by max_path. The program is used for new interfaces in Windows XP, where there is no A / W version.

Most Unicode APIs are not available for Windows 9x. So if the program wants to run on Windows 9x, forced to use the MBCS API (Microsoft launched a new library running on Windows 9x, called Microsoft Layer for Unicode. But I didn't try it, I can't explain it. " Instead, all NTs use Unicode encoding, using the Unicode API to speed up the program. Whenever a string is processed as a MBCS API, the operating system converts the string into Unicode and calls the corresponding Unicode API function. For returned strings, the operating system must do the same conversion. Although these conversions have been highly optimized, the modules are compressed as minimitary, but will affect the running speed of the program after all.

NT allows the ultra-long file name (260 defined by max_path), but is limited to Unicode API. Another advantage of the Unicode API is that the program can automatically process the input text language. Users can mix input English, Chinese, and Japanese as the file name. If you do not have to use other code, you will be processed in Unicode encoding.

Finally, as the ending of Windows 9x, Microsoft seems to abandon the MBCS API. For example, the two parameters of the setWindowTheme () interface function only support Unicode encoding. Use Unicode encodes the switching process between MBCS and Unicode.

If you have not used Unicode encoding in the program, you should insist on using TCHAR and the corresponding macro. This can not only keep the security of DBCS coding in the program, but also to use Unicode encoding in the future. At that time, just change the settings in the pre-processing! C string full guide (2) - Various string classes (1) Translation: Connection 19/11/2002 URL: http://www.zdnet.com.cn/developer/tech/story/0,2000081602, 39098621, 00.htm preface

The C language string is easy to make mistakes, it is difficult to manage, and it is often a goal that hackers are looking for. Thus, many string packages appear. Unfortunately, people don't know which class should use, and don't know how to convert C language strings to the packaging class.

This article relates to all string types used in WIN32 API, MFC, STL, WTL, and Visual C Runturser. Describe the usage of each class, how to construct objects, how to make class conversion, and more. Nish provides the usage of the Managed String class of Visual C 7 this article.

Before reading this article, you should fully understand the character types and encodings set forth in this guide.

The primary principle of string classes:

Don't use the type forced conversion, unless the type of conversion is clearly defined by the document.

The reason why the string guide is written is because some people often ask how to convert X-type strings to the Z type. The questioner uses forced type conversion (CAST), but I don't know why I can't convert success. A variety of string types, especially BSTR, is not three words two languages ​​in any occasion. Therefore, I think that these questioners want to make the mandatory type conversion to process everything.

Do not convert any other type of data to string unless explicitly specifies the conversion operator. A string cannot be converted to the String class with a mandatory type. E.g:

Void Somefunc (LPCWSTR WIDESTR);

Main ()

{

Somefunc ((lpcwstr) "c: //foo.txt"); // wrong!

}

This code is 100% error. It can be compiled because the type forced conversion exceeds the type of compiler. However, it is correct that the code is correct by compiling.

Below, I will point out when to use the type forced conversion. C language string and type definition

As mentioned in the first part of the guide, the Windows API defines TCHAR terms. It can be used for MBCS or Unicode encoding characters, depending on the pretreatment set to _mbcs or _unicode tags. For detailed description of Tchar, please read the first part of the guide. To facilitate the narrative, the character type definition is given below:

Type Meaning WCHAR Unicode character (wchar_t) TCHAR MBCS or Unicode character, depending on preprocessor settings LPSTR string of char (char *) LPCSTR constant string of char (constchar *) LPWSTR string of WCHAR (WCHAR *) LPCWSTR constant string of WCHAR (const Wchar *) LPTSTSTSTSTSTANT STRING OF TCHAR (Const Tchar *)

There is also a character type OLECHAR. This is an object link to an embedded data type (such as embedding a Word document). This type is typically defined as WCHAR_T. If the pre-processing setting is defined as OLE2ANSI, OLECHAR will be defined as a CHAR type. Ole2ansi now no longer defines (it is only used in the previous version of the MFC 3), so I will handle OLECHAR as Unicode characters. Below is the type definition related to OLECHAR:

Type Meaning Olechar Unicode Character (Wchar_t) LPolestr String of Olechar (Olechar *) LPCOLESTR Constant String of Olechar (Const Olechar *)

There is also the following two macros to make the same code to be available for MBCS and Unicode encodings:

.

Macro_T has several forms, the function is the same. Such as: - text, _text, __text, and __t These four macros are the same.

Strings in COM - BSTR and Variant

Many COM interfaces use the BSTR declaration string. BSTR has some defects, so I am here to make it independent.

BSTR is a PASCAL type string (string length value and data stored together) and a Class C string (string length must be calculated by looking into end zero characters). BSTR belongs to the Unicode string, and the string length value is preset in the string and ends with a zero character. Below is a "bob" BSTR string:

Note that the string length value is a DWORD type value, gives the byte length of the string, but does not include end zero. In the above example, "Bob" contains three Unicode characters (not counting zero), 6 bytes long. Since the string length is explicitly given, the COM library can know the amount of data that should be transmitted when the BSTR data is transmitted between different processors and computers.

With the comment, BSTR can contain any data blocks, not just characters. It can even accommodate zero character data. These are not discussed herein.

The BSTR variable in C is actually a pointer to the string first character. BSTR is defined in this:

Typedef Olechar * BSTR;

This definition is very bad, because in fact, BSTR is different from Unicode strings. With this type of definition, you cross the type check, you can mix using LPolestr and BSTR. Passing BSTR data to a function that requires LPColestr (or LPCWSTR) type data is safe, but otherwise. So you must clearly understand the string type required by the function, and pass the correct type of string to the function.

To know why the LPCWSTR type data is not secure to a function of the BSTR type data, don't forget that the BSTR must reserve the string length value at the four bytes starting at the beginning of the string. But there is no this value in the LPCWSTR string. Other data or other random data in a bunch of garbage or stacks are found when other processing processes (such as Word) are looking for a length value of BSTR. This causes the method to fail, which will cause crash when the length value is too large.

Many application interfaces use BSTR, but use two most important functions to construct and destruct BSTR. That is, sysallocstring () and sysfreestring () functions. Sysallocstring () copy the Unicode string to BSTR, and sysfreestring () releases BSTR. Examples are as follows:

BSTR BSTR = NULL;

BSTR = sysallocstring (l "hi bob!"); if (null == BSTR)

// memory overflow

// Use BSTR here

Sysfreeestring (BSTR);

Of course, various BSTR packaging classes manage memory.

The other data type in the automatic interface is Variant. It is used to deliver data between non-type languages, such as JScript, VBScript, and Visual Basic. Variant can include many non-type data, such as long and idispatch *. If Variant contains a string, this string is a BSTR type. I also talk about and more Variant in the Variant packaging class below. C string full guide (2) - Various strings - CRT class translation: Connect 20/11/2002 URL: http://www.zdnet.com.cn/developer/tech/story/0,2000081602, 39098682,00.htm

_BSTR_T

String packaging class

I have explained various types of strings and now discuss packaging classes. For each packaging class, I will explain its object constructor and how to convert to C-Type C string pointers. Application interface call, or construct another different type of string class, most of which use C-type pointers. This article does not involve other operations such as sorting and comparison.

Also stressed, do not use the forced type conversion before fully understand the result of the conversion.

CRT class

_BSTR_T

_BSTR_T is the full packaging class of BSTR. In fact, it implies BSTR. It provides a variety of constructors that can handle implicit C-Types strings. But it itself does not provide the process mechanism for BSTR, so it cannot be used as an output parameter [OUT] of the COM method. If you want to use BSTR * type data, it is more convenient to use ATL's CCOMBSTR class.

_BSTR_T data can be passed to functions that require BSTR data, but must meet the following three conditions:

First, _bstr_t has functions that can be converted to WCHAR_T * type data.

Second, according to the BSTR definition, WCHAR_T * and BSTR are the same for compilers.

Third, _bstr_t reserved inside the pointer to the memory data block Wchar_t * To follow the BSTR format.

These conditions are met, even if there is no corresponding BSTR conversion document, _bstr_t can work normally. Examples are as follows:

// Structure

_BSTR_T BS1 = "char string"; // Constructed from LPCSTR

_BSTR_T BS2 = L "Wide Char String"; // Construct from LPCWSTR

_BSTR_T BS3 = BS1; // Copy another _BSTR_T

_VARIANT_T V = "Bob";

_BSTR_T BS4 = V; // From a _variant_t constructor with a string

// Data extraction

LPCSTR PSZ1 = BS1; / / Automatically convert to MBCS string

LPCSTR PSZ2 = (LPCSTR) BS1; // Cast OK, the same

LPCWSTR PWSZ1 = BS1; // Return to the internal Unicode string

LPCWSTR PWSZ2 = (LPCWSTR) BS1; // Cast OK, the same

BSTR BSTR = BS1.copy (); // Copy BS1, return to BSTR

// ...

Sysfreestring (BSTR); Note, _bstr_t can also be converted to CHAR * and WCHAR_T *. This is a design problem. Although Char * and WCHAR_T * are not a constant pointer, it cannot be used to modify strings because the internal BSTR structure may be broken.

_VARIANT_T _VARIANT_T

_variant_t is a full packaging class of Variant. It provides a variety of constructor and data conversion functions. This article only discusses the operation related to the string.

// Structure

_VARIANT_T V1 = "char string"; // Constructed from LPCSTR

_VARIANT_T V2 = L "Wide Char String"; // Constructed from LPCWSTR

_BSTR_T BS1 = "Bob";

_VARIANT_T V3 = BS1; // Copy a _BSTR_T object

// Data extraction

_BSTR_T BS2 = V1; // Extract BSTR from Variant

_BSTR_T BS3 = (_BSTR_T) V1; // Cast OK, the same

Note that _variant_t method will throw an exception when the conversion fails, so it is necessary to use Catch capture _Com_ERROR exception.

Also pay attention to _variant_t can't be converted directly into an MBCS string. To create a transition-only _bstr_t variable, use other class functions that convert Unicode to MBCs, or ATL conversion macros to convert.

Unlike _bstr_t, _variant_t data can be directly transmitted to the COM method as parameters. _variant_t inherits the Variant type, so use the use of Variant's place to use _variant_t is allowed by C language rules. C string full guide (2) - STL and ATL type Translation: Connection 21/11/2002 URL: http://www.zdnet.com.cn/developer/tech/story/0,2000081602,390,8845,00. HTM

STL class

STL class

STL has only one string class, namely Basic_String. Basic_string manages a zero-ended character array. The character type is determined by the template parameter. Typically, Basic_String is processed as an opaque object. A read-only pointer can be obtained to access the buffer, but the write operation is made by the membersic_string member function.

Basic_string predefined two special cases: String, containing a Char type character; Which, contains Wchar_t type characters. Without built-in TCHAR special case, you can implement the following code:

// special

Typedef Basic_String

TSTRING;

// tchar string

// Structure

String str = "char string";

// Construct from LPCSTR

WString wstr = l "wide char string";

// Construct from LPCWSTR

Tstring TSTR = _T ("tchar string");

// Construct from LPCTSTR

// Data extraction

LPCSTR PSZ = Str.c_str ();

/ / Point read-only pointer to the STR buffer

LPCWSTR PWSZ = WSTR.C_STR ();

/ / Point read-only pointer to WSTR buffer

LPCTSTR PTSZ = TSTR.C_STR ();

/ / Point read-only pointer to the TSTR buffer

Unlike _bstr_t, Basic_String cannot be converted between character sets. However, if a constructor accepts the corresponding character type, the pointer returned by c_str () can be passed to this constructor. For example: // Constructed from Basic_String_BSTR_T

_BSTR_T BS1 = str.c_str (); // From LPCSTR Construction_BSTR_T

_BSTR_T BS2 = WSTR.C_STR (); // From LPCWSTR Construction_BSTR_T

ATL class

CCOMBSTR

CCOMBSTR is ATL's BSTR packaging class. Some cases are more useful than _BSTR_T. The most important thing is that CCOMBSTR allows operations to implicit BSTR. That is to say, the CCOMBSTR object automatically manages BSTR memory when passing a CCOMBSTR object to the CIC method. For example, to call the following interface functions:

// Simple interface

Struct ISTUFF: Public IUNKNOWN

{

// COM program ...

STDMETHOD (SETTEXT) (BSTR BSTEXT);

STDMETHOD (BSTER * PBSTEXT);

}

CCOMBSTR has a BSTR operation method that can pass BSTR directly to setText (). There is also an Operator & method that returns BSTR *, passes BSTR * to the relevant functions that require it.

CCOMBSTR BS1;

CCOMBSTR BS2 = "New TEXT";

Pstuff-> GetText (& BS1); // OK, acquire internal BSTR address

PSTUFF-> setText (BS2); // OK, call BSTR conversion

Pstuff-> setText ((BSTR) BS2); // Cast OK, the same

CCOMVARIANT CCOMBSTR has a constructor similar to _BSTR_T. But there is no built-in converter to build MBCS strings. You can call the ATL macro for conversion.

// Structure

CCOMBSTR BS1 = "char string"; // Construct from LPCSTR

CCOMBSTR BS2 = L "Wide Char String"; // Constructed from LPCWSTR

CCOMBSTR BS3 = BS1; // Copy CCOMBSTR

CCOMBSTR BS4;

BS4.LoadString (IDS_SOME_STR); // Load from string table

// Data extraction

BSTR BSTR1 = BS1; // Returns internal BSTR, but cannot be modified!

BSTR BSTR2 = (BST) BS1; // Cast OK, the same

BSTR BSTR3 = BS1.copy (); // Copy BS1, return BSTR

BSTR BSTR4;

BSTR4 = BS1.DETACH (); // BS1 no longer manages its BSTR

// ...

Sysfreestring (BSTR3);

Sysfreeestring (BSTR4);

The last example of the above is used to the drop () method. After the method is called, the CCOMBSTR object will no longer manage its BSTR or its corresponding memory. So BSTR4 must call sysfreestring ().

Finally, discuss the reference operator (Operator &). Its transcendence makes some STL sets (such as LIST) can't use CCOMBSTR directly. Returns a pointer to the tolerance class using a reference operation on a collection. However, use the reference operation on CCOMBSTR, returning BSTR *, not ccombstr *. However, you can use ATL's CADAPT class to solve this problem. For example, to create a CCOMBSTR queue, you can declare: std :: list > BSTR_LIST;

CADAPT provides the operation required for the collection, which is implied in code. At this time, use BSTR_LIST like a CCOMBSTR queue.

Ccomvariant

CCOMVARIANT is a Variant's packaging class. But unlike _variant_t, its Variant is not implicit, you can directly operate the Variant member in the class. CCOMVARIANT offers a variety of constructor and multiple types of operations. Here only describes the operations related to strings.

// Structure

Ccomvariant v1 = "char string"; // constructed from LPCSTR

CCOMVARIANT V2 = L "Wide Char String"; // Constructed from LPCWSTR

CCOMBSTR BS1 = "BSTR BOB";

CCOMVARIANT V3 = (BSTR) BS1; // Copy from BSTR

// Data extraction

CCOMBSTR BS2 = V1.BSTRVAL; / / Extract BSTR from Variant

Unlike _variant_t, CCOMVARIANT does not have a conversion operation between different Variant types. Variant members must be manipulated directly and determine the type of Variant. Call the ChangeType () method to convert CCOMVARIANT data to BSTR.

CCOMVARIANT V4 = ... // Initialize V4 from some type

CCOMBSTR BS3;

IF (succeeded (v4.changetype)))

BS3 = v4.bstrval;

Like _variant_t, CCOMVARIANT cannot be converted directly into a MBCS string. To create a transition-only _bstr_t variable, use other class functions that convert Unicode to MBCs, or ATL conversion macros to convert.

ATL conversion macro

ATL conversion macro

ATL's string conversion macro can easily convert different encoded characters and is very effective in functions. Macro is named in [Source Type] 2 [New Type] or [Source Type] 2C [New Type] format. The latter is converted into a constant pointer (the name contains "C"). Type abbreviations are as follows:

A: MBCS string, char * (a for ANSI)

W: unicode string, wchar_t * (w for wide)

T: tchar string, tchar *

OLE: OLECHAR string, OLECHAR * (actually equal W)

BSTR: BSTR (only for destination type)

For example, W2A () converts the Unicode string to the MBCS string, and T2CW () converts the TCHAR string to the Unicode string constant.

To use macros, you want to include ATLCONV.H header files. Macro converts can be used in non-ATL programs because the header file does not rely on other ATLs, nor does it need to _Module global variables. If you use a conversion macro in a function, write the uses_conversion macro before the function starts. It indicates that some local variables are used by macro control. Convert result string, as long as it is not BSTR, is stored in the stack. If you want to use these strings outside of the function, you must copy these strings to other string classes. If the result is BSTR, the memory is not automatically released, so the return value must be assigned to a BSTR variable or BSTR packaging class to avoid memory leakage.

Here is a number of macro conversion examples:

// Function with strings:

Void foo (LPCWSTR WSTR);

Void Bar (BSTR BSTR);

// Return to the function of the string:

Void Baz (BSTR * PBSTR);

#include

Main ()

{

Using std :: string;

Uses_Conversion;

// Declaration partial variables are used by macro control

// Example 1: Send an MBCS string to foo ()

LPCSTR psz1 = "bob";

String str1 = "bob";

FOO (A2CW (PSZ1));

FOO (A2CW (str1.c_str ()));

// Example 2: Send the MBCS string and Unicode string to bar ()

LPCSTR PSZ2 = "bob";

LPCWSTR WSZ = L "BOB";

BSTR BS1;

CCOMBSTR BS2;

BS1 = A2BSTR (PSZ2);

// Create BSTR

BS2.attach (W2BSTR (WSZ));

// To the same, assign to CCOMBSTR

BAR (BS1);

BAR (BS2);

Sysfreestring (BS1);

// Release BS1

// Do not have to release BS2, released by CCOMBSTR.

// Example 3: Convert BSTR returned by baz ()

BSTR BS3 = NULL;

String str2;

BAZ (& BS3);

// baz () populates BS3 content

STR2 = W2CA (BS3);

/ / Convert to MBCS string

Sysfreeestring (BS3);

// Release BS3

}

It can be seen that the other type of parameter is passed to a function that needs some type of parameter, which is very convenient to convert with macro. C string full guide (2) - mfc class translation: Connection 22/11/2002 URL: http://www.zdnet.com.cn/developer/tech/story/0,2000081602,39098983,00.htm

MFC class

MFC class

Cstring

The MFC's cstring contains Tchar, which depends on the setting of the pre-process tag. Typically, the CString is like a STL string is an opaque object, and can only be modified by a cstring method. CString is more superior to the STL string that its constructor accepts MBCS and Unicode strings. And can be converted to LPCTSTR, so the CString object can be directly delivered to the function of receiving LPCTSTR, and the c_str () method is not required.

// Structure

CSTRING S1 = "char string"; // Construction from LPCSTR

CString S2 = L "Wide char string"; // Constructed from LPCWSTR

CString S3 ('', 100); // Pre-allocate 100 bytes, fill spaces CString S4 = "new window text";

// You can use CString at LPCTSTR:

SetwindowText (HWNDSOMEWINDOW, S4);

// or explicitly to make mandatory type conversion:

SetwindowText (HWNDSOMEWINDOW, (LPCTSTSTR) S4);

You can also load strings from a string table. CString constructs objects through LoadString (). Use the format () method to selectively read a string of a certain format from a string table.

// Constructed / load from string table

CSTRING S5 ((lpctstr) IDS_SOME_STR); // Load from string table

CSTRING S6, S7;

// Load from string table

S6.LoadString (IDS_SOME_STR);

// Load the string of print format from string table

S7.Format (IDS_SOME_FORMAT, "Bob", nsomestuff, ...);

The first constructor looks a bit strange, but it is indeed a character string load method.

Note that CString only allows for a mandatory type conversion, that is, forced to convert to LPCTSTR. Forced to convert to LPTSTR (very measured pointer) is wrong. According to old habits, convert CString to lPTSTR can only hurt themselves. Sometimes didn't find an error in the program, it just happened. The correct way to convert to a very quantity pointer is to call the getBuffer () method.

The following previous queue added elements Take an example how to use CString correctly:

CString str = _t ("new text");

LVITEM ITEM = {0};

Item.mask = lvif_text;

Item.iitem = 1;

Item.psztext = (lpctstr) str; // is wrong!

Item.psztext = str.getBuffer (0); // correct

ListView_SetItem (& item);

Str.releaseBuffer (); // returns the queue to STR

PSZText members are LPTSTR, a very quantitative pointer, so use STR's getBuffer (). The parameters of GetBuffer () are the minimum buffers allocated by CString. If you want to assign a 1K TCHAR, call GetBuffer (1024). The parameter is 0, and only the pointer to the string is returned.

The error statement on the above example can be compiled, even can work properly, if it is this type. But this does not prove the syntax correct. Perform a very quite compulsory type conversion, break the object-oriented package principle and pass over the internal operation of CString. If you habits such a forced type conversion, I will eventually encounter an error, but you may not know where you are, because you are doing such a conversion, and the code can run.

Know why people always complain about defective software? Incorrect code is breed in the bug. However, you are willing to prepare a known code to make the bug organically multiplied? Still spend some time to learn CString's correct usage makes your code 100% correct.

CString has two functions to get BSTR from CString and convert to Unicode if necessary. That is allocsystring () and setsystring (). In addition to setsysString (), both use the BSTR * parameters.

// Convert to BSTR

CString S5 = "Bob!";

BSTR BS1 = NULL, BS2 = NULL; BS1 = S5.Allocsystring ();

S5.Setsystring (& BS2);

// ...

Sysfreestring (BS1);

SysfreeString (BS2);

Colevariant is very similar to CComvariant. Colevariant inherits to Variant, which can be passed to a function that needs Variant. However, unlike CComvariant, Colevariant has only one LPCTSTR constructor that does not provide a separate LPCSTR and LPCWSTR constructor. In most cases, there is no problem, because it is always willing to treat the string to LPCTSTR. But you have to know this. Colevariant also has a constructor that accepts CString.

// Structure

CSTRING S1 = _T ("tchar string");

Colevariant v1 = _t ("bob"); // Construct from LPCTSTR

Colevariant V2 = S1; // Copy from CString

For CCOMVARIANT, you must directly process the Variant member, and convert it to a string when necessary when necessary, using the CHANGETYPE () method. However, Colevariant :: ChangeType () will throw an exception when the conversion fails, not the error code to return HRESULT.

// Data extraction

Colevariant V3 = ...; // Structures V3 from some type

BSTR BS = NULL;

Try

{

v3.change type (vt_bstr);

BS = v3.bstrval;

}

Catch (ColeException * e)

{

// error, unable to convert

}

Sysfreestring (BS);

WTL class

WTL class

Cstring

WTL's CSTRING is identical to the MFC's CString behavior, see the description of MFC CString above.

CLR and VC 7

System :: string is a .NET string class. In its interior, the String object is a constant character sequence. Any String method for any operation String object returns a new String object because the original String object remains the same. The String class has a property that when multiple String points to the same set of character sets, they actually point to the same object. The string of Managed Extensions C has a new prefix S. is used to indicate a managed string string.

// Structure

String * ms = s "this is a nice managed string";

You can use the unmanaged string string to construct the String object, but it is better to construct the String object with the Managed String String. The reason is that all the same strings with S prefix points to the same object, and unmanaged string does not have this feature. The following example can be explained more clearly:

String * ms1 = s "this is nice";

String * ms2 = s "this is nice";

String * ms3 = l "this is nice";

Console :: WriteLine (ms1 == ms2); // Output TRUE

Console :: WriteLine (ms1 == ms3); // Output False

To compare with a string without the S prefix, use the string :: compareto () method, such as Console :: WriteLine (MS1-> Compareto (MS2));

Console :: WriteLine (MS1-> Compareto (MS3));

Both output 0, indicating that the string is equal.

The conversion between String and MFC 7 cstring is easy. CString can be converted to LPCTSTR, String has two constructors that accept char * and wchar_t *. Therefore, you can directly pass the cstring to the String constructor:

CSTRING S1 ("Hello World");

String * S2 (S1); // Copy from CString

The method of reverse conversion is also similar:

String * S1 = s "Three Cats";

CSTRING S2 (S1);

May be a bit confused. Starting with VS.NET, CString has a constructor to accept String objects, so it is correct.

CStringt (System :: string * pstring);

In order to accelerate operation, it is sometimes available to underwater strings (Underlying String):

String * S1 = s "Three Cats";

Console :: WriteLine (S1);

Const __wchar_t __pin * pstr = ptrtostringchars (S1);

For (int i = 0; i

(* const_cast <__ wchar_t *> (pstr i)) ;

Console :: WriteLine (S1);

PTRTStringChars () Returns the const __wchar_t * pointer to the underlying string, preventing the garbage collector from removing the string when the string is operated. C string full guide (2) - Summary Translation: Connection 23/11/2002 URL: http://www.zdnet.com.cn/developer/tech/story/0,2000081602,39099061,00.htm

Print format function of string classes

It is particularly careful when using Printf () or other similar functions for string packages. Includes sprintf () functions and variants thereof, as well as Trace and ATLTRACE macros. Their parameters do not do types, must pass them to pass the C language string, not the entire String object.

For example, to pass a string in _bstr_t () to ATLTRACE (), you must explicitly use (LPCSTR) or (LPCWSTR) to force type conversion:

_BSTR_T BS = L "Bob!";

ATLTRACE ("THE STRING IS:% S in line% D / N", (LPCSTR) BS, NLINE

If you forget to transfer the entire _bstr_t object directly to Atltrace directly, the tracking message will output meaningless things, because all the data in _BSTR_T variables are in stack.

Summary of all classes

The conversion method between the commonly used strings is: convert the source string into a C-Type C-Type Cycle, and then pass the pointer to the constructor of the target class. The following is listed below to convert a string to a C-type pointer, and which classes of constructor accept C-Type pointers.

Class stringtype convert to char *? Convert to constchar *? Convert to wchar_t *? Convert to const wchar_t *? Convert to BSTR? Construct from char *? Construct from wchar_t *? _Bstr_t BSTR yes, cast1 yes, cast yes, cast1 yes, cast yes2 yes yes _variant_t BSTR no no no cast to_bstr_t3 cast to_bstr_t3 yes yes string MBCS no yes, c_str () method no no no yes no wstring Unicode no no no yes, c_str () method no no yes CComBSTR BSTR no no no yes, castto BSTR yes, cast yes yes CComVariant BSTR no no no yes4 yes4 yes yes CString TCHAR no6 in MBCSbuilds, cast no6 in Unicodebuilds, cast no5 yes yes COleVariant BSTR no no no yes4 yes4 in MBCS builds in Unicode builds Notes:

Although _BSTR_T can be converted to a very amount of pointers, modifications to internal buffers may cause memory overflow, or cause memory leakage when release BSTR.

BSTR_T's BSTR contains Wchar_T * variables, so converts const wchar_t * to BSTR. But this usage may change in the future, be careful when using it. If the transition to the BSTR fails, the exception will be thrown. Handle the BSTRVAL of Variant with ChangeType (). In MFC, the conversion failure will throw an exception. Although there is no BSTR conversion function, allocsysString () can return a new BSTR. Use a getBuffer () method to temporarily get a very square TCHAR pointer.

转载请注明原文地址:https://www.9cbs.com/read-53388.html

New Post(0)