Summary about CString
Author: wangshengxiang
Foreword: String operation is one of the most commonly used operations in programming. As a VC programmer, both rookies or masters have been used in cstring. And it seems that it is difficult to do it in actual programming (although it is not a library in standard C ). Because this class provided in the MFC is too convenient to operate the string of operation, CString not only provides a variety of operational functions, but also overloaded, so that we use it more than Basic in Basic; and it also provides dynamic memory Allocation, which reduces the hidden dangers of how many strings arbitrage. However, we also experience CString simply in the process of use, and there is unpredictable. So there are many high people stand, it is recommended to abandon it.
Here, I personally think that CString is really perfect, it has many advantages, such as "easy to use, strong, dynamically distributed memory, large copies, it can save memory resources and perform efficiency, complete with standard C Compatible, while supporting multi-byte and wide bytes, because there is an abnormal mechanism, it is safe and convenient. "In fact, the reason why it is easy to use, that is because we understand it is not enough, especially its implementation mechanism. Because most people in us, don't love the documentation in the work, let alone it or English.
Since I encountered a matter of work in the past few days, I was not a problem but especially tricky, particularly difficult to solve and inexplicable. Finally, I found that it was caused by cstring. So there is no way, I have seen all the implementations of the entire CString, just panic, and thoroughly understand the cause of the problem (this problem, I have already opened it on 9CBS). Here, I would like to summarize some of my knowledge about CString, so that he (she) will learn from, maybe there is any mistakes that I understand, I hope that the discovers can inform me, thank you.
1. CSTRING implementation mechanism.
CString is managed by "reference", "reference", I believe everyone is not unfamiliar, like the Window kernel object, COM object, etc. is implemented by reference. CSTRING is also managed by such a mechanism to manage allocated memory blocks. In fact, the CString object has only one pointer member variable, so there is only 4 bytes of the length of any CString instance.
That is: int Len = SizeOf (CString); // Len is equal to 4
This pointer points to a related reference memory block, as shown: CString Str ("ABCD");
'A'
'B'
'C'
'D'
0
0x04040404 HEAD department, information for reference memory block
Str 0x40404040
Because of this, a such memory block can be referenced by multiple CString, such as the following code:
CSTRING STR ("abcd");
CString a = STR;
CSTRING B (STR);
CString C;
C = B;
The result of the above code is: The above four objects (STR, A, B, C) have the same value, all of which are 0x40404040. And how many CSTRING references do this memory block? Again, it also records some information. If the number, string length, allocation of memory length.
The structure of this reference memory block is defined as follows:
Struct CstringData
{
Long nrefs; // indicates how many CString references it. 4INT NDATALENGTH; // String actual length. 4
INT Nalloclength; / / total allocated memory length (no 12 bytes of this head). 4
}
Because of this information, CString can correctly assign, manage, and release the reference memory block.
If you want to get this information when debugging the program. You can type the following expressions at the Watch window:
(Cstringdata *) ((CStringData *) (this-> m_pchdata) -1) or
(CStringData *) (CStringData *) (str.m_pchdata) -1) // Str is a cstring instance
Because this kind of good mechanism is used, CString is highly efficient in a large number of copies, but also distributes less memory.
2. LPCTSTR and GetBuffer (int nminbuflength)
These two functions provide compatible conversion with standard C. The frequency is high in actual use, but it is the easiest way to go wrong. These two functions are actually returned by pointers, but what is the difference? And after calling them, what kind of processing process is done behind the scenes?
(1) LPCTSTR its execution process is actually very simple, just returns the string address of the reference memory block. It is provided as an operator overload, so it is sometimes implicitly converted in the code, and sometimes it needs to be enforced. Such as:
CString Str;
Const char * p = (lpctstr) STR;
/ / Suppose there is such a function, Test (const char * p); you can call this
Test (STR); // This is implicit to LPCTSTR
(2) GetBuffer (int nminbuflength) it is similar, and it will return a pointer, but it is a bit different, returning is LPTSTR
(3) What is the difference between the two? I want to tell you, it is essentially different. Generally, the LPCTSTR conversion should only be used as a constant, or the entry into the function; and getBuffer (...) After retrieving the pointer, you can modify the content in this pointer. Or make a function of outbraction. why? Maybe there is such a code:
CSTRING STR ("abcd");
CHAR * P = (char *) (const char *) STR;
p [2] = 'Z';
In fact, you may have such a code, your program is not wrong, and the program is also very good. But it is very dangerous. Look again
CSTRING STR ("abcd");
CSTRING TEST = STR;
....
CHAR * P = (char *) (const char *) STR;
p [2] = 'Z';
STRCPY (p, "akfjaksjfakfakfakj"); // This finished
You know how much is the value in TEST? The answer is "abzd". It also changed, this is not what you expect. But why is this? If you think about it, you will understand that, because cstring is pointing to the reference block, Str and Test point to the same place, when you p [2] = 'z', of course, Test will change. So after using it LPCTSTR to switch, you can only read this data, don't change its content.
If I want to change the data directly through the pointer, what should I do? That is to use getBuffer (...). See the following code: CSTRING STR ("ABCD");
CSTRING TEST = STR;
....
Char * p = str.getBuffer (20);
p [2] = 'z'; // execute it, now the Test median is still "abcd"
Strcpy (p, "akfjaksjfakfakfakj); // execute it, now Test's value or" ABCD "
Why is this this? In fact, GetBuffer (20) When it is called, it actually established a new inner plug, and assigned a 20-byte length buffer, and the original memory block reference count is also reduced 1. So after execution code, Str and Test are pointing There are two different places, so there is no things.
(4) It is also a little consideration: It is Str.getBuffer (20), the STR distribution length is 20, that is, the buffer pointed by the pointer P which is only 20 bytes long, and it is impossible to assign it. Otherwise, the disaster is not far away; if the specified length is less than the original string length, such as getBuffer (1), it actually assigns 4 bytes length (ie the original string length); in addition, when calling getBuffer (...) And change its content, be sure to call ReleaseBuffer (), which updates the header information of the reference memory block according to the string content.
(5) Finally, there is a precaution to see the following code:
CHAR * P = NULL;
Const char * q = null;
{
CString str = "abcd";
Q = (LPCTSTR) STR;
P = str.getbuffer (20);
AfxMessageBox (Q); // legal
STRCPY (P, "this is test"); // legal,
}
AfxMessageBox (q); // illegal, possibly
STRCPY (p, "this is test"); // illegal, possibly
What to say here is that when these pointers are returned, if the CString object is over, these pointers are invalidated accordingly.
3. Copy & Assignment & "Quote Memory Block" When is it released?
The following demonstrates a code execution process
Void test ()
{
CSTRING STR ("abcd");
// Str Points to a reference memory block (reference memory number is 1, the length is 4, the distribution length is 4)
CSTRING A;
// a points to an initial data state,
A = STR;
// A and Str point to the same reference memory block (reference memory number is 2, the length is 4, the distribution length is 4)
CSTRING B (a);
// A, B and STR points to the same reference memory block (reference number count of the reference memory block is 3, the length is 4, the allocation length is 4)
{
LPCTSTR TEMP = (LPCTSTR) A;
// Temp points to the reference address of the reference memory block. (The reference number count of the reference block is 3, the length is 4, the distribution length is 4)
CString D = a;
// A, B, D and STR points to the same reference memory block (reference memory number is 4, the length is 4, the distribution length is 4) B = "Testa";
// This statement is actually calling the cstring :: operator = (cstring &) function. B Point to a newly allocated reference memory block. (The newly distributed reference memory block is 1, the length is 5, the distribution length is 5)
/ / At the same time, the original reference memory block reference count is reduced 1. A, D and STR still refer to the original reference memory block (reference memory number is 3, the length is 4, the distribution length is 4)
}
/ / Due to the end of D life, the destructor is called, leading to the reference count minus 1 (reference number count of the reference memory block is 2, the length is 4, the distribution length is 4)
LPTSTR TEMP = A.GetBuffer (10);
// This statement can also cause reassignment of new memory blocks. Temp points to the serial address of the new allocation reference memory block (the reference number of the newly allocated reference memory block is 1, the length is 0, the distribution length is 10)
/ / At the same time, the original reference memory block reference count reduction 1. Only STR is still pointing to the original reference memory block (reference memory block is 1, the length is 4, the distribution length is 4)
STRCPY (TEMP, "TEMP");
// a reference memory block of the reference memory block is 1, the length is 0, the distribution length is 10 a.ReleaseBuffer (); // Note: A reference count of the reference memory block points to 1, length is 4, allocation length 10
}
// Performance, all local variable life cycles have ended. Object STR A B each calls its own analytical system
// Function, the reference memory block pointed to 1 is also reduced accordingly
// Note that the count of reference memory blocks points to STR A B is 0, which causes the allocated memory block release.
By observing the above execution process, we will find that CString can point to the same reference in the same reference in the same reference, but they have a very smart and very safe, completely It does not interfere with each other and does not affect each other. Of course, you must require your code to use correct, especially in actual use, if you do function parameters, reference, and sometimes save to cstringlist, if even if there is a small place to use improper use, the result will also Unpredictable mistakes
5 freeextra () role
Look at this code
(1) CSTRING STR ("Test");
(2) LPTSTR TEMP = Str.getBuffer (50);
(3) STRCPY (TEMP, "There Are 22 Character);
(4) str.releasebuffer ();
(5) str.freeextra ();
When the above code is executed (4), everyone knows that the reference memory block count of the Str is 1, the length is 22, the distribution length is 50. When Str.FreeeExtra () is executed, it will release the assigned extra. RAM. (The reference memory block count is 1, the length is 22, the distribution length is 22)
6 Format (...) with formatv (...)
This statement is the most prone to error in use. Because it's richer, it is quite flexible. Here, I didn't plan to analyze it, actually sprintf (...), how to use it. I only remind to pay attention to it when using: is the particularity of its parameters. Since the compiler does not perform the type and length of the corresponding argument in the compiler when compiling. So you have to pay attention, both must be corresponding, otherwise it will be wrong. Such as:
CString Str;
INT A = 12;
Str.Format ("First:% L, Second:% S", A, "ERROR"); // result? Try
7 LockBuffer () with unlockbuffer ()
As the name, the role of these two functions is to lock and unlock the reference memory block. But what is the effect of using it and does it have a substantial impact on the CString string. In fact, it is quite simple, look at the following code:
(1) CSTRING STR ("Test");
(2) str.lockbuffer ();
(3) cstring temp = STR;
(4) str.unlockbuffer ();
(5) Str.LockBuffer ();
(6) Str = "error";
(7) str.releasebuffer ();
After execution (3), in general, TEMP and STR do not point to the same reference memory block. You can take a look at this expression (CStringData *) (CStringData *) ((str.m_pchdata) -1) at the Watch window.
In fact, there is an explanation in MSDN:
While In a Locked State, The String Is Protected in Two Ways:
No Other String Can Get a Reference To The Data In The Locked String, Even IF That String is assigned to the locked string.
The Locked String Will Never Reference Another String, Even if That Other string is copied to the locked string.
8 CSTRING is just a string?
No, CString is not only able to operate the string, and the memory block data can be handled. Perfect function! Look at this code
Char P [20];
For (int loop = 0; loop { P [loop] = 10-loop; } CSTRING STR ((LPCTSTR) P, 20); CHAR TEMP [20]; Memcpy (TEMP, STR, STR.GETLEngth ()); The STR is fully reproduced in the memory block P to the memory block TEMP. So you can use CString to handle binary data 8 allocsystring () with setsystring (bstr *) These two functions provide a string and BSTR conversion. When using it: When you call allocsystring (), you must call it sysfreestring (...) 9 Parameter safety test A safety check of multiple macros is provided in the MFC, such as: assert. Among them, there is no exception in CString, there are many such parameter inspections, in fact, this also shows that the code is high, sometimes we will find This is very annoying, causing DEBUG to be different from the Release version, if the program debug is normal, and Release is crashing; and sometimes it is the opposite, debug can't, Release line. In fact, I personally think that we should strive to high code quality during the use of CString, can't appear in the Debug version, which is afraid that Release runs seems to be normal. But very unsafe. As follows: (1) CString STR ("Test"); (2) str.lockbuffer (); (3) LPTSTR TEMP = Str.getBuffer (10); (4) STRCPY (TEMP, "ERROR"); (5) str.releasebuffer (); (6) Str.releaseBuffer (); // Perform it to this time, the debug version will pop up the error box 10 cstring exception handling I just want to emphasize: Only if you allocate memory, it is only possible to throw CMEMORYEXCEPTION. Similarly, in the function declaration in the MSDN, the functions of THROW (CMEMORYEXCEPTION) have the possibility of reassign or adjust memory. 11 CString at the time of block block. That is, the parameter in the interface function of a DLL is CString & how it happens. Sign up what I have encountered. My problem has been posted, the address is: Http://www.9cbs.net/expert/topic/741/741921.xml?temp=.2283136 When constructing such a CString object, such as the CSTRING STR, do you know the reference memory block pointed to by this STR? Maybe you will think it points to NULL. In fact, if this, if this, the reference mechanism management of CString will have trouble, so when cstring is constructing an empty string object, it points to a fixed initialization address, the declaration of this data is as follows: AFX_STATIC_DATA INT _AFXINITDATA [] = {-1, 0, 0, 0}; Brief description summarize: When a CString object is set, such as EMPTY (), CString A, etc., its member variable m_pchdata will point to the address of the variable of _afxinitdata. When this CString object lifecycle ends, it will be reduced to the reference memory block count of the referenced reference memory block, if the reference count is 0 (ie, there is no CString reference it), release this reference memory. The current situation is that if the reference memory block pointed to by CString is to initialize the memory block, no memory is released. Said so much, what is the relationship with the problem I encountered? In fact, it's big. It is true that if the EXE module and the DLL module have a STATIC compilation connection. Then this CString initialization data has different addresses in the EXE module and the DLL module because the Static connection will have a copy of the source code in this module. Also, if the two modules are Share connected, the CString implements code implemented in another separate DLL, and the AFX_STATIC_DATA specifies that the variable is only once, so the _afxinitData has the same address in the two modules. Now the problem is fully understood! You can demonstrate yourself. __Declspec (DLLEXPORT) Void Test (CString & Str) { Str = "abdefakdfj"; // If it is a Static connection, and the incoming STR is an empty string, this is wrong. } The last idea: Writes here, in fact, there are many skillful stuff in CString, I didn't explain. For example, many heavy load operators, lookups, etc. I think I still have a detailed look at MSDN, which may be much better than me. I only focus on the situation that may be wrong. Of course, if there is a mistake in my narrative, please give your guidance, thank you!