Wide : Type Cache (I) Andrei Alexandrescu
Imagine this article The beginning of the "generic " section of this article is: "This article is about how to use C processing memory buffer." When you turn off your browser, you will also hear thousands of mice as you do. Because, who will interested in processing memory buffers? However, this article is indeed working on how to process memory buffers in C , but there are two special situations here. First, the buffer is generic, which means that the buffer can place type data (essentially in the original byte). Second, the efficiency of the cache we want to discuss and stored in the type and the highest efficiency allowed by the operating system are comparable, and this efficiency refers to any of the efficiency-including allocation policies, and data operations. You will soon see that writing a generic cache is a small exercise about the template. Write an efficient cache is not a complex task. Write an unexpectedly safe cache is difficult, but it is not very difficult, especially after you read the book [1] on unexpected security. But writing a cache with generic features and high efficiency is like climbing a dangerous peak. Just like it is often happened in C , you can enjoy your efforts to get enough return on the top of the mountain - as long as you can overcome difficulties.
When some people will always be in the USENET newsgroup comp.lang.c . Moderated in the USENET News Group: Why can't Autip_ptr can't give you use delete [] instead of delete in the destructor? The next discussion is usually this: "You should not use the C style array and should use std :: vector, it is highly efficient" "" but std :: Vector in the place I want to use is not efficient enough. "" Why? "Waiting for the fact that std :: Vector is a class that is used to deal with the design of the continuous object sequence. When I first saw the excellent design of std :: Vector, I didn't throw it away and I wrote a need (when I saw, for example, some MFC's stuffy stuffy stuffily felt this impulse )however. Std :: Vector is some of the low-efficiency, and this may seriously affect some users.
* Unnecessary data initialization. Normally, you need to create a size suitable Vector with basic type (such as char) and pass it to the low-level C function to populate it with a Socket or file to populate it. The problem is that although you don't need it, the entire vector will be initialized by the constructor of the Vector or the resize function. If you have to set clamps with the C function, you can't avoid this overhead. * index increase. Standard VECTOR rules require a rapid growth strategy for fast (average constant time). This requires Std :: Vector to grow by multiplication - Most vector implementations are allocated multiple memory when they need to automatically grow. For example, if you want to add an object after a vector containing one million objects, this vector is a space that can accommodate 3 million objects in a short period of time in a short period of time, but this may not be what you want. * "Mandatory" memory allocation strategy. Standard VECTOR will never shrink and will only grow. If you concentrate three million data samples, then you feel that you only need to keep one million data samples, you will have idle spaces that can store 2 million objects. Handling this situation The common means of use is to copy your vector to a newly built VECTOR and then exchange these two vectors.
Std :: Vector data; ... Processing Data ... // Shrink data std :: vector (data.begin (), data.end ()). Swap (data); this common method Not very optimized, because you have to copy the data inside. Worse, this is such a common method to guarantee 100% of the monthly utilization. Because std :: Vector will always assign multiple memory than needed. If you can "local" shrinking the vector will be better, that is, this needs to tell the memory distributor that it can use the memory after the end of your VECTOR. Many memory distributors allow you to do this. * Inefficient object movement. Std :: Vector does not differ from copy and movement. For a std :: vector, a mobile operation is a copy operation behind a sectoral operation of a source object. So if you have a vector saved string, you have no way to write a function to quickly move several pointers (transfoliation: The pointer refers to these places such as the beginning and end of the managed memory. The pointer) to the new memory to make the std :: vector use it. Each single string is unwanted to copy to a new string. When the mobile object is not a CRC [2] member, this will be a huge source of low efficiency. This is also what kind of std :: vector > will be prohibited. * Data copy and more inefficiencies in mobile. Some STL implementations cannot distinguish the Vector of the old data type and a Vector containing the user's custom type. (MetrowerKs is an exception) This means that they use more common structures to copy data, such as for loops. You have the right to think that a good compiler should replace the loop of the copy series integer to a more efficient Memcpy call. The legendary two compilers can produce high-efficiency code, Microsoft Visual C and MetrowRKs CodeWarrior, in full speed optimization mode, using loop to copy the integer data to be significantly slower than the corresponding Memcpy call. So you have to perform efficiency, you need to call Memcpy in your own code. * Unnecessary accident safety. Many std :: Vector implementations (but not all) assume that the category constructor, copy constructor, and assignment operator overload function may throw an accident. But this does not happen for basic types and many other types. So these implementations that the code generated by these types may be unnecessarily larger and slower. In many applications, some or all of these issues have no impact. But when you have harsh performance requirements, std :: Vector is not so attractive. You will turn to find a more optimized structure that gives you more control. The problem is that you can't find any containers in the under :: Vector under the C standard - except for arrays. You either open Cadillac, or ride a yellow fish. To evidence? Evidence is in front of you: STL uses continuous memory buffers at least three (except the std :: vector itself): std :: basic_string, std :: deque memory block management section and std :: deque's memory block itself. however. I didn't see the STL implementation using the vector as the backend of these structures. If std :: Vector is a tool that implements continuous memory in your application, that is, STL implements other special tools to be used as their backend. There is a gap, there is a vacuum zone in the Std :: Vector and C style arrays. We are located in this position. We will develop a continuous memory buffer containing the following characteristics:
* Generalized - can store any type of sequence * affinity - support std :: vector syntax and semantic subset * Provide full control - provide basic operation of fine-grained basic operations with high-level function * generated code pair Basic Types and Selected User Definition Types Dightly Optimization * Allows User Control Optimization * Support advanced memory allocation functions, such as local expansion and fast redistribution * the most important, can be used to make a higher level continuous memory container Back end - especially, you can use it to implement Vector, String, or Deque.Buffer: The first line of dawn we define an interface for the buffer model. Buffer only saves two pointers - the head and tail of the memory block. For a buffer, size and capacity are different. We start from std :: Vector, to remove and volume related functions, we have the following member functions set:
Template > class buffer {public: .... All std :: Vector type definitions and functions, except: // size_type capacity () const; // void Reserve (SIZE_TYPE N); Private: T * beg_; t * end_;};
Interestingly, although buffer has no volume concept, it can implement all features of the std :: Vector, in addition to Capacity and RESERVE, and Buffer does not meet the performance requirements of all of these functions. For example, buffer :: push_back has o (n) time complexity, but std :: vector :: push_back is integrated with a large amount of time complexity of O (1) during a large number of calls. (This is the Standard "Amortized Constant") "You see later you can improve the performance of Buffer :: Push_back in some cases, you still don't have to support the volume concept. The interface to implement Buffer is not very complicated, two places need to pay attention: accidentally safe and correctly destroyed objects. Standard Functions std :: uninitialized_copy and std :: uninitialized_fill is two very useful tools. In order to allow users to assign a cache without initialization, we need a special constructor and several auxiliary functions. Then we need a grow_noinit tool to extend the cache without call constructor. Correspondingly, we need the shrink_nodeStroy function to contract the cache without calling the destructor, and finally, there is a slightly excess function clear_nodeStroy, which is empty, and the memory is reclaimed without calling the destructor.
Template > Class buffer {.... 同 ..... public: enum noinit_t {noinit}; buffer (size_type n, noinit_t, const allocator & a = allocator () ); Void growth_noinit; void shrink_nodeestroy; void clear_nodeStroy ();
This extended interface gives you full control of the cache stores within the internal data. Be careful not to use these extensions directly without thinking. For example, you use GROW_NOINIT as follows:
Void fun () {buffer Somewidgets; .... // Add 10 Widget's space Somewidgets.grow_noinit (10); // Initialize these Widget ConstructWidgets (SomeWidgets.end () - 10, Somewidgets.end () The problem here is that if the constructor of 10 widgets is failed in any form, all things will be a mess. When FUN returns, Somewidget's destructive function destroys it contains objects - it won't know which widget constructs are successful, this is because buffer is not like std :: Vector has volume concept, if there is also buffer Without initialization, it is clear that the destructive function of those memory uses Widget will result in undefined consequences.
Type Traits A key technology for optimizing generic code is to obtain information about the type of generic code operation. In this way, generic code can assign a job to the specified code at compile time, which performs operations for specialized types. For example, in our example, a very important information is whether the copy constructor of the type in the buffer throws an accident. Types that do not give each copy when copying, the code will become simple because there is no need to handle accident safety. In addition, some compilers create better code if they don't use the TRY block. Type characteristics are a famous technology for deriving type information. Boost [4] There is a library implementation type feature, Loki [5] also (in Buffer, the type feature mechanism we will use will be slightly different from Boost and Loki, this is made by kavonen, I suggest me) . Let's see how to derive a type of copy function is thrown. Frankly, it is impossible to know if all types of copy functions are arbitrarily in C . However, we do some work and let users help us do when they feel optimized. You don't have to be becoming Holm Holmes to know that any basic type of copy constructor will not thrown an accident. In this way, a conservative assumption is that any copy constructor in addition to the basic type may throw an accident. Below is the corresponding code:
namespace TypeTraits {// include all basic types typedef TYPELIST_14 (const bool, const char, const signed char, const unsigned char, const wchar_t, const short int, const unsigned short int, const int, const unsigned int, const long int, const unsigned long int, const float, const double, const long double) PrimitiveTypes; template struct isPrimitive {enum {value = Loki :: TL :: IndexOf = 0};} ;
Template struct isprimitive {enum {value = true};
Template struct copythrows {enum {value =! isprimitive :: value};
For the sake of brevity, the above code uses two tools provided by Loki: Typelists and IndexOf. Typellists let you create and operate type strings, Loki :: TL :: IndexOf looks up a separate type in the type string and returns it in it. If this type is found in the type string, the returned index is negative. Finally, Typetraits :: Copythrow :: Value contains the information you need. Through this mechanism, it is very flexible through this mechanism. It is assumed that you define the following types in an application: struct point {int x; int y; .... Operation function ....
This Point is not a basic type, but it will not throw an accident when copying. You can tell this information with the mechanism of type features. What you have to do is open the Typetraits named space and then put it in an explicit instantiation of a CopythRowS.
// In the file "Pint.h" file, the definition of Point after the definition of Namespace Typetraits {Template <> Struct Copythrows {Enum {Value = false};
Better, you can customize CopythRows for a whole type, which is implemented by partial template specialization, consider the standard plural type, std :: complex , you can use basic arithmetic type instances Std :: Complex, but can also define an arithmetic type, such as Rational or Brigint. Now, because copying a std :: Complex object includes copying two T objects (real parts and imaginary parts), this can be known that std :: complex has the same COPyThrows feature as T. You can express this through the following code:
Namespace Typetraits {Template struct copythrows > {enum {value = copythrows :: value};};} We returns to buffer. How does Buffer use CopythRows information? It is very simple to assign a boolean value when compiling by using the INT2TYPE template class [5] [6]. Recall, INT2TYPE defines a simple appearance.
Template struct int2type {enum {value = v};
The following is how the buffer constructor uses int2type to assign an example of an accidental security or unexpected initialization function:
Template std :: allocator > class buffer: private allocator {private: enum {copythrow = typetraits :: Copythrows :: value! = 0};
// Unintentionally initialize void init (size_type n, const t & value, loki :: int2type ) {....} // Most initialization Void Init (Size_Type N, Const T & Value, Loki :: INT2TYPE ) {....}
Public explicit buffer (size_type m, const t & value = t (), const allocator & a = allocator ()) {init (n, value, loki :: int2type (copythrows> ());}}; other buffers may need Information includes:
* The type is MemcopyAble, that is, copying an object and the byte result of the Memcpy. Obviously, the basic type and POD structure (simple old data, C style) belongs to this situation. * The type is Memmoveable, that is, copying an object from one place to another and destroy source objects, results, and destroy source objects from one place to another place and destroy source objects, results. Once again, the basic type and POD belong to this situation. However, you will soon see that there is a very much user-defined type is Memmoveable.
`` Next section "Pan " will define MemcopyAble and Memmoveable and use them in a similar way with CopyThrows. Is the MemcopyAble and MemmoveAble ended? Not at all. They are the same as Reallocate and Inplace_reallocate for download code, meaning that we face memory allocation challenges. When I arrived, there will be a detailed description!
Available: Thank you very much for the detailed inspection of Tim Sharrock.
Number of references and comments [1] Herb Sutter. Exceptional C (Addison-Wesley, 2000). [2] COW Hover Club Abbreviation. COW is abbreviation for copying. COW is available for std :: Vector Realize the smaller strategy for mobile principles. However, many libraries are removing COW-based implementation because this will cause problems in multithreading programs. [3] Overall, Buffer must have the same level of the same level with the vector. See Bjarne Stroustrup, Appendix E: Standard-Library Exception Safety in . [4] Boost is the C library set of the tip, see . [5] loki is a library of MODERN C Design (Addison-Wesley, 2001), in fact, you are involved in the participation. You can download LOKI from . [6] Andrei AlexandRescu. "Generic : mappings Between Types and Values," C / C User C Experts Forum, October 2000, .andrei AlexandRescu is the University of Washington, Seattle Ph.D. students are also authors of "Modern C Design" book. You can contact him via www.moderncppdesign.com. Andrei is also a C seminar (). A superior lecturer.
You can get this source code from the CUJ website or http://merced.go.nease.net/code/buffer.zip.