Wild <Programming>: Strategy-based Basic

zhaozj2021-02-16  64

Wide : Strategy-based Basic_String Implementing Andrei AlexandRescu

This month's return Sections have two new things. The same is this topic - we will discuss the realization of standard library parts Basic_String (more as a string is known, for convenience, String is a Basic_String typefef), an important part of a C library. But what is really interesting is that this article is for downloading code to do special processing in Visual C 6.0, and the VC6 compiler has two contradictures known to people - it is widely used and it supports generic programming . The code comes with the article does not implement one, or two, but twelve Basic_String's different changes, each has its own advantages and disadvantages. They are not a toy program. We talk about it here is complete, compatible with standard, industrial-grade strength (, of course, there is bug). Do you think there will be a lot of code? Think about it again. Believe me, this article will be very interesting.

One choice cannot meet everyone first, why do someone go very trouble to realize Basic_String? Your Standard Template Library has already implemented it, then "another" Basic_String implementation seems to only have the value of education. However, many people who have used a string in multithreading programs know a difficult problem. Standard attempt to allow "copy-on-write" Basic_String implementation (written when copying it is called COW (cow), but is called "mad cattle"). Based on COW-based string uses reference count, in multi-threaded environments, such a string either use, or if the library supports multithreading, slow unacceptable, even in your single-threaded part. The two must be one. Cow string more problems appear in the dynamic load library - If you release a library, you may have a shallow copy of your program that is originally assigned in the library in the library. The extensive discussion of the problem caused by the COW string makes most STL implementations without COW, and other optimization strategies are implemented for them. However, "Most" is not "all", so if you use threads and Basic_String at the same time, you must use a non-COW implementation, and the consequence is to give up your code's versatility for other parts of STL. In addition, COW has its advantages and is very useful to a large number of applications. Then. Whether you can use the optimization scheme using a string only in an application? "Here I need a non-COW string and perform small string optimization for less than 16 characters" or "here I want to use the advantage of COW, and I want to assign strings in my own pile."

How to implement Basic_String in the 200-line code to abandon the existing useful string implementation and only to develop a daunting task. Write all member functions and type definitions that are available to specific implementation to choose or have a simple job. I know this is because I have written an interface that needs to use the Basic_String in a COM string allocation and multi-threaded environment. When I carefully write all the functional functions specified in the standard, I noticed an interesting phenomenon. Most functions look around a small part of the core functions and types - in other words, you can break the Basic_String interface into "core" and "additional features". Additional function part does no matter how the implementation of your choice is the same, the core part is very different from different implementations. For example, the REPLACE series functions contain a lot of functional functions, which use core functions resize implementation. Now you can use the eraser, the additional functional part of Basic_String is also the largest part (more than 300 lines of code in my implementation). Conversely, write a core, even a complex core, is a relatively simple job - in my implementation is 75 to 250 lines of code. This means that you only need to simply match different core implementations in the extended function section interface, you can create a new implementation. (In fact, it is not entirely because you can download this code, these code also want to be used) actually, the Basic_String implementation of your dream is 200 line code! A strategy-based string you read my book [3] (ah, you hate ad) Learn about the word: strategy)! Policies! Of course, if you want to separate a particular aspect of a class and let your users choose to use the implementation of that aspect, you move that aspect to a template parameter and define an interface for it. This is not a scientific science, but it is also extremely effective. Standard Basic_String is this:

Namespace std {template , class a = allocator > Class Basic_String;}

E is a character type of a string (most cases, not char is wchar_t), T control string how to compare and copy, and A is allocator, it is all we know: love it, but never use it . We have to increase the fourth template parameters and control the true implementation of the string. Because it truly handles the storage of the string, we call it Storage Policy We call our new string flex_string, because you will see it soon, how flexible it is (flexible)

Template , class a = allocator , class storage = allocatorstringStorage > Class Flex_string;

The default value of Storage is allocatorstringstorage

StorageImpl (const StorageImpl &); StorageImpl (const allocator_type &); StorageImpl (const E * s, size_type len, const allocator_type & a); StorageImpl (size_type len, E, const allocator_type &);

Iterator begin (); const_iterator begin () const; limited_Type size () const; size_type size () const; size_type max_size () const; size_type capacity () const;

Void Resize (SIZE_TYPE, E); Void Reserve (SIZE_TYPE); Void Swap (StorageImpl &); Const E * c_STR () const; const E * data () const; allocator_type get_allocator () const;

These are enough. The specification is very simple (and if there is no ALLOCATOR, it will be simpler). The idea here is that you can ultimately effectively implement the entire interface of Basic_String with very few core types and functions in Storage. The Flex_String class saves the value of a Storage object. I have a private inheritance for some small convenience. This way, flex_string in the download code is like this.

template , class A = std :: allocator , class Storage = AllocatorStringStorage > class flex_string: private Storage {public: typedef typename Storage :: Iterator Iterator; TypedEf Typename Storage :: const_iterator const_iterator;

//21.3.1 Construction / Copy / Destructure Explicit Flex_String (Const A & A = a ()): Storage (a) {} ...

Implementation Storage Policy Now starts dry dirty. Let's complete the implementation of Storage. A efficient string implementation saves points to the cache. Next, the cache contains the length and volume of the string, and then the string itself. In order to avoid twice allocation of memory (once in order to record control data for data), you may use a THE STRUCT HACK, and the cache contains a C style character array as its last member, and when many characters are required It will grow dynamically. This is the Template > class simplestRingStorage {struct data {e * pend_; e * pdata_;}; data * pdata_;

Public: size_type size () const {return pdata _-> pend_ - pdata _-> Biffer_;} size_type capacity () const {return pData _-> pendofMem_ - pdata _-> buffer_;} .. ..

Pend_ points to the end of the string, pointofMem_ points to the end of the assigned cache, the buffer_ size extends to accommodate all characters in the string - in other words, Buffer_ continues to exist outside the data of the data. In order to achieve such flexibility, PDATA_ is not really pointing to a DATA object, but pointing to a large block memory converted to DATA. This "struct hack" skill is not 100%, but it is actually the case. SimpleStringStorage has another small special optimization - all empty strings are shared by a static DATA instance. Another implementation may initialize PDATA_ to empty, but it will be empty in many member functions. SimpleStringStorage is "simple" because it does not use the incoming distributor. SimpleStringStorage needs memory just to use standard free storage (New / Delete). The use of incoming dispensers to assign Data objects to be difficult than imagination, this part is because of the design of the dispenser (not supported for any size object), part is because of the compatibility problem of the compiler. You can find this practice in the AllocatorstringStorage template class as a Storage policy. There is also a possible string storage implementation simply using the std :: vector as the backend. This implementation method is very rapidly, you get a simple string that multiplexes the standard library tool for design. This is also helpful to reduce the amount of code. You can see this implementation in VectorstringStorage. All three implementations naturally use EBO (empty optimized EMPTY BASE OPTIZATION) [4] in any possible places. (I didn't say "industrial grade strength" this trendy word?) Use EBO very efficient because most dispensers are empty. Cute C is ok, we have three great Basic_String implementations in his hands, each realized probably 433 line code, so we already have a 1300 line code. Not bad, especially when you think you can easily add new implementations. If you think this is very interesting, this article has reached the purpose. But don't forget that the beginning of the section tells you a lot of fun, but now just start. Let's take a look at SSO (small string optimization) [5]. The idea behind the SSO is to store a small string directly in a string object (instead of dynamically allocating memory). When the size exceeds String energy to accommodate the range, use a dynamic allocation policy. These two policies share memory within the string object to store record data. The String class can distinguish between these two mechanisms through some species. template class sso_string {struct DynamicData {.. ..}; static const unsigned int maxSmallStringLen = 12; union {E [maxSmallStringLen] inlineBuffer_; DynamicData data_;}; bool isSmall_; ...};

If ISSMALL_ is true, the string is stored directly in InlineBuffer_, otherwise DATA_ is valid. What is the dynamic allocation mechanism for DynamicData? A std :: Vector? A SimpleStringStorage? A AllocatorstringStorage? Answer is of course "Please give me all the above and more" is very obvious, using SSO and any storage mechanism you choose to use. In this way, the SmallStringopt template class has another template parameter as a storage mechanism.

Template Class SmallStringOpt {enum {temp = threshold> sizeof (Storage) threshold: sizeof (Storage)}; public:? Enum {maxSmallString = temp> sizeof ( Align? Temp: sizeof (align)];

PRIVATE UNION {E BUF [MAXSMALLSTRING 1]; align align_;}; .. .. Execute Storage Policy .. ..

The BUF_ member variable stores a Storage object or a string itself. But what is Align use? You must handle alignment issues very carefully when processing a case similar to "SEATEDALOCATION". SemallStringOpt accepts a type of aligned alignment because there is no common method, and SmallStringopT accepts a type specified alignment and stores in a virtual align_ variable. How does SmallStringopt distinguish between size strings? When a small string, the last element of BUF_ (that is, BUF_ [MAXSMALLSTRING] stores MaxSmallString to subtract the result of the actual length of the string, and store a constant when a long string. For a string of MaxSmallstring, buf_ [maxsmallstring] is zero, which is very good to serve as the Null terminator and the flag. You can see some techniques, conversions, and low-level things in SmallStringOpt. (We are optimized here, isn't it?) But the result is amazing. We can combine SMallStringOpt and any other Storage, of course, including SimpleStringStorage, VectorStringStorage, and AllocatorstringStorage. So now we have six Basic_String implementations - we have worked more (by way of inciting, is it very interesting?) There is a returns. Now there is 1440 lectures, so you have 240 rows to get every Basic_String implementation. If C programming is a karate, multiple products you have written efficiently, you will be able to deal with multiple enemies at the same time. This is an example - instantiate.

Typedef flex_string , std :: allocator , smallstringopt > >> string;

This specifies a string that uses the std :: Vector storage and the small string of the 6 characters to take optimization

Back to COW No matter if you like it, you can't ignore COW - too much people think it is useful. For these people, let's implement a cowstring template class, this template class can also add COW features to other Storage. Cowstring is this:

Template Class Cowstringp Struct Data {Storage S_; unsigned int rebs_;}; data * pdata_; public: ...

Data Save any Storage you selected plus a reference counter. Cowstring itself contains only a DATA pointer, and multiple cowstring may point to the same DATA object. Cowstring makes a truly replication of its DATA when Data may change. Take a look at this: typedef flex_string , std_allocator , smallstringopt >>>>> String

Now we get a string, which does not use dynamic allocation to less than the five-byte string. For long strings, use COW strategies, and this COW policy is based on the allocator-based implementation. Cowstring doubled the possible instance of flex_string, so we now have twelve implementation for our dominance, the total number of code is increased to 1860 lines, or each implementation 155 lines. In fact, if you consider using the order of SmallStringOpt and CowString, there are twenty-four implementations. However, small strings use COW not an efficient design decision, so we always use cowstring in SmallStringOpt instead of the opposite.

Summarizing Basic_String is a very complex component. However, carefully adopting strategy-based designs can increase your productivity to the highest. By using a limited number of strategies, can you choose to be direct or small strings, or a Basic_String that references the count. All this only needs to pass several parameters to the template class.

Reference Bibliography [1] Herb Sutter. "Optimizations That Aren't (In A Multithreaded World)," C / C User, June 1999. [2] Kevlin Henney. "From membinctly Qualified," C / C Users Journal C Experts Forum, May 2001, http://www.cuj.com/experts/1905/henney.htm.[3] Andrei Alexandrescu. Modern C Design (Addison-Wesley, 2001). [4] Andrei AlexandRescu. "Traits on Steroids," C Report, June 2000, http://ftp.sj.univali.br/prof/fernando montenegro/artigos/GenericProgramingcpp02.htm.[5] Jack Reeves. "String in The Real World - Part 2, "C Report, January 1999, http://www.bleding-edge.com/publications/c 14.htm.

Andrei Alexandrescu is a doctoral student at Washington, Seattle, also of the author of "Modern C Design" book. You can contact him via www.moderncppdesign.com. Andrei is also a C seminar (). A superior lecturer. You can get this source code from the CUJ website and http://merced.go.nease.net/code/alexandr.zip.

转载请注明原文地址:https://www.9cbs.com/read-23041.html

New Post(0)