A strategy-based Basic

zhaozj2021-02-11  145

Generic : a policy-based Basic_String implementation

Andrei alexandrescu

Translation: Skywalker

There are two interesting topics in the Generic section of this month. One - discussed the implementation of the Basic_String component in Standard Library (well known, string is the definition: typedef Basic_String string;), it is a very important part of C Library. But what is actually, the code available for download is specially carefully made in conjunction with VC 6.0. VC 6.0 is a compiler with two major opposing characteristics - it has the popularity of it and its weak support for generic programming.

Basic_string, which is accompanied by this article is not one, two, but twelve, with different trade-off. They are not a toy code, we discuss all the complete, efficient, compliant standards, industrial-grade strength (oh, of course, there may be bugs). You may first think of this will have a terrible code, but I believe me, this article will be very interesting.

One size does not apply all the body

First of all, everyone may puzzle why it is necessary to implement Basic_String again? He is not already implemented in Standard Library, and then realizes a Basic_String whether it only has a teaching value.

It is well known that there is a very serious problem with String in multithreaded programs. The standard library will try to use Copy-On-Write in the implementation of the Basic_String. (Copy-On-Write has a kind nickname in its support: COW, but in the eyes of the opponents, "The Mad Cow" is based on COW-based Strings Internal use reference counters, this is Multi-threaded programs are unavailable, but if Standard Library supports multi-thread in the implementation, the performance is often unacceptable when using the single thread, and it seems to be selected.

(Translation: The so-called Copy-ON-WRITE is a plurality of objects that use the same amount of data using the same amount of data. Only when the object changes its own data, it will copy a data to you, and then change the data. The advantage of this is to reduce memory allocation, assignment, etc. in the object replication. Disadvantages are that since multiple objects use the same memory data, if they don't lock, I will have trouble in multi-thread, huh, huh !!)

When you use COW STRINGS in the program that uses a dynamic connection library, you can also have a shallow copy (Shallow Copies) when you have a String allocated by a dynamic library. Release the dynamic library.

A large number of discussions arising from COW STRINGS have enabled Most STLs to abandon COW's way, and it is changed to Alternate Optimization Strategies. But "most" is not "all", so when you use Basic_String in multithreading, you have to use the Non-COW mode, so you have to give up the Basic_String implemented by STL.

On the other hand, the COW implementation is still very useful for considerable applications. Ask, you can use some of your own implementations in some application, this is not a better way? "I want to implement String with Non-COW and optimize the number of characters in 16 or less.", "," I want to use COW, and use my own heap assignment operator. "How to 200 Implement Basic_String within line code

No matter the effectiveness of multiple String implementations, only one will be a fearful task. In the external interface you implement, you need to achieve a large number of member functions and type definitions, it is really not an easy task. I understand this, because I have previously implemented a Basic_String interface for a Basic_String interface to allocate an operator and multi-threaded application.

When I can't write all UTILITY functions according to the rule of Standard, I noticed an interesting fact. A large number of member functions look around a small, core function and type - in other words, you can divide the Basic_String interface to "Core" and "Utility". The "Utility" section is not related to your implementation strategy, and "Core" has a completely different design in different implementations. For example, the Replace series function in "utility" is implemented by the Resize this "Core" function.

There is a difficult point here: the code in the "Utility" is very large (more than 700 rows in my implementation). In contrast, write a "core" section, even rigorous, able to withstand the test of the test, is also a relatively easy task - the amount of code in my implementation is approximately 75-250 rows. This means that you can relatively easily achieve different "core" in the "Utility" interface to meet different needs. You just need to achieve the huge code once. (In fact, you don't need to write even once, you can download the source code attached to this article, they are eager to use !!) Really, only 200 lines can realize the Basic_String in your dream.

A strategy-based string

Perhaps some people who have read my book know that the original intention of our initial design is to - Policies. Of course, when you want to change the implementation of a particular part in a class, you want users to choose a particular implementation, you should use this part as a template variable and define an interface. Although this approach is not high, it is indeed very effective.

Standard Basic_String statement looks like this:

Namespace STD

{

Template

Class T = CHAR_TRAITS ,

Class a = allocator >

Class Basic_String;

}

E is a string character type (usually char or WCHAR_T), T controls how string is comparison and replication, and A is all we know, but never use allocation operators. We will add a fourth template variable, responsible for how accurately implements String. Since it is handling how accurate storage string, we call it Storage Policy

The new String is named flex_string, as you can see, it has considerable flexibility, scalability.

Template

Class T = CHAR_TRAITS ,

Class a = allocator Class Storage = AllocatorstringStorage >

Class flex_string;

Storage defaults to AllocatorstringStorage . This is a very simple, straightforward storage strategy, which uses a direct Eager Copy mode (relative to COW). In this implementation, flex_string has a Storage object and uses the types and member functions it provides.

The exact degree of choosing the Storage interface may become a bit different. In essence, after using my Basic_String implementation, I found a set of functions. If there is no such function, I will not be able to provide realization, and these functions are non-redundant. This is a semi-formal condition paradigm that the Storage Policy implementation must satisfy.

Template

Class StorageImpl

{

PUBLIC:

Typedef Some_Type Size_Type;

TypeDef Some_Type Iterator;

Typedef Some_Type Const_iterator;

TypeDef Some_Type Allocator_Type;

StorageIMPL (Const StorageImpl &);

StorageIMPL (const allocator_type);

StorageImpl (Const E * S, SIZE_TYPE LEN, Const Allocator_Type & A);

StorageImpl (SIZE_TYPE LEN, E, Const Allocator_Type &);

Iterator begin ();

Const_iterator begin () const;

Iterator end ();

Const_iterator End () const;

SIZE_TYPE SIZE () Const;

SIZE_TYPE MAX_SIZE () Const;

SIZE_TYPE CAPACITY () Const;

Void Resize (SIZE_TYPE, E);

void reserve (size_type);

Void Swap (StorageImpl &);

Const E * c_str () const;

Const E * DATA () Const;

Allocator_type get_allocator () const;

}

It is very beautiful, it is quite simple (because it does not use allocator). This means you can implement a complete, efficient interface with Types and Functions defined by a very small Storage core.

The flex_string class has a Storage object in a 'value'. I chose private inheritance because some irrelevant factors. Therefore, flex_string looks like this:

Template

Class T = std :: char_traits ,

Class a = std :: allocator ,

Class Storage = AllocatorstringStorage >

Class Flex_string: Private Storage

{

PUBLIC:

TypeDef Typename Storage :: item Iterator;

TypedEf TypeName Storage :: const_iterator const_iterator;

...

// 21.3.1 Construct / Copy / Destroy

Explicit flex_string (const A & a = a ()))

: Storage (a)

{}

...

}

Implement storage strategy

OK, now let us enter the inside of Storage and see how to implement these Storage. A String has a pointer to buff. The length, maximum length, and String itself are saved in BUFF. In order to avoid allocation of memory twice (once a data tag, once a data itself), you can use a small door "The Struct Hack": BUFF's last element is a c-style's character array, which can dynamically extend to adapt to characters Number of changes. SimpleStringStorage this:

Template >

Class SimpleStringStorage

{

Struct Data

{

E * pend_;

E * pendofmem_;

E buffer_ [1];

}

Data * PDATA_;

PUBLIC:

SIZE_TYPE SIZE () Const

{Return PDATA _-> Pend_ - PDATA _-> Buffer_;

SIZE_TYPE CAPACITY () Const

{RETURN PDATA _-> PendOfMem_ - PDATA_-> buffer_;

...

}

The PEND_ pointer points to the end of the string, the pendofmem points to the end of the assigned memory buff, and the buffer_ will be extended to store multiple characters, as a string storage space - in other words, the object of the DATA data structure can be expanded To put down more characters. In order to achieve this flexibility, the PDATA_ pointer is actually not strictly pointing to the Data object, and a larger memory is filled into a Data structure. This "struct hack" is theoretically unacceptable to 100% portability, but in practical applications, it does work very well.

SimpleStringStorage has a small optimization - all EMPTY STRING shares a static DATA instance. Another implementation is to initialize the PDATA_ pointer of EMPTY STRINGS to Zero, but it cannot be tested through the popularization of many member functions.

SimpleStringStorage whose "Simple" is also because it avoids allocator. SimpleStringStorage Simplely uses the New / Delete operator to manage memory. The use of PaSSED-ILOCATORs to allocate space for Data is difficult than look, partly because Allocator's design (it does not support objects that can change the size), as well because of the compiler compatibility. You can find an implementation of the Storage policy that uses Allocator in the class template AllocatorstringStorage.

Another possible implementation is a simple use of Std :: Vector in the background. This implementation is very simple, effective, you can get STL support, which means that String can be very simple, wonderful way to reuse Standard Library. This will also help you reduce the amount of code to the minimum size. You can see this implementation in VectorstringStorage.

The above three implementations use EPO (EMPTY BASE OPTIZATION). (Remember the "industrial grade strength" I said?) Use EPO very effective because most allocators actually Empty Class.

Exhilarated C OK, so far, there are a total of approximately 1300 lines of code, with three very beautiful Basic_String implementations. On average of 433 lines. Not too bad, especially you should think of it is already very easy to add a new implementation.

If you think this is very interesting, this article has reached its purpose so far. But don't forget that we have finally said that there is a lot of interesting things, now we started again.

Now let's enter the SSO (Small String Optimization small string optimization). The idea of ​​SSO is how to store small strings in an appropriate manner (without dynamic storage). ALLOCATION STRATEGY is used when String size becomes large. Both methods share memory in the string to record data. String Class can distinguish one of these two mechanisms by tagging.

Template

Class SSO_STRING

{

Struct DynamicData {...};

Static const unsigned int maxsmallstringlen = 12;

union

{

E [MaxSmallstringlen] inlineBuffer_;

DynamicData Data_;

}

Bool issmall_;

...

}

When Issmall_ is True, String uses inlineBuffer_. Otherwise use DATA_. The question is, what kind of policy is used to allocate space for DynamicData? Std :: Vector? SimpleStringStorage? AllocatorstringStorage? The answer is of course "casual you choose"

Obviously, no matter what replaceable storage you use, SSO is Orthogonal. Therefore, the SmallStringOpt class has a Storage template variable:

Template

Class SmallStringopt

{

Enum {temp = threshold> sizeof (storage)? threshold: sizeof (storage)}}}

PUBLIC:

ENUM {MaxSmallstring = TEMP> SIZEOF (ALIGN)? TEMP: SIZEOF (align)};

Private:

union

{

E buf_ [maxsmallstring 1];

Align align_;

}

... Implement The Storage Policy ...

}

The BUF_ member variable saves the Storage object or the String string itself. But what is Align? Ok, when you want to deal with "Seated Allocation, you must be careful to process the alignment problem. Because there is no simple method to calculate the aligned value, SMALLStringopT accepts a specified type variable as a dummy variable.

How does SmallStringOpt look good, small and different strings? In a small string, the last element of BUF_ (ie, BUF_ [MAXSMALLSTRING]) is stored in size of MaxSmallString and the actual string size, but the magnificent string is a value. For a string equal to MAXSMallString, BUF_ [MAXSMALLSTRING] is equal to 0, just as a string terminator. (Translation: This implementation method is indeed very clever, buf_ [maxsmallstring] This memory unit serves a double task, which is less than equal to MAXSMALLSTRING, indicating that this is a small string, and can indicate the length of the string, equal to 0, and also as a string end When larger than maxsmallstring, represent a long string.) In this class, you will see "Digital Deception", CASTS, and underlying operation (we are discussing optimization, isn't it?), And there is a point of attention : That is we can combine SMallStringOpt and other three storage policies, including SimpleStringStorage, VectorStringStorage, and AllocatorstringStorage. So now we have six kinds of Basic_String storage strategies - our efforts have doubled returns. (Oh, is it very interesting) Now there is a 1440 line code, and the average code performed decreased to 240 lines.

This is an example:

Typedef flex_string <

CHAR,

Std :: char_traits ,

Std :: allocator ,

Smallstringopt >>

> String;

It combines an optimized storage policy that uses a VECTOR-based storage policy and a small string of less than 16 characters.

Back to COW

Whether you like or not, you can't ignore COW - because many people know that mild animals are valuable. For these benefits, let's implement a COWSTRING type template, of course, it can also use any other Storage. Cowstring looks like this:

Template

Class cowstring

{

Struct Data

{

Storage S_;

Unsigned int REFS_;

}

Data * PDATA_;

PUBLIC:

...

}

The DATA structure has a Storage and a reference counter you selected. Cowstring itself has only one pointer to the DATA structure. Multiple cowstrings may point to the same DATA object. Only changes to strings will only be generated in your own copy.

Let's look at this below:

Typedef flex_string <

CHAR,

Std :: char_traits ,

Std :: allocator ,

Smallstringopt

Cowstring >>>

> String;

This String uses a storage policy: When the number of characters is less than 5, it uses a small string optimization method without dynamically requested memory. For longer strings, the COW storage policy is used, while the storage policy of the COW itself uses allocator-based implementations. Cowstring once again doubled the implementation of Flex_String, so I have 12 ways now. The code has a total of 1860 lines, and the average is 155 lines. If you consider the order of use of SmallStringOpt and CowString, there should be 24 kinds. But there is no significance to use SmallStringOpt in COW, so you should always use cowstring in SmallStringOpt instead.

summary

Basic_string is a complex component. Despite this, a careful design storage strategy makes greatly enhance your productivity. With a small number of strategies, you can choose a direct, small string optimized, with Basic_String that has the implementation of the reference counter, just simply assigns several variables to the template class.

reference

[1] Herb Sutter. "Optimizations That Aren't (In A Multithreaded World)," C / C Users Journal, June 1999.

[2] kevlin Henney. "From membinism to method: distinctly Qualified," C / C Uses Journal C Experts Forum, May 2001 (http://www.cuj.com/experts/1905/henney.htm)

[3] Andrei AlexandRescu. Modern C Design (Addison-Wesley, 2001).

[4] Andrei Alexandrescu. "Traits on Steroids," C Report, June 2000, (http://ftp.sj.univali.br/prof/fernando montenegro/artigos/genericprogramingcpp02.htm)

[5] Jack Reeves. "String in The Real World - Part 2," C Report, January 1999, (http://www.bleding-edge.com/publications/c 14.htm).

original

(Http://www.cuj.com/experts/1906/alexandr.htm?topic=experts&topic=experts)

Code download

(Ftp://ftp.cuj.com/pub/2001/1906/alexandr.zip)

About author

Andrei AlexandRescu is the RealNetworks Development Manager. Modern C Design author

转载请注明原文地址:https://www.9cbs.com/read-5522.html

New Post(0)