Effective STL Terms 23

zhaozj2021-02-11  169

Terms 23: Consider replacing the associated container with the sequencedor

Many STL programmers immediately think of standard associated containers: SET, MULTISET, MAP and MULTIMAP. It's good until now, but it is not always good. If the lookup speed is very important, it is indeed worth considering the use of non-standard HASH containers (see Terms 25). If a suitable Hash function is used, the Hash container can be considered to provide a lookup of the constant time. (If you select a bad Hash function or the table is too small, the search performance of the Hash table may decrease, but in practice this is relatively rare.) For most applications, it is considered to be a long time to find the Hash container to be guaranteed. Time to find SET, MAP, and their Multi colleagues.

Even if you need it is just a guarantee for logos, standard related containers may still be your best choice. In contrast to intuition, for standard related containers, the performance provided is often inferior to the comparison of VECTOR. If you want to use STL efficiently, you need to understand when and how to let a vector can provide a faster search than the standard container.

The typical implementation of standard related containers is a balance binary look tree. A balance binary look tree is a data structure optimized to a hybrid operation of insertion, delete, and lookup. In other words, it is designed to be applied to some insert, then some look, then some inserts, then maybe some delete, then some lookup, then more insert or delete, then more lookups, etc. The key feature of this event sequence is that insertion, deletion, and lookup are mixed together. In general, there is no way to predict what is the next operation of the tree.

In many applications, the use of data structures is not so confusing. They can summarize the use of data structures to be such a trivial stage:

set up. Establish a new data structure by inserting a lot of elements. At this stage, almost all operations are inserted and deleted. I have almost no or not found. Find. Find the specified information piece in the data structure. At this stage, almost all operations are looking for. There is almost no or not inserted and deleted at all. Recombination. Modify the contents of the data structure, perhaps by deleting all existing data and insert new data in place. From the action, this stage is equivalent to stage 1. Once this phase is completed, the application returns to the phase 2.

For applications that use their data structures, a Vector may provide higher performance than an associated container (time and space). But not any vector will, only have sequence vector. Because only the sequence container can use the lookup algorithm - Binary_Search, Lower_Bound, Equal_Range, etc. (see Terms 34). But why is one (sequential) VECTOR two-born to find better performance than a two-bifidist lookup than a binary tree? Because some things are outdated but true, one of them is a size problem. Other things are not so too true, one of which is to reference local problems.

Consider the first size problem. Suppose we need a container to accommodate the Widget object, and because the lookup speed is important to us, we consider a Widget's associated container and a sequential vector . If we choose an associated container, we almost determine the use of balance binary tree. Such a tree uses a tree node, which not only accommodates a widget, but also saves a pointer to the left child, one to its right child's pointer, and (typically) a pointer to its parent node. This means that the space for storing a Widget is at least three pointers in the associated container.

As opposed there, it is not overwriting in the Vector of the Vector: We simply store a Widget. Of course, the Vector itself has overhead, and there may be space (reserved) space at the end of Vector (see Terms 14), but each Vector overhead is negligible (usually three machine characters, such as three pointers or two Pointer and an int), and if necessary, the end of the tail space can be removed by "exchange skills" (see clause 17). Even if this additional space is not removed, it does not affect the following analysis because the memory will not be referenced when the lookup is found. Suppose our data structure is large enough, they can be divided into multiple memory pages, but the vector is less than the page you need to associate containers. That is because Vector does not need to overhead for each Widget, and the associated container adds three pointers to each Widget. To know why this is important, assuming that a Widget on the system you use is 12 bytes, the pointer is 4 bytes, and a memory page is 4096 (4K) bytes. Ignore the overhead of each container. When saving with Vector, you can place 341 Widget on a page, but you can only place up to 170 when using an associated container. Therefore, the associated container and vectors are more than two times the memory. If the environment you use can be used in virtual memory, it can easily see that there is a large number of page errors, so a system will slow down because of large data.

In fact, I am still optimistic about the associated container because we assume that the nodes in the binary tree are set in a related small memory page. Most STL implementation uses custom memory managers (implementation on the container - see Terms 10 and 11) to achieve such clusters, but if your STL implementation does not improve the reference part in the tree node, these nodes Will be dispersed in all your memory space. That will lead to more page errors. Even if you use a custom cluster memory manager, an associated container will also cause a lot of pages errors, because unlike a continuous memory container, such as a vector, node-based container is more difficult to ensure a sequential element in the container. Physical memory is also one of them. However, when the two-point lookup is performed (the translation: the sequence of traversal sequence is also one of the physical memory is also one). It is exactly the least page error.

SUMMARY: Storage data in the sequencedor is likely to be more than memory than saving the same data in the standard associated container; when the page error is worth paying attention, it may be more than one standard in the sequencedor. Find faster in the associated container.

Of course, the big disadvantage of the sequential vector is that it must remain sorted! When a new element is inserted, all things greater than this new element must move one. It is as expensive, if the vector must reassign its inner memory (see Terms 14), it will be more expensive, because all elements in the vector must be copied. Similarly, if an element is deleted from the vector, all elements greater than it will move down. Vector's insertion and deletion is expensive, but the insertion and deletion of the associated container is very light. That's why it is meaningful when you look at it when you find hard and delete mixing when you know your data structure, use the sequencedor instead of an associated container.

This article has a lot of text, but unfortunately there is only a few examples, so let's take a look at a code skeleton that uses the sequential vector instead of SET:

Vector vw; // instead of Set

... // Establishing Stage: Many inserts, // Almost no look

Sort (vw.begin (), vw.end ()); // ends the establishment phase. (when

// Simulate a multiset, you

// May prefer to use Stable_Sort

// instead; see Terms 31. )

Widget W; // Objects used to find values

... // Start finding phase

IF (binary_search (vw.begin (), vw.end (), w)) ... // Find Binary_Search

Vector :: item i =

Lower_bound (vw.begin (), vw.end (), w); // Find through Lower_Bound

IF (i! = vw.end () &&! (* i

// "! (* I

Pair :: item,

Vector :: item> Range =

Equal_Range (vw.begin (), vw.end (), w); // Find through Equal_Range

IF (Range.First! = Range.second) ...

... // End the search phase, start

// Restructuring the stage

sort (vw.begin (), vw.end ()); // Start a new lookup stage ...

Just like you can see, this is very straightforward. The most difficult thing inside is how to make a choice in the search algorithm (such as binary_search, limited_bound, etc.), the terms 45 can help you make a choice.

When you decide to replace MAP or MultiMap with a Vector, things will become more interesting because the vector must accommodate the PAIR object. After all, it is accommodated by Map and MultiMap. But note that if you declare an object (or equivalent multimap), the element type saved in MAP is PAIR . If you want to use the Vector to simulate MAP or MultiMap, you must remove the const, because when you sort the vector, the value of its elements will move by assignment, which means that both components of Pair must be assignable. When using vector to simulate MAP , the type saved in the vector will be pair instead of PAIR .

Map and multimap save their elements in order, but they use only the KEY section of the element (the first component of Pair), so you have to do the same thing when sorting the Vector. You need to write a custom comparison function for your PAIR because PAIR's Operator

Interestingly, you will need the second comparison function to find. The comparison function used to sort will act on two PAIR objects, but look for only the KEY value. You must pass a key type of a key type for a comparison function (value to look for) and a pair (a pair stored in vector) - two different types. There is also an additional trouble, you won't know that Key is also passed as the first parameter, so you really need two comparison functions for finding: a key is delivered, a pair pass. This example demonstrates how to bring these things together:

Typedef Pair data; // In this example

// "MAP" type

Class Datacompare {// Class for comparison

PUBLIC:

Bool Operator () (Const Data & LHS, // Used to Sorting Compare Functions

Const Data & RHS) Const

{

Return Keyless (lhs.first, rhs.first); // Keyless

}

BOOL Operator () (Const Data & IHS, // Compare Functions for Find

Const Data :: First_Type & K) const // (Form 1)

{

Return Keyless (LHS.First, K);

}

Bool Operator () (const data :: first_type & k, // Compare functions for finding

Const Data & RHS) const // (Form 2)

{

Return Keylessfk, rhs.first);

}

Private:

Bool Keyless (Const Data :: First_Type & K1, // "Really"

Const Data :: First_Type & K2) Const // Compare Function

{

RETURN K1

}

}

In this example, we assume that the sequence vector will simulate MAP . This code is almost the literal conversion discussed above, except for the presence of the member function Keyless. The existence of that function is used to ensure consistency between several different Operator () functions. Each such function is simply compared to two key's values, so we put this test in keyless and let the operator () function returns to Keyless what is done, which is better than replication. The wonderful movement in this software engineering enhances the maintenanceability of Datacompare, but there is a small disadvantage that provides an operator () function with different parameter types, which will cause function objects to adapt (see clause 40). Oh, okay.

The sequence vector is used as MAP in nature and is used as SET. The only big difference is that you must use the Datacompare object as a comparison function:

Vector vd; // instead of MAP

... // Establishing Stage: Many insertions,

// Almost no look

Sort (vd.begin (), vd.end (), datacompare ()); // ends the establishment phase. (when

// When simulating multimap, you

// May prefer to use Stable_Sort

// instead; see Terms 31. )

String S; // Objects used to find the value

... // Start finding phase

IF (binary_search (vd.begin (), vd.end (), s, datacompare ())) ... // Find through binary_search

Vector :: item i =

Lower_Bound (vd.begin (), vd.end (), s,

Datacompare ()); // Find through Lower_Bound,

IF (i! = vd.end () &&! (i-> first

// "! (I-> first

Pair :: item,

Vector :: item> =

Equal_Range (vd.begin (), vd.end (), s,

Datacompare ()); // Find through Equal_Range

IF (Range.First! = Range.second) ...

... // End the search phase, start

// Restructuring the stage

Sort (vd.begin (), vd.end (), datacompare ()); // Start a new lookup stage ...

As you can see, once you wrote Datacompare, things are well arranged. Once the position is suitable, as long as your program uses the data structure in the stage of the Page 101, they tend to run faster than the corresponding use of true MAP and use fewer memory. If your program does not operate the data structure in a stage, use the sequencedor instead of the standard associated container can almost determination is a waste time.

转载请注明原文地址:https://www.9cbs.com/read-4780.html

New Post(0)