Effective Standard C Library: Are Set Iterators Mutable or Immutable? Portability Issues in Using The Standard Library
Klaus Kreft and an Agelika Langer
http://www.cuj.com/experts/1810/kreft.htm?topic=experts
In this column, we will examine the implementation of the standard runtime library to see how diversified these implementations, and how much effects on the portability of the procedure. Summary:
l The container set in the standard C Runturship is implemented with a binary tree, which means that the value of the elements in the SET determines the location in the tree. Like all standard library containers, SET accesss the elements through Iterator. The Iterator is designed to provide read / write access to the pointed element by reverse reference operation. In other words, we can modify the elements through the Iterator of the container.
l For a SET, modify the elements to destroy the internal tree structure [Note 1]. Therefore, the set of Iterator (so-called Mutable Iterator) that provides access to access is considered to be dangerous. What's more, if you pass the ITERATOR to a generic algorithm, you will quietly destroy the SET container for some generic algorithms.
l Some runtuies have tried to solve the problem by providing Mutable Iteerator that does not provide Mutable for Set. This is safe, but causing the severity of SET in availability. Based on this reason, other practical actually determine the trust user, and provide Mutable Iterator. The actual result is that SET and its Iterator have different implementations, so we must pay attention to portability issues.
In this column, we examine the different implementations of SET, see what they are introduced and restrictions, and explore what other portability issues and how to solve it.
Internal tree structure
The SET container provided by the C standard runtuant is organized with a binary tree structure because the standard is a complexity requirement for all generic algorithms and container operations. For SET, it requires access to elements to be logged. In order to achieve this requirement, the implementation of the SET must be based on the binary structure. (Details of the binary tree can be found in the computer books discussing the data structure and algorithm [Note 2].)
The position of the elements of the binary tree is determined by the order of the sort order. This is exhibited on the SET: it requires a comparator that operates to the element type. Let us consider an example. Suppose we have a bank account class Account. It has an account number Number and balance balance. In our program, maintain all accounts with a SET container. The sorting criterion is based on the account number Number. SET uses the Less-Than as the comparator and defines the corresponding Operator <() version to compare the account object with an account number Number.
Class Account {
...
SIZE_T _NUMBER; // determines Ordering
Double _balance; // IrRrelevant for ordering
}
Bool Operator <(Const Account & LHS,
Const Account & RHS)
{RETURN LHS._NUMBER Set SET determines the location of the element in the inner binary tree in this order. Whenever an element is added to the SET, it will be automatically placed in the correct position so that the elements remain sequentially. When looking up an element, the corresponding search algorithm can efficiently traverse the tree structure based on this sort. Since all the operations of the SET relies on the correct arranged binary tree, it is necessary to keep the trees remain complete at any time. Naturally, the binary tree is hidden behind the SET interface, and as long as we perform a modified operation through its function, we will not destroy the trees in the interior. However, there are Iterator in standard runs. Set of Iterator Like all containers in the standard Runturser, the SET provides access to its internal elements via Iterator. Iterator is a class pointer object that can be reversed (Operaater * () by reverse reference) to access the elements you point to. There are two Iterator: providing read / writes (called Mutale Iterator) and only the immutable iterator. In the standard runtime, the Mutable Iterator type is Container :: Iterator; the Immutable Iterator Type is Container :: Const_Iterator. All containers, including SET, must define these two iTerator types. If we have a set of Mutable Iterator and refer to it, we will get a write access to the element in the set, and can modify its content. Such modifications may be extremely catastrophic. Think about us we are doing: Iterator points to a node of the binary tree. The position of the element in the tree is now correct and reflects its position in sorting. When we change an element in a way that affects the sort order (for example, the data member used by the comparator) is modified, the element should appear in a different location according to the sort. The tree must be reacted to reflect the new sort result. But this is not to happen. We quietly changed the elements and did not change its position in the binary tree. The result is a destroyed tree, while the operation of the destroyed tree is completely unpredictable. Obviously, it is meaningless to modify the elements through the Iteerator of the SET Mutalbe. Therefore, the rules of incomplete are: Specify 1: Do not modify the elements in the SET in a destroyed manner. This rule applies to all modifications made by Iterator, [point to elements in the container]. Although this rule seems to be reasonable, it is more difficult to comply with what we think. Replace the elements of Set Let us review the previous examples to figure out how easy it is to comply with this rule. We maintain a bank account, which is sorted according to the account number Number. A customer wants to change to another account and requires the new account to replace the old. If we have an Iterator pointing to the old account object, we can implement this: Set ... Set ... * t = * new account; // direct mode of element This undoubtedly violates rules 1. Although the new account has the same data as the old account (such as Name, Balance, etc.), but it is likely to have a different account number. The stored position in the new content coverage tree also includes overriding ordering criteria [data used] and destroy the tree structure. To replace the depointed elements in the set, we should use the member function insert () and ERASE () instead of executing the replacement through the set of set. The correct way is: Set ... Set S.insert (iter, * new account); S.RASE (ITER); The INSERT () member function puts the new element in the correct position of the tree, which maintained the integrity of the tree. It exports another rule: Rule 2: Do not replace the elements or references through the set of set of sets. Use the Set's INSTER () and ERASE (). SET container and generic algorithm This is a significant violation of a rule 1. But it is often a violation and not very obvious. How do you say the following program? In our banking program, the canceled account is not immediately removed from the SET, and there will be some time until the garbage collector removes them. Abandoned accounts can be judged by Balance as 0. They can be removed once using the remove_if () generic algorithm. What we need is a predicate function (it determines if Balance is 0): BOOL OBSOLETE (Const Account & ACC) {Return Acc.balance () == 0;} Then use the remove_if () generic algorithm, and it will be completed soon: Set ... // Remove Element if Balance Is Less Than 0 S.RASE (REMOVE_IF (S. Segin (), S.End (), OBSOLETE, s.end (), s.end ()); The thick looks very good, but it makes us destroy the SET container. Nameless: remove () does not remove [element] For this example, why violate the rules 1, it is not very obvious. How do we affect sorting order in a certain mode? From the description of the REMOVE_IF () generic algorithm, it may be concluded that we remove certain elements from a sequence queue and produce another sequence queue. In fact, the standard is described like this REMOVE_IF () generic algorithm: Template ForwardITerator Remove_if (ForwardItemrator First, ForwardItemrator Last, Predicate pred; Requirements: Type T is EqualityComparable. Effect: Elements that meet the PRED (* i)! = False in the removal range [first, last). Return Value: End of the result section Note: Stable; the relative order of the elements that have not been removed is the same as the original interval. Complexity: Accurate Last - First Call Predicate. So why do this removing the tree structure? Answer is all of the change algorithm (WQ Note, see "C Standard Library" Chinese version P386, this book uses modifying algorithm words, and uses Mutating Algorithm to becomes potentially Breaking the tree structure of the SET container. This is because the generic algorithm in the standard runtime library accesss elements in the container through the Iterator, and if it is a variability algorithm, they perform modification operations via the Iterator. REMOVE_IF () is such a variability algorithm. The problem is the removal algorithm (this includes remove (), remove_if () and its variant) are often misunderstood. Their names are wrong: the removal algorithm does not remove anything. In fact, no element is removed from the sequence [Note 3]. Instead, all effective elements (no abandoned accounts in our example) were copied to the sequence head, and a pile of garbage was left at the end. REMOVE_IF () generic algorithm returns the Iterator to the tail garbage, we must manually call the ERASE () member function of the SET to remove invalid elements. Figure 1 (the failure address provided on CUJ http://www.cuj.com/experts/1810/kreft_x.htm?topic=experts#x1) The function of the REMOVE_IF () generic algorithm is illustrated. Inside the REMOVE_IF () generic algorithm, the valid element is copied to the sequence head is performed by the Iterator of the elements in the sequence. This generic algorithm did done the problem we pointed out in Rules 2: It assumed an element to another element by reversing an Iterator. In the implementation of any removability algorithm, we will find such a row of statements: * ore1 = * iter2; In our example, Iter1 and Iter2 are set of set of set. This assignment breaks the sort. Therefore, it guides another rule for SET: Rules 3: Do not use the Variant Algorithm for the SET. In this place, the "Variant Algorithm" refers to the algorithm of modifying the element through the Iterator (or reference) of the container, rather than the algorithm of the operation (the member function of the container) provided. All variability algorithms in standard runs (such as Copy (), SWAP (), Replace (), remove (), reverse ()) are within this category. Dilemma It should be very clear now, and the Mutable Iterator of SET is actually a trap because they make it easy to destroy the SET container. So why do you have them? As will be seen later, the container has Mutable Iterator feels perfect because we don't just check the elements stored in the container, sometimes want to modify these elements. In addition, standard requires all containers must have Mutable and Immutable Iterator types. This is the inherent concept of the container. Iteerator throwing away SET's Mutable will challenge this idea, which is the central idea of containers and generic algorithms in design standard runtime. As a result, the implementation of the running library has fallen into a dilemma, and the standard does not say how to get rid of it. Some implementations of some runners determine trust users, and for these implementations (let us call them as being relaxed), ensuring that the tree structure is our responsibility. However, this is often difficult than we imagined, as shown above. Other implementors intend to reduce potential dangers and decide not to provide the set of Mutable at the SET. In those implementations (let us call them for security), type set :: item is a TypedEf for set :: const_iterator. The effect is that the element stored in the SET is not enough to modify it through the Iterator. This is obviously safe because it moves at least in addition to the possibility of destroying the tree structure through the Iterator. We can still destroy the SET container by pointing to the pointers and references that are accommodated, but do not provide Mutable Iterator is definitely a progress. However, this is cost; some restrictions. Modifying an element does not necessarily affect sorting. If we only change the part that does not affect the sort, how do you say? This should be a harmless modification. Sadly, if set doesn't have Mutable Iterator, then we can't perform harmless modifications through Iterator, although this is safe. Harmless modifications to elements in SET Retrospective example. We use a set of sets and sorted according to the account number. Class Account { ... SIZE_T _NUMBER; // determines Ordering Double _balance; // IrRrelevant for ordering } Bool Operator <(Const Account & LHS, Const Account & RHS) {RETURN LHS._NUMBER Set In this case, Balance and Sort have nothing to do, the Balance data member of the account object in the SET should be safe. Set ... // Direct Modification of Part of the Element ore-> balance = 1000000; In the relaxed set implementation, this can work; when using a security implementation, the compiler will complain about a constness problem. Undoubtedly, the program is meaningful, but its set-based iTerator is Mutable hypothesis, and this practice may not be established. So we have encountered a problem. The problem is mainly about the portability [Note 4]. Things that work in a standard running library can not work in another actual work. How do we solve this problem? Violent way Constness problem can be solved by forced conversion, right? replace ore-> balance = 1000000; We can * (const_cast 1000000; Note that you cannot perform const_cast_cast to an object, you can only do your pointer and reference. Another possible way to define a constant member function in the Account class to modify Balance, but I hope that we can agree to enforce the conversion and fake Const member functions is a bad programming style and should avoid as much as possible. Let us try to do better. Iterator adapter Although we can't remove const_cast, we can still try to deepen the core of the problem and solve it in the place it appears. It is the set of set of set to let us fall into trouble. Why don't you define a new Iterator type, can you let us access the Balance section of the account, not exposing account number? The idea is the Iterator adapted to set the ITerator, but the Iterator allows harmless modifications, but disables modification and sorting part. Replace the Iterator Access Element through Set Set ... ore-> balance = 1000000; We will access it through the adapted Iterator, which can be called BalanceIiter: Set ... * BalanceIiter (iter) = 1000000; In practice, C programmers rarely realize their own custom Iterator types because they think this is too complicated, but it is actually very easy and very useful. Here is a brief description of implementation: class balanceiter { Private: Set PUBLIC: Explicit BalanceIiter (Set : _i (i) {} BALANCEITER & OPERATOR () { _ i; return * this;} // ... postfix , pre- and postfix - ... Double & Operator * () const {RETURN * const_cast } The focus is: l The Iterator adapter saves the original Iterator (ie Adaptee) as a data member. l Adaptee is provided when constructing; that is, the constructor accepts the original Iterator as a parameter. l The typical operation of all Iterator must be provided, such as Operator , Operator -, Operator ==, and more. They are implemented as delegate to adaptee. l The only interesting operation is a reverse reference operation. It must provide a write access to the Balance section of the account object. It is different from the signature of the ADAPTEE's back-to-use operation because it returns a reference to the BALANCE section, not the reference to the entire account object. When implementing the Iterator type, there are several details that requires remember, such as providing the base () member function to access Adaptee, and providing the nested type of ITERATOR (see Listing 1 or [Note 5] as further reading, or Refer to your collection of books for the standard runtime to learn how to implement the Iterator adapter.) Assess We still have to use ugly const_casts somewhere, but it is now hidden in the reverse operation of the Iterator adapter. There is no need to modify the Account class, which is not intended to violate the Const-Correctness rules. We correctly solved the problems encountered without the additional vulnerability in terms of security. There is still a saved security vulnerability; we can use generic algorithms by adapted Iterator, but this problem has been overwritten by rule 3: Do not use a variant algorithm for SET. In addition, this solution is portable. Even when we use SET's relaxation, the Iterator adapter will not cause problems. Because all of its operations are inline functions, it won't add any overhead. In order to avoid portability issues, this is another suggestion: Specify 4: Do not modify the elements through the set of setlates (Types are Set Unshaped to sort modifications are changed in part of the partitioned elements. to sum up The C standard does not specify the set of sets (type set l Specify 1: Do not modify the elements in the SET in a way destroyed. l Specify 2: Do not replace the elements through the set of the set of sets, [point to elements] pointers or references. Use the Set's INSTER () and ERASE (). l Specify 3: Do not use the set of elements to modify the generic algorithm for modifying the elements using the set of Iterator, [pointing element]. This includes all variation algorithms in the standard run library. l Specify 4: Do not modify the elements from the set of sets. Use the Iterator adapter to complete changes that do not destroy sorting. Rule 1 to Rule 3 is always established without relying on a particular implementation of any standard runtime. Rules 4 relate to portability issues, which are derived from different implementations of SET and its Iterator types in practice. Listing 1: The BalanceItemrator Adapter Class BalanceItemrator { PUBLIC: Typedef set TYPEDEF Adapted_Type :: item_category iterator_category; Typedef adapted_type :: value_type value_type; TYPEDEF Adapted_Type :: Distance_Type Difference_type; Typedef Double * Pointer; Typedef Double & Reference; Balanceiterator () {} Explicit BalanceIterator (Adapted_Type I): Adaptee (i) {} Template Adapted_type base () const {return adaptee;} Reference Operator * () const {RETURN Const_cast Pointer Operator -> () const {return (operator * ()); Balanceiterator & Operator () { adaptee; Return (* this); } Balanceiterator Operator (int) {balanceiterator_tmp = * this; Adaptee; Return (_TMP); } Balanceiterator & Operator - () {--Adaptee; Return (* this); } Balanceiterator Operator - (int) {balanceiterator_tmp = * this; --Adaptee; Return (_TMP); } PRIVATE: adapted_type adaptee; } Inline Bool Operator == (Const BalanceItemrator & X, Const balanceiterator & y) { Return x.base () == y.BASE (); } Inline Bool Operator! = (Const BalanceItemrator & X, Const balanceiterator & y) { Return x.base ()! = y.base (); } Note and reference [1] Herb Sutter "Standard Library News, Part 2, Sets and Maps," C Report (October 1999) This article gives background information on sets and maps;.. Sutter explains why keys in associative arrays like set must not be modified. [2] Cormen, Leiserson, And Rivest. Introduction to Algorithms (Mit Press, 1990). [3] Matt Austern. "Algorithms and Containers," C Report (July / August 2000). This article also points out the problem of applying the remove algorithms to associative containers and suggests a solution using container-based generic algorithms and container traits, Which is not part of the site. [4] How can one have a portability problem with the Standard library? After all, the purpose of a standard is that it defines a portability platform. True, it's just that in this case we are talking about an open issue in the C Standard : The Implementation of the Set Iterators Is Still An Open ISSUE (# 103) on The Standards Committee Issue List. The problem.................... [5] Klaus Kreft and Angelika Langer. "Iterators in the Standard C Library," C Report (November / December 1996). The article is also available at http://www.langer.camelot.de/Articles/IteratorsInStdlib/cppr9612_kreft .html.