CUJ: Standard Library: BitSet and Bit Vector

zhaozj2021-02-16  52

The Standard Librarian: Bitsets and Bit Vectors

Matt austern

Http://www.cuj.com/experts/1905/AUSTERN.HTM?topic=Experts

In C , you can play the donation chamber, and even the macro.

-------------------------------------------------- ----------------------------

The people who have the process are familiar with the Boolean options: Handle a set of options into one, package them into a word, use a bit for each option. For example, to set the permissive permissions of UNIX files, you may write like this:

Chmod ("my_file",

S_IWUSR | S_IRUSR |

S_IRGRP | S_IROTH);

Each constant corresponds to a bit; you can specify a lot of options once by combining them with a "bit or" operation.

Packing multiple options into a Word behavior is very common. This trick is used in many places, in the UNIX and WIN32 API, in the iOS_BASE formatted flag of the C standard runtime, and some of its forms are easy to appear in large programs. The collection of bit yuan is very important.

It is not difficult to understand why this skill is very common: another implementation method is to use an array or structure, each option corresponds to a different field, which is clumsy and wasting memory. However, sometimes this technique will cause trouble. First, some calculations may be clumsy: set a named bit to compare directly (Flags | = S_IRGRP), but clear a bit (Flags & = ~ S_IWGRP) how much ugly. You can test if a bit is set, by moving it: if (falgs & s_iwusr); but when the "explicit" test error: IF ((Flags & S_iwusr) == True), or worse IF (Flags & S_IWUSR == TURE). Corresponding to the named bit, for the number of the number, the same clumsy: It is necessary to use an expression similar to Flags & = ~ (1 << n), usually add a mandatory type conversion. Finally, this technique is difficult to have a lot of options:

Because the set of bits is important, the C standard runtime provides explicit support for them - in fact, there are several support. Sometimes you will still want to use a low-level bit (and you have to do this, if you are interacting with C language API), but in most cases, the version of the C runs will more suitable. They have some small problems, but most of them are easy to bypass.

Bitset

Class std :: bitset appears in C Standard Chapter 23 "Associated container". This is not the correct position it should appear, because BitSet does not have any relationship with the associated container such as Set and Map, which does not even meet the most basic needs of the STL container. Put BitSet is better when making an integer, and each of its bits can be accessed separately - but it is not limited by the length of long. The length size of BitSet is determined in the compile period (the number of the bit is a template parameter), but there is no upper limit: BitSet <32> is 32-bit long, bitset <1000> is 1000. The integer operation you have used continues to be valid for BistSet, and has added some operations for convenience. For example, you can write B1 ^ B2 to perform "bit or" operation (at least at least B1 and B2 length). Operating a single bit There are two different interfaces: You can set the nth bit with B.Set (n), clear it with B.Reset (n), and test it with if (B.Test (n)); Or, almost equivalents, you can do BitSet as an array, use B [n] = true, b [n] = false, and if (b [n]) to achieve the same operation. ("Almost" is because there is a small difference: the array version does not perform off -ral inspection, and the set () / reset () version is done. If passing to set () / reset () / test ( The parameter is too large, and it will get out_of_range exception.)

If you use the bitset size appropriate, you can use it as an integer: There is a constructor to create a bitset from unsigned long, and a member function to_ulong () get a unsigned long from BitSet. Of course, you can't use this constructor directly to initialize the bit over the unsigned long range; Similarly, you cannot extract the bit of the unsigned long with to_ulong (). (If you try to do, and any one of the unsigned long is set, to_ulong () will throw an exception). However, if needed, you can bypass these limits by using shifts and masks:

Const int N =

SIZEOF (UNSIGNED Long) * CHAR_BIT;

Unsigned long high = 0x7b62;

Unsigned long low = 0x1430;

Std :: bitset <2 * n> b

= (std :: bitset <2 * n> (hor) << n) |

Std :: bitset <2 * n> (low);

...

Const std :: bitset <2 * n>

Mask ((unsigned long);

Low = (b & mask) .to_ulong ();

HIGH = (B >> N) .to_ulong ();

The 0th bit is defined as the lowest significant bit, so for example, if you write:

Std :: bitset <4> b (0xa);

The place where the place is B [1] and B [3].

It is easy to replace traditional option flags with BitSet: Just declare a BitSet object in the header file to replace integer constants. We have already said two benefits to using BitSet: You get more markers than long, you can use it easier and safer ways to operate each bit. The other is that BitSet gives you a conversion mechanism to conversion between BitSet and text. First, BitSet provides a commonly used I / O operation. This program,

#include

#include

Int main () {

Std :: bitset <12> b (3432);

Std :: cout << "3432 in binary is"

<< B << std :: endl;

}

Give an intuitive result:

3432 in binary IS 110101101000.

The input operation works in the same method: it is read into a string of "1" and "0", converting them into a bitset.

Second, you can convert Bitsets into strings or conversions from string: there is a constructor that accepts a string parameter, and BitSet <> :: to_string () member functions. Hey, although these conversions are useful, the details indicate that it is very inconvenient. Accept string constructor and to_string () member functions are a member template, because the Std :: Basic_String class itself is template because the run library; the usual string class, std :: string is an alias of Basic_String .

The versatility of these member templates is affected by some unfortunate rules of C . You must write:

Std :: bitset <6> (std :: string ("110101"))

Instead of

Std :: bitset <6> ("110101");

Only the string text "110101" is directly incoming version, the compiler error will be given because the compiler does not know what version of the instantiation of the member template. Similarly, if B is BitSet, you can't just write:

Std :: string s = b.to_string ();

You must use this kind of terrorist form:

Std: string s

= B.Template TO_STRING

Std :: char_traits ,

Std :: allocator > ();

(Yes, the keyword that looks laughs is really necessary.)

Of course, in practice, you should not pollute your code in such something. Unless you really need to work with a variety of characters, you can encapsulate the horror grammatical detail into the auxiliary function:

Template

Std :: bitset

From_string (const st :: string & s) {

Return std :: bitset (s);

}

Template

Std :: string

TO_STRING (const std :: bitset & b) {

Return B.Template To_String

Std :: char_traits ,

Std :: allocator > ();

}

Vector

BitSet does have an important limit: it has a fixed length. You can have a bitset than long, but you have to specify its size in advance. Things to the option flag set, this is good, but it is not suitable for other purposes. If you are handling a huge terms set in a complex order, and you need to master what you have seen. This requires an array of a Boolean value, there is reason to use "compressed" array, each element is used, but BitSet is no longer a reasonable choice. The number of clauses you are dealing with until the runtime can you know, and the terms may even increase or remove. Another mechanism for another management bit set in the C standard runtime is a specialization of Vector , Vector <> template. In some aspects, Vector and BitSet are very icing: Each element is represented by a bit, allowing you to use an array syntax (for example, v [3] = true) to access single bit. The difference is that BitSet uses its own unique mechanism, and Vector uses the usual STL interface. You can use the resize () member function to change the number of elements, or add new elements with push_back (), just like other vector .

Although vector does not have a pair operation to provide special support, you can still use the usual STL generic algorithm and Functor to complete these operations. For example, not writing V3 = V1 & V2, you can write:

Std :: vector v3 (v1.size ());

Std :: Transform (v1.begin (), v1.end (),

v2.begin (), v3.begin (),

Std: Logical_and ());

Similarly, you will output Vector Press BitSet's Operator << The same format, you can use a STL generic algorithm once again:

Std :: Copy (v.rbegin (), v.rend (),

Std :: ostream_iterator (std :: cout);

(This code depends on a fact, by default, BOOL output uses "1" and "0" instead of "true" and "false". It also noticed that we are using Rbegin () and rend () to inverse Copy vector . This is the way the bitset output: BitSet is the leftmost number at the time of printing is B [N-1], not B [0].)

As long as it is possible, you should always use BitSet instead of Vector : The fixed size data structure has better performance compared to the data structure that supports the vector interface that supports generic purposes, while in space and time. (In a time test I have run, BitSet is almost 5 times faster than Vector.) If the bit set size managed is not prior presence, you need to use Vector .

It seems that there is a situation that should be used with vector instead of BitSet: When interacting with the STL generic algorithm. The STL generic algorithm uses the selection, and Vector provides the selection (we see V. Segin () and v.end ()) in the example, BitSet is not. You can use the array syntax to access a single bit in BitSet, but it doesn't have a Begin () and End () member functions.

However, you should not let this lack of you! Although BitSet does not have an STL container interface, it is still a very good (fixed size) container. If you make sense, and if you need to choose a child, you can define a simple "subscript selection sub" adapter to convert the selection (such as * i) into an array expression (such as B [n]) . Implementation is clearly: maintaining a pointer to a subscript and pointing to the container. Details, most of us is used when implementing Random Iterator, seeing in Listing 1. We also define some non-members' auxiliary functions, begin () and end (), which accepts a bitset as a parameter. (ITERATOR we displayed in Listing 1 is universal as it possible: If we are willing to accept a slightly cumbersome interface, we can define a class that can work with any similar to array. A universal destination subscript Selecting sub-adapters are often useful when processing pre-STL containers, sometimes, even if the STL container is processed, for example, vectors.)

Using bitset_iterator, BitSet can now interact with STL components: For example, you can copy a BitSet into Vector :

Std :: bitset <10> B;

...

Std :: Vector

b (begin (b), end (b));

However, if you have read Listing 1 carefully, you may have noticed a question of bitset_iterator: The name is a lie because BitSet_iterator is not really an Iterator. If i is an Iterator, then * i should return to the reference to the object referred to. BitSet_iterator does not do this: const bitset_iterator returns BOOL, not const boxol, and can modify version of BitSet_iterator returns a type of bitset <> :: Reference agent object, not Bool &.

Because the bits are not independently addressing, this is the best we can do; in fact, Vector :: Iterator acts in the same way - again, this means vector Not a real STL container. Say bitset_iterator and vecotr :: Iterator is not very correct, but both are close to Iterator, so they can be used in many (not all!) Expect Iterator places.

to sum up

The array of Boolean values ​​is very common in large procedures, and the C standard runtime provides several ways to represent such an array. I don't have all possible: For example, you can use Valarray , in some cases it is well suited to represent a sparse bit vector, like Set .

Many times, in any case, the easiest way is to use std :: bitset. If you know how much your Boolean array is in the compile period, or at least specify a reasonable upper limit, then BitSet is simpler and more efficient. There are some annoying problems on the interface of BitSet, but it is easy to bypass them by some auxiliary functions.

Listing 1 - bitset_iterator, an iterator adaptor class for std :: bitset

Template structiff;

Template

Struct IF {

Typedef iftrue Val;

}

Template

Struct IF {

Typedef iffalse val;

}

Template

Class bitset_iterator {

Private:

Typedef std :: bitset bitset;

TypedEf TypeName if :: VAL

QBitSet;

Typedef std :: random_access_iterator_tag

Iterator_category;

Typedef Bool Value_Type;

Typedef std :: ptrdiff_t Difference_type;

TypedEf TypeName if :: val *

Pointer;

TypedEf Typename IF

Bool,

Typename BitSet :: Reference> :: Val

REFERENCE;

QBitSet * b;

Std :: size_t n;

PUBLIC:

BitSet_Iterator (): b (), n () {}

BitSet_iterator (QBitSet & B, std :: size_t sz)

: B (& B), N (SZ) {}

BitSet_iterator (const bitset_iterator & x)

: B (x.b), n (x.n) {}

BitSet_iterator & operator = (const bitset_iterator) {

B = x.b;

n = x.n;

}

PUBLIC:

Reference Operator * () const {return (* b) [n];}

Reference Operator [] (std :: ptrdiff_t x) const {

Return (* b) [n x];

}

BitSet_iterator & operator () { n; return * this;}

BitSet_iterator Operator (int) {

N;

Return bitset_iterator (* b, n-1);

}

BitSet_iterator & operator - () {--n; return * this;}

BitSet_iterator operation - (int) {

N;

Return bitset_iterator (* b, n 1);

}

BitSet_iterator Operator (std :: ptrdiff_t x) const {

Return bitset_iterator (* b, n x);

}

BitSet_iterator & operator = (std :: ptrdiff_t x) {

n = x;

Return * this;

}

BitSet_iterator Operator- (std :: ptrdiff_t x) const {return bitset_iterator (* b, n - x);

}

BitSet_iterator & operator - = (std :: ptrdiff_t x) {

n - = x;

Return * this;

}

PUBLIC:

Friend Bool Operator == (bitset_iterator x,

BitSet_iterator y) {

Return X.B == Y.B && x.n == Y.N;

}

Friend Bool Operator! = (bitset_iterator x,

BitSet_iterator y) {

Return! (x == Y);

}

Friend Bool Operator <(bitset_iterator x,

BitSet_iterator y) {

Return X.N

}

Friend Bool Operator> (BitSet_iterator X,

BitSet_iterator y) {

Return Y

}

Friend Bool Operator <= (bitset_iterator x,

BitSet_iterator y) {

Return! (Y

}

Friend Bool Operator> = (bitset_iterator x,

BitSet_iterator y) {

Return! (x

}

Friend st: PTRDIFF_T OPERATOR- (bitset_iterator x,

BitSet_iterator y) {

Return X.N - Y.N;

}

Friend BitSet_Iterator Operator (std :: ptrdiff_t n1,

bitset_iterator x) {

Return bitset_iterator (* x.b, x.n n1);

}

}

Template

BitSet_iterator

Begin (Const std :: bitset & b) {

Return bitset_iterator (b, 0);

}

Template

BitSet_iterator

End (const st :: bitset & b) {

Return bitset_iterator (b, n);

}

Template

BitSet_iterator

Begin (std :: bitset & b) {

Return bitset_iterator (b, 0);

}

Template

BitSet_iterator

End (std :: bitset & b) {

Return bitset_iterator (b, n);

}

- End of listing -

转载请注明原文地址:https://www.9cbs.com/read-26694.html

New Post(0)