Effectively represents a collection (transfer to MSDN)

zhaozj2021-02-16  79

Section 6: Effective Representation Collection

Release Date: 5/21/2004

| Update Date: 5/21/2004

Scott mitchell4guysfromrolla.com

April 2004

Summary: Scott Mitchell discusses data structures for implementing generalization and non-intersecting sets. Collection is the only unordered collection of unique items, which can be enumerated and compared to other collections in a variety of methods. (20 page print pages)

Download the sets.msi sample file.

This page

Introduction The basic aspect of the collection provides an effective set of data structures Maintaining a set of non-intersecting collection references related books

Introduction

One of the most basic mathematical structures is a collection, it is the unordered collection of unique objects. The object included in the collection is called a collection of elements. In order to formally, the set is in uppercase, the italic letter is represented, and its element is displayed in braces ({...}). Examples of this representation are as follows:

S = {1, 3, 5, 7, 9} T = {Scott, Jisun, SAM} u = {-4, 3.14159, Todd, X}

In mathematics, the collection is typically constituted by a number, such as a set S, which contains less than 10. However, please pay attention to the elements of the collection can be anything - numbers, people, strings, letters, variables, etc. For example, the collection T contains a human name, a collection U contains a mix of numbers, names, and variables.

In this article, we start from the basic introduction of the collection, including general representations and operations that can be performed on a collection. Then you will find how to implement a collection data structure using the defined domain. This article finally studies the non-intersecting collection and the best data structure used.

Back to top

Basic aspect of the collection

Remember that the collection is just a collection of elements. The "belonging to." Operator is recorded as XS, indicating that X is an element in the collection S. For example, if the set S contains a correct odd number of less than 10, then 1 S. This representation is read to "1 is an element of S." In addition to one element of S, there are 3s, 5s, 7s, and 9s. "Isn't ... The element" operator, records XS, means that X is not an element of collecting s.

The number of unique elements in the collection is the base of the collection. The groups of {1, 2, 3} are 3, which is the same as the base of the set {1, 1, 1, 1, 1, 1, 1, 2, 3} (because it only has three unique elements). There can be no elements at the collection. Such a collection is referred to as empty set, records {} or, the base is 0.

Many developers assume that many developers assume them as collection, such as ArrayList. However, there are some subtle differences between them. ArrayList is an ordered collection of elements - each element in ArrayList has a related order index, indicating the order. Moreover, there can be repeated elements in ArrayList.

On the other hand, the collection is disordered and includes unique items. Since the collection is disorder, the elements in the collection can be listed in any order. That is, the collection {1, 2, 3}, and {3, 1, 2} are considered equal. At the same time, any repetition in the collection is considered to be redundant. Collect {1, 1, 1, 2, 3} and set {1, 2, 3} are equal. If two collections have the same elements, they are equal. (Equal with = symbols; if S and T are equal, then write them into S = T.)

Note In mathematics, the ordered element set of repeating elements is allowed to be referred to as a list. Two lists L1 and L2, for the number of elements in the list from 1 to the list, when and only when the i element in the L1 is equal, it is considered to be equal to L2. Typically, the elements that can occur in the collection are limited to a certain domain. The domain is a collection of all possible values ​​that can appear in a collection. For example, we may only be interested in integration of integers. By limiting the domain as an integer, we can make the collection not contain non-intensive elements, such as 8.125 or SAM. (The domain is expressed as a collection u.)

Collection of relational operators

There is a set of relationship operators that are usually used with numbers. More commonly used (especially in programming languages) include <, <=, =,! =,>, And> =. According to the standard defined by the relational operator, the relational operator determines whether the number of operands on the left is related to the operand on the right. The relationship operator returns "True" or "false" value, indicating whether there is a relationship between the operands. For example, if x is less than y, then x

Relational operators, such as <, <=, =,! =,>, And> = usually used together. As we see, the collection uses the = relational operator to represent two sets of collections (also available! = To indicate that two sets are not equal), but there is no set definition <, <=,>, and> =. After all, how to determine the collection {1, 2, 3} is less than the collection {Scott, 3.14159}?

As an alternative to

Using new subset operators, we can more formally define the set equal. Given set S and T, when only ST and TS, S = T. That is, when each element in S is in T, the S and T are equal when each element in T is in S.

Note According to <= similar, there should be a collection relationship operator with> = similar. This relationship operator is called supercharge, remembering; indicating a true supercoming. Just like <= and> =, ST is when and is only when TS.

Collect operation

Just like the relationship operator, many of the calculations for digital definitions cannot be suitable for collection. Common operations for numbers include addition, multiplication, subtraction, and power, and the like. For collection, there are four basic operations: 1. And - the merger of two collections, records ST, similar to the number of numbers. The operator returns a collection that contains all elements in S and all elements in T. For example, {1, 2, 3} {2, 4, 6} are equal to {1, 2, 3, 2, 4, 6}. (Repeat 2 can be deleted to provide a more concise answer to obtain {1, 2, 3, 4, 6}.) Forms in the form of ST = {x: XS or XT}. That is, the result of S and T is a set, if the element X is in S or T, then x is included in the result set. 2. Intersection - the intersection of two collections, remembering ST, is a collection of s and t co-elements. For example, {1, 2, 3} {2, 4, 6} are equal to {2} because it is the only element that is commonly owned by {1, 2, 3}, and {2, 4, 6}. The form is written, st = {x: XS and XT}. That is to say, the result of S Turn T is a collection, if the element X is in S and in T, then X is included in the result set. 3. Difference - the difference in two collections, remembering S - T, is an element that is in S but not in t. For example, {1, 2, 3} - {2, 4, 6} are equal to {1, 3}, because 1 and 3 are elements in s, but not the elements in t. The formal design is, S - t = {x: XS and XT}. That is, the difference between the collection S and the set T is a set, if the element X is in S and not in T, then X is included in the result set. 4. Supplement - We discussed a known domain that usually restricts collection to a possible value, such as an integer. The collection of complements (recorded as s') is U - S. (Remember that u is the domain set.) If our domain is an integer of 1 to 10, and S = {1, 4, 9, 10}, then s' = {2, 3, 5, 6, 7, 8}. (The collection is similar to the number of questions. Just like the number of numbers will be obtained twice - that is, - x = x - The collection is set twice to get the original collection - s' = S.)

When researching new operations, the essence of firmness is always important. When learning any operation (whether it is a digital definition or a collection definition), some questions to ask yourself is:

? Is this operation to exchange? If X OP Y is equivalent to Y OP X, the operator OP can be exchanged. In the digital domain, the addition is an example of the exchangeable operator and the subtraction is not switchable. ? Is the operation combinable? That is to say, whether the order of operation is important. If the operator OP is incorporated, X OP (Y OP Z) is equivalent to (X OP Y) OP Z. Similarly, in the digital field, the addition is combined, but subtraction is not.

For collection, and intended to be exchanged and can be combined. ST is equal to TS, and S (TV) is equal to (ST) V. However, the difference between the collection is neither exchangeable or in combination. (To verify the difference in the collection is not exchanged, consider {1, 2, 3} - {3, 4, 5} = {1, 2}, but {3, 4, 5} - {1, 2, 3 } = {4, 5}.) Finite set and unlimited set

So far, all of our collection examples have been dealt with limited sets. Finite set is a collection of limited elements. Although it seems that it seems to violate the instinct, the collection can contain unlimited number of elements. For example, positive integers is an unlimited set because there is no boundary in the number of elements in the collection.

In mathematics, there are several unlimited sets commonly used, so they can be represented by special symbols. These symbols include:

? N = {0, 1, 2, ...}? Z = {..., -2, -1, 0, 1, 2, ...}? Q = {A / B: AZ, BZ, And b0}? R = real collection

N is a collection of natural number, or a positive integer set greater than or equal to 0. Z is an integer collection. Q is a grouped gathering, and the rational number is a number of scores that can be expressed as two integers. Finally, R is a set of real numbers, and the real number refers to all stigtes and unmetrified numbers (not representing the numbers of two integers, such as the square root of PI and 2).

Of course, unlimited sets cannot be written all because you can never write these elements, but it can be more simply expressed in mathematical representations, for example:

S = {x: xn and x> 100}

Here S is a collection of natural numbers greater than 100.

In this article, we will explore the data structure representing a limited set. Although the unlimited set is definitely in mathematics, we rarely need to use unlimited sets in computer programs. The expansion and calculation of unlimited sets also have special difficulties, because the content of unlimited set cannot be stored or enumerated in the data structure.

Note The base of the calculation is simple - just calculated the number of elements. But how do I calculate the base of the unlimited set? This discussion beyond the scope of this article, but it is necessary to realize that there are different types of bases in unlimited sets. Interestingly, the "quantity" and all integers of the integer are the same, but the number of real numbers is more than the integer.

Collection in the programming language

C , C #, Visual Basic .NET and Java do not provide built-in language function for use of the collection. If you want to use a collection, you need to create your own collection class with your appropriate method, attributes, and logic. (We will complete it accurately in the next part!) Although there have been programmed languages ​​in the past, the collection is provided as the basic generation block in the language. For example, PasCal provides a collection structure that can be used to create a collection using a clearly defined domain. To use a collection, Pascal provides an in operator to determine if an element is in a given collection. Operators , *, and - the difference in integration and collection. The following Pascal code describes the syntax of the collection:

/ * Declares a variable named PossiblenumBers, a set whose universes is the set of integers BetWeen 1 and 100 ... * / var possiblenumbers = set of 1..100; ... / * Assigns the set {1, 45, 23 , 87, 14} to PossiblenumBers: = [1, 45, 23, 87, 14]; / ​​* sets PossiblenumBers to the Union of PossiblenumBers and {3, 98} * / PossIblenumBers: = POSSIBLENUMBERS [3, 98 ]; / * Checks to see IF 4 IS An Element of Possible NumBers ... * / if 4 in PossiblenumBers the Write ("4 IS in the set!"); Other previous languages ​​allow more powerful set semantics and syntax. A language called SETL (the first letters of Set Language) was founded in the 1970s and puts forward the collection as a head. Unlike Pascal, you will not be restricted to specify the domain of the specified collection when using the collection in Setl.

Back to top

Implement a valid collection data structure

In this section, we will explore classes that create collection functions and features. When creating such a data structure, one of the primary things is that we need to decide how to store the elements of the collection. This decision can greatly affect the asymptotic efficiency of the calculations executed on the aggregate data structure. (Please keep in mind the operations that need to be performed on the collection data structure include:

In order to describe how the storage set element affects the runtime, imagine that the set class we created uses the base arraylist to save the elements of the collection. If we want to merge two sets S1 and S2 (S1 have M elements, S2 has n elements), we must perform the following steps:

1. Create a new set class T that holds S1 and S2 and 2. The elements circulating to the S1 are accessed to T in T. 3. Cyclicize the elements in S2. If the element has not yet been in T, then it is added to T.

How many steps do you need to perform and operate? Step (2) will require M step to access the M elements in S1. Step (3) will need N step, for each element in S2, we must determine if the element is in T. Use unordered ArrayLists to determine if the element is in ArrayList, must linearly enumerate the entire ArrayList. Therefore, for N elements in S2, we must search for M elements in T in T. This will result in the runtime O (M * N) of the secondary side of the operation.

The use of ArrayList and the operation requires two times, the reason is that it is determined whether an element is present in a collection requires linear time. That is, to determine if an element exists in a collection, you must completely search the collection of ArrayList. If we can reduce the runtime of the "belonging" operation of the element to a constant, you can reduce the runtime of the operation as linear O (M N). Remember that in Part 2 of this article series, the hash table provides constant runtime to determine that the item exists in the hash table. Therefore, if the elements used in the store are used, the hash table will be better than the ArrayList.

If we ask a collection domain known, you can use the bit array to implement a more efficient collection data structure. Suppose the domain contains elements E1, E2, ..., EK. Then, we can use the bits of k elements to represent a collection; if the i-bit is 1, the element EI is in the collection; on the other hand, if the i-I bit is 0, the element EI is not in the collection. A collection indicates that the range array not only saves space but also contributes to effective collection operations because these set-based operations can be done using simple bit instructions. For example, determining if the element EI needs to spend a constant time during a collection, as only the i-bit in the bit array is required. Two collections and just a set of bit array of bitors; two sets of exchanges are the bit of the collected bits of the group. The difference and subset of the collection can also be simplified as a bit operation. The note array is an array of compressed, consists of 1 and 0, usually implemented as an array of integers. Since the integer in the Microsoft .NET framework has 32 bits, a bit array can store 32-bit values ​​in an element of an array (rather than 32 array elements).

The bit operation is the operation performed on a single bit of an integer. There is also a duty operator in both binary operators. Bit and and bit OR operators are binary, each operator requires two bits and returns a bit. Only when two inputs are 1, the bit and returns 1, otherwise it returns 0. When two inputs are 0, the bit or returns 0, otherwise returns 1.

For more discussion of more C # bits, be sure to read: Bit-Wise Operators In C #.

Let us explore how to implement a collection class that uses the C # bit.

Create a PascalSet class

Need to know: Implement a collection class with a valid bit operator to implement a collection domain. This is similar to the way PASCAL uses a collection, so in order to commemorate the Pascal programming language, decide to name the set class as a Pascalset class. PascalSet limits the domain to the range of integers or characters (like the PASCAL programming language). This range can be specified in the constructor of the PascalSet.

public class PascalSet: ICloneable, ICollection {// Private member variables private int lowerBound, upperBound; private BitArray data; public PascalSet (int lowerBound, int upperBound) {// make sure lowerbound is less than or equal to upperbound if (lowerBound> upperBound ) throw new ArgumentException ( "The set's lower bound can not be greater than its upper bound."); this.lowerBound = lowerBound; this.upperBound = upperBound; // Create the BitArray data = new BitArray (upperBound - lowerBound 1); } ...}

Therefore, to create a domain is a PASCALSET with an integer collection between -100 and 250, you can use the following symptoms:

Pascalset myset = new PascalSet (-100, 250);

Implement collection operations

PascalSet implementation standard collection - and integration - and standard relational operators - subset, true subset, superchard, and true supercharge. Collecting operations and, however, the collection is poorly returns a new Pascalset instance, which contains the result of the difference between or the collection. Code the following Union (PascalSet) method described in this behavior: public virtual PascalSet Union (PascalSet s) {if throw new ArgumentException ( "Attempting to union two dissimilar sets Union can only occur between two sets with (AreSimilar (s)!). "); // do a bit-wise or to union together this.data and s.data pascalset result = new pascalset (this.lowerbound, this.upperbound); result.data = this.data.or ( S.DATA); RETURN RESULT;} Public Static Pascalset Operator (Pascalset S, Pascalset T) {Return S.Union (T);

The ARESIMILAR (PASCALSET) method determines if the passed PascalSet is the same lower and upper limit as the PascalSet instance. Therefore, and (and the integration of the intersection) is only applicable to two sets of the domain. (You can modify the code here so that the domain that returns to PascalSet is two domain collections, allowing for and calculating a collection of non-interstrous domains.) If two PascalSet has the same domain, then new PascalSet - Result - BitATA-bit OR-DATA-assigned BitArtArtaRset, which is created as the same domain, and its BitArray member variable. Note that the Pascalset class is also overloaded with the operator (like the Pascal programming language).

Enumerate members of PascalSet

Since the collection is the unordered collection of elements, it is meaningless to enable the PascalSet to implement ILIST because the collection of ILIST represents a list of sequences. Since PascalSet is a collection of elements, it makes it meaningful to implement iCollection. Since ICOLLECTION implements IEnumerable, PascalSet needs to provide a getENUMERATOR () method, which returns an IENUMERATOR instance, allowing developers to traverse the set of elements.

When you use some other basic set class to save data to create a dedicated collection class, the getENUMERATOR () method for specialized classes is often just returns the IEnumerator () method from the GetEnumerator () method from the foundation collection. Since PascalSet uses BitArtay to indicate what elements in the collection, it seems that it seems that the pASCALSET's getENUMERATOR () method returns the IEnumerator () method from the internal BitArtaRray. However, BitArray's GetEnumerator () returned to all bits in BitArtaray, returning a Boolean value for each bit - if the bit is 1 returns true, if the bit is 0, returns false.

However, the elements in the PASCALSET are just the element of the BitArtay. Therefore, we need to create a custom class that implements the IENUMERATOR, which intelligently traverses BitArray, and only returns those elements that are 1 in BitArtArray. To process this, create a class called PascalseTenumerator inside the PascalSet class. The constructor of this class accepts the current PASCALSET instance as a unique input parameter. In the MoveNext () method, it accesses each bit of BitArray until the value of 1 is found. class PascalSetEnumerator: IEnumerator {private PascalSet pSet; private int position; public PascalSetEnumerator (PascalSet pSet) {this.pSet = pSet; position = -1;} ... public bool MoveNext () {// increment position position ; // see If there is another element greater tour for (int i = position; i

The full code of the PascalSet class is included in the download included in this article. With this class, there is an interactive WinForms test application settester, from it to create a PascalSet instance and perform different collection operations, and view the result collection.

Back to top

Maintain a set of non-intersectings

Next time you search on Google, please pay attention to each result with a link to the "Similar Web". If you click the link, Google displays a list of URLs, which is related to the items you click on the "Similar Web" link. Although I don't know how Google is determined how it is related, it is a way to say:

• Make X a web page for the related pages we want to find. • Become S1 becomes a collection of web pages to the X-linked Web page. • Because S2 becomes a collection of web pages received by the web page in S1. • Because S3 becomes a collection of web pages received by the web page in S2. ? ...? Making SK a collection of web pages received by the web page in SK-1.

S1, S2 until all web pages in SK are X-related pages. We don't calculate the related web pages in accordance with you, and may choose to create a collection of related pages for all web pages, and store these relationships in a database or some other permanent memory. Then, when the user clicks the "similar web page" link of the search term, we just asked to display links related to the page.

Google has some database, where it knows all web pages. Each of these web pages has a collection of links. We can use the following algorithm to calculate the collection of related web pages:

1. Create a collection for each web page in the database, put separate web pages in the collection. (After this step, if we have N web pages in the database, then we will have a collection of n one element.) 2. For web pages in the database, find all the Web pages that it directly link. Reference to the page of these links to S. For each element P in S, the set of collects containing P is performed and operated. 3. Repeat step 2 for all Web pages in the database. After completing step 3, the web page in the database will be separated into a relevant group. To see graphical representation of this algorithm, please refer to Figure 1.

Figure 1. Linking the graphical representation of the web page grouping algorithm.

Research Figure 1, please note that there are three related parts:

? W0? W1, W2, W3 and W4? W5 and W6

So, when the user clicks the "similar web page" link of W2, they will see links to W1, W3 and W4; when you click the W6's "Similar web page" link, you should only display the link to W6.

Please note that only one collection operation is performed on this particular problem - and. Moreover, all web pages are not intersecting a collection. For a given number of sets, if they do not have a public element, these collections are called non-intersecting. For example, {1, 2, 3} and {4, 5, 6} are not intersecting, and {1, 2, 3} and {2, 4, 6} are not because they share common elements {2}. In all the steps shown in Figure 1, each collection of web pages is not intended. That is, a web page will never exist at the same time in multiple collections.

In this way, when using a non-intersecting collection, we often need to know which particular non-intersecting collection of a given element. In order to identify each collection, we arbitrarily select one as a representative. The representative is an element from a collection of intersectments that uniquely identifies the entire intersecting collection. Using the concepts of the representative, you can determine if they are in the same collection by viewing whether the two give elements have the same representative.

One non-intersecting set data structure needs two methods:

? GetRepresentative - This method accepts ä element as an input parameter and returns the element. Union (Element, Element) - This method accepts two elements. If the two elements come from the same non-intersecting collection, union () does nothing. However, if the two elements come from different non-intersectments, union () combines these two non-intersectings into a collection ã,

Now that the challenge we face is how to effectively maintain the number of non-intersecting collection, and these non-intersectings often come from two mergers? There are two basic data structures that can be used to process the problem: a set of link tables using a series of links.

Use link table maintenance does not integrate collection

In Part 4 of this article, we spent a little time to explore the basic knowledge of the link table. Remember that the link table is a series of nodes that usually have a separate reference to the next neighbor. Figure 2 shows a linked table with four elements.

Figure 2. Link table with four elements

For non-intersecting set data structures, use the modified link table represents a collection. It is not only a reference to the neighbor, and each node in the not intersecting the set link table has a reference to the collection representative. As shown in FIG. 3, all nodes in the link table points to the same node as representative, according to the convention, the node is the header of the link table. (FIG. 3 shows a link table that does not intersect collective representations from the final phase of the algorithm separated from Figure 1. Note that there is a link table for each non-intersecting set, and the node of the link table contains specific Elements that do not intersect the collection.)

Figure 3. Link table for non-intersecting a collection is represented, the non-intersecting set from the final stage of the algorithm separated from Figure 1. Since each element in the collection has a direct reference that returns to the collection representative, the getRepresentative method costs a constant time. (To understand the reason, consider how many elements in the collection will need an operation to find a representative of a given element, as it only needs to check the representative reference of the element.)

Using a link table method, combine two non-intersectments into a one of the needs to add a link table to another and update representative references in each additional node. The process of combining two non-intersecting sets is shown in Figure 4.

Figure 4. Process of combining two non-intersectings

When combining two non-intersecting sets, it does not affect the correctness of the algorithm in the end of which one of the two collections to another. However, it will affect the runtime. Suppose our consolidation algorithm randomly selects one of the two link tables, append it to the end of another. Follow the worst luck, suppose we always choose longer to add in two link tables. This will have a negative impact on the runtime of the computation because we must enumerate all nodes in the link table to update their representative references. That is, suppose we have n non-intersecting sets, S1 to SN. Each collection has an element. Then we have to do n - 1 time and operate, combine all N sets into a large collection with n elements. It is assumed that S1 and S2 are combined for the first time and operate S1 as a representative of the merge set containing two elements. Since S2 has only one element, there is only one representative reference to update. Now, it is assumed that S1 - it has two elements - perform and operate with S3, and S3 is a representative. This time there is two representative references - S1 and S2 - need to be updated. Similarly, when S4 is combined, S4 is used as a representative of the new set, then three representative references (S1, S2, S3) are required. In subsection (N-1), N-2 representative references require updates.

The number of calculations must be performed in each step, we discover the order of the entire step - N times aggregate and N-1 times and operate - requires twice - O (N2).

This worst case runtime may happen because it may select a longer collection to append to a shorter collection. A longer collection requires updating more nodes representative references. A better approach is to track the size of each collection, then, when combining two collections, add the shorter of the two link tables. The runtime when using this improvement method is reduced to O (n log2n). A thorough time analysis is a bit beyond the scope of this article, and it is omitted here. For the form of time analysis, see the readings in the reference part.

To understand the improvement from O (N2) to O (N Log2N), Figure 5 is observed, and the blue showing the growth rate of N2, and the growth rate of N log2n is displayed with pink. For smaller N values, these two results are similar, but when N exceeds 32, N log2n is more slow than N2. For example, using the original link table implementation 64 times and the operation will take more than 4,000 computing, and only 384 operations are required for optimized link tables. These differences becomes more obvious when N becomes larger.

Figure 5. Growth rate of N2 and N log2

Using forest maintenance not intersecting collection

It is also possible to use forest maintenance and non-intersecting collection. The forest is a collection of trees (understand?). Remember to use the link table implementation, the representative of the collection is the header of the list. Using forest implementation, each collection is implemented as a tree, and the representative of the collection is the root of the tree. (If you are not familiar with the concept of the tree, consider reading this article 3 in this article, we discussed the number, bifurcous and binary look trees.) Use the link table method, give an element, find the representative of its collection Soon because each node has a direct reference to its representative. However, use the link table method, and the operation will take a longer time because it needs to add a link table to another link table, which requires the representative reference to the additional node. The goal of the forest approach is to make the operation faster and the cost is a representative of the collection in the case of a given element.

Forest methods implement each non-intersecting collection as a tree and will use roots as a representative. To make two collections and operate, you can add a tree as a child of another tree. Figure 6 describes this concept with graphics.

Figure 6. Two collections

The combined two sets require a constant time because only one node needs to update its representative reference. (In Figure 6, you want to merge W1 and W3 collections, we only need to update W3 reference to W1 - nodes W4 and W5 do not need any modifications.)

The forest approach improves the time required to make two non-intersecting sets, but the time used to find the collection representative is long. Given an element, we determine the only way to represent the collection is to traverse the collection of trees until you find the root. Suppose we want to find representatives of W5 (after combining the set W1 and W3). We want to traverse the tree until reach the roots - first to W3, then go to W1. Therefore, the representative of the lookup collection needs to spend the time associated with the depth of the tree, not the same constant time as the link table represents the same constant time.

The forest approach provides two optimizations, when both use, get linear running time when performing N non-intersecting sets, means that each individual calculation has averaged constant runtime. These two optimizations are called press and compression. With these two optimizations, we must try to avoid a series of trees that are highly and thin. As discussed in Part 3 of this article, the height and width of the tree usually affect its runtime. Ideally, the tree can be dispensed out of the sector, rather than high and narrow.

Press steps and optimized

Press the step and add a short list to a long list of short lists to a long list. Specifically, the order of each set root is maintained according to the order, which provides the upper limit of the height of the tree. When combining two sets, a smaller group is added to a child with a largest order. Press the order and help to ensure the width of the tree. However, even if you use the steps and we may eventually get a very wide but very high tree. Figure 7 shows a tree map, which may be formed by using only a series of steps and optimized and operated. The problem is that the leaves nodes on the right must still have a lot of operations to find representatives of their collection. .

Figure 7. A tree that may be formed by a series of trees that use only one steps and optimized

Note Forest methods only have the same runtime when it is implemented and optimized.

Road compression optimization

Since the high tree makes the representative of the collection collection are expensive, I hope that we hope that our trees are wider and flat. Reward compression optimization can make the tree flat. As we discussed earlier, whenever the collection represents a collection of elements, the algorithm will pass through the tree to the root. Method of distance compression optimization work in this algorithm; in the process of accessing the root, the node updates their parent references to the root.

To understand how this flattened work, consider the tree in Figure 7. Now, suppose we need to find a collection representative of W13. The algorithm starts from W13, to W12, then go to W8, and finally reaches W1, returns W1 as a representative. This algorithm also has side effects when updating the father of W13 and W12 is root W1. Figure 8 shows the screen display of the tree after this approach. Figure 8. Tree after the distance is compressed

The distance compression pays a small amount of overhead in the first lookup representative, but benefits in the future representative lookup. That is, after this approach is compressed, finding a collection representative of W13 requires a step because W13 is the child of the root. In Figure 7, the representative of the Finding W13 will require three steps before the journey compression. The idea here is to pay once for improvement, and then benefit from improvements every time you perform inspections.

When using the step and the distance compression algorithm, the time required to perform N times on the non-intersecting collection is linear. That is to say, use two optimizations, the operating time of the forest method is O (n). You must believe in me, because the form of time complexity has proved very long and complicated, it is easy to write a few pages. However, if you are interested in reading the multi-page time analysis, please refer to the text in "Introduction to Algorithms" listed in the reference.

Back to top

Reference

Alur, Rajeev. "Disjoint Sets"? Cormen, Thomas H., Charles E. Leiserson and Ronald L. Rivest. "Introduction to Algorithms." Mit Press. 1990.? Devroye, LUC. "Disjoint Set Structures".

Back to top

Related books

What written in Thomas H. Cormen INTRODUCTION TO Algorithms

Scott Mitchell, with five books, the founder of the 4guysfromrolla.com website, has been engaged in Microsoft Web technology in the past five years. Scott is an independent consultant, faculty, and writer, and he recently completed a master's degree in computer science in San Diego Gargar. You can send an email to mitchell@4guysfromrolla.com or visit his network diary http://www.scottonwriting.net contact him.