Data Structure Learning (C ++) Continued - Find (Search) [2]

zhaozj2021-02-16  64

Tree type lookup

Folding is required, ordered, can be randomly accessed, the sequential structure is limited, resulting in the extra burden of sorting (if it is added one by one, the main burden is mobile data, which is a folded half insertion). By observing the process of folding half find, it is found that the MID is from the root determining the root to the leaf node, and the height of the full binary tree having the same node is the same. The advantage of a chain structure is that there is no need to move data, and natural use of chain trees to find a lookup structure should be a good choice.

In front we have written a BSTREE class, this class can generally meet our requirements. In order to make the height of the tree as small as possible in different input conditions, the concept of balanced trees is proposed, such as the front AVLTREE class. The so-called balance, now meant, the ratio of the complete M force height is less than a constant tree, that is, the height ≤ klogmn tree, k is a regular number. Note that the definition of the AVL tree is actually too harsh, it is easy to understand why it is to relax the requirements. The actual equivalent balance tree is to meet this requirement and set up a set of rules, such as B-trees, this will be said.

index

Some dictionaries will provide a directory, most of the case is such a ... xx; b ... xx; ....... In this way, you can quickly turn to the number of pages corresponding to the beginning of the letter (actually know where the end of the end of the letter), and the words in the left and right corners of each page will also illustrate the word range of this page (which can be judged in the case Not on this page). These are indexes.

Using an index can quickly locate the search range, from our cheap dictionary experience, this should also be a way to improve find efficiency. Noting the role of the directory, it makes our dictionary space distribution, rapidly being obtained by us on a page (or a few pages) -, if the data is too large, we can also Let's get a "directory" so that you can read the data block in memory.

B-tree

When the data expands to a terrible degree, the index cannot be loaded into memory - I have seen the print version of the EI search, and a retrieval directory is smaller than the dictionary we use. Our approach is to index retrieval, obvious, each index block should be as big as possible to help us get as much information as possible, and avoid re-checking the search (at this time, there is generally involved access) . The AVL tree is not from the heart, and we need a new structure.

I believe that everyone who has learned here is deeply sick about the definition of B-trees. This responsibility should be negative by people who write books. Although the definition, concept is an important tool and way of people's awareness, but here is appropriate. The reason is that the B-tree does not have a "concept" in the conceptual sense, it is just a "concept", which is a phenomenon that the B-tree can operate. Anyway, let's take a look at the definition of the existing book (omitted "or empty tree"):

Yan Weimin, Wu Weimin "Data Structure (C Language Edition)"

1. There are at most M pieces per node

2. If the root is not a leaf node, at least there are two subtots

3. All non-terminal nodes other than the root have at least ém / 2ù tree

4. All non-terminal nodes contain the following information data (N, A0, K1, A1, K2, A2 ..., KN, AN), where Ki is key, and Ki

Yin people, etc. "data structure (with object-oriented methods and C description)"

1. M-channel search tree (actually is the contents of Articles 1, 4 above)

2. There are at least two children of the root node

3. All nodes other than root node (excluding failure nodes) at least ém / 2ù

4. All failed nodes are located on the same layer. In fact, these nodes are presented as an external node, not a node on the tree.

Very inexplicable definition - how to have at least two children, while other at least ém / 2ù children? Comparing the definition of the AVL tree to see the definition of this definition - AVL tree is just given the definition of the balance "The left and right sub-tree height difference is not more than 1", as for what equilibrium factor, rotation, there is no mention And, that is just to ensure the means of balance. Obviously, the definition of the B-tree now, the results take measures to ensure balancing, including the definition, and this will cause the human awareness, so this definition is unqualified. Perhaps, a reasonable definition that does not define the contradiction now should be this: use the following way to reach the "balance" M-way search tree ..., of course, we must first solve what is "balance", and this intuition definition It has been mentioned.

Next, let's take a look at what is "as follows" to maintain a balance. Recall the AVL tree, the rotation of the rotation is really a very exquisite method - keep the subtree height in the order of order, I also consume a lot of brain cells to explain the AVL tree's explanation from the book inexplicable It became a prescribed class, this is the most proud of the current, and when you can see this text, you should also see the AVL tree explained.

How to turn the two of the AVL trees that can rotate, M fork tree? It seems that I have to change my idea, and the former people have done it, which is the split and mergers of the node.

Ø split

When the node is installed, it is divided, that is, for a node, when the m-element element comes in, it is divided. How to split? The intermediate element is axial, and the left and right partially has a node. As the left and right children of the intermediate element - here still borrowed the concept of the binary tree, in fact, the M-Zuo search tree is a mixed product of multiple binary search tree - This is ordered in the order of order (maybe not say "order", everyone combines the binary tree to understand it), the intermediate element is inserted into the original node's parent node, of course, the parent's festival is equally Split, such a layer until the root node (or no split). Throughout the process of splitting, we can understand what is going on. If you use the above split method, for the root node, there is no child, or a division is either two children. For non-rooted nodes, only when the nodes are full, then there is a minimum of ém / 2ù children.

"All failed nodes are located in the same layer" is the B-tree "Balance Guidelines", which is easy to see such a tree is "balance".

Therefore, the definition of the B-tree is actually an external manifestation of the B-tree to maintain a balanced, which should not be used as the definition of the B-tree, but can only be a description.

Ø merge

Like the binary search tree, the deletion is attributed to the removal of the leaves node, and if not on the leaf node, the cover is covered in the leaf node, converted to the removal of the leaf node. Similarly, it is necessary to balance when unbalanced occurs (ie, the description of the B-tree is not met).

We see that the elements in the parent node and the left and right children may be a node, or they can be said they can merge into a node. In this way, if their elements are greater than M-1, then from more to one less one element - think about the old monk in the TV, the mobile process of the elements is like this: the element is more node - Parent Node (Beads in Hand) - Less nodes of elements. If their element is always less than or equal to M-1, then merge them into a node, so that the parent node will be less elements, and if there is an imbalance, it also processes.

After watching the split merge, it will find that the starting point inserted inserted in the leaf node. After the imbalance occurs, no matter the division or merge, it will pass more (or less) a change to the parent node, resulting in the father The re-balance of the node. The adjustment of the AVL tree is simply, and the difference is that the AVL tree cannot split merge, so it is a way to rotate; or that the B-tree cannot rotate, thus take a split merge. The two adjustment methods have changed the height (poor) in the case of maintaining order.

When the fork of the tree, it will feel inconvenience, so B-tree should be presented in order to reduce the exchange of inside and more, like the outer row, the general application is not large, do not do system level The application (or for the exam) rarely used, no longer specifically explained.

And if it is only in memory, the number of forks will be better, such as 3 fork (because each node has 2 or 3 fork, but also called 2-3 trees), and actually, AVL tree ( Or red black tree) has been done very well, there is no need to pay attention to the top of a fork search tree.

B tree

Think about people are really annoying, B-tree has some people think that it is a binary tree, and come back to a B tree. It seems that foreigners are really hit, binary and balance abbreviations don't keep several letters, all B. B tree is for depositing, the same, general use is not large, and the average person does not have to worry (exceptions to the exam). Note that there are several differences between the B-tree: B tree's non-leaf node is an index, so when the key code is deleted, some cases do not have to change the index. If you find the keyword, you have to go to the leaves node; The leaves nodes are linked together, can be traversed in the head to tail - and the clue binary tree is not like?

Strict version and Yin version have a large difference in the definition of B Tree (more accurate to describe), there is a big difference (several elements in the node), from the purpose of teaching, I agree with the description of the Yin version. Because such descriptions can be more direct expression of B tree from B-tree evolution, the strict version introduces additional distinction, dispelled the reader's attention.

Human (hash table)

I do something more like a hash table is born because of the current storage structure. Think about our reservoir (RAM), each unit corresponds to an address, for an array, when you know the first address, you can immediately calculate the address of the nth element, so you can read it immediately. Conversely, it cannot be determined by the content to determine the address of the element. If you want to know the address of the unit containing a content, you must be viewed, or in an orderly situation, it is half-search.

And when our reservoir can support it, the method we have introduced earlier will become bleak - I want to find 15, and I will be positioned to 15, so that o (1) method It's just that people dream of. I don't know if it is lucky (the way you look for is also a big useful place) or unfortunately (directly O (1) lookup impossible implementation), the type of reservoir we want (associated memory) is expected ( It is also possible that capacity is not limited.

Since the hardware is not supported, it will start from the software. I have a tongue in front of it. In order to make you understand that the existing method is to build in an existing structure, and when the existing system changes, the method will follow change.

Obviously, as long as we can establish a functional relationship between content and serial number, you can implement the contents of content → Function transformation → serial number on existing storage structures.

In fact, any available content can be corresponding to an integer (we can always find such a law), however, the cost of one or one is probably the range of integers, the result makes the waste of space very Big. Bi-side, a 1901-2000 incident index table, only one 100-size array Such a pair will find that this is a historical legacy problem. The first year of the BC is not 1 year of AD, which will lead to such a problem. Calendar has been revised for thousands of years. In one sentence, with a large stream. However, if we do an index table of human history, we have five thousand years (how many years are counted in ancient people), but it is not every year (it should be a thing that is worthy of our attention). It is not known for the exact year. It is obvious that the procedure is not very reasonable. Most spaces are idle. Therefore, the actually used function transform is compressed, that is, the value domain of the output is limited range, typical is 0 to the top length. Just like our daily experience, one compression, some elements are squeezed together, "you don't want to". Therefore, there must also be reasonable ways to process "conflicts". Function transformation plus "conflict" processing method constitutes a hash of this search method.

For specific hash functions and conflicts, please read the textbook, I will no longer explain it.

Key Tree (TRIE Tree) and the base number

This is actually the idea of ​​built in a hash. It is easy to find the shadow of the chain stream, and the TRIE tree is more like a static base. The TRIE tree is used to organize the data in the memory is also very good, not only for the deposit. So a thoughtful book is just one or two pages, I have a few pages of this paragraph, ^ _ ^. Please read textbooks, and general readers can skip.

转载请注明原文地址:https://www.9cbs.com/read-22955.html

New Post(0)