Huffman Code Introduction

zhaozj2021-02-11 219

In the book of the general data structure, the chapter of the tree will generally introduce Huffman trees and Hafman coding. Hafman coding is an application of Hawman's tree. Huffmann coding is widely used, such as JPEG is applied in JPEG.

First introduce what is Hawman tree. The Hawman Tree is also known as the optimal binary tree, is a binary tree with the shortest band length. The ribbon length of the so-called tree is the weight of all the leaf nodes in the tree multiplied to the path length to the root node (if the root point is 0 layers, the path length of the leaf junction to the root point is the leaf. Number of nodes). The ribbon length of the tree is recorded as WPL = (W1 * L1 W2 * L2 W3 * L3 ... WN * LN), n weights Wi (i = 1, 2, ... n) constitute one A binary tree with N leaves, the path length of the corresponding leaf node is Li (i = 1, 2, ... n). It can be proved that the WPL of the Havman tree is the smallest.

Hafman proposed this coding in the early 1950s, constructs the shortest code of the average length according to the probability of characters. It is a growing code. In the encoding, if the length of each codeword is strictly arranged in a reverse order of the probability of the probability of the symbol of the codeword, the average length of the encoded is minimal. (Note: The codeword is the code obtained after the symbol is encoded by Hamfman, and its length is different from the probability of symbols, so haffman code is a growing code.)

However, how to construct a Hawwman tree? The most generalized constructor is the Havman algorithm. The general data structure can be found in the book:

First, the initial set f = {T1, T2, T3, ..., Ti, a given N weight {W1, W2, W3, ..., Wi, ..., Wn} constitute N , ..., TN}, where only one weight value of Wi is only a root node of Wi, and its left and right bonies are empty. (To facilitate the implementation of algorithms on your computer, it is generally required to be arranged in ascending order of the weight Wi of Ti.)

Second, the smallest tree of two root nodes is selected as the left and right subtree of the newly constructed binary tree, and the weight of the root nodes of the new binary tree is the sum of the roots of the left and right subtails.

Third, the two trees are removed from f and the new binary tree is also arranged in ascending order.

Fourth, repeat two and three steps until there is only one binary tree in the collection f.

Implement the above algorithm with a C language that can be used with static binary or dynamic binary trees. If you use a dynamic binary tree, the following data structure is available:

Struct Tree {

Float weight; / * Weight * /

Union {

Char Leaf; / * Ye Junction Information Character * /

Struct Tree * Left; / * The left knot * /

}

Struct Tree * Right; / * Tree right knot * /

}

Struct Forest {/ * F set, indicated in the form of a list * /

Struct Tree * Ti; / * f Tree * /

Struct Forest * next; / * Next Node * /

}

Example: If the probability of the letters A, B, Z, and C is: 0.71, 0.54, 0.28, 0.43; the corresponding weight is: 75, 54, 28, 43.

(Figure 1)

After constructing the Hafman tree, you can coding according to the Hawman tree. For example, the above character consists of a Hafman tree based on the probability that it appears as the weight, the corresponding code value obtained by Hamfman encoding. As long as you use the same Hafman tree, you can restore the code to the original group of characters. Obviously Hafman coding is a prefix code, that is, any character's encoding is not a prefix of another character's encoding, otherwise, the encoding cannot be translated. For example: A, B, C, D encoding is: 0, 10, 101, 11, can be translated into BB or CA for the encoded string: 1010, since B encoding is the encoded prefix of C. The rule that just conducts Hafman coding is the path from the root node to the leaf node (including the original information), and the left child is coded to 0, and the right child is coded to 1, of course, you can also reverse it. This encoding method is a static Hafman code that scans twice on data that needs to be encoded: the frequency of each character in the original data is used, and the resulting frequency value creates a Hafman tree, and must The information of the tree is stored, that is, the frequency value of the character 0-255 (2 ^ 8 = 256) is stored in the length of 2-4BYTES, and the frequency value is stored with the length of 4bytes, the frequency value is represented by 0-- 2 ^ 32-1, this is enough to indicate the frequency of characters in large files) to decompress the same Hafman tree for decompression; second, according to the first pass scanned Hufman tree, And store the code obtained after the encoded.

The static Hafman coding method has some shortcomings: First, the meaning of the excessive file is not large, because the light of the 4BYTES is stored in the storage of Hawman's trees; second, Wakman Encoding, when you store encoded information, if you use a communication network, a large delay is caused; three, frequent disk read / write access reduces the speed of data encoding when encoding larger files.

Therefore, some people have proposed a dynamic Hafman coding method. Dynamic Shuman encoding uses a dynamic changed Hafman tree, encoding the code T 1 character is made according to the first T characters before the original T-character, the encoding and decoding use the same The initial haffmann tree, each of the other characters, encoding, and decoding using the same method to modify the Hafman tree, so there is no need to save the information of the Hawman tree for decoding. The time required to encode and decode a character is proportional to the coding length of the character, so dynamic Hafman coding can be performed in real time. Dynamic Avocal Coding is more complicated than static Hafman, interested readers can refer to books on data structures and algorithms.

The JPEG mentioned earlier is used in JPEG. It is not to say that JPEG can only be encoded with Hafman, but a picture after multiple steps gets a column of values, to do these values MAN code for storage or transmission. The Hafman coding method is more readily understood, and everyone can write Havman encoding and decoding procedures based on its encoding method.

转载请注明原文地址:https://www.9cbs.com/read-5991.html

9cbs

New Post(0)