Data compression technology brief history

xiaoxiao2021-03-06  82

Data Compression Technology Simpado computer data compression is actually similar to the slimming exercise of the eyebrows, there are two major functions. First, save space. Take a slimming eyebrows, if the eight eyebrows can be squeezed into a taxi, then there should be much money! Second, it can reduce the occupation of bandwidth. For example, we all want to watch the DVD blockbuster online less than 100kbps, this is like a slimming eyebrows always want to use a foot to cut seven hazelies, the former needs the breakthrough progress of data compression technology, the latter is dependent. The perseverance and perseverance of the crush.

Simply put, if there is no data compression technology, we can not use WinRAR for the attachment in Email; if there is no data compression technology, the digital recording pen on the market can only record the voice of less than 20 minutes; if there is no data compression Technology, downloading a movie from the Internet may have to spend half a year ... How is this all? What is the data compression technology developed?

The probability of probabilistic Chinese scholars know that the abbreviations such as "Ban Ma" refer to San Guardian, this advocacy of simple customs have continued to today's Internet era: when we use it on BBS " 7456 "When" mad, "or" B4 "represents" Before ", we should at least know that this is actually the simplest data compression.

Strict data compression originates from people to probability. When we encode text information, if a shorter encoding is given a higher probability, a longer code is given to a lower probability, the total coding length can be shortened. Before the computer appeared, the famous Morse Electric Code has successfully practiced this guideline. In the Morse code table, each letter corresponds to a unique downtown combination, and the maximum probability of the probability is encoded as a point ".", And the lower probability is the letter z is encoded "-." ". Obviously, this can effectively shorten the final electrode length.

The parent of information theory C. E. Shannon first uses mathematical language to clarify the relationship between probability and information redundancy. In the 1948 papers "A Mathematical THEORY OF Communication" Or is related to uncertainty. Shannon draws on the concept of thermodynamics, excludes the amount of redundancy in the information, called "information entropy", and gives a mathematical expression of calculating information entropy. This great paper was later hailed as the mountaineering of informationism, and the information entropy has also laid the theoretical basis of all data compression algorithms. Essentially, the purpose of data compression is to eliminate redundancy in the information, and information entropy and related theorem are precisely described in the degree of information redundancy using mathematical means. With information entropy formula, people can calculate the limit of information encoding, that is, under certain probability models, lossless compressed coding lengths cannot be less than the result given by the information entropy formula.

With a complete theory, the next thing is to find a specific algorithm and try to make the output of the algorithm close to the limit of the information entropy. Of course, most engineering technicians know that it is necessary to make a nuclear weapon like a formula that is only a E = MC 2 based on the formula of E = MC 2, is not a very easy thing.

The process of specific compression algorithms for mathematics games is usually more like a mathematical game. Developers first look for methods that can be accurately statistically or estimated in information symbols, and then design a set of coding rules that describe each symbol with the shortest code. Statistical knowledge is quite effective for the previous work. So far, people have successively realized static model, semi-static model, adaptive model, Markov model, partial matching predictive model, etc. probability statistical model. Relatively, the development of the coding method is more twist. In 1948, Shannon also gave a simple encoding method while proposing information entropy theory - Shannon encoding. In 1952, R. M. FANO further proposed FANO encoding. These early encoding methods reveal the basic laws of the beam of the coding, and they can also achieve a certain compression effect, but the real-purpose compression algorithm is far away.

The first practical encoding method is made by D. A. Huffman in 1952 the paper "minimum redundant redundant code". Until today, many "data structures" textbooks still need to mention the methods called Huffman encoding by the posterity when discussing the binary tree. Huffman encoding is so famous in the computer world, so that the encoding process itself has become the topic of people. It is said that in 1952, young Huffman was a student of the Massachusetts Institute of Technology. In order to prove to the teacher, he could not participate in the final exam, and it was designed to be simple, but it influences a far-reaching coding method.

Huffman coded efficiency is high, the operation is fast, the implementation is flexible, from the 1960s, it has been widely used in the field of data compression. For example, an early UNIX system is less than a compressed program, which is less known to modern people, is actually the specific implementation of Huffman 0th-order adaptive encoding. In the early 1980s, Huffman encoding appeared in the CP / M and DOS system, and its representative program called SQ. Today, there are Huffman encoded figure in many well-known compression tools and compression algorithms (such as WinRAR, Gzip and JPEG). However, the encoding length obtained from the Huffman encoding is only an approximate to the information entropy calculation result, and it is not possible to truly approximate the limit of the information entropy. Because of this, modern compression techniques typically only treat HUFFMAN as the final encoding means, not all of the data compression algorithms.

Scientists have never given up the ideals of the information entropy challenge. Before and after 1968, P. Elias developed Shannon and FANO encoding methods, constructing Shannon-Fano-Elias encoding from mathematics perspectives. Along the idea of ​​this coding method, 1976, J. Rissanen proposed a coding method that can successfully approximate information entropy limit - arithmetic coding. In 1982, Rissanen and G. G. Langdon have improved arithmetic coding. Thereafter, people combine the arithmetic coding and J. G. Cleary and I. H. Witten in 1984, the partial matching prediction model (PPM) proposed in 1984, has developed an algorithm for the compression effect almost perfect. Today, those universal compression algorithms called PPMC, PPMD ​​or PPMZ are also known as compressed effects, actually all of this idea.

For non-destructive compression, the PPM model is combined with an arithmetic coding that has been able to maximize the limit of the information entropy. It seems that the development of compression technology can be seen here. Unfortunately, things are often not as simple as imagining: Although the arithmetic coding can get the shortest coding length, the complexity of itself also makes any specific implementation of the arithmetic coding is slow as a snail. Even in the Moore Law, the CPU speed has a day and day, the operation speed of the arithmetic coding program is also difficult to meet the needs of daily applications. No way, if it is not the two Jews that will be mentioned later, we still don't know when to use the convenient and practical compression tool that Winzip. Alien legends reverse thinking will always be a magic weapon in the scientific and technical field. Just in most people, I want to improve the brain, I want to improve the Huffman or arithmetic code to obtain a "perfect" encoding, two smart Jews J. Ziv and A. Lempel, which are all kinds of "perfect" coding, and A. Lempel. HUFFMAN and arithmetic coding design ideas create a series of compression algorithms that are more efficient than HUFFMAN encoding, more faster than arithmetic coding. We usually use the abbreviations of these two Jew names to the LZ series algorithm.

According to the time order, the development of the LZ series algorithm is roughly: ZIV and LEMPEL issued the paper issues "a universal algorith component compression" in 1977, the algorithm described in the paper People are called LZ77 algorithms. In 1978, the two issued the continued sequence of the paper "Compression of Individual Sequences Via Variable Rate Coding" by the independent sequence encoded by the variable ratio, described later, described later. In 1984, TA Welch published a paper called "a technique for high performance data compression", which describes his research results in the Sperry Research Center (the Research Center has been incorporated into Unisys). This is a variant of the LZ78 algorithm, which is later very famous LZW algorithm. After 1990, T. C. Bell et al. Further proposed a variant or improvement version of the LZ series algorithm.

To be honest, the idea of ​​the LZ series algorithm is not fresh, which has neither a deep theoretical background, and there is no complex mathematical formula. They just simply continue the chase and preferences of the dictionary for thousands of years, and use a very extreme Smart way to apply dictionary technology to the general data compression. It is popular that when you replace each word in the dictionary, you have actually mastered the true meaning of the LZ series algorithm. This dictionary-based model is on the surface, although the statistical method of Shannon, Huffman et al., Is the same as the effect of the effect, can approximate the limit of the information entropy. Moreover, it can be proved to in theory that the LZ series algorithm is essentially the basic law of information entropy.

The superiority of the LZ series algorithm will soon be embodied in the field of data compression, and the number of tool software using the LZ series algorithm is exploded. The COMPRESS program using the LZW algorithm is first on the UNIX system, which will soon become the compression standard for UNIX world. It is followed by the ARC program in the MS-DOS environment, as well as PKWARE, PKARC and other imitation products. In the 1980s, the famous compression tool Lharc and ARJ were an outstanding representative of the LZ77 algorithm. Today, the LZ77, LZ78, LZW algorithm and the various variants are almost monopolized throughout the general data compression, and we are familiar with PKZIP, WinZip, WinRAR, Gzip and other compression tools and ZIP, GIF, PNG and other file formats are LZ series. The beneficiary of the algorithm, even the encrypted file format such as PGP also selected the LZ series algorithm as a standard for its data compression.

No one can deny that the two Jews contribute to data compression technology. I want to emphasize that in the field of engineering technology, the perfect in theory is often only half a feet. If you can think about it like Ziv and Lempel, you often change the problem, you can invent you, I can invent a new algorithm. In the history of the technical parties, I can name the history.

The sound and picture fashion LZ series algorithm basically solves the problem of taking care of speed and compression in universal data compression. However, there is another vast world in the field of data compression, waiting for us to explore. Shannon's information tells us that the more knowledge about the information, the smaller the information is compressed. In other words, if the design objective of the compression algorithm is not any data source, it is a special data known to the basic attribute, and the effect of the compression will be further improved. This reminds us that in the development of general compression algorithms, it must also carefully study the dedicated compression algorithm for various special data. For example, in today's digital life, images, audio, video information in various digital devices such as digital cameras, digital recorders, digital Walkman, digital cameras, etc., must be stored on the hard disk. Or transfer via a USB cable. In fact, the compression of multimedia information has always been an important topic in the field of data compression. Each branch may dominate the future of a technical trend, and bring unlimited business opportunities for digital products, communication equipment and application software developers. .

Let us first speak from the compression of image data. Typically, the image can be divided into a two-value image, a gradation image, a color image, and the like a different type. The compression method of each type of image is not the same.

The invention of fax technology and wide use promotes the rapid development of the binary image compression algorithm. CCITT (International Telegraph Telephone Advisory Committee, an organization of the International Telecommunication Alliance ITU) established a series of image compression standards for fax applications, dedicated to compression and transfer of binary images. These standards roughly include CCITT GROUP 1 and Group 2, 1980 CCITT Group 3, and 1984 CCITT Group 4 in the late 1970s. In order to adapt to different types of fax images, the encoding methods used in these standards include one-dimensional MH coding and two-dimensional MR encoding, using techniques such as stroke encoding (RLE) and HUFFMAN encoding. Today, when we send and receive fax in the office or home, most of them use CCITT Group 3 compression standards, some TIFF files for digital network-based fax devices and stored two-value images use CCITT Group 4 compression standards. In 1993, CCITT and ISO (International Standardization) Joint Bi-Level Image Experts Group, JBIG also further developed the compression of the binary image into a more common JBIG standard. In fact, many general-purpose compression algorithms, including LZ series algorithms, are well compressed. For example, the birth of the GIF image file format for 1987 is the LZW compression algorithm. The PNG format that appeared in 1995 was more perfect than the GIF format, which selected a variant ZLIB of the LZ77 algorithm to compress the image data. In addition, uses the HUFFMAN coding, arithmetic coding, and PPM models mentioned earlier, and people have realized a lot of effective image compression algorithms.

However, for more common, pixel values ​​in a spatial change of grayscale or color images (such as digital photos), the advantages of universal compression algorithms are not so obvious. Fortunately, scientists have found that if they allow changes to some less important pixel values ​​when compressed this type of image data, or allow for some precision (when compressed universal data, we will never tolerate any precision , But when compressed and displayed a digital photo, if the color of some leaves in a wood is slightly narrower, people who look at the photos are usually not aware of), and we may get breakthroughs in compression. Progress. This idea is revolutionized in the field of data compression: Through some precision within the user's patty, we can compress the image (including audio and video) to one-tenth of the original size, one percent or even One thousandth, which far exceeds the power limit of the general compression algorithm. Perhaps, this is the same as the "refundive sea and vast sky" that is often mentioned in life.

This compression of this allowable precision is also referred to as loss of loss. In the field of image compression, the famous JPEG standard is a classic in the loss compression algorithm. JPEG standards are developed by Joint Photography Group, JPEG, began in 1986, and became an international standard after 1994. JPEG uses a discrete cosine transform (DCT) as a core algorithm, by adjusting the quality factor to control the accuracy and size of the image. For a continuous change of the photo, the grayscale or color image, JPEG can generally compress the image to one-tenth of one-tentieration of the image to the original size under the premise of ensuring image quality. If the image quality is not considered, JPEG can even compress the image to "unlimited small".

The latest developments in JPEG standards were initiated in 1996, and officially became an international standard JPEG 2000 in 2001. Compared to JPEG, JPEG 2000 has made great improvements, and most importantly, the discrete wavelet transform (DWT) replaces discrete cosine transform in the JPEG standard. In the case where the file size is the same, the JPEG 2000 compressed image is higher than the JPEG quality, and the accuracy loss is smaller. As a new standard, JPEG 2000 has not been widely used, but many companies including digital camera manufacturers are optimistic about their application prospects. JPEG 2000 should not be particularly far away. . The design idea of ​​changing the compression effect in the JPEG standard directly affects the compression technology of video data. CCITT has developed a draft recommendation for the H.261 of the TV phone and conference TV in 1988. The basic idea of ​​H.261 is to use an algorithm similar to the JPEG standard to compress the video stream, while using inter-motion compensated inter-frame prediction to eliminate redundant information on the time dimension of the video stream. On this basis, in 1993, ISO passed the MPEG-1 standard proposed by the MOVING PICTURE EXPERTS GROUP, MPEG. MPEG-1 can be effectively encoded for normal quality video data. Most of the VCD discs we have now seen is to use the MPEG-1 standard to compress video data.

In order to support clearer video images, in particular high-end applications such as digital TV, ISO proposed a new MPEG-2 standard (equivalent to CCITT H.262) in 1994. MPEG-2 grading image quality, can accommodate different quality video applications such as ordinary TV programs, conference television, high-definition digital TV. In our lives, you can provide a high-definition picture of DVD videos that use the MPEG-2 standard.

The development of the Internet has put forward higher requirements for video compression. Under the stimulation of new demand such as content interaction, object editing, random access, ISO passed MPEG-4 standard (equivalent to CCITT H.263 and H.263 standard) in 1999. The MPEG-4 standard has a higher compression ratio, supporting the encoding of concurrent data streams, content interaction, enhanced time domain random access, fault tolerance, content-based scale variability. Advanced characteristics. The emerging DivX and XVID file formats on the Internet is to compress video data with a smaller storage space or communication bandwidth, which allows us to release it on the Internet. Or the dream of downloading digital movies has become a reality.

Just like video compression and development of the TV industry, the compression technology of audio data is also developed by technicians in radio broadcasts, voice communications and other fields. This is the most active study of speech coding and compression technology. Since the 1939 H. Dudley inventive vocoder, a pulse coding modulation (PCM), linear prediction (LPC), vector quantization (VQ), adaptive transform coding (ATC), subband coding (SBC), etc. Voice analysis and processing technology. These voice technologies can also be used to reduce information redundancy while collecting voice characteristics. Like JPEG in the field of image compression, most of speech coding techniques allow for a certain degree of accuracy loss to achieve higher coding efficiency. Moreover, in order to better store or transmit speech signals with binary data, these speech coding techniques further reduce redundant information in data streams after converting speech signals into digital information. General compression algorithms such as HUFFMAN encoding, arithmetic coding. . For the normal audio information stored in computers and digital electrical appliances (such as digital recorders, digital Walkman), our most commonly used compression is mainly audio compression standards in the MPEG series. For example, MPEG-1 standard provides a total of three optional audio compression criteria for Layer I, Layer II, and Layer III, and MPEG-2 further introduces AAC (Advanced Audio Coding audio compression standard, audio in MPEG-4 standard). Part is simultaneously supporting different types of applications such as synthetic sound coding and natural sound coding. In many of these audio compression criteria, the most significant probably a number of MPEG-1 Layer III is the MP3 audio compression standard. From the MP3 player to the MP3 mobile phone, from the hard disk, the MP3 files on the hard disk, such as the MP3 file to the Internet, the constant MP3 download, MP3, has already exceeded the scope of data compression technology, which has become a symbol of fashion culture.

Obviously, in the digital age of multimedia information, data compression technology is particularly useful for image, audio, and video data compression technology, and quite large development space - after all, people have information quantity and information quality. The pursuit is forever.

Back to the future, from information entropy to the arithmetic code, from Jews to WinRAR, from JPEG to MP3, the development of data compression technology is like a sheepskin full of "innovation", "challenge", "breakthrough" and "change" reel. Perhaps, we are here to annoying the age, people, standards and literature, and its purpose is to tell everyone, the results of the former are only the goals of the future generations, who knows in the next few years, will also appear Several Shannon, how many huffman?

When it comes to the future, we can also add some topics related to the development trend of data compression technology.

In 1994, M. Burrows and D. J. Wheeler jointly proposed a new general data compression algorithm. The core idea of ​​this algorithm is to sort and transform the character matrix obtained after the string rotation, and a similar transform algorithm is called Burrows-Wheeler transform, referred to as BWT. In addition to ZIV and Lempel, the practice of Burrows and Wheeler design is very different from the design ideas of all general compression algorithms in the past. Today, the BWT algorithm has achieved huge success in the open source compression tool BZIP, BZip's compression effect of the text file is far better than the tool software using the LZ series algorithm. This can at least show even in the field of universal data compression, as long as we can constantly innovate in ideas and technologies, we can still find new breakthroughs. Fractal compression technology is a hot spot in the image compression area in recent years. This technology originated from the fractal geometry created by B. Mandelbrot in 1977. M. Barnsley laid a theoretical basis for fractal compression in the late 1980s. Since the 1990s, A. Jacquin et al. Further proposed many experimental fractal compression algorithms. Today, many people believe that fractal compression is the most potential technical system in the field of image compression, but there are many people who dismissed this. Regardless of its prospects, the research and development of fractal compression techniques suggests that after decades of rapid development, we need a new theory, or several more effective mathematical models to support and Push data compression technology continues to jump forward.

Artificial intelligence is a keyword that may have a significant impact on the future of data compression. Since Shannon believes that the information can be compressed and can be directly related to the uncertainty that can be compressed and information. It is assumed that the artificial intelligence technology matures a certain day. Subsequent information, then, compress information into one-pool of the original size or one thousandth, I am afraid it is no longer the heavens.

After reviewing history, people always like the future. But in the future, it is the future. If you only rely on you, you can clarify the future technology development trend. Is there a taste of technology innovation? According to me, it is not important in the future. It is important that hurry to download a few large pieces on the Internet, then lying in the sofa, enjoy the data compression for us to bring unlimited happiness.

转载请注明原文地址:https://www.9cbs.com/read-94880.html

New Post(0)