[转] Audio dynamic compression third layer (MPEG AUDIO LAYER-3 MP3)

xiaoxiao2021-03-06  38

Audio dynamic compression third layer (MPEG AUDIO LAYER-3)

introduction

MP3 is now a very popular compression technology, which can be compressed with 12: 1 compression than the high-fidelity digital audio, so that a song of more than a dozen laser records can be put down on an MP3 CD. Playback is still the same as the laser record. Not only is the computer to play MP3, and many domestic super VCD manufacturers have also launched a super VCD with playing MP3 disc function. Some foreign languages ​​are integrated below, and MP3 is simply introduced.

table of Contents

MP3 history

Sound quality

Perceived the basis of audio coding

Frequently Asked Questions of MP3

MP3 history

In 1987, IIS began researching on a perceived audio coding issue in the digital sound broadcast (DAB) under EUREKA project EU147 framework. Under the collaboration of Erlangen University (Professor Seitzer), the final IIS has designed a very good, compressed algorithm, which is later specified as an ISO-MPEG audio compression third layer (MP3) standard algorithm. (IS 11172-3 and IS 13818-3). Without the loss of data, the typical digital audio signal consists of sample records collected above more than twice the actual audio bandwidth (such as 44.1 kHz) of 44.1 kHz. So you finally need to use a 1.400M bit space to indicate only one second in stereo music with the CD sound quality. By using MPEG audio encoding, you can use 1:12 compression ratio to greatly reduce the original sound data on the CD without affecting sound quality. Even 1:24 or even higher compression ratio can still keep the sound quality, better than you rely on the sampling frequency to get your sample. Basically, this is achieved by a perceptual coding technique of sound waveforms engaged in human ear. With MPGE audio, you can get good data compression and keep the CD sound quality.

1: 4 First level (in accordance with 384kbps stereo signal) 1: 6 ... 1: 8 second layer (in line with 256..192kbps stereo signal) 1: 10 ... 1: 12 Layer 3 (in line with 128..112kbps stereo signal)

In order to enhance stereo effects and limit audio bandwidth, the encoding plan should reach an acceptable sound quality at a sufficiently low bit rate (bitrates). The third layer of MPEG is a member of the most compressed function in the MPEG audio coding family. For a given audio quality level, he only requires the lowest bit rate or a given bit rate to achieve the highest audio quality.

Sound quality

Some MPEG third layer data typical performance:

Sound quality bandwidth mode space compression than phone sound 2.5kHz mono 8kbps * 96: 1 above short waves quality 4.5KHz mono 16kbps48: 1 higher than the zoning mass 7.5KHz mono 32kbps24: 1 similar FM Broadcast 11kHz stereo 56..64kbps26..24: 1 close to CD15KHZ stereo 96kbps16: 1CD quality> 15khz stereo 112..128kbps14..12: 1

* Non-ISO's MPEG line 3 to increase compression quality (MPEG 2.5)

In all international listening tests, the third layer of MPEG was maintained at 1:12 compression ratio (each channel 64kbps) still maintained the original sound quality, fully proved his superb performance. If the application system can tolerate 10 kHz bandwidth limit, the sound quality of the stereo signal can be obtained using a compression ratio of 1: 24. For places where low rate applications are used to use a low rate application, ITU-R is recommended (ITU-R DOC.BS.1115) as the broadcast application system per channel 60kbps bit rate.

Perceived the basis of audio coding

preface

Audio compression, audio coding, and audio decoding These concepts are currently defined, and this part will make a brief introduction to audio coding. Audio compression

Until the audio compression, high-quality digital audio data takes a lot of disk space to store or want more channel bandwidth to transmit. Let's take a small example. You want to put a one-length song you like and store it on your hard drive. You want to get the music quality as the CD, so you have to use 44.1 kHz sampling frequency, stereo, and quantitative precision to use each sample 16. 44.1KHz means that 44,100 values ​​will be passed from your sound card or sound file per second, because it is a two-channel stereo, so it is necessary to multiply 2, because the quantization accuracy is 16, that is, two bytes, so I have to take a 2, so this song should be used on your disk 44, 100 samples / second * 2 channel * 2 byte / sample * 60 seconds / minute = about 10M bytes of storage space. If you want to download it from the Internet If the modem rate is 28.8.800, he will cost you 10,000,000 bytes * 8-bit / bytes / (28.800 bit / second * 60 seconds) = approximately 49 minutes. Just download one minute stereo music digital audio encoding (he and digital audio compression is the same concept) is a compressed storage space or channel bandwidth of audio data. Contemporary audio coding techniques (such as MPEG third or MPEG-2 AAC) can significantly reduce the amount of data with 12: 1 compression ratio, without loss or only small loss of sound, and distortion It can't be accomplished in human ear (or not easy). Therefore, this solution is the technical key to the application of high quality low rate. This application includes a sound track of a CD-ROM game, a sound crystal memory, an Internet sound, a digital audio broadcast system, and the like.

Two parts of audio compression

Audio compression does include two parts. The first part is encoded, and he will convert the sound signal represented by the WAVE file into a high compressed form of bitstream or audio data encoding. If you want to play this bit on the sound card, you need the second part - decoding section. The decoding will cause the processing bit to make him restore into a Wave file.

How to work?

High efficiency encoding is an unnecessary signal that removes the redundant signal and the human listening system and cannot feel the frequency range. All encoders use the same basic structure. The encoding scheme can be described as "perceived noise form" or "Subband / Transform Coding". The encoder calculates the conversion filter strip by analyzing the spectral composition of the sound signal, and the level of noise that can be perceived is estimated by a psychological analysis model. In his quantization and encoding process, the encoder attempts to assign appropriate data bits, enabling the need to meet the needs of bit rate and demand shield to some extent. The decoder has a small complexity. His only task is based on the code consisting of the spectrum, synthesizing the sound signal.

Compression ratio, bit rate and quality

These issues have not yet clear theories, and the final sound files obtained after encoding and decoding are no longer the same as the previous sound file, because all extra information (exactly, the excess part of the sound signal is unable to The unrelated part of the perceived is excluded. Re-combined Wave files and the original Wave files are different, but their sound is the same. The difference between the difference is to see why his compression is. Because the compression is more difficult to measure in some cases, the expert uses the concept of bit rate when discussing the compression capacity of the sound. The bit rate represents the average number of bits to be used for one second sound data. The normal unit of the bit rate is Kbps, which is a k-bit per second (1k = 1024). For digital audio signals in the CD, its bit rate is 1411.2 kbps. The sound quality of the approximate CD is 96 kbps.

MP3 common problem

Q: Yes, MP3 is obviously the key to many applications. What is the limitations of MP3? A: MP3 is a solution to the audio encoding. He is a sound tool for human ear design, and he has to maintain the quality of the original sound. In contrast, dedicated speech multimedia digital signal encoders is a tool in the field of voice, and he is going to maintain the understandability of speech signals. Advanced voice coding scheme (such as CS-ACELP [LD-CELP] is specified by ITU to standardize G.723.1 [G.728]) at a bit rate of low to 5.3 kbps, reachable replication, its multimedia digital signal The encoder is delayed below 40ms. At such a low bit rate, their performance is better than MP3 in plain speech signal processing, and the low delay they provide is well suited for complete voice calls. In the MPEG-4 scenario, an upgraded scheme that integrates speech information and perceived audio encoding is designed. Q: Do you know more than a multimedia digital signal encoder? A: The standard minimum delay is given in the standard:

First layer: 19MS (<50ms) Layer 2: 35ms (100ms) Layer 3: 59ms (150ms)

The actual value is larger than the theoretical value. Since the actual value is dependent on the specific implementation, it is not possible to give an accurate value. The value in parentheses is just a probably value, and the real multimedia digital signal encoding will display a higher server value. Generally only specific applications cannot endure this delay, such as the feedback chain in the remote communication. This delay can be tolerated for most other applications.

Q: What is MPEG? A: MPEG is the "Moving Picture Experts Group" works under the joint guidance of the International Standardization Organization (ISO) and the International Electrotechnical Commission. The work of this group is primarily aimed at the coding standard of moving images and audio. MPEG has his own homepage, providing many information about this standard.

Q: Is the third layer of MPEG-3 and MPEG? A: Different. The third layer is a powerful coding scheme. He is one of the MPEG standards. The third layer is one of several international standards on the sound, which also includes MPEG-1 and MPEG-2. However, there is no definition of so-called MPEG-3.

Q: How do I get MPEG documentation? A: You can go to the ISO site to query.

Q: Is there a public C language source code available? A: There are public C language source code on many sites, such as ftp://ftp.iis.fhg.de/pub/layer3/public C /. This code is just to explain the problem, so don't expect him to have much better performance.

Q: Talk about MPEG audio, I always hear "first, second, third floor". What do they mean? Answer: MPEG describes the compression of audio signals in high performance perceptual coding scheme. There are three programs in this audio coding scheme, referred to as the first layer, the second layer, and the third layer. From the first layer to the third layer, the complexity and performance of the encoder (the sound quality of each rate) is improved. These three multimedia digital signal encoders are compatible in the hierarchical structure, that is, the decoder of the ninth layer can decode the n-layer and the encoder encoded below the N layer.

Q: We have a family of 3 voice coding schemes, what is the accurate definition of MPEG? A: For each layer, the standard details of the bitstream format and decoder are described in detail. In order to adapt to the future development, he does not specifically defines an encoder, but each layer has an example of a chapter to give an encoder implementation.

Q: What is the commonality of these three audio levels? A: All levels are used in the same basic structure. The encoding scheme can be described as "perceived noise form" or "Subband / Transform Coding". The encoder calculates the conversion filter strip by analyzing the spectral composition of the sound signal, and the level of noise that can be perceived is estimated by a psychological analysis model. In his quantization and encoding process, the encoder attempts to assign appropriate data bits, enabling the need to meet the needs of bit rate and demand shield to some extent. The decoder has a small complexity. His only task is based on the code consisting of the spectrum, synthesizing the sound signal. All layers were used to analyze the filter belt (there were multiple phases of 32 subbands). The third layer plus an MDCT conversion to increase frequency analysis. All layers use the same signal head in its bitstream to support standard hierarchies. All layers have similar bit error sensitive; they are supported in their audio data bits streams, and 32, 44.1 or 48kHz sampling frequencies are used; they are allowed at similar bit rates (first The layer is from 32kbps to 448kbps; the second layer 32kbps to 384kbps; the third layer is asked from 32kbps to 320kbps): From the global point, what is the main difference between these three layers? A: From the first layer to the third layer, complex degree is increasing (mainly encoder), in general, the delay time of multimedia digital information encoders is increasing, and performance is also growing (sound quality of each bit rate) ).

Q: What is the main difference between the audio section, MPEG-1 and MPEG-2? A: MPEG-1 and MPEG-2 use the same family's sound multimedia digital information encoder, whether it is the first layer, the second layer is also a third layer. The new audio characteristics of MPEG-2 are "expansion of low sampling frequencies" and "multi-channel extensions". "Demand of low sampling frequency" refers to applications that limit the application system services that limit the bit rate of bandwidth requirements, and the new sampling frequency is 16, 22.05 or 24 kHz, and the bit rate is extended to 8kbps. "Multi-channel expansion" refers to the surround sound system that serves five major channels (left, right, middle, left surround and right surround), and some surround sound systems even add a low frequency enhancement. Tao to handle low-frequency signals, "multi-channel extensions" allow for this system, allowed to include up to 7 channels.

Q: Is all compatibleity? A: More or less. Said, it is based on the extension of low sample frequencies. Obviously, a pure MPEG-1 decoder cannot handle new sampling frequencies.

Q: What does you mean is compatible? Includes all additional channels? please explain. A: In the definition phase of MPEG-2, compatibility is a major topic. The main thinking is to adopt the basic bond format as MPEG-1, the main data domain like a signal that carries the left and right channels, and the additional data domain carries the extended multi-channel information. There is no difference in detail, there are two terms here to explain: "Backward compatibility": MPEG-2 decoder can accept MPEG-1 audio bitstream (he only has mono or double sound) Road); "Backward Compatibility": MPEG-1 decoder can at least decode information of two channels of the primary data field in the audio stream of MPEG-2, and the audio bitstream matrix of MPEG-2 can surround The information is dissolved into the left and right channels, the method is: left channel = left channel signal a * hinges signal b * left surround signal; right channel = right channel signal a * Middle Signal B * Right surround signal. Thus the MPEG-1 decoder can fully reproduce all 5 channel information. A MPEG-2 decoder uses a multi-channel extension signal (more than 3 audio signals) to reproduce 5 surround channels. Q: In your footnote, you point out that it can get good performance in order to make a very low bit rate such as 8kbps, you have adopted a non-ISO extended multimedia digital signal decoding of MPEG2.5. What do you have for this point? A: Oh, yes, as a low sampling frequency extension, MPEG-2 standard allows bit rate to 8kbps. At such a low bit rate, the effective audio bandwidth is limited, for example, to 3 kHz, so the actual sampling frequency can decrease, such as 8 kHz. The lower the sampling frequency, the better the frequency of the frequency, the worse, the time parsing is worse, and the ratio between the control signal and the audio payload in the bitstream format is better. Since the MPEG-2 standard defines 16kHz as the lowest sampling frequency, we recommend that more expansions should be based on the Sampling frequency of MPEG-2, which means that we recommend 8, 11.025 and 12kHz, we call this The expansion is MPEG2.5.

Q: I read you with a description of "similar CD" performance. You said that in the first layer of data compression ratio can reach 4: 1 (or the total rate of 384kbps), in the second layer compression ratio is 6..8 : 1 (or 256..192kbps total rate), in the third layer compression ratio 12..14: 1 (or 128..112kbps total rate), can you explain more detailed? A: Ok Each level has improved to a certain extent. The simplest form is the first floor, which is mainly designed for DCC (digital small cassette), mainly using 384kbps. The second layer is designed for complexity and performance balance. He guaranteed sound quality without falling to 192kbps in the position rate. The quality of the sound will be affected. The third layer is currently designed for low rate, and he adds some "advanced features" on the second floor: the frequency analysis is 18 times, which makes the third layer encoder can be Better in the shielded limit, to quantify the quantization noise. Only the third layer uses entropy coding to better reduce redundancy, and only the third layer uses a bit accumulation to reduce human factors, and the third layer uses more advanced joint stereo coding schemes.

Q: Oh. Please tell us more about the quality of the sound. How do you evaluate the quality of sound? A: It is now not available to test with expensive listening. During the ISO-MPEG development process, a series of international listeners organized by many trained auditors have been passed. All of these tests use the "three-dimensional stimulus, concealed identification" and "CCIR (International Radio Advision Commission) Damage Level to assess the quality of the sound. The listening sequence adopts "ABC", a = original, BC = a random sequence on the original / encoded signal, the audio-listener must evaluate A and B with a number of 1.0 to 5.0. The meaning of this value is 5.0 = transparent (this is the original data), 4.0 = noticeable, but not annoying (the first difference), 3.0 = slightly hate, 2.0 = annoying, 1.0 = Very annoying. Q: The listening test is indeed very expensive. Does he really choose? A: At least it is like this. Maybe it may not be the same. In order to evaluate the sound quality of the perceived multimedia digital signal encoder, all conventional "quality" parameters (such as signal-to-noise ratio, distortion, bandwidth) are invalid, as long as the introduction of the multimedia digital signal encoder does not affect the feeling of noise and distortion The quality of the sound is on. So the listening test is required, and if they are carefully prepared and executed, they can get reliable results. However, IIS is also working on the standards and development of sound quality assessment tools. And with the first available product, a real-time metering tool can finely provide an analysis of the perceived sound multimedia digital information encoder.

Q: Ok, go back to the listening test and performance evaluation, let's tell us some of the relevant results: You may have to learn a lot of knowledge about the details of the AES file and MPEG documentation. For the third layer of MPEG, the main result is that he can achieve good performance at a low rate (64kbps or less). Not only this, although the third layer uses the same toolset as the second layer, there is some advanced decoding characteristics for the very low rate decoding. A good example is ISO-MPEG listening test September 94 in Japan (DOC. ISO / IEC JTC1 / SC29 / WG11 N0848, 11, Nov. 94). Another interesting result is the conclusion of the TG 10/2 task team inside ITU-R, and they recommend using low rate audio decoding in digital sound broadcasting applications. ITU DOC. BS. 1115).

Q: It is very interesting! Can I talk about this advice in more detail? A: The TG 10/2 task team completed this task completed in October 93. The recommendation defines the broadcast applications in three areas. It is recommended that the second layer of 180kbps per channel is applied to the issuance and submission field (20kbps bandwidth, can not hear any damage in the 5-layer multimedia digital signal encoder, it is recommended to The second layer of 128kbps is applied to the launch area (20kHz bandwidth), which recommends that the MPEG third layer uses 60 (120) Kbps to be applied to mono (stereo) signals in the field (15kHz bandwidth).

Q: Where can I get more information? A: It is a long-once topic in many academic meetings between about 10 years:, for example, AES (Audio Engineering Association) organizes two conventional meetings. You will find the following helpful papers:

转载请注明原文地址:https://www.9cbs.com/read-66016.html

New Post(0)