MP3 file structure

zhaozj2021-02-16  106

MP3 file format

One. Overview:

The MP3 file is composed of frame (frame), and the frame is the minimum component of the MP3 file. The full name of MP3 should be MPEG1 Layer-3 audio file, MPEG (MOVING PICTURE EXPERTS Group) Translated into Activity Image Expert Group in Chinese, specifically referred to as an active audio and video compression standard, MPEG audio file is the sound part of the MPEG1 standard, namely MPEG Audio layer, which divides the three layers of compression quality and encoding complexity, namely, Layer-1, Layer2, Layer3, and respectively correspond to the three sound files of MP1, MP2, MP3, and use different levels of encoding according to different purposes. . The higher the hierarchy of the MPEG audio encoding, the more complex the encoder, the higher the compression ratio, the compression ratios of MP1 and MP2 are 4: 1 and 6: 1-8: 1, respectively, and the compression ratio of MP3 is up to 10: 1- 12: 1, that is, a one-minute CD sound quality music, uncompressed 10MB of storage, only 1MB after the MP3 compression encoding is compressed. However, MP3 uses a lossless compression in order to reduce sound distortion, MP3 adopts "sensory coding technology", that is, the audio file is spectrum, then filter out the noise level with the filter, then By quantifying each of the remaining parts, it is finally formed, and finally the MP3 file having a higher compression ratio is formed, and the compressed file can be achieved in the playback of the sound effect.

two. Whole MP3 file structure:

MP3 files are generally divided into three parts: tag_v2 (id3v2), frame, tag_v1 (id3v1)

ID3v2 contains information such as authors, composers, albums, and is not fixed, extended the amount of ID3v1. Frame.

.

A series of frames, the number of file size and frame length determines that the length of each frame may not be fixed, or it may be fixed, and each frame is divided into frame head and data entity two-part frame header recorded by the bit rate bitrate. The MP3 bit rate, sampling rate, version and other information, each frame independent Id3v1 contains information such as authors, composers, albums and other information, with a length of 128byte.

three. MP3 Frame format:

Each Frame has a frame head frameHeader, the length is 4Byte (32bit), and there may be two bytes of CRC check after the frame header, whether there is a 16bit that determines the FrameHeader information, 0 There is no calibration behind the frame header. It has a verification. The value length is 2 bytes, followed by FrameHeader, then the entity data of the frame, the format is as follows:

FrameHeader CRC (Free) main_data 4 byte 0 or 2 BYTE length is calculated by the frame header

1. The frame head FrameHeader format is as follows:

Aaaaaaa Aaabbccd eeeeffgh iijjklmm

The meaning of 13 frame head characters is as follows:

Sign

Length (BITS)

Position (BITS)

Description

A

11

(31-21)

Frame Sync (All Bits Set)

B

2

(20, 19)

MPEG AUDIO VERSION00 - MPEG VERSION 2.501 - RESERVED10 - MPEG VERSION 211 - MPEG VERSION 1C

2

(18,17)

Layer Description00 - Reserved01 - Layer III10 - Layer II11 - Layer i

Di

1

(16)

Protection Bit0 - Protected by CRC (16bit CRC Follows Header 1 - Not protected

E

4

(15, 12)

Bitrate INDEX

Bits

V1, L1

V1, L2

V1, L3

V2, L1

V2, L2

V2, L3

0000

Free

Free

Free

Free

Free

Free

0001

32

32

32

32

32

8 (8)

0010

64

48

40

64

48

16 (16)

0011

96

56

48

96

56

24 (24)

0100

128

64

56

128

64

32 (32)

0101

160

80

64

160

80

64 (40)

0110

192

96

80

192

96

80 (48)

0111

224

112

96

224

112

56 (56)

1000

256

128

112

256

128

64 (64)

1001

288

160

128

288

160

128 (80)

1010

320

192

160

320

192

160 (96)

1011

352

224

192

352

224

112 (112)

1100

384

256

224

384

256

128 (128)

1101

416

320

256

416

320

256 (144)

1110

448

384

320

448

384

320 (160)

1111

Bad

Bad

Bad

Bad

Bad

Bad

NOTES: All values ​​are in kbpsV1 - MPEG Version 1V2 - MPEG Version 2 and Version 2.5L1 - Layer IL2 - Layer IIL3 - Layer III "free" means variable bitrate "bad" means that this is not an allowed value The values ​​in parentheses. Are from Different Sources Which Claim That Those Values ​​Are Valid for V2, L2 And V2, L3. if Anyone Can CONFIRM PLEASE LET ME KNOW. f

2

(11, 10)

Sampling Rate Frequency Index (Values ​​Are IN Hz)

Bits

MPEG1

MPEG2

MPEG2.5

00

44100

22050

11025

01

48000

24000

12000

10

32000

16000

8000

11

RESERV.

RESERV.

RESERV.

G

1

(9)

Padding Bit0 - Frame Is Not Padded1 - Frame Is Padded with One Extra Bit

Hide

1

(8)

Private bit (Unknown Purpose)

I

2

(7, 6)

Channel Mode00 - Stereo01 - Joint Stereo (Stereo) 10 - Dual Channel (Stereo) 11 - Single Channel (Mono) J

2

(5, 4)

Mode Extension (ONLY IF JOINT STEREO)

Value

INTENSITY STEREO

MS STEREO

00

Off

Off

01

on

Off

10

Off

on

11

on

on

K

1

(3)

Copyright0 - Audio Is Not CopyRighted1 - Audio Is CopyRighted

L

1

(2)

Original0 - Copy ORIGINAL Media1 - Original Media

M

2

(1,0)

Emphasis00 - None01 - 50/15 ms10 - reserved11 - ccit J.17

1) Playing time per frame: no matter how much the frame length is, the play time per frame is 26ms;

2) Data frame size:

Framesize = ((MPEGVERSION == MPEG1? 144: 72) * bitrate) / SamplingRate) PaddingBIT

For example: Bitrate = 128000, a SamplingRate = 44100, and PaddingBit = 1

Framesize = (144 * 128000) / 44100 1 = 417 BYTES

2. Main_data:

Main_data part is changed to determine if the framehead's bitrate changes, a MP3 song, which has three versions: 96kbps (96 thousand bit per second), 128kbps and 192kbps. Kbps (bit rate) indicates the amount of data per second, the higher the Kbps value, the better the sound quality, the larger the file, the MP3 standard, the constant Bitrate's MP3 file is called CBR, most MP3 files It is CBR, and the changed Bitrate's MP3 file is called VBR, and each frame length may be changed. Below is the difference between CBR and VBR:

1) CBR: The size of the Frame of the fixed bit rate is also fixed (the formula is as described above), as long as the total length of the file is known, the frame length can be calculated by playing each frame to calculate the total time of MP3 playback, or by counting The number of frames controls fast forward, fast retreat, etc.

2) VBR: VBR is the algorithm introduced by XING, so there is "xing" keyword in the MP3 Frame (now many popular small software can also perform VBR compression, whether they comply with this agreement, then you must know It is stored in the first valid frame in the MP3 file, which identifies this MP3 file is VBR. At the same time, the first Frame stores the total number of Frames of the MP3 file, which is easy to get the total time, and there are 100 bytes of 100 time segmented Frame of the total time. Assuming 4 minutes MP3 song, 240s, divided into 100 segments, each two neighboring index time difference is 2.4S, so through this index, as long as the minority of Frame can quickly find the Frame header we need fast forward, Referring to the following:

This system was created to minimize file lengths and to preserve sound quality.Higher frequencies generally needs more space for encoding (thats why many codecs cut all frequencies above cca 16kHz) and lower tones requires less. So if some part of song doesnt consist of higher tones then using eg. 192kbps is wasting of space. It should be enough to use only eg. 96kbps.And it is the principle of VBR. Codec looks over frame and then choose bitrate suitable for its sound quality.It sounds perfect but it brings some problems: If you want to jump over 2 minutes in song, it is not a problem with CBR because you are able simply count amount of Bytes which is necessary to skip But it is impossible with VBR Frame lengths should be arbitrary so you.. have to either go frame by frame and counts (time consuming and very unpractical) or use another mechanism for approximate count.If you want to cut 5 minutes from the middle of VBR file (all we know CDs where last song takes 10 minutes but 5 minutes is a pure silence, HELL!) problems are the same.Result? VBR files are more difficult for controlling and adjusting. And I dont like feeling that sound quality changes in every moment. And AFAIK many codecs have problems with creation VBR in good quality .Personally I Cant See any Any Reason Why To Use VBR - I DONT GIVE A FUCK IF SIZE ONE CD IN MP3 IS 55 MB with CBR OR 51 MB with VBR. But Everybody Has A Different Taste ... Some People PREFER VBR.VBR File Structureis the Same as for CBR. But The First Frame Doesnt Contain Audio Data And It Is Used for Special Information About VBR File.Structure of The First Frame: Byte

Content

0-3

Standard audio frame header (as descripted above). Mostly it contains values ​​FF FB 30 4C, from which you can count FrameLen = 156 Bytes. And thats exactly enough space for storing VBR info.This header contains some important information valid for the whole file : - MPEG (MPEG1 or MPEG2) - Sampling Rate Frequency Index- Channel (JointStereo ETC.) 4-x

NOT USED TILL STRING "XING" (58 69 6E 67). This string is buy is used as a main VBR File Identifier. If IT IDentifier. MPEG AND CHANNEL (YA, THESE FROM A FEW LINES UPWARDS):

36-39

"Xing" for mpeg1 and channel! = Mono (MOSTLY USED)

21-24

"Xing" for MPEG1 and CHANNEL == mono

21-24

"Xing" for MPEG2 and CHANNEL! = Mono

13-16

"Xing" for mpeg2 and channel == mono

After "Xing" string there are placed flags, number of frames in file and a size of file in Bytes. Each of these items has 4 Bytes and it is stored as 'int' number in memory. The first is the most significant Byte and The last is the least.

FOLLOWING Schema is for MPEG1 and CHANNEL! = MONO:

40-43

Flags

Value

Name

Description

00 00 01

Frames flag

Set if value for number of frames in file is store

00 00 00 00 02

BYTES flag

Set if value for filesisness in bytes is store

00 00 00 04

Toc Flag

Set if Values ​​for TOC (See Below) Are Stored

00 00 00 08

VBR scale flag

Set IF Values ​​for VBR Scale Are Stored

All these Values ​​Can Be Stored Simultaneously.

44-47

FramesNumber of Frames in File (Including The First Info One)

48-51

Bytesfile Length in bytes

52-151

TOC (Table of Contents) Contains of 100 indexes (one Byte length) for easier lookup in file Approximately solves problem with moving inside file.Each Byte has a value according this formula:. (TOC [i] / 256) * fileLenInBytesSo if song Lasts Eg. 240 Sec. and you want to jump to 60. sec. (and file is 5 000 000) length) You can use: TOC [(60/240) * 100] = TOC [25] and corresponding Byte in file Is Ten Approximately At: (TOC [25] / 256) * 5000000IF You Want To Trim VBR File You Should Also Reconstruct Frames, Bytes and Toc Properly.152-155

VBR Scalei Dont Know Exactly System of Storing of this Values ​​But this Item Probably DoesNT Have Deeper Meaning.

four. The ID3v1 ID3v1 is relatively simple, it is stored at the end of the MP3 file, open an MP3 file with the 16-encycloped editor, see the 128 order storage bytes at the end, the data structure is defined as follows: Typedef struct tagid3v1

{

Char Header [3]; / * Label head must be "tag" otherwise think no label * / char Title [30]; / * Title * / char Artist [30]; / * Author * / char album [30]; / * Collection * / char year [4]; / ​​* Out of age * / char case [28]; / * Remarks * /

Char reserve; / * reserved * /

CHAR TRACK ;; / * Track * / char genre; / * Type * /

} ID3V1, * PID3V1;

The information of ID3v1 is stored sequentially, without any identifiers, such as less than 30 bytes of the title information, and will not cause information errors if the title information is less than 30 bytes.

GENRE uses the original code, the comparison table is as follows: / * Standard genres * / 0 = "blues"; 1 = "ClassicRock"; 2 = "country"; 3 = "DANCE"; 4 = "disco"; 5 = "funk "; 6 =" grunge "; 7 =" hip-hop "; 8 =" jazz "; 9 =" metal "; 10 =" newage "; 11 =" Oldies "; 12 =" other "; 13 =" POP "; 14 =" R & B "; 15 =" RAP "; 16 =" reggae "; 17 =" Rock "; 18 =" techno "; 19 =" industrial "; 20 =" alternative "; 21 =" SKA "; 22 = "DeathMetal"; 23 = "PRANKS"; 24 = "SoundTrack"; 25 = "eURO-TECHNO"; 26 = "Ambient"; 27 = "trip-hop"; 28 = "Vocal"; 29 = "JAZZ Funk "; 30 =" fusion "; 31 =" trance "; 32 =" classical "; 33 =" instructionAl "; 34 =" acid "; 35 =" house "; 36 =" game "; 37 =" SoundClip "; 38 =" gospel "; 39 =" noise "; 40 =" alternrock "; 41 =" bass "; 42 =" sooul "; 43 =" punk "; 44 =" space "; 45 =" Meditative "; 46 = "InStrumentalPop"; 47 = "InStrumentAlRock"; 48 = "Ethnic"; 49 = "gothic"; 50 = "DARKWAVE"; 51 = "techno-industrial"; 52 = "electronic"; 53 = "POP-FOLK "; 54 =" eurodance "; 55 =" DREAM "; 56 =" southernrock "; 57 =" comedy "; 58 =" CULT "; 59 =" gangsta "; 60 =" TOP40 "; 61 =" Christianrap "; 62 = "POP / FUNK"; 63 = "jungle"; 64 = "

NativeAmerican "; 65 =" Cabaret "; 66 =" newwave "; 67 =" psychadelic "; 68 =" rave "; 69 =" showTunes "; 70 =" trailer "; 71 =" LO-FI "; 72 =" Tribal "; 73 =" acidpunk "; 74 =" acidjazz "; 75 =" polka "; 76 =" retro "; 77 =" musical "; 78 =" Rock & roll "; 79 =" HardRock "; / * Extended genres * / 80 = "folk"; 81 = "folk-Rock"; 82 = "nationalfolk"; 83 = "swing"; 84 = "fastfusion"; 85 = "bebob"; 86 = "latin"; 87 = "revAl" 88 = "Celtic"; 89 = "bluegrass"; 90 = "avantgarde"; 91 = "gothicrock"; 92 = "processiverock"; 93 = "psychedelicrock"; 94 = "SymphonicRock"; 95 = "SLOWROCK"; 96 = "Bigband"; 97 = "chorus"; 98 = "easylistening"; 99 = "acoustic"; 100 = "humour"; 101 = "Speech"; 102 = "CHANSON"; 103 = "Opera"; 104 = " Chambermusic "; 105 =" SONATA "; 106 =" Symphony "; 107 =" bootybass "; 108 =" primus "; 109 =" porNGroove "; 110 =" Satire "; 111 =" slowjam "; 112 =" club "; 112 =" club " 113 = "TANGO"; 114 = "Samba"; 115 = "folklore"; 116 = "BALLAD"; 117 = "Powerballad"; 118 = "rhythmicsoul"; 119 = "freestyle"; 120 = "duet"; 121 = "Punkrock"; 122 = "drumsolo"; 123 = "acapella"; 124 = "eURO-house"; 125 = "Dancehall"; 126 = "GOA"; 127 = "

Drum & bass "; 128 =" club-house "; 129 =" hardcore "; 130 =" terror "; 131 =" indIE "; 132 =" BRITPOP "; 133 =" negerpunk "; 134 =" polskpunk "; 135 =" BEAT "; 136 =" ChristiangangStarap "; 137 =" Heavymetal "; 138 =" BlackMetal "; 139 =" crossover "; 140 =" contemporarychristian "; 141 =" ChristianRock "; 142 =" merengue "; 143 =" salsa " 144 = "trashmetal"; 145 = "anime"; 146 = "jpop"; 147 = "synthpop"; five .ID3V2 ID3v2 to now have 4 versions, but popular play software generally only supports the third edition, both ID3V2.3. Since ID3v1 records at the end of the MP3 file, ID3v2 has to record the head of the MP3 file (if one day release ID3v3, I really don't know where to record). It is also for this reason, for ID3v2 It is slow to ID3V1. And the ID3V2 structure is much more complicated than ID3v1, but it is comprehensive and scalable and expanded in the former. Here is the ID3V2.3. Each ID3V2.3 tag is a label head and several tags Frame or an extended tag head composed. About track information such as title, author, etc. And the label frame sequentially stores the head of the MP3 file. 1, the tag head records the head of 10 bytes of ID3v2.3 in the first order of the file. The data structure is as follows: char header [3]; / * must be "ID3) "Otherwise, the label does not exist * / char ver; / * version number ID3v2.3 record 3 * / char revision; / * Sub-version number this version records 0 * / char flag; / * Store the signature, this The version only defines three digits. Details later * / char size [4]; / ​​* Tag size, including 10 bytes of the label head and the size of all label frames * /

1). The flag byte flag byte is usually 0, defined as follows: ABC00000

A - means if you use unsynchronisation (this word doesn't know what it means, the dictionary is not found, generally not set) b - indicates whether there is an extended header, generally not (at least WinAMP is not recorded), so it is generally not set c - means whether it is a test label (99.99% tag is not test, so it is generally not set) 2). Tag size is four bytes, but only 7 digits per byte, the highest bit is not Use constant to 0. So the format is 0xxxxxxxxxxedxx 0xxxxxxxxxx 0xxxxxxx calculation to remove 0, get a 28-bit binary number, is the label size (do not understand why doing this), the calculation formula is as follows: int total_size; total_size = (size [0] & 0x7f * 0x200000 (Size [1] & 0x7F) * 0x400 (Size [2] & 0x7F) * 0x80 (Size [3] & 0x7F) 2, the label frame has a 10-byte frame header and The content consisting of at least one byte is not fixed. They are also sequentially stored in the file, and there is no special character separation in the file and other tag frames. The content of a complete frame is obtained only after the content from the frame header can only be read. When reading, you should pay attention to the size, and do not read the content or frame of the other frames. The definition of the frame is as follows: Char frameID [4]; / ​​* Identify one frame with four characters, indicating the content, and there is a commonly used identification control table * / char size [4]; / ​​* The size of the frame content, no Including a frame header, not less than 1 * / char flags [2]; / * Store flag, only 6 digits, later explain * / 1 later). The frame identification identifies one frame with four characters, indicating the content of a frame Meaning, commonly used controls are as follows: Tit2 = Title represents the title of this song, the same TPE1 = author TALB = Collection TRCK = track format: n / m where N is the native of the special concentration, m is the special set of CCP M-first, N and M represent the digital TYER = era in the ASCII code is the digital TCON = type of the ASCII code to represent the string to represent the string of the string: "ENG / 0 Note Content", where the eNG represents the note. natural language

2). Size this can have no labeling, the algorithm, the 8-bit full-purpose, format of each byte is the following xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx algorithm as follows: int fsize; fsize = size [0] * 0x100000000 size [1] * 0x10000 size [2] * 0x100 size [3]; 3). The flag is only defined by 6 digits, and the additional 10 digits are 0, but the 16 digits are 0 in most cases. The format is as follows: ABC00000 IJK00000 A - Label Protection Sign, Set this frame as a waste B - file protection flag, set this frame as a waste C - read-only flag, set this frame not to modify (but I didn't find A Software Rational Association This flag) I - compressed flag, set one byte when a byte stores two BCD code indicates the number J - encryption flag (no MP3 file label is encrypted) K - group flag, set Explanation This frame and other frames are worth mentioning that WINAMP will add a '/ 0' in front of the content when saving and reading the frame content, and calculates this byte in the size of the frame content. Detailed situation can be at http://www.id3.org/ query, for the read and writing of ID3v1 and id3v2, I wrote two classes with Delphi to implement, I can write to me QDZHANG@sohu.com. The meaning of the frame identity 4). Declared ID3v2 FrameSthe Following Frames Are Declared In this Draft.

Aenc Audio Encryption Apic Attached Picture

Comm Comments Commer Frame

ENCR ENCRYPTION METHOD Registration Equa Equalization etco event Timing Codes

GeoB General Encapsulated Object Grid Group Identification Registration

IPLS Involved People List

Link Linked Information

MCDI Music CD Identifier MLLT MPEG LOCATION LOOKUP TABLE

OWNE OWNERSHIP FRAME

Priv Private Frame PCNT Play Counter Popm Popularime Poss Posctions Synchronisation Frame

RBUF Recommended Buffer Size Rvad Relative Volume Adjustment RVRB REVERB

Sylt Synchronized Lyric / Text Sytc Synchronized TEMPO CODES

TALB Album / Movie / Show title TBPM BPM (beats per minute) TCOM Composer TCON Content type TCOP Copyright message TDAT Date TDLY Playlist delay TENC Encoded by TEXT Lyricist / Text writer TFLT File type TIME Time TIT1 Content group description TIT2 Title / songname / content description TIT3 Subtitle / Description refinement TKEY Initial key TLAN Language (s) TLEN Length TMED Media type TOAL Original album / movie / show title TOFN Original filename TOLY Original lyricist (s) / text writer (s) TOPE Original artist (s) / performer (s) TORY Original release year TOWN File owner / licensee TPE1 Lead performer (s) / Soloist (s) TPE2 Band / orchestra / accompaniment TPE3 Conductor / performer refinement TPE4 Interpreted, remixed, or otherwise modified by TPOS Part of a set TPUB Publisher Trck TRACK NUMBER / POSITION INTERNET RADA RECORDING DATES TRSITERNET RADIO Station Owner Tsiz Size Tsrc ISRC (International Standard Recording Code) Tsse Software / Hardware A Nd settings buy for encoding type, txxx user defined text information frameufid unique file iDentifier user terms of use USLT unsychronized lyric / text transcription

WCOM Commercial information WCOP Copyright / Legal information WOAF Official audio file webpage WOAR Official artist / performer webpage WOAS Official audio source webpage WORS Official internet radio station homepage WPAY Payment WPUB Publishers official webpage WXXX User defined URL link frame

转载请注明原文地址:https://www.9cbs.com/read-10677.html

New Post(0)