Gzip file format introduction

zhaozj2021-02-16  48

Gzip is originally created by Jean-Loup Gailly and Mark Adler for file compression for UNIX systems. We often use the suffix for .gz file in Linux, they are gzip format. Today, it has become a very common data compressed format on the Internet, or a file format. GZIP coding on the HTTP protocol is a technology used to improve web application performance. The large-flow Web site often uses Gzip compression technology to make users feel faster.

Gzip itself is just a file format, which typically uses the DEFLATE data format, and DEFLATE uses the LZ77 compression algorithm to compress the data.

The Gzip file consists of 1 to multiple "block", which is actually only 1 piece. Each block contains heads, data, and tail. The sequence is as follows:

- - - - - - - - - - ======== / / ======== =========== // ========== - - - - --- - - ---

| Id1 | ID2 | CM | FLG | MTIME | XFL | OS | Additional Head Field | Compressed Data | CRC32 | Isize |

- - - - - - - - - - ======== / / ======== =========== // ========== - - - - --- - - ---

Head part

ID1 and ID2: 1 byte. The fixed value, ID1 = 31 (0x1f), ID2 = 139 (0x8b) indicates Gzip format. CM: 1 byte. Compression method. There is only one type: cm = 8, indicating the deflate method. FLG: 1 byte. Sign. Bit 0 fText - Indicates text data bit 1 FHCRC - Indicates the existing CRC16 header check field bit 2 FEXTRA - Indicates the optional field bit 3 FNAME - Indicates the original file name field bit 4 fcomment - Indicates the presence of the annotation field bit 5-7 preserved Mtime : 4 bytes. Change time. UINX format. XFL: 1 byte. Additional logo. When cm = 8, XFL = 2 - maximum compressed but the slowest algorithm; XFL = 4 - the fastest but minimum compressed algorithm OS: 1 byte. Operating system, exactly, it should be a file system. There is the following definition: 0 - FAT file system (MS-DOS, OS / 2, NT / WIN32) 1 - Amiga2 - VMS / OpenVMS3 - UNIX4 - VM / CMS5 - Atari TOS6 - HPFS file system (OS / 2, NT) 7 - Macintosh8 - Z-SYSTEM9 - CP / M10 - TOPS-2011 - NTFS File System (NT) 12 - QDoS13 - Acorn Riscos255 - Unknown Additional Headed Field: (if flg.fextra = 1) - --- --- - ========================================== | Si1 | Si2 | XLEN | Options for length xlen bytes |

- --- --- - =================================

(If flg.fname = 1) ======================================== ========

| Original file name (end with NULL) |

=================================================================================================================================================================

(If flg.fcomment = 1) ========================================= ========= | Comment text (only ISO-8859-1 characters, ending with NULL) |

=================================================================================================================================================================

(If flg.fhcrc = 1) - ---

| CRC16 |

- -

When there is an additional option, Si1 and Si2 indicate optional ID, XLEN instructs the number of options by option. For example, Si1 = 0x41 ('A'), Si2 = 0x70 ('p'), indicating that option is additional data in the Apollo file format. 2. Data section DEFLATE data format, contains a series of sub-data blocks. The sub-block profile is as follows: ... ... ... ======================== ====

| BFINAL | BTYPE | Data |

... ... ... ============================

BFinal: 1 bits. 0 - There is also a subsequent sub-block; 1 - the sub-block is the last piece. BTYPE: 2 bits. 00 - no compression; 01 - Static HUFFMAN encoding compression; 10 - Dynamic HUFFMAN encoding compression; 11 - Reserved. For various cases, please refer to the RFC documentation listed later. 3. Tail section CRC32: 4 bytes. 32-bit checksum of the original (uncompressed) data. ISIZE: 4 bytes. The length of the original (uncompressed) data is 32 bits. The sequence of bytes in GZIP is the LSB mode, that is, Little-Endian, the opposite of ZLIB. Below is a brief analysis of Gzip file gzip-1.3.3.tar.gz format: Gzip has a deep origin of Gzip and ZLIB. For more detailed descriptions such as Zlib, Gzip, and DEFLATE, refer to RFC 1950-1952. Other references can also be found in these documents. Gzip has become an integral part of GNU Project, its official site is www.gzip.org. You can download it to the Gzip source code here. The latest version is 1.2.4, and 1.3.3 of the beta version. [Resources] GZIP official website: www.gzip.org RFC 1950 - ZLIB Compressed Data Format Specification version 3.3 RFC 1951 - DEFLATE Compressed Data Format Specification version 1.3 RFC 1952 - GZIP file format specification version 4.3 Branch brain Studio (Kernel Studio) : Www.kernelstudio.com First Release: 2003-12-16 Final Amendment: 2003-12-16

转载请注明原文地址:https://www.9cbs.com/read-23661.html

New Post(0)