EXIF format analysis and through XML processing
Raptor [Mental Studio] (Personal Column) (Blog)
Http://eental.mentsu.com
With the popularity of digital cameras, EXIF has been supported by most image processing software. Although I did a little girl (see "My Album of Human Information Assistant") But after all, it is also used for image processing, although the JPEG file format is currently supported, but does not support EXIF.
So, what is EXIF? EXIF is an abbreviation of Exchangeable Image File Format, which is the exchange of image files, which is a standard developed by the Japanese Electronics and Information Technology Industry Association (Jeita) to achieve communication between different software or devices. Image data, typical applications are digital cameras directly connect printer printing photos. Of course, EXIF also includes rich information, which you can know what this digital photo is taken, the aperture, speed, ISO, etc. used in shooting. And the latest version of EXIF also supports audio format files.
The most authoritative documentation of EXIF is of course the standard specification of Jeita [1], the latest version is 2.2. However, in Jeita's website, although two language versions (Japanese and English, and Jeita declare are subject to Japanese version), it is necessary to charge. Fortunately, I have found an English version through Google.
EXIF only provides support for both image file format: TIFF [2] and JPEG [3, 4]. The compressed image is used in the JPEG format using the TIFF format for non-compressed images. This article focuses on JPEG format.
We know that the JPEG file format is based on information about the image by so-called Marker Segments, which has very good flexibility and expandability, compared with the early PCX, GIF, BMP, etc. The way is much better (PCX is originally designed for 16-color image. After the 256 color image appears, the original format definition is destroyed, and the palette is on the end of the file; while the GIF has a segmentation mechanism inside, it is later expanded In order to achieve animation, it is still the basic information of file header using fixed formats), and EXIF uses this.
Every Marker Segments in the JPEG file begins with a word type (Note: This numeric record is in the file, the high byte is before, the low byte is later, will introduce this byte order after later. This value is the so-called Marker, each Marker represents the meaning of the corresponding segment, if this segment has content (ie, the length is greater than 0, whether it is determined by the specific Marker), the next Word type value is The length of this segment (the same as the byte order is the same as Marker), as for the specific content of Segment, there is a different definition according to the different Marker. If the FFD8 is called SOI, indicating the beginning of the image, this section is no content; if FFE0 is app0, that is, the application segment 0, which is a customizable data, it has been used for JFIF [4], this section Then there is content, the next Word is the length of the segment, and the definition of segment content is defined by the JFIF specification.
EXIF is also an extension definition, similar to JFIF, which uses two Marker Segments of App1 and App2. The reason why two Marker is used because, as mentioned earlier, the length of Segment is represented by a word, which is not more than 64K. Because EXIF supports a non-destructive image format called FlashPix, its data is likely to exceed 64K, so I use app2, where APP2 can have multiple, but because of the support of FlashPix belongs to Exif, Appendix is appendix F is described [1]), usually rarely used, this article is not discussed. The app1 segment defined by EXIF is a standard JPEG MARKER Segment, as shown in Table 1. The value of the App1 Marker is FFE1, Length is the length of this segment, which includes two bytes that Length itself, but does not include the two bytes of Marker. The remaining part in the segment is EXIF data.
The format definition of EXIF data is also very simple, as shown in Table 2. It includes two parts: EXIF head and Tiff head. The EXIF head consists of six bytes, its contents are a length of 4 Asciiz (ASCII) string with NULL, add one byte 0 (used to make the data to be aligned), and this ASCIZ string content is "EXIF". The TIFF head is a definition of standard TIFF file format (TIFF is also a flexible file format, and it is too flexible to some extent), which allows JPEG and Tifff in both formats of EXIF Information can be processed in a consistent method.
Starting length (bytes) content 0x00 2 app1 marker (0xffe1) 0x02 2 Length 0x04 Length - 2 Exif Data
Table 1: APP1 segment format definition
Starting length (bytes) Content 0x00 6 Exif Header 0x06 App1 Length - 8 Tiff HEADER
Table 2: EXIF format definition
Starting length (bytes) content 0x00 2 byte Order 0x02 2 Flag (0x2a) 0x04 4 The Offset of the first ifd
Table 3: TIFF Image File HEADER format definition
TIFF Header [2] includes two parts: Image File HEADER and IFD (Image File Directory) Links. The definition of image file header is shown in Table 3. Where Byte Order is used to illustrate the byte sequence employed by this TIFF file, two characters are represented, and there are two options, namely II and MM (this mm is independent of the eyebrows, where II means adopting intel bytes The order, and MM refers to the MOTOLORA byte order (see below). FLAG is a flag of TIFF file format, always 0x002A, ie decimal number 42. The last DWORD is the starting position of the first IFD. The starting point of its offset is the starting point of Tiff Header, that is, if the first IFD is immediately image file, this value is 8 ( Image File HEADER size).
Description of the byte order:
The byte order is a problem that you need to pay attention to the exchangeable file format. The so-called "exchangeable file format" is to say that this file format can be correctly interpreted under a variety of different software and hardware platforms. The cause of the byte order problem is on the hardware.
In the early stage of the CPU (8-bit CPU era), many 8-bit CPUs can handle 16-bit data due to the extensive instruction set, of course, is divided twice, then the byte order of the occurrence: Is the high-level byte or first processing the low byte? Different CPU vendors use different options! CPU vendors represented by companies such as Intel, Zilog, have a low-level manner, that is, low-level addresses to save low bytes of data; while MoTolora (it can do not do mobile phones, it is the largest electronic product in the world Manufacturers) use a high-level low-level manner to consistent with the usual human reading order. The corresponding hardware is the use of the software running on the IBM PC of Intel architecture and its compatible machines, and Apple Mac, which is used by IBM, MoTOLORA, Apple, using the Power PC chip, using the MOTOLORA sequence. Now, the byte order problem does not only appear on the image format, because the Unicode character set (UCS) also uses a 16-bit (UCS-2) or 32-bit (UCS-4) to represent a character, so it is also facing byte order The problem.
In addition, according to the characteristics of the respective byte sequence, Intel's byte order is also called Little-Endian, and MoTolora's byte order is called BIG-Endian.
Figure 1: IFD chain structure
IFD is a chain table structure, as shown in Figure 1, at the end of each IFD, contains a shift amount to the next IFD (in the same way from Tiff Header), if this offset is 0, indicating that it has arrived The end of the linked list. EXIF only uses two TIFF IFDs, which are called IFD0 and IFD1, but define three own IFDs: EXIF IFD, GPS IFD, Interoperability IFD, their structure is the same as the standard Tiff IFD, but not recorded in Tiff In the IFD linked list, it is used as an extension record of IFD0.
(to be continued)