1 Introduction
Since released in 1982, RFC822 has defined a standard format for transferring text messages on the Internet. The RFC822 format is so successful, it has been fully or partially accepted by everyone, and the degree even beyond the Internet SMTP defined in RFC821. It is also because this format is widely used, so many restriction factors are increasingly constrained by the user.
RFC822 is developed to specify text information format. Thus, non-text information - such as multimedia information containing audio or images, is completely not mentioned. This is even like this for some text. RFC822 does not apply to users that require more than US-ASCII character set content. Because RFC822 does not define a mechanism to allow messages to include audio, video, Asian languages, or even some European language text, so there is an additional specification to explain.
One of RFC821 / 822 is a clear limit on mail system, which is to limit the contents of the email message in some short lines consisting of 7-bit US-ASCII characters (1000 bytes per line [RFC821] ). This forces the user to convert the non-text data to be sent to the prinable 7-bit US-ASCII character before using the local user agent (UA-User Agent) to send their mail before sending their mail. . The coding method currently used on the Internet is: pure hexadecion, UuEncode, RFC1421, which is described in the Base 64, ATK (The Andrew Toolkit Representation) and some other ways.
When the gateway is designed to exchange mail information between the RFC822 host and the X.400 host, the limitations of RFC822 are more obvious. X.400 [x400] Specifies a mechanism for incorporating non-text content in an email message. Currently, from the mapped standard of the X.400 message to the RFC822 message, the non-text section in the X.400 message must be converted to the IA5Text format, otherwise these content will be discarded, and when discarding, the RFC822 is notified user. This is obviously unhappy when a user lost the content he wants to receive. Even when the user agent cannot handle non-text content, the user can take some additional mechanisms to extract useful information. In addition, this process does not take into account: The message may be forwarded to the X.400 message processing system that supports non-text information.
This document describes several mechanisms that combine them to solve most of these problems without incorporating problems that are incompatible with RFC822. In detail, it describes:
(1) "MIME-VERSION" header field. It uses a version number to explain the message for MIME, and allows the mail processing agent to distinguish such messages from other messages generated by the old version or not applicable software.
(2) The "Content-Type" header field indicated in RFC1049. The media type (MEDIA TYPE) and subtypes used to specify message data, and the local representation method (specification form) specifying the data.
(3) "Content-Transfer-Encoding" header field. It is used to designate the scope of the encoding conversion method applied to the main body (body). The encoding conversion is different from the constant conversion, which is usually used to make the data through the mail transport mechanisms that are limited by data or character sets.
(4) Two additional head fields: "Content-ID", "Content-Description". They are used to describe data in the body (body).
All header fields in this document define all the syntactors specified in RFC822. In addition, in addition to "Content-Disposition", all of these headers can include RFC822 comments. These annotations do not make practical, and should be ignored during the MIME process. Finally, in order to illustrate and promote interoperability, RFC2049 provides a basic applicability declaration for subsets of the above mechanisms. It defines the minimum of this document.
Historical Note: When you read it, several mechanisms described in this set of documents look somewhat strange or have a baroque style. It is important to note that the two have the same priority with the development of the group of documents, compatible with existing standards and the rasticity of existing habits. Especially "compatibility" will always be "simple".
Please review the current version of the Internet Official Protocol Standards to get the standardized status and situation of this protocol. RFC 822 and STD3, RFC 1123 also provide the basic background of MIME, which meets the implementation of MIME. In addition, MIME's implementation may care about several additional RFC documents, especially RFC1344, RFC1345, and RFC1524.
2. Definition, agreement, and general BNF syntax
Although the mechanism defined by this group of documents is given in the form of text, there is still a part of the BNF symbol defined by the RFC 822. To understand this group of documents, the implementation needs to be familiar with these symbols and refer to RFC 822 to get the complete explanation of these extended BNF symbols.
The names of some of the extended BNFs in this document refers to the syntactic rules in RFC 822. Or to get a complete syntax, the following is to be combined: the appendix of the syntax is collected in each document, and the BNF defined in RFC 822 and the correction in RFC 822 in RFC 1123 are collected. (The grammatical change of "return", "data", "mailbox" is given)
In this set of documents, all numerical words and bytes of the bytes are given by decimal form. All media types, subtypes, and parameter names are case sensitive. However, unless otherwise stated, the parameter content is case relevant.
Format Note: This section "Note" provides some unimportant information, you can skip them while reading, and you won't miss anything. The basic purpose of adding these comments is to illustrate the basic principle of this series of documents, or in order to place it properly in history or development. This information can be ignored by those who only care about the implementation, but for those who wish to understand why some design will be applied, this information will still have a certain use.
2.1 CRLF
In this series of documents, the term CRLF refers to a US-ASCII character sequence. It consists of two characters: Cr (decimal value is 13) and LF (decimal value 10), which is placed in order, constitutes a wrap of RFC 822 mail.
2.2 Character Set
In MIME, the term "character set" is used to indicate a method of converting byte sequences into a character sequence. Note that the reverse direction does not require absolute, clear conversions, because not all characters can be described by a known character set, and a character set may provide more than one byte sequence to represent a character sequence.
This definition allows various types of character encoding as a character set, such as from simple single table mapping (e.g., US-ASCII) to multi-table conversion methods, such as using ISO 2022 technology). However, the definition associated with the MIME character set must fully explain the map to be performed. In particular, external description information is not allowed to determine precise mapping. Note: The term character set ("Character SET") was originally used to describe some simple solutions such as US-ASCII and ISO-8859-1, which are one-to-one mapping from a single byte to a single character. Multi-byte coding character sets and conversion methods make the situation more complicated. For example, some groups use the terms "character eNCoding", rather than the term "character set" as used in MIME to represent the character set, and use "Coded Character Set" to abstract representation from integers (rather than bytes) to characters. Mapping.
2.3 Message (Message)
The term "Message) indicates that the RFC822 message transmitted on the Internet is not further limited, or represents compressed in" Message / RFC822 "or" Message / Partial ". content.
2.4 Entity (Entity)
The term "entry" specifically refers to the header field and contents, which exist in a message (Message) and a plurality of entities. The specification for these entities is the basic content of MIME. Since the content of an entity is often referred to as "body" ("body"), it is meaningful to the entity main body. Any field can appear in the entity header information, but only those fields starting with "Content-" have real, and related to MIME. Note that this does not mean that they have no meaning, and the meaning of entity (or message) without MIME header field is defined by RFC822.
2.5 Some main body (Body Part)
"Body Part" refers to an entity (entity) in multi-physical Entity.
2.6 Main Body (Body)
When not further explained, the term "body" (body "refers to an entity (entry). That is, the body part of "message" or "partial part" is referred to.
Note: It is obvious that the above four concepts are cycled. Because the entire structure of the MIME message is recursive, this situation is inevitable.
2.7 7-bit data (7bit Data)
The "7-bit data" (7bit DATA) is a relatively short data line: 998 or less 8-bit bytes per line, and the line division segment is CRLF sequence [RFC-821]. The value of each 8-bit byte can not be greater than the decimal 127, and cannot be NUL (decimal 0), and the Cr (decimal value is 13) and LF (decimal value 10) bytes can only appear CRLF sequences.
2.8 8-bit data (8bit Data)
The "8-bit data" (8bit DATA) is a relatively short data line: 998 or less 8-bit bytes per line, and the line division cleavage is CRLF sequence [RFC-821]. It is, but the value of the byte can be greater than the decimal 127. Like "7bit Data", Cr (decimal value 13) and LF (decimal value is 10) bytes can only occur in the CRLF sequence, and the value of the byte cannot be NUL (decimal 0). 2.9 Binary Data (Binary Data)
"Binary Data" means data that can contain any byte sequence.
2.10 lines (LINES)
"LINES) is defined as a byte sequence separated by CRLF. This is consistent with RFC 821 and RFC 822. "LINES" means the data unit in the message message, which can meet or do not comply with the real situation displayed by the User Agent.