Multi-Use Internet Mail Extension (Multi-Used Internet Save Extension Protocol (MIME))
Part 1: Internet Information Form of INTERNET
(RFC2045 - MULTIPURPOSE INTERNET MAIL EXTENSIS (MIME) Part ONE: FORMAT OF Internet Message Bodies
The state of this memo
This document describes an Internet standard tracking protocol, which requires further discussion and recommendations to improve. Please refer to the latest version of the Internet Formal Protocol Standard (STD1) to get the standardization and status of this protocol. The release of this memo is not restricted.
Summary
STD11, RFC 882 defines a information representation protocol that specifies the details of the US-ASCII message header, and specifies the message content (Message Body) for US-ASCII text format. . This series of documents are commonly referred to as MIME (MultiPurpose Internet Mail Extensions), redefining a series of information formats allowing the following content:
(1) Non-US-ASCII character set text message (Message Body)
(2) Extension of non-text messages in different formats
(3) Multi-part yield (Message Body)
(4) Text header of non-US-ASCII character set
This set of documents are based on earlier document RFC934, STD 11, and RFC1049, but expand and correct them. Since the RFC822 is too small to the Message Body, the correlation between the document and the RFC822 is not large (not a modified RFC822).
This document illustrates a variety of headers for describing MIME messages; the second document RFC 2046 defines the overall structure of the MIME media type system and defines the initial set of media types; the third document is RFC 2047, which expands RFC822, allows non-US-ASCII text to occur in the Internet mail header; the fourth document RFC2048 describes the different IANA registration processes of the MIME-related program; the fifth is also the last document RFC 2049 describes the MIME consistency standard, and it provides Some explanatory examples about the MIME message format, as well as "Acknowledrance" and "Reference Book".
These documents are RFC1521, RFC1522 and RFC1590 revised, while three RFCs are also revised in RFC1341 and 1342. The appendix in RFC2049 describes the different and variations of previous versions.
table of Contents
Introduction ... 3
2. Define, agreed, and general BNF grammar ... 4
2.1 CRLF. 5
2.2 Character Set ... 5
2.3 Message. 5
2.4 Entity ... 6
2.5 Some main body (Body Part) ... 6
2.6 main body (body) ... 6
2.7 7-bit data (7bit Data) ... 6
2.8 8-bit data (8bit data) ... 6
2.9 Binary Data (Binary Data) ... 7
2.10 lines (LINES) 7
3. Mime header field (MIME Header Fields) ... 7
4. MIME-VERSION header field ... 8
5. Content-Type header field ... 9
5.1 Syntax of the Content-Type header ... 10
5.2 default value for Content-Type ... 12
6. Content-Transfer-encoding header ... 126.1 Content-Transfer-Encoding Sentence ... 12
6.2 Content-Transfer-Encoding Semiology ... 13
6.3 New Content-Transfer-Encoding. 14
6.4 Explanation and Use ... 14
6.5 Code Conversion ... 16
6.6 Specification Coding Model ... 16
6.7 quoted-printable encoding ... 16
6.8 Base64 Content-Transfer-Encoding. 20
7. Content-ID header field ... 21
8. Content-description header field ... 22
9. Additional MIME headers ... 22
10. Abstract ... 22
11. Safety considerations ... 23
12. Author address ... 23
Appendix A: Collected Syntax ... 24
1. Introduction Since the release of 1982, RFC822 has defined a standard format that transmits text messages on the Internet. The RFC822 format is so successful, it has been fully or partially accepted by everyone, and the degree even beyond the Internet SMTP defined in RFC821. It is also because this format is widely used, so many restriction factors are increasingly constrained by the user.
RFC822 is developed to specify text information format. Thus, non-text information - such as multimedia information containing audio or images, is completely not mentioned. This is even like this for some text. RFC822 does not apply to users that require more than US-ASCII character set content. Because RFC822 does not define a mechanism to allow messages to include audio, video, Asian languages, or even some European language text, so there is an additional specification to explain.
One of RFC821 / 822 is a clear limit on mail system, which is to limit the contents of the email message in some short lines consisting of 7-bit US-ASCII characters (1000 bytes per line [RFC821] ). This forces the user to convert the non-text data to be sent to the prinable 7-bit US-ASCII character before using the local user agent (UA-User Agent) to send their mail before sending their mail. . The coding method currently used on the Internet is: pure hexadecion, UuEncode, RFC1421, which is described in the Base 64, ATK (The Andrew Toolkit Representation) and some other ways.
When the gateway is designed to exchange mail information between the RFC822 host and the X.400 host, the limitations of RFC822 are more obvious. X.400 [x400] Specifies a mechanism for incorporating non-text content in an email message. Currently, from the mapped standard of the X.400 message to the RFC822 message, the non-text section in the X.400 message must be converted to the IA5Text format, otherwise these content will be discarded, and when discarding, the RFC822 is notified user. This is obviously unhappy when a user lost the content he wants to receive. Even when the user agent cannot handle non-text content, the user can take some additional mechanisms to extract useful information. In addition, this process does not take into account: The message may be forwarded to the X.400 message processing system that supports non-text information.
This document describes several mechanisms that combine them to solve most of these problems without incorporating problems that are incompatible with RFC822. In detail, it describes:
(1) "MIME-VERSION" header field. It uses a version number to explain the message for MIME, and allows the mail processing agent to distinguish such messages from other messages generated by the old version or not applicable software. (2) The "Content-Type" header field indicated in RFC1049. The media type (MEDIA TYPE) and subtypes used to specify message data, and the local representation method (specification form) specifying the data.
(3) "Content-Transfer-Encoding" header field. It is used to designate the scope of the encoding conversion method applied to the main body (body). The encoding conversion is different from the constant conversion, which is usually used to make the data through the mail transport mechanisms that are limited by data or character sets.
(4) Two additional head fields: "Content-ID", "Content-Description". They are used more deeply to describe data in the body (BODY).
All header fields in this document define all the syntactors specified in RFC822. In addition, in addition to "Content-Disposition", all of these headers can include RFC822 comments. These annotations do not make practical, and should be ignored during the MIME process.
Finally, in order to illustrate and promote interoperability, RFC2049 provides a basic applicability declaration for subsets of the above mechanisms. It defines the minimum of this document.
Historical Note: When you read it, several mechanisms described in this set of documents look somewhat strange or have a baroque style. It is important to note that the two have the same priority with the development of the group of documents, compatible with existing standards and the rasticity of existing habits. Especially "compatibility" will always be "simple".
Please review the current version of the Internet Official Protocol Standards to get the standardized status and situation of this protocol. RFC 822 and STD3, RFC 1123 also provide the basic background of MIME, which meets the implementation of MIME. In addition, MIME's implementation may care about several additional RFC documents, especially RFC1344, RFC1345, and RFC1524.
2. Definition, agreed, and general BNF syntax Although the mechanism defined by this set of documents is given in the form of text, there is still a part of the BNF symbol defined by RFC 822. To understand this group of documents, the implementation needs to be familiar with these symbols and refer to RFC 822 to get the complete explanation of these extended BNF symbols.
The names of some of the extended BNFs in this document refers to the syntactic rules in RFC 822. Or to get a complete syntax, the following is to be combined: the appendix of the syntax is collected in each document, and the BNF defined in RFC 822 and the correction in RFC 822 in RFC 1123 are collected. (The grammatical change of "return", "data", "mailbox" is given)
In this set of documents, all numerical words and bytes of the bytes are given by decimal form. All media types, subtypes, and parameter names are case sensitive. However, unless otherwise stated, the parameter content is case relevant.
Format Note: This section "Note" provides some unimportant information, you can skip them while reading, and you won't miss anything. The basic purpose of adding these comments is to illustrate the basic principle of this series of documents, or in order to place it properly in history or development. This information can be ignored by those who only care about the implementation, but for those who wish to understand why some design will be applied, this information will still have a certain use.
2.1 CRLF In this series of documents, the term CRLF refers to a US-ASCII character sequence. It consists of two characters: Cr (decimal value is 13) and LF (decimal value 10), which is placed in order, constitutes a wrap of RFC 822 mail. 2.2 Character SET In MIME, the term "character set" is used to represent a method of converting byte sequences into a character sequence. Note that the reverse direction does not require absolute, clear conversions, because not all characters can be described by a known character set, and a character set may provide more than one byte sequence to represent a character sequence.
This definition allows various types of character encoding as a character set, such as from simple single table mapping (e.g., US-ASCII) to multi-table conversion methods, such as using ISO 2022 technology). However, the definition associated with the MIME character set must fully explain the map to be performed. In particular, external description information is not allowed to determine precise mapping.
Note: The term character set ("Character SET") was originally used to describe some simple solutions such as US-ASCII and ISO-8859-1, which are one-to-one mapping from a single byte to a single character. Multi-byte coding character sets and conversion methods make the situation more complicated. For example, some groups use the terms "character eNCoding", rather than the term "character set" as used in MIME to represent the character set, and use "Coded Character Set" to abstract representation from integers (rather than bytes) to characters. Mapping.
2.3 Message The term "message" is not further limited, indicating (complete or "top") RFC822 message transmitted on the Internet, or represents compressed in "Message / RFC822" or "Message The content in / partial.
2.4 Entity The term "entity" specifically refers to the header field and contents of the MIME definition, which exists in a message (Message) and a plurality of entities. The specification for these entities is the basic content of MIME. Since the content of an entity is often referred to as "body" ("body"), it is meaningful to the entity main body. Any field can appear in the entity header information, but only those fields starting with "Content-" have real, and related to MIME. Note that this does not mean that they have no meaning, and the meaning of entity (or message) without MIME header field is defined by RFC822.
2.5 Part Parts "Body Part" refers to an entity (entity) in multi-physical Entity.
2.6 The main body (Body) is not further explained, the term "body" (body "refers to an entity (entry). That is, the body part of "message" or "partial part" is referred to.
Note: It is obvious that the above four concepts are cycled. Because the entire structure of the MIME message is recursive, this situation is inevitable.
2.7 7-bit data (7bit DATA) "7 bit data" (7bit data) is a relatively short data line: 998 or less 8-bit bytes per line, the line division is CRLF Sequence [RFC-821]. The value of each 8-bit byte can not be greater than the decimal 127, and cannot be NUL (decimal 0), and the Cr (decimal value is 13) and LF (decimal value 10) bytes can only appear CRLF sequences. 2.8 8-bit data (8bit Data) "8 bit data" (8bit data) is a relatively short data line: 998 or less 8-bit bytes per line, and the line division is CRLF Sequence [RFC-821]. It is, but the value of the byte can be greater than the decimal 127. Like "7bit Data", Cr (decimal value 13) and LF (decimal value is 10) bytes can only occur in the CRLF sequence, and the value of the byte cannot be NUL (decimal 0).
2.9 Binary Data (Binary Data) means data that can contain any byte sequence.
2.10 lines "" line "is defined as byte sequence separated by CRLF. This is consistent with RFC 821 and RFC 822. "LINES" means the data unit in the message message, which can meet or do not comply with the real situation displayed by the User Agent. 3. MIME Header Fields MIME defines a number of new RFC822 fields to describe the MIME entity content. These headers will appear at least two places:
(1) As part of the RFC822 message (Message) header information.
(2) In the multipart construct, "the Body Part" message is stored.
The form of these headers is defined as follows:
Entity-Headers: = [Content CRLF]
[Encoding CRLF]
[ID CRLF]
[Description CRLF]
* (MIME-EXTENSION-FIELD CRLF)
Mime-Message-Headers: = Entity-Headers
Fields
Version CRLF
; Current BNF implied entity head information
The order can be ignored.
Mime-part-headers: = Entity-Headers
[fields]
Any field that does not start with "content-"
It is not defined, it can be ignored.
; Current BNF implied entity head information
The order can also be ignored.
The syntax details of different MIME headers will be described in the following sections.
4. MIME-VERSION header Since released RFC 822 since 1982, there is only this Internet message format standard, and almost no one realizes that it is necessary to declare the formats in use. This document is an independent statement that supplements RFC822. Although the extensions made in this document have been defined as being compatible with RFC 822, the mail processing agent still needs to know if a message is composed of new standards.
To this end, this document defines a new header field: "MIME-VERSION". It is used to declare the version number of the format used by the Internet Message Body.
According to the message configured in this document, you must include this header field as follows: MIME-VERSION: 1.0
This field is a declaration that indicates that the structure of the message is in line with the format specified in this document.
Because there is a possibility of expanding the message format again in the future document, here is a BNF of the mime-version header field:
Version: = "Mime-Version": "1 * Digit". "1 * DIGIT
In this way, the future format specifiers are constrained as two integers separated by decimal points, which may replace or extend characters: "1.0". If a message is received, its mime-version value is not "1.0", then it can be assumed that it does not meet the specifications of this document.
There is also a thing worth noting that you cannot use the MIME-VERSION mechanism to implement version control of the media type. Special, some formats (such as Application / PostScript) have an agreement number included in the media format. When this convention exists, MIME does not replace it. When this context does not exist, MIME will declare a "Version" parameter in the "Content-Type" field when necessary.
The problem that the implementator should pay attention to: When checking MIME-VERSION, it must ignore any comment portions defined in RFC 822. In detail, the following mime-version fields are equivalent:
MIME-VERSION: 1.0
MIME-VERSION: 1.0 (Produced by Metasend Vx.x)
MIME-VERSION: (Produced by Metasend Vx.x) 1.0
MIME-VERSION: 1. (Producted by Metasend Vx.x) 0
When the MIME-VERSION field is missing, the agent (whether this agent is in line with MIME requirements) can be arbitrarily explained in accordance with the local agreed. Many of the present uses in the current use. It should be noted that in the actual non-MIME message can contain anything.
Unable to determine that only the plain text content of the US-ASCII character set in a non-MIME message message is likely to use some non-standard local conventions that have earlier than MIME, or content containing other character sets or non- The content of the text, so that the message cannot be identified automatically. (If the Unix Tar compressed file encoded by UUENCODE mode)
5. Content-Type header field
The purpose of setting the "Content-Type" header is to complete the content of the data in the body (body). In this way, the receiving agent can select the appropriate proxy or mechanism to present the data content to the user, or processes data in an appropriate manner. This field value is called "Media Type".
Historical Note: The "Content-Type" header is initially defined in RFC 1049. The RFC1049 is used is relatively simple, unsatisfactory syntax, but is largely compatible with the mechanism defined by this document.
The "Content-Type" header field illustrates the original type of data in the body body (Body of An Entity) by specifying the identifier of the media type and subtype, and it also provides assistance information for some special media types. After the media type and subtype name, the rest of this field is parameters, which give in the form of "attribute = value", as for these parameters, are given in any order, is not important.
In general, the top media type is used to declare the general type of data, while the subtype indicates the details of the data. Therefore, the media type "image / xyz" is enough to enable the user agent to know that the received data is an image, even if the user agent does not know this special image type: "XYZ". Therefore, such information can be used to determine whether to display the original data of the subclass of unrecognized subcarpse to the user, this operation is reasonable for the text content of the unrecognizable subtype, but not suitable for images ( Image) and audio (AUDIO) data. For this reason, the subtype of text, images, audio, and video cannot contain different types of embedded information. This composite format should be described by the type "Multipart" and "Application". The parameters are the modified components of the media subtype without affecting the nature of the content. A set of meaningful parameters depends on the type and subtype of the media. Most of the parameters are only associated with a single subtype. However, a top-level media type can define some parameters associated with any subtypes therein. For the content type or subType involved, the parameter may be necessary, or may be optional. All non-identifiable parameters must be ignored during the implementation of MIME.
For example, the "charset" parameter can be applied to any subtype in the "Text" type, and the "Boundary" parameter is required to all subtypes in the "Multipart" type.
There is no global parameter suitable for all media types. The real global mechanism is made by defining the "Content- *" field in the MIME prototype.
The original seven top media types are defined in RFC 2046. The five of them are discontinuous types, their content is not concerned about the MIME processing. The additional two types are synthetic, their content requires an additional process of the MIME processor.
This set of top media types have been fully defined. It is desirable to expand the subtype in the initial type when the type of media type is expanded. In the future, you can only define more top media types in the case of expanding this standard. Whether you need to use another top type, this type of name must be started with "x-" to indicate that it is a non-standard state to avoid conflict with the official definition in the future.
5.1 syntax of the Content-Type header field
Define the "Content-Type" header field with an extended BNF symbol, as follows:
Content: = "Content-Type": "Type" / "SUBTYPE
* (";" parameter)
When matching the media type or subtype, it is case-sensitive.
TYPE: = discrete-type / composite-type
Discrete-type: = "text" / "image" / "audio" / "video" /
Application "/ extension-token
Composite-type: = "message" / "multipart" / extension-token
Extension-token: = Ietf-token / X-Token
Ietf-token: = Standards-TRACK RFC And Registered With Iana.> X-token: = Subtype: = extension-token / iana-token IANA-TOKEN: =
of this form must be registered with Iana as specified in rfc 2048.> Parameter: = attribute "=" VALUE Attribute: = token When matching attributes, ; Always case-write Value: = token / quoted-string Token: = 1 * Or Tspecials> Tspecials: = "(" / ")" / "<" / "> /" @ "/ "," / ";" / ":" / "/" / <"> "/" / "["] "/"? "/" = " Must be in quoted-string, ; to use coven parameter values Note that the definition of "Tspecials" is almost the same as the definition of "Specials" in RFC822, but three characters have been added: "/", "?", "=", And remove a character: ". ". It is also noted that the definition of subtypes is a mandatory-subtype cannot be ignored by the Content-Type field, so there is no default subtype (SUBTYPE). Type, subtype, parameter names are case sensitive. For example: "text", "text" and "text" indicate the same top media type. The parameter value is usually case-sensitive, but it is also defined as a case-sensitive form, depending on the specific application. (For example, Multipart Boundary is case-sensitive, and "access-type" is case-sensitive) Note that a quotation number is not included in the parameter value enclosed with quotation, which means that a quotation number is not included in the parameter value, but this is limited to the case where the parameter value is determined using quotation marks. under. In addition, the format is also allowed to appear in this field, and the following two forms are complete equivalent: Content-Type: Text / Plain; Charset = US-ASCII (Plain Text) Content-type: text / plain; charset = "US-ASCII" In addition to these syntactics, the only syntactic constraint for constituent sub-type names is that they cannot conflict in use. That is to say, there is no two different groups to use "Content-Type: Application / Foobar". Define a process of media subtypes and is not restricted: just publish these types of definitions and use them. Therefore, two widely accepted definition media subtypes are as follows: (1) Private value (with the name of "X-" can be two-way definitions between two collaborative work, without external registration Or standardize. This value cannot be registered or formulated as a standard. (2) You can register a new standard to IANA, such as the case described in RFC 2048. The second article in this set of documents: RFC 2046 defines the initial set of media types. 5.2 Default value for Content-Type The RFC822 message without the "Content-Type" header is default that is the content of the US-ASCII character set, the plain text type. It can be accurately described as: Content-type: text / plain; charset = US-ASCII This default is used when not specifying the "Content-Type" header field. Moreover, this default value is also used when you encounter a "Content-Type" header field of syntax errors. When there is a "MIME-VERSION" header field in the message, the "Content-Type" header field is missing, the user agent of the recipient can also assume that the sender is sent by the US-ASCII character set. When there is no "MIME-VERSION" header field or "content-type" header field with error syntax, it can still be assumed to be plain text of the US-ASCII character set, but this may not be the original meaning of the sender. 6. Content-Transfer-Encoding header field Some of the data transmitted via mail may be declared into their "original" format, such as 8 bit character (8bit character) or binary data. These data do not transmit through some transport protocols. Such as: RFC821 (SMTP) can only be constructed of a row of row end symbol CRLF sequences, which includes no more than 1000 bytes, and all characters are 7-bit US-ASCII characters. Therefore, it is necessary to define a mechanism to encode these data (7bit Data) short. Also, when the unlimited format is directly transmitted in a very small system, it is necessary to make appropriate marks in which the coded content is made. This document specifies such encoding through a new "Content-Transfer-Encoding" header field. This header field is not defined in previous criteria. 6.1 Content-Transfer-Encoding Sentence There is only one value in the "Content-Transfer-Encoding" header field, which specifies the type of encoding, the format is as follows: Encoding: = "Content-Transfer-Encoding": "Mechanism Mechanism: = "7bit" / "8bit" / "binary" / "quoted-printable" / "base64" / Ietf-token / x-token These values are unrelated, "Base64", "Base64", "base64" means the same. The encoding mode "7bit" requires that the content in the physical body is 7-bit bytes. And it is also a default, that is, if there is no "content-transfer-encoding" header field, assume that it is: "Content-Transfer-Encoding: 7bit" .6.2 Content-Transfer-Encoding Semantic string "Content -Transfer-encoding actually provides two information. It indicates which coding mode is used in the transmission main body (body) and which decoding must be used to decode the data into its original state. At the same time, it It also indicates the scope of the decoding result. Any "Content-Transfer-Encoding" conversion section - whether it is determined or default-all specified a single, detailed definition decoding algorithm. This algorithm can The encoded any byte sequence is converted to the original sequence before the encoding, or some part of the content is illegal coding sequence. "Content-Transfer-Encoding" conversion will never depend on additional external information. Note that the decoder must be Each legal encoding content provides a single, detailed definition output. For the encoder, there is no such limit. For the same input sequence, the encoder can give different, equivalent coding sequences - this is It is completely legal. Three conversion methods are currently defined: Heng et al, "quoted-printable" encoding, "base64" encoding. The range is "binary", "8bit", "7bit". "Content-Transfer-Encoding" is "7bit", "8bit", "binary", indicating that the encoding conversion has been completed (there is no code), and as a simple identifier, the physical data is given. The range and provide information about some coding mode when transmitting data in a particular transmission system. The term "7bit Data", "8bit Data", "Binary Data" see Section 2. "Quoted-Printable" and "Base64" convert any input content to the "7bit" range so that the data can be transmitted in a restricted system. The definition of the conversion will be given below. The correct "Content-Transfer-Encoding" flag must always be used. The uncoded 8-bit character (8bit characters) flag is not allowed to be "7bit". Moreover, the unpinked content unrelated to the line can only be identified as "binary". Unlike the media subtype, the "Content-Transfer-Encoding" value does not require a child type value. However, it is impossible to establish a single transformation method to "7bit". Because we must trade in the following two aspects: Simple, efficient encoding for longer binary content, or need a more readable encoding content, and the encoding can be not completely 7bit. For this reason, at least two encoding mechanisms are required: more or less readable and "compact", "uniform" encoding (Base64). The way to transfer uncoded 8-bit data is defined in RFC1652. When the present document is originally published, there is no standard that contains uncoded binary data messages containing uncoded binary data messages on the Internet. Therefore, there is no situation in the Internet email without the "binary" content-transfer-encoding type value. However, when the "binary" message is transmitted, or MIME is used to connect any other mail transport mechanism that can transmit "binary" mail, it must be identified by this mechanism. Note: The five values defined for the "Content-Transfer-Encoding" header are just the algorithm for encoding or decoding for the media type. 6.3 New Content-Transfer-Encoding If desired, the implementation can define the private "content-transfeer-encoding" value. However, the X mark must be used, which means to add "X-" in front of the name to illustrate it is a non-standard state. For example, "Content-Transfer-Encoding: X-my-new-encoding". Other "Content-Transfer-Encoding" standard values must be defined by standard path RFC. These instructions must meet the requirements in RFC 2048. Similarly, in addition to the "Content-Transfer-Encoding" name starting with "X-", all the names are reserved for the IETF future. Unlike the media type and subtype, do not advocate a new "Content-Transfer-Encoding" value. Because it hinders interoperability and there is little potential benefits. 6.4 Interpretation and Use If the "Content-Transfer-Encoding" header is part of the message header information, it applies to all the mains (biod) in this message. If the "Content-Transfer-Encoding" header is part of the entity header information, it applies only to the host (body) of this entity. If the entity type is "multipart", "Content-Transfer-Encoding" can only be one of "7bit", "8bit", "binary". Some more stringent constraints will be applied to the subtype of the "Message" media type. It should be noted that most media types are defined according to bytes instead of bits, so the mechanisms described herein are decoding any byheniferous stream rather than a bitstone. If you need to encode the bit stream through these mechanisms, you must first use the network bit sequence standard (big end-by-order order [BIG-Endian]) to turn the bit stream into an 8-digit stream-bit stream Bit appears in the high position of the byte. If the remaining parts remain in the bitstream, you must make up 0. The "Application / OcTeet-Stream" media type with parameters "padding" is provided in RFC 206, which provides the above-mentioned filling mechanism. The encoding mechanism defined here can encode any data as the contents of the US-ASCII character set. Therefore, if a entity has the following field: Content-Type: Text / Plain; Charset = ISO-8859-1 Content-Transfer-Encoding: Base64 It must be explained that the physical content is the base 64 US-ASCII encoded data, and the original data is ISO-8859-1 character set content, and the decoding is also in the same character set ISO-8859-1. A specific "Content-Transfer-Encoding" value can be used for a specific media type. In particular, it is clear that the encoding method other than "7bit", "8bit", "binary" will be approved to any composite media type, such as recursring a media type containing other "Content-Type" fields. The current only composite media type is "Multipart" and "Message". When you want to encode the entity of the "Multipart" or "Message" type, you must encode the real entity that you need to encode in the most inside layer. Also note that if the encoding type of an entity is defined as "7bit", it also contains an entity encoded type "8bit". At this time, or the exterior "7bit" tag is wrong because it contains 8 bits of data; or the internal "8bit" tag puts unnecessary requirements for the transfer system, because it is actually the data type it contains just 7 Bit. Note about coding constraints: Although it is prohibited to use "Content-Transfer-Encoding" in the composite structure may be excessive strict, nesting coding must be prevented - this will result in multiple encoding of data, and to display data correctly The content must be decoded multiple times. Nested encoding will increase the work to the user agent: multiple codes In addition to the efficiency problem, the original structure of the message will become ambiguous. In particular, they imply that only the type of message content is only known after all the decoded operations are performed. Prohibiting nested coding will make the work of the mail gateway complex, however, this problem is much smaller than the impact of repeated coding may give the user agent. For any entity with an unrecognized "Content-Transfer-Encoding" value, regardless of the true value of its "Content-Type", it is necessary to see it as an "Application / OcTet-Stream" type. "Content-Type" relationship with "Content-Transfer-Encoding": It seems that "Content-Transfer-Encoding" can be inferred by the characteristics of the encoded media, or at least through the use of some specific media types. To determine "content-transfer-encoding". But in fact, these assumptions are incorrect, here given several reasons: First, for mail, there are different transmission methods, some codes may only apply to certain rather than all media types or transmission methods. (As, in the 8-bit transmission system, you do not need to encode the text in a particular character set, but in the 7-bit transmission system, it must be encoded.) Second, the same media type may require different transmission coding methods in different environments. For example, a number of postscript portions of many messages is completely made up of 7-bit data, so it is not necessary to encode. And other (especially those who use the Level 2 PostScript binary encoding mechanism) may need to use binary transportation coding. Finally, because "Content-Type" has been developed as an expandable specification mechanism, if the relationship between the media type and the encoding is strictly defined, the definition of the application protocol is combined with the lower layer transmission details. And this is not required because the media type designer does not necessarily need to know all the transmission methods and limitations thereof. 6.5 Coding Conversion can be converted between "quoted-printable" and "base64" encoding. This operation is only needed when you need to force the output line break in the "Quoted-Printable" encoding. When the "quoted-printable" is converted to "base64" format, the newline in the "quoted-printable" encoding is represented as a CRLF sequence. Therefore, it must be converted to the corresponding content after encoding with "base64". Similarly, the CRLF sequence in the decoded data must also be converted to "quoted-printable" forced wrap, but this only applies to the text. 6.6 Specification Coding Model (Canonical Encoding Model) When will the mail data into a standard form and encode, how does this process handle CRLF (new line characters, and have different forms in different systems), what is the relationship between transmission coding and character sets - in previous versions In the RFC, there is a chaotic problem with these aspects. For this reason, RFC 2049 gives the code specification model. 6.7 QuoteD-Printable Encoded QuoteD-Printable Coding Applicable to the content of the us-ASCII character set to print characters. After its encoded data, it is no longer necessary to convert the mail transmission system. If the encoded data is mostly a US-ASCII character, the encoded content will retain those people-specific portions. The content consisting entirely of the US-ASCII character can also be checked to ensure that all message data can be passed through the gateway to the character transition and (or) package. In this encoding method, bytes are described as follows: (1) (Ordinary 8-bit byte description) In addition to any bytes other than the CR and LF bytes in the standard wrap CRLF sequence in the encoded content, it should be represented as "=" back. The form of hexadecimal numbers indicating the byte value. The hexading character table used here is "0123456789AbcDef". Copying letters must be used without allowing you to use lowercase letters. Therefore, the decimal value 12 can be represented as: "= 0c", the decimal value 61 (indicating the equal sign in the US-ASCII character set) can be represented as "= 3d". This rule must be followed in addition to the encoding mode specified in the following rules. (2) (Text Representation) The decimal value from 33 to 60, from the bytes of 62 to 126, can directly represent the corresponding characters in the US-ASCII character set (ie from the exclamation number! 'To less than the number' <', from Big than number '> to symbols' ~'). (3) The byte of the decimal value is 9 and 32 can be represented as Tab (HT) and spaces (Space) in the US-ASCII character set, respectively. However, this representation cannot be applied at the end of the coding line. In the encoded content, any Tab (HT) and spaces (space) must be followed by the printable characters. In particular, the "=" of the line indicates "soft exchange" (see rule 5), which can follow up in one or more Tab (HT) or spaces (Space) characters. It follows such a rule: When the value of the last byte of the line is between 9 and 32, it must be encoded in accordance with the rules (1). This is necessary, because some of the MTA (Message Transport Agent - Message Transfer Agent: Transferring the message from one user to another or one of the works) will populate the line with a blank character, while others The blank characters will be removed from the end of the line. Therefore, when "quoted-printable" decoding, any blank character to delete the end of the row is deleted because they are likely to be added by the middle transfer agent. (4) (PRC) Quoted-Printable encodes the wrap (CRLF sequence) in the text content (CRLF sequence) to the RFC 822's wrap (also CRLF sequence). Because other normative media types other than text types (text) are usually not included in the newline CRLF sequence. There is no hard change in the quoted-printable encoding of these media types (meaningful, and displays a wrapper to the user). Therefore, sequences such as "= 0d", "= 0a", "= 0A0d", "= 0d0a" may appear in the data of the non-text type encoded by the quoted-printable. Note that many implementation mechanisms are directly encoded different media types into a local form, rather than first convert them into specific formats and then encoded, and finally convert costs. In particular, in a system that does not use the CRLF sequence as a line termination, this occurs when the plain text content is operated. This optimization operation is allowed only when the specification coding of some combination is to hold three steps. (5) (Soft Change) QuoteD-Printable encoding rule requires no more than 76 bytes per encoded row. If you need to encode longer, you must use the "soft" wrap. In the encoding content, the equal sign ("=") indicating the last position of a row represents the meaningless wrap (soft change line). Therefore, if the format before the encoding is a separate unknown line: Now's the time for all folk to come to the aid of their country. Then after the quoted-printable encoding, it can be represented as: Now's the time = For all folk to com = To the aid of their country. With this mechanism, the excessive row can be encoded into a form that can be stored by the user. 76 characters limitations do not include a CRLF sequence of the row, but all contents are calculated, including all equal numbers. Because in quoted-printable, even characters ("-") may not be encoded, so when the quoted-printable encoded content is included in one or more Multipart entities, be careful not to keep the boundary separator (Boundary Delimiter) Any position in the encoded content appears. (When selecting a boundary separator, it is best to include a "= _" sequence, because this sequence will never appear in the quoted-printable encoding content. Refer to the definition of Multipart in RFC2046) Note: The quoted-printable encoded provides a method for folding and reliability to transmit data. The main body (BODY) encoded by quoted-printable can be reliably transmitted on most mail gateways, but on a few gateway - especially on a gateway involving EBCDIC conversion - it may not work well. Base64 encoding provides higher credibility. Another method of reliable transmission of the EBCDIC gateway is to encode the US-ASCII character in accordance with rule (1): "# $ @ [/] ^` {|} ~ The quoted-printable data content is assumed to be facing, so it can be expected that the newline between the rows may be converted during transmission. That is, pure text messages transmitted between systems in different wrap habits are often converted to the Internet mail. If this conversion constitutes an error data, then it is necessary to encode based on Base64 instead of quoted-printable. Note: The quoted-printable coding rule determines that several types of substrings cannot be generated, so if they are illegal if the QuoteD-Printable encoder outputs these strings. This annotation lists these situations and process these illegal substrings in the decoding process. (1) After the equal sign "=", followed two hexadecimal numbers, but the two numbers or one of them are illegal formats of lowercase letters "abcdef". A robust executive program can choose to identify them as a corresponding uppercase letter. (2) If the characters behind the equal sign "=" are not a hexadecimal number (including "AbcDef"), it is not a CR-character in the CRLF pair, then illegal. This is possible because it may be because the uncoded US-ASCII text contains the QuoteD-Printable encoding section in the message. For a robust executive, a reasonable solution is to include the equivalent and its back of the characters into decoding content without any conversion. And if possible, to point out: decoding at this location may have an error. (3) Isometric "=" cannot be the last or countdown second character in the encoded content. The corresponding processing method can refer to the situation (2). (4) In addition to the Tab character, the CRLF pair, other control characters cannot appear. It is also possible not to exist bytes that are greater than 126. If these characters appear in the input data when decoding, the robust execution program deletes these illegal characters from the decoded data and wants the user to discover illegal characters. (5) The decrees after the encoding can not exceed 76 characters (excluding the CRLF wrap). For a robust decoding program, if a longer row is found from the input coded content in decoding, it is still necessary to decode it, and can also report the encoding error to the user. WARNING for the implementation: If you encode binary data with quoted-printable, you must encode CR and LF characters to "= 0d" and "= 0a" separately. In particular, the CRLF sequence in binary data should be encoded as "= 0d = 0a". Otherwise, if the CRLF sequence is described as a hard change, it will be erroneous when decoding data in a system using other wrap. Describe quoted-printable data as follows: Quoted-printable: = qp-line * (CRLF QP-line) QP-line: = * (QP-Segment Transport-Padding CRLF) QP-Part Transport-Padding QP-Part: = QP-Section ; Maximum length is 76 characters QP-segment: = qp-segment * (space / tab) "=" ; Maximum length is 76 characters QP-section: = [* (PTEXT / Space / Tab) PTEXT] PTEXT: = HEX-OCTET / SAFE-CHAR Safe-char: = 60 Inclusive, And 62 THROUGH 126> Not recommended There is no character listed as "mail-safen" in RFC2049. HEX-OCTET: = "=" 2 (DIGIT / "A" / "B" / "f") These bytes are used to represent bytes greater than 127, ; And spaces that appear on the end of the row or tab. and, ; For "Mail-Safe" not listed in RFC2049 The characters are also recommended to use this expression. Transport-Padding: = * LWSP-Char The designer can not produce non-zero-length fills, However, the recipient must be able to handle the transmission mechanism Added fill content. Note: Additional LWSP mentioned in this BNF is not recognized because this BNF has not described a structural header field. 6.8 Base64 Content-Transfer-Encoding Design Base64 Content Transmission Coding is to describe any byte sequence that does not require human identification. The encoding and decoding algorithm are very simple, however, encoded data is always 33% longer than the data before the encoding. The Privacy EnhanceD Mail (PEM) defined in Base64 and RFC 1421 is the same coding method. It consists of 65 characters from US-ASCII, using 6 points to indicate each printable character (the 65 characters "=" indicates a special operation) Note: This subset has a very important nature, that is, in any version of ISO 646 (including US-ASCII), it is described as the same content. Moreover, all characters in the subset have the same description in any version of EBCDIC. Other common codes, such as uuencode, Macintosh Binhex 4.0 [RFC-1741], base85 does not have these properties, so it is not possible to meet the portability requirements of mail binary transmission coding. When encoding, 24-bit data is input each time, and outputs 4 encoding characters. 24-bit data can be formed from left to right from left to right. It is a 4 consecutive 6-bit group to see these 24-bit, and each group can be individually translated into a character in a base64 table. When encoding a bit stream via the base64 encoding, the order of positioning stream is "important bit priority". That is to say, the first bit in the bitstream should be the highest bit in the first byte; the 8th bit in the bitstream should be the lowest bit in the first byte, and this is pushed. Use each group (6-bit) value to index 64 printable characters. The resulting character will be placed in the output string after the index. Select these characters in Table 1 to be able to fully describe, and exclude special meaning in the SMTP, such as '.', CF, LF), and the boundary separator of the Multipart defined in the RFC 2046. Characters (such as '-'). Table 1: Base64 alphabet Value Encoding Value Encoding 0 A 17 R 34 I 51 Z 1 B 18 S 35 J 52 0 2 C 19 T 36 K 53 1 3 D 20 U 37 L 54 2 4 E 21 V 38 M 55 3 5 F 22 W 39 N 56 4 6 G 23 x 40 o 57 5 7 H 24 y 41 P 58 6 8 i 25 Z 42 Q 59 7 9 J 26 A 43 R 60 8 10 K 27 B 44 S 61 9 11 L 28 C 45 T 62 12 m 29 D 46 U 63/13 N 30 E 47 V 14 o 31 F 48 W (PAD) = 15 P 32 G 49 x 16 Q 33 H 50 Y The stream of the encoded output must be described in some rows that are not more than 76 bytes. When decoding, you must ignore the charm and all other characters that do not exist in Table 1. In the base64 data, the characters other than the characters, newline, and spaces other than the table 1 may indicate that there is a transmission error. In some cases, some warning information can be given appropriate or even reject information. If the remaining portion of the encoded data is required is less than 24, a special operation is required. The amount of encoding is usually ended at the end of the body (BODY). When the input is less than 24 bits, a bit of 0 is added at the end (right side) to form a complete 6-bit group. Use "=" to indicate the fill of the end of the data. Because all base64 input is a complete byte, only the following cases: (1) The last encoding input is a complete 24-bit; then, the last unit of the encoded output will be complete 4 is not "" = "Characters. (2) The last input is 8 bits; at this time, the last unit of the encoded output is two fill characters "=" after two coding characters. (3) The last input is just 16 bits; at this time, the last unit of the encoded output is three coded characters and one fill character "=". Because "=" is used to fill the end of the data, its appearance means that the end of the data may have been reached (but do not cut off). However, it may not be possible in this way: "=" will not occur in the encoding when the number of bytes transmitted is three integral times. In the base64 encoded data, it is necessary to ignore any characters that do not belong to the Base64 alphabet. Be sure that when the base64 encoding is directly applied to the content that is not standardized, use the right byte as a newline. Especially before the base64 encoding is performed, the text change must be converted to a CRLF sequence. It is important to note that a very important thing is: These operations can be done directly from the encoder, rather than in some implementation, one standardization step. Note: Don't worry about the Base64 encoding section in the Multipart entity to reference the Boundary Delimiter because the Base64 encoding is not used. "-". 7. Content-ID header field When building a high-level user agent, it may be necessary to relate to another body in one main body (body). Therefore, you need to set the label to the main body (Body "header field. This field is the same as "Message-ID" on the syntax: ID: = "Content-ID": "MSG-ID Like the value of Message-ID, the value of Content-ID must also be the only one in the world. The value of the Content-ID can be used to determine the MIME entity in multiple contexts, especially for cache data referenced by the Message / External-Body mechanism. Although the Content-ID header field is usually optional, it is necessary to generate an optional media type "Message / External-Body" implementation program. That is to say, every Message / External-Body entity (entity) must have a Content-ID field to allow caches of these data. It is worth noting that the Content-ID value has special meaning in the Multipart / Alternative media type. This will be explained in the section of Multipart / Alternative in RFC2046. 8. Content-Description header field It is often necessary to combine some description information from a given entity. For example, labeled a "image" type entity as "a picture of the space shuttle endeavor." Will have a certain amount. These texts can be placed in the ContentDescription header field. This header field is usually optional. Description: = "Content-description": "* text The description is assumed to be given in the form of a US-ASCII character set, but a mechanism for using a CONTENT-DESCRIPTION value using a non-US-ASCII character set is defined in RFC 2047. 9. Additional MIME header field The future document may define some MIME headers for different purposes. Any header field that further describes the message content should begin with a string "Content-" to distinguish this type of header field from the normal RFC822 field in the message header field. Mime-extension-field: = 10. Summary Using MIME-VERSION, Content-Type, and Content-Transfer-Encoding headers, you can include any media type in messages that match the RFC822 standard in accordance with standard methods. That is, there is no contrast to any constraints specified in RFC821 and RFC822, and it is also carefully avoided problems caused by the constraints of the Internet mail transport mechanism. (See RFC 2049) The next document in this document is RFC 2046, which details the initial set of media types that can be marked and transmitted using these headers. 11. Safety consideration Security issues are discussed in RFC 2046. 12. Author address If you need more information, please contact this document by Internet Mail. Ned Freed Innosoft International, Inc. 1050 East Garvey Avenue South West Covina, CA 91790 USA Phone: 1 818 919 3600 Fax: 1 818 919 3614 EMail: ned@innosoft.com Nathaniel S. Borenstein First Virtual Holdings 25 Washington Avenue Morristown , NJ 07960 USA Phone: 1 201 540 8967 Fax: 1 201 993 3032 Email: nsb@nsb.fv.com mime is a result of the work of the internet engineering task force networking group on rfc 822 extensions. The chairman of That Group, Greg Vaudreuil, May Be Reached At: Gregory M. Vaudreuil Octel Network Services 17080 Dallas Parkway Dallas, TX 75248-1905 USA Email: Greg.vaudreuil@octel.com Appendix A: Collected Syntax This appendix includes all BNF syntax defined in this document. For itself, these syntax is complete. Among them, several syntax names defined in RFC 822 are mentioned without repeating, which can reduce the risk of differentiations in two definitions due to unconscious operation. Once an undefined term is found, refer to the correlation definition in RFC 822. Attribute: = token; matching of attributes; is always-type: = "message" / "multipart" / extension-token content: = "content-type": "type" / "subtype * (" " "Parameter); matching of media type and subtype; is always case-insensitive. description: =" content-description "": ":" * text discrete-type: = "text" / "image" / "audio" / "video "/" Application "/ extension-token encoding: =" Content-Transfer-Encoding ":" Mechanism Entity-Headers: = [Content CRLF] [Encoding CRLF] [ID CRLF] [Description CRLF] * (Mime-Extension) Field CRLF) EXTENSION-TOKEN: = IETF-TOKEN / X-TOKEN HEX-OCTET: = "=" 2 (DIGIT / "A" / "B" / "c" / "d" / "e" / "f" ); OcTet Must Be Used for Characters> 127, =,; Spaces or Tabs at the ends of lines, and is; recommented for a NY Character NOT LISTED IN; RFC 2049 AS "Mail-Safe". IANA-TOKEN: = Ietf-token: = < An extension token defined by a standards-track rfc and registered with iana.> Id: = "content-id": "7s" / "8bit" / "binary" / " Quoted-printable "/" base64 "/ itf-token / x-token mime-extension-field: =