3.2 Uniform Resource Identifiers
There are many names, such as WWW addresal documents, univyresal resource identifiers [2], and final uniform resource locators (URL) [4]) and unified Resource Name (URN). Before related to HTTP, the URI describes the string - name, location, or other characteristics, such as network resources.
3.2.1 General Synthics (General Syntax)
The URI in HTTP can be expressed in absolute form or may be represented in the form of a certain basic URI [9], depending on their usage. The difference in these two forms is that the absolute URI always begins with the method name ":": "
URI = (Absoluteuri | relativeuri) ["#" Fragment]
Absoluteuri = scheme ":" * (uchar | reserved)
Relativeuri = net_path | ABS_PATH | REL_PATH
NET_PATH = "//" NET_LOC [ABS_PATH]
ABS_PATH = "/" Rel_Path
REL_PATH = [PATH] [";" params] ["?" Query]
Path = fsegment * ("/" segment)
FSEGMENT = 1 * pchar
segment = * pchar
Params = param * (";" param)
PARAM = * (PCHAR | "/")
Scheme = 1 * (alpha | DIGIT | "|" - "|". "
NET_LOC = * (PCHAR | ";" | "?")
Query = * (uchar | reserved)
Fragment = * (uchar | reserved)
Pchar = uchar | ":" | "@" | "=" | " "
Uchar = unreserved | escape
Unreserved = alpha | DIGIT | SAFE | EXTRA | National
Escape = "%" HEX HEX
RESERVED = ";" | "/" | "|" @ "|" = "|" "
EXTRA = "!" "*" | "" | "(") "|", "Safe =" $ "|" - "|" _ "|". "
Unsafe = CTL | SP | <"> |" # "|"% "| <" |> "
National = Berners-Lee, et al information [Page 14] Reserved, Extra, Safe, And UNSAFE> Authoritative URL syntax and semantic information, see RFC1738 [4] and RFC1808 [9]. The BNF mentioned above includes a symbol (RFC 1738) that is not allowed in the legal URL, since the HTTP server is not limited to only the characters in the non-hospital set, and the HTTP agent may also receive RFC1738. Uriped URI requests are not defined. 3.2.2 HTTP URL "Http" means to locate network resources through the HTTP protocol. This section defines the syntax and semantics of the HTTP URL. http_url = "http:" "//" Host [":" port] [ABS_PATH] Host = < See RFC1123, 2.1 Definition> Port = * DIGIT If the port is empty or not specified, the default is 80 ports. For the URI of the absolute path, the server host with the requested resource receives the URI request by listening to the TCP connection of the port. If an absolute path is not given in the URL, it is necessary to use as a request URI (see Section 5.1.2), you must give it in "/". Note: Although the HTTP protocol is independent of the transport layer protocol, the HTTP URL is only identified the TCP location of the resource, and for non-TCP resources, it must be identified in the form of other URIs. The specified HTTP URL form can be obtained by converting uppercase characters in the host into lowercase (host name is case sensitive). If the port is 80, remove the colon and port number, and replace the empty path to "/". 3.3 Date / Time Format (DATE / TIME FORMATS) For historical reasons, HTTP / 1.0 applications allow three formats to represent timestamps: Sun, 06 NOV 1994 08:49:37 GMT; RFC 822, Updated by RFC 1123 Sunday, 06-NOV-94 08:49:37 GMT; RFC 850, Obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994; ANSI C's asctime () Format Berners-Lee, et al information [Page 15] The first format is the preferred INTERNET standard format, indicating method length (RFC 1123 [6]). The second format is used in normal cases, but it is based on the date format in the abandoned RFC850 [10], and the year is not expressed in four digits. The HTTP / 1.0 client and server end can identify all three formats when parsing the date, but they cannot generate third time formats. Note: When receiving date data generated by non-HTTP applications, the received date values are promoted. This is because, at some point, the agent or gateway may get or send messages via SMTP or NNTP. All HTTP / 1.0 Date / TIMP Timestamps must be used in World Time (UT), which is Greenwich Mean Time, GMT, without any residual room. The previous two formats use "GMT" to represent time zones, and when reading ASC, it should also be assumed to be this time zone. Http-date = RFC1123-Date | RFC850-Date | Asctime-Date RFC1123-Date = Wkday "," sp Date1 SP Time SP "GMT" RFC850-date = weekday "," sp Date2 SP Time SP "GMT" asctime-date = wkday sp Date3 SP Time SP 4Digit Date1 = 2Digit SP Month SP 4Digit Day Month Year (E.g., 02 JUN 1982) Date2 = 2Digit "-" MONTH "-" 2Digit Day-month-year (E.G., 02-JUN-82) Date3 = Month SP (2Digit | (sp 1digit)) Month Day (E.G., Jun 2) Time = 2Digit ":" 2Digit ":" 2Digit ; 00:00:00 - 23:59:59 Wkday = "MON" | "TUE" | "WED" | "THU" | "fri" | "sat" | "sun" Weekday = "Monday" | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | "sunday" Month = "Jan" | "feb" | "mar" | "APR" | "May" | "jun" | "jul" | "AUG" | "SEP" | "Oct" | "NOV" | "dec" Note: HTTP requirements can only use the Data / Time timestamp format in the protocol stream, which is not required to use this type of format in the user description, request login, etc.. Berners-Lee, et al information [Page 16] 3.4 Character Sets The character set used by HTTP defines the same as the MIME: This document uses one or more tables to convert sequence bytes into sequence characters using one or more tables. Note that there is no need to convert unconditional conversion in other directions, because all characters can be represented by a given character set, and a character set may also provide an or more byte order to represent a special character. This definition tends to allow different types of character encoding to be implemented by simple single-table mapping, such as switching from table US-ASCII to complex tables such as ISO2202. In fact, definitions related to the MIME character set must be fully specified from bytes to characters, especially to determine precise mapping by utilizing external configuration information. Note: The term character set is characterized by character encoding. In fact, since HTTP and MIME use the same registration, the terminology should also be consistent. The HTTP character set consists of case sensitive symbols. All symbol definitions are registered with IANA character sets [15]. Because the registry does not define a set of symbols separately, we have seen the characters here in this, mostly related to HTTP entities. These character sets registered in RFC 1521 [5], namely US-ASCII [17], and ISO-8859 [18] character sets, and some other character sets are strongly recommended inside the MIME character set parameter. Charset = "US-ASCII" | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3" | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6" | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9" "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR" "Unicode-1-1" | "Unicode-1-1-UTF-7" | "Unicode-1-1-UTF-8" | token Although HTTP allows the use of a dedicated symbol as a character set value, any symbols with a predefined value in IANA character set registry [15] must indicate the character sets thereafter. The application should limit its use of the character set to the range of IANA registry. Berners-Lee, et al information [Page 17] If the character set of the entity body is not to mark the US-ASCII or ISO-8859-1, you should not mark it, otherwise it should be marked with the most basic naming in the main character encoding method. 3.5 Content Codings The content decoding value is used to indicate encoding conversion to the resource. The content decoding is mainly used to restore files that are compressed, encrypted, etc., allowing it to maintain its original media type. Typically, the coded saved resource can only be reduced by decoding or similar operations. Content-code = "x-gzip" | "x-compress" | TOKEN Note: For future compatibility, HTTP / 1.0 applications should be "gzip" and "compress" and "x-gzip", respectively. " X-compress corresponds to it. All content decoding values are sensitive. HTTP / 1.0 uses the content decoding value in the content encoding (10.3) header field. Although this value is described, it is content decoding, but more importantly, it indicates what mechanism should be used to decode. Note that a separate program may have the ability to implement decoding of multiple format encoding. In this text, two values mentioned: X-Gzip File compression program "Gzip" (GNU ZIP, developed by jean-loup gailly). This format is a typical Lempel-ZIV decoding with 32-bit CRC check (LZ77). X-compress The file compression program "Compress" encoding format applies to LZW (Lempel-Ziv-Welch) decoding. Note: Use the program name to identify the code format, not very ideal, in the future, may not continue to do so. Now, this is because of history, it is not a good design. Berners-Lee, et al information [Page 18] 3.6 Media Types HTTP uses Internet Media Types [13] in the Content-Type HEADER field (10.5) to provide open scalable data types. Media-type = type "/" subtype * (";" parameter) Type = token Subtype = token The parameters can refer to the properties / value pair, written with the type / subtype format. Parameter = attribute "=" value Attribute = token Value = token | quoted-string Where, type, subtype, parameter attribute name is sensitive. The parameter value is not necessarily sensitive, which is depends on the syntax of the parameter name. There is no LWS (space) between type and subtypes, attribute names, and attribute values. When receiving parameters of the type of media that cannot be identified, the user agent should ignore them. Some old HTTP applications cannot identify media type parameters, so HTTP / 1.0's application can only use media parameters when defining messages. Media-Type values are registered with Internet authorization allocation numbers (Internet Assigned Number Authority, IANA [15]). See RFC1590 [13] for the media type registration process. Unregistered media types are not encouraged. 3.6.1 Standards and Text Defaults (CANONICALIZATION AND TEXT Defaults) The Internet media type is registered in the form of a specification. In general, it is necessary to indicate the appropriate specification format before transmitting the entity main body (entity-body) through the HTTP protocol. If the body is encoded with a Content-Encoding, the following data must be converted to a specification form before encoding: The media subtype of the "text" type is interrupted by using CRLF in the specification form. In fact, it is consistent with the use of the entity body (entity body), and HTTP allows transportation to represent a line interrupted text medium in the CR or LF. The HTTP application must see CRLF, Cr, LF in the text medium received by HTTP mode as a line interverter. Berners-Lee, et al information [Page 19] In addition, if the character set of the text medium does not use bytes 13 and 10 as CR and LF, the HTTP allows the use of any sequential replacement CR and LF to be used as the character set by some multi-byte character sets. The flexible mode of use of such lines can only be in the entity main body (entry-body). A pure Cr or LF should not replace CRLF in any HTTP control structure (such as the header domain-Header Field and Multiple Boundary Line-Multipart Boundaries). The parameter "charset" is used with some media types when defining the data set (Section 3.4). When the sender does not explicitly give the character parameters, HTTP defines the "text" media subtype as the default value "ISO-8859-1" when receiving the character parameters. The "ISO-8859-1" character set or the data other than its subset must mark its corresponding character set value, which ensures that the reception can parse it correctly. Note: Many current HTTP servers provide other character sets other than "ISO-8859-1", and there is no correct tag, which limits interoperability, and it is recommended not to adopt. As a remedy, some HTTP user agents provide configuration options that allow users to change the default media type interpretation method without specifying the character set parameter. 3.6.2 MultiPart Types MIME provides a number of numbers of "Multipart") - several entities (enttive) can be packaged in a separate message entity main body (entity-body). Although the user agent may need to know each type, it can correctly explain the intention of each part of the subject, but in the multipart type registration registration, the content specified for HTTP / 1.0 is not found in the multipart type registration registration. HTTP User Agent has to do its own work, the process and behavior are the same or similar to the MIME user agent. The HTTP server should not assume that the HTTP client has the ability to handle multiple types. All multi-segment types use generic syntax, and must include boundary parameters in the media type value section. The main body of the message is its own, as a protocol element, which must use only CRLF as a line interruption in the Body-Parts. Multipart Body-Parts may include an HTTP title domain for each paragraph. 3.7 Product Identification (Product tokens) It is a communication application to identify a simple symbol of its own, often use with any letters and version descriptions. Most product identities also lists the version numbers of the important components of their products, separated by spaces in the middle. Berners-lee, et al information [Page 20]