Please don't reprint this article; please don't re-publish in any form; please delete it within 24 hours of downloading this article; it is forbidden to use this article for commercial purposes.
2 Lexical Conventions [Lex] 2. 1 Phases of Translation [lex.phases] 2 Less Law Conventions [Less Law] 2.1 Translation Stage [Less Law. Stage] The Precedence Among The Syntax Rules of Translation IS Specified by the Following Phases.13)
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.3) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (ie using the / uXXXX notation), are handled equivalently.) Each instance of a new-line character and an immediately preceding backslash character is deleted, Splicing Physical Source Lines To Form Logical Source Lines. IF, AS A Result, a Character Sequence That Matches The Syntax of a Universal-Character-Name Is Product, The Behavior Is Undefined. If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character, the behavior is undefined. The source file is decomposed into preprocessing tokens (2.4) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or partial comment14). each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white- Space Characters Other Than new-line is retained or report by one space character is usually- defined. The process of dividing a source file
S Characters INTO PreProcessing Tokens IS Context-Dependent. [Example: See the Handling of <
within a #include preprocessing directive.] Preprocessing directives are executed and macro invocations are expanded. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation (16.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. Each source character set member, escape sequence, or universal-character-name in character literals and string literals converted to a member of the execution character set (2.13.2, 2.13.4). Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. ( 2.6). The resulting tokens are syntactically and semantically analyzed and translated. [NOTE: SOURCE Files, Translas, Translate sarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation.] Translated translation units and instantiation units are combined as follows : [Note:. some or all of these may be supplied from a library] Each translated translation unit is examined to produce a list of required instantiations. [Note: this may include instantiations which have been explicitly requested (14.7.2).] The definitions of the required templates are locateation-defined WHETER THE Source of the translation units containing these definitions is required to be available.
an implementation could encode sufficient information into the translated unit so as to ensure the source is not required here] All the required instantiations are performed to produce instantiation units [Note:.. these are similar to translated translation units, but contain no references to uninstantiated templates and no template definitions.] The program is ill-formed if any instantiation fails. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output IS Collected INTO A Program Image Which Contains Information NEEDED for Execution In Its Execution Environment. The preface in the translated syntax rules is specified by the phases described below. 13)
When needed, the source file character is physically mapped to the basic source character set in a way that is defined (introduced to the line endorse). Replace the tricard sequence (2.3) with a single character indicated internally. Any source file characters not in the basic source character set (2.2) are replaced by a unified character name representing the character. (Implementation can be used any internal encoding, as long as it enables the actual extension character in the source file to represent the unified character name of the same extension character in the source file (such as using / uxxxx tag) is equivalent to handle.) All tight The followed back slave character and the wrap characters are deleted, and the source of the physical source text is connected to the logical source text row. If the result produces a character sequence that conforms to the unified character name syntax, its behavior is undefined. If the source file is non-empty but does not end, or the end of the wrap character is tight with a backslash character, its behavior is undefined. The source file is broken down into a sequence of pre-processing tags and blank characters (including comments). The source file should not end 14 in an incomplete pre-pretreated tag or incomplete comment). Each annotation is replaced by a space character. The wrap characters remain unchanged. Whether the definition of implementation will replace a blank character sequence that does not contain a wrap character is a single space character. The process of splitting source file characters into pre-processed tags is related to its context. [Example: See the processing in the #include pre-processing instruction. 】 Execute a pre-processing instruction and extend the macro call. If the tag connection (16.3.3) produces a character sequence that meets the unified character name syntax, its behavior is undefined. #include pretreatment instruction will result in a named head or source file from stage 1 to stage 4. Each source character set member in the character character quantity and string is converted to members of the execution character set (2.13.2, 23.4). Adjacent normal string text quantity tags are connected. Adjacent wide string text quantity tags are connected. Blank character segmentation tag is no longer effective. Each preprocessing tag is converted to a tag. (2.6). The resulting label will be analyzed and translated by grammar and semantic analysis. [Note: Source files, translation units and translated units do not have to be stored as files, nor must they correspond to these entities and external representations. This is only a conceptual description, does not specify any particular implementation. The translated unit and the instantiation unit are combined as follows: [Note: some or all of them may be provided by the library. 】 Check each translated unit to generate a series of required instantiations. [Note: This may include explicitly required instantiation (14.7.2). 】 The template definition required for positioning. Whether the source of the translation unit containing these definitions must be available from implementation definitions. [Note: Implementation can encode sufficient information into the translation unit to ensure that the source is not required. 】 All required instantiation execution generates an instantiation unit. [Note: They are similar to the translated unit, but do not contain any references for instantiation templates and any template definitions. 】 If any instantiation fails, the program is a pathological form. Solve all external objects and references. Connect the library component to meet the externally references that are not defined in the current translation. All of these translator outputs are collected into a program image, which is included in their execution environment. 13) Implementations Must Behave As IF these Separate Phases Occur, Although IN Practice Different Phases Might Be Folded Together. 13) Even if different stages can actually overlap, the implementation must still appear as if they occur in these stages.