Lexical
In D, the lexical analysis is independent of grammar analysis and semantic analysis. The lexical analyzer divides the source file into a marker. The lexist describes how to identify the mark. The lexical designed for high speed scanning, it has the smallest special rule set, only one translation, which makes it easy to construct a correct scanner. For those who are familiar with C and C , the marker is also very easy to identify.
Compile stage
Compiling is divided into multiple phases. Each stage is not dependent on the stage of successive. For example, the scanner does not depend on the semantic analysis program. This separation is relatively easy to construct language tools such as speech editor. This also makes it possible to compress D source code by storing it as 'symbol'.
The source code character set first checks what character sets used by the source file, and then use the appropriate scanner. You can use the ASCII or UTF format. The lexical analysis source file is divided into a marker sequence. Special markers will be processed and then deleted. The grammatical analysis symbol sequence is parsed as a syntax tree. Semantic analysis traverses grammar tree, declares variables, load symbolic tables, allocation types, and generally determines the meaning of the program. Optimization Optimization is an optional step, it is trying to semantically equivalent overwriting, but generates a more fast version. The code generates instructions using the target architecture to implement the semantics of the program. A typical result is generating a target file that will be used as a connector.
Source code text
D Source Code Text can be one of the following forms:
ASCII UTF-8 UTF-16BE UTF-16LE UTF-32BE UTF-32LE
UTF-8 is a traditional 7-bit ASCII superchard. The beginning of the source code document can be one of the following UTF BOMS:
Format BOM UTF-8 EF BB BF UTF-16BE FE FF UTF-16LE FF FE UTF-32BE 00 00 Fe FF UTF-32LE FF Fe 00 00 ASCII NO BOM
There is no "double consequence" or "three consecutive" in D. (Translation: Trinity is some of the two consecutive three-character combinations of ?? =, ?? /, ?? ', ?? (, ??), ??!, ?? <,? ?> And ?? - These characters will be replaced with the corresponding characters, respectively, ^, ^, [,], |, {,}, and ~. Introducing a trinum is for convenient input these characters, early Some keyboards do not support them. Double-consecutive synergism. Obviously Walter believes that these things have been out of time.)
The source code document is composed of blank, row, notes, special marks, marks, etc. must be the end of the file.
The source code document should be divided into a marker using a greedy algorithm, that is, the lexical analyzer attempts to generate a longest symbol. For example: >> is a right shift operator, not two larger than the operator.
File end
File tail:
File physical end
/ u0000
/ u001a
Endoffile:
Physical end of the file
/ u0000
/ u001a
It is considered that the document is terminated when the above is encountered.
Row
Row:
/ u000d
/ u000a
/ u000d / u000a
File end
Endofline:
/ u000d
/ u000a
/ u000d / u000a
Endoffile
It is not allowed to divide a line into multi-line, and there is no restriction in the long-term performance.
blank
blank:
Space
Space blank space
Space:
/ u0020
/ u0009
/ u000b
/ u000c
Row
Comment
WHITESPACE:
Space
Space Whitespace
Space:
/ u0020
/ u0009
/ u000b
/ u000c
Endofline
Comment
Blank is defined as a series of one or more spaces, tabs, vertical tabs, table filled, row or comments. Comment
Note:
/ * Character * /
// Character
/ Characters /
Comment:
/ * Characters * /
// Characters endofline
/ Characters /
D has three comments:
The block annotation can span more lines, but cannot nested. The single line comment ends at the end of the line. Nested annotations can span multiple rows and can be nested.
In terms of concept, a comment is processed before the marker. This means that embedded strings and comments do not affect the identification of the end of the comment start and comment:
A = / // / 1; // Resolution to 'a = 1;'
A = / " /" / 1 "; // Resolution to 'a =" / 1 ";'
A = / / * / * / 3; // Resolution to 'a = * / 3;'
Note cannot be used as a marker connection, such as ABC / ** / DEF is two symbols, ABC, and DEF, not marker Abcdef.
mark
mark:
Marker
Character string
Character text
Integer number of characters
Floating point number text
Keyword
/
/ =
.
.
...
&
& =
&&&&
|
| =
||
-
- =
-
=
<
<=
<<
<< =
<>
<> =
>
> =
>> =
>>> =
>>
>>>
!
! =
! ==
! <>
! <> =
! <
! <=
!>
!> =
(
)
[
]
{
}
?
,
;
:
$
=
==
===
*
* =
%
% =
^
^ =
~
~ =
TOKEN:
Identifier
Stringliteral
Characterliteral
Integerliteral
Floatliteral
Keyword
/
/ =
.
.
...
&
& =
&&&&
|
| =
||
-
- =
-
=
<
<=
<<
<< =
<>
<> =
>
> =
>> =
>>> =
>>
>>>
!
! =
! ==
! <>
! <> =
! <
! <=
!>
!> =
(
)
[
]
{
}
?
,
;
:
$
=
==
===
*
* =
%
% =
^
^ =
~
~ =
Marker
Marking:
Sign start
Multiple marker characters started with logo
Multiple marker characters:
Sign character
Magistan symbol characters
Signature start:
_
letter
Universal letter
Monogram character: Sign start
digital
Identifier:
Identiferstart
Identiferstart Identifierchars
Identifierchars:
Identiferchar
IdentiferChar Identifierchars
IdentifierStart:
_
Letter
Universalarpha
Identifierchar:
Identiferstart
Digit
The flag is started by one letter, a underscore or a Unicode letter, followed by any letter, underscore, numbers or generic letters. General Order Refer to ISO / IEC 9899: 1999 (e) Appendix D. (This is the C99 standard) logo length arbitrary, and case sensitive. The marker starting with two underscores is reserved.