D language morphology (1)

xiaoxiao2021-03-06  43

Lexical

In D, the lexical analysis is independent of grammar analysis and semantic analysis. The lexical analyzer divides the source file into a marker. The lexist describes how to identify the mark. The lexical designed for high speed scanning, it has the smallest special rule set, only one translation, which makes it easy to construct a correct scanner. For those who are familiar with C and C , the marker is also very easy to identify.

Compile stage

Compiling is divided into multiple phases. Each stage is not dependent on the stage of successive. For example, the scanner does not depend on the semantic analysis program. This separation is relatively easy to construct language tools such as speech editor. This also makes it possible to compress D source code by storing it as 'symbol'.

The source code character set first checks what character sets used by the source file, and then use the appropriate scanner. You can use the ASCII or UTF format. The lexical analysis source file is divided into a marker sequence. Special markers will be processed and then deleted. The grammatical analysis symbol sequence is parsed as a syntax tree. Semantic analysis traverses grammar tree, declares variables, load symbolic tables, allocation types, and generally determines the meaning of the program. Optimization Optimization is an optional step, it is trying to semantically equivalent overwriting, but generates a more fast version. The code generates instructions using the target architecture to implement the semantics of the program. A typical result is generating a target file that will be used as a connector.

Source code text

D Source Code Text can be one of the following forms:

ASCII UTF-8 UTF-16BE UTF-16LE UTF-32BE UTF-32LE

UTF-8 is a traditional 7-bit ASCII superchard. The beginning of the source code document can be one of the following UTF BOMS:

Format BOM UTF-8 EF BB BF UTF-16BE FE FF UTF-16LE FF FE UTF-32BE 00 00 Fe FF UTF-32LE FF Fe 00 00 ASCII NO BOM

There is no "double consequence" or "three consecutive" in D. (Translation: Trinity is some of the two consecutive three-character combinations of ?? =, ?? /, ?? ', ?? (, ??), ??!, ?? <,? ?> And ?? - These characters will be replaced with the corresponding characters, respectively, ^, ^, [,], |, {,}, and ~. Introducing a trinum is for convenient input these characters, early Some keyboards do not support them. Double-consecutive synergism. Obviously Walter believes that these things have been out of time.)

The source code document is composed of blank, row, notes, special marks, marks, etc. must be the end of the file.

The source code document should be divided into a marker using a greedy algorithm, that is, the lexical analyzer attempts to generate a longest symbol. For example: >> is a right shift operator, not two larger than the operator.

File end

File tail:

File physical end

/ u0000

/ u001a

Endoffile:

Physical end of the file

/ u0000

/ u001a

It is considered that the document is terminated when the above is encountered.

Row

Row:

/ u000d

/ u000a

/ u000d / u000a

File end

Endofline:

/ u000d

/ u000a

/ u000d / u000a

Endoffile

It is not allowed to divide a line into multi-line, and there is no restriction in the long-term performance.

blank

blank:

Space

Space blank space

Space:

/ u0020

/ u0009

/ u000b

/ u000c

Row

Comment

WHITESPACE:

Space

Space Whitespace

Space:

/ u0020

/ u0009

/ u000b

/ u000c

Endofline

Comment

Blank is defined as a series of one or more spaces, tabs, vertical tabs, table filled, row or comments. Comment

Note:

/ * Character * /

// Character

/ Characters /

Comment:

/ * Characters * /

// Characters endofline

/ Characters /

D has three comments:

The block annotation can span more lines, but cannot nested. The single line comment ends at the end of the line. Nested annotations can span multiple rows and can be nested.

In terms of concept, a comment is processed before the marker. This means that embedded strings and comments do not affect the identification of the end of the comment start and comment:

A = / // / 1; // Resolution to 'a = 1;'

A = / " /" / 1 "; // Resolution to 'a =" / 1 ";'

A = / / * / * / 3; // Resolution to 'a = * / 3;'

Note cannot be used as a marker connection, such as ABC / ** / DEF is two symbols, ABC, and DEF, not marker Abcdef.

mark

mark:

Marker

Character string

Character text

Integer number of characters

Floating point number text

Keyword

/

/ =

.

.

...

&

& =

&&&&

|

| =

||

-

- =

-

=

<

<=

<<

<< =

<>

<> =

>

> =

>> =

>>> =

>>

>>>

!

! =

! ==

! <>

! <> =

! <

! <=

!>

!> =

(

)

[

]

{

}

?

,

;

:

$

=

==

===

*

* =

%

% =

^

^ =

~

~ =

TOKEN:

Identifier

Stringliteral

Characterliteral

Integerliteral

Floatliteral

Keyword

/

/ =

.

.

...

&

& =

&&&&

|

| =

||

-

- =

-

=

<

<=

<<

<< =

<>

<> =

>

> =

>> =

>>> =

>>

>>>

!

! =

! ==

! <>

! <> =

! <

! <=

!>

!> =

(

)

[

]

{

}

?

,

;

:

$

=

==

===

*

* =

%

% =

^

^ =

~

~ =

Marker

Marking:

Sign start

Multiple marker characters started with logo

Multiple marker characters:

Sign character

Magistan symbol characters

Signature start:

_

letter

Universal letter

Monogram character: Sign start

digital

Identifier:

Identiferstart

Identiferstart Identifierchars

Identifierchars:

Identiferchar

IdentiferChar Identifierchars

IdentifierStart:

_

Letter

Universalarpha

Identifierchar:

Identiferstart

Digit

The flag is started by one letter, a underscore or a Unicode letter, followed by any letter, underscore, numbers or generic letters. General Order Refer to ISO / IEC 9899: 1999 (e) Appendix D. (This is the C99 standard) logo length arbitrary, and case sensitive. The marker starting with two underscores is reserved.

转载请注明原文地址:https://www.9cbs.com/read-60283.html

New Post(0)