Dialect of my compiler EBNF

xiaoxiao2021-03-05  29

C3 language definition (EBNF dialect)

Keyword

:

Section

Class

End

Scanner

Parser

Rule

Label

Letter

Digit

LetterORDIGIT

Upperletter

Lowerletter

Symbol

WHITESPACE

CRLF

Operator

:

=>

,

*

[]

!

|

Other characters

:

()

;

among them

Section,

Class,

End,

Rule,

Label is a universal keyword for eBNF language.

Label only allows a terminator to be used as derivation.

Scanner, Parser as a special keyword, represents the rules of the scanner and analyzer.

Letter et al. For the Scanner rule, which will correspond to its literal characters.

=> Is a derivation, indicating that the previous rule is derived from the rear generation. It is not allowed to be in the derived expression.

To derive the end symbol, similar to the logical row end of C.

For connectors, blank can also be used.

For the correct closure, the rear operator, indicating that the correctness of the elements in front of the previous elements is acceptable.

* Star closed package, the rear operator, indicating that the natural digit of the elements in front of it is acceptable.

[] Optional, indicating that the elements in which it can match or do not match.

() Parentheses, handle priority issues.

! Non, post operator, scanner rules are dedicated. And only a single character is allowed or or an operation is allowed.

| Or, indicating that one of a plurality of elements can be matched.

Where *, ,! is a one-dollar operator.

Connect and or for binary operators.

Priority

>> * = > Connection> or

Different Unicode characters and normal characters.

Due to the implementation of the problem, the scanner rule uses the .NET regular expression represented by the Label type string. Instead of standard EBNF language semantics. Mainly the technical issues on the conversion.

Sample code

:

1.

ID => letter ;

The above example is that the tag ID is composed of a positive closure of characters, that is, one or more characters. For example: A, AAAAA, Hello ... is all legitimate IDs, and A2, __ a__ is illegal.

2.

ID => letterORDIGIT *;

The ID consists of a star closure package of characters or numbers, 0 or more characters or numbers. For example: φ, AA, SSS, A3, A2 are legal.

3.

ID => Letter! ;

The ID consists of a symbol not a character. For example: 32-9023 & *) is legal, and 234AF is illegal.

4.

ID => "Hello";

IDs have only one legal form, namely Hello, other forms of unlaminated.

5.

ID => "Hello" | "hi";

The ID is only two legitimate form, namely Hello and Hi, other forms of forms unlaminated.

6.

ID => "AAA" ["SSS"] "DDD";

The ID can match the AAADDD or Match AaacesSDDD, that is, SSS is optional.

Precautions:

Don't start the Whitespace keyword as the beginning of the statement because it does not generate a matching code. Rule calls are placed before the end of the match.

In this way, the basic structure of an EBNF dialect has gradually appeared. Next we use C3 to describe itself. At the same time we can see the use of C3 and prove its availability.

EBNF

Self-description:

Class Scanner 'is essential, otherwise there is no scanner, the entire compiler can't work. Section Operator 'Operators, identify all operators, can use a single word type similar to VB.

Label Trans => "=>"; 'Dental, indicating the derived of the right side of the rules on the left.

Label endline => ";"; 'Since some rules are very long, and worried about the readability of the code,

'Use the endorse to represent the end of the statement, while ignoring the blank characteristics

'The statements can be divided into multiple rows like a plurality of rows like the C language to enhance readability.

Label Join => "," in the initial planning, blank meaning, but in successive version, blank

'No meaning, so two words need a separator, thereby reserved.

Label Positive => " "; 'is close to

Label repeat => "*"; 'Star closed package

Label lopt => "["; 'optional start

Label Ropt => "]"; 'can be selected. The optional index actually has two symbols, which is also included.

The number of segmentation / changing priority is also.

Label NOT => "!"; 'Non, only the single character or character set of the scanner.

End section

Section Keyword 'Key fields, place all keywords here.

Label k_section => "section"; 'is used in the declaration section

Label k_class => "class"; 'declaration class, actually current syntax only two classes, namely Scanner and Parser

Label K_END => "end"; 'end segment or class

Label k_scanner => "scanner"; '

Label K_Parser => "Parser"; '

Label k_rule => "rule"; 'declare a grammar rule

Label k_label => "label"; 'declare a string constant

Section Letters

Label K_Letter => "Letter";

Label K_Digit => "DIGIT";

Label K_LOD => "LetterOrdigit";

Label k_upper => "Upperletter";

Label K_Lower => "lowerletter";

Label K_Symbol => "Symbol";

Label K_Whitespace => "Whitespace";

End section

Label K_CRLF => "CRLF";

End SectionSECTION Comment

Label linecomment => "'";' comment

End section

Section Other 'This section is simplified using the regular expression of .NET's regular expression.

'Transformation complexity

Rule ID => ("_" LetterOrdigit | Letter ("_" | letterordigit) *;

"((_ [/ d / p {l}]) | / p {l}) [_ / d / p {l}] *"

Rule string => "" "" "" "" "" "" "" "" "" "([^" "" "" "" "" "" "

End section

END CLASS

Class Parser

Rule Program => ScannerDecl ParserDecl; 'Scanner class before, the analyzer class is behind.

Rule scannerDecl =>

"Class" "Scanner"

[Operatorsection]

[KeywordSection]

[Commentsection]

[OtherRuleSection]

"END" "class";

Rule Operatorsection =>

"Section" "Operator"

SCSECTIONDECL *

LabelDecl *

"End" "section";

Rule KeywordSection =>

"Section" "keyword"

SCSECTIONDECL *

LabelDecl *

"End" "section";

Rule commentsection =>

"Section" "comment"

[LinecomDecl]

[BlockComstart 'declares that the beginning must declare the end, otherwise it will be ignored.

Blockcome]

"End" "section";

Rule OthersECTION =>

"Section" "other"

SCSECTIONDECL *

LabelDecl *

"End" "section";

Rule linecomDecl => "rule" "linecomment" => "string; 'The comment obviously can only be a string.

Rule blockComstart => "rule" "block" "=>" string;

Rule Blockcomend => "rule" "block" => "string;

Rule scsectionDecl =>

Section ID

(LabelDecl | Scruledecl) *

"End" "section"; rule scrosectionDecl => 'only allows section of the rule, which is required for the above specification,

'But in order to achieve it, it is actually not used.

Section ID

Scruledecl *

"End" "section";

Rule LabelDecl => "Label" ID "=>" String;

Rule Scruledecl => "Rule" ID "=>" SCEXP;

Rule scqExp => ("(" SCEXP ")") | ("[" SceXP "]");

Rule scEXP => scorExp ["!"] ["*" | " "];

Rule scorExp => scjoinexp ("|" scjinexp) *;

Rule scjoinExp => scEXPELEM ([","] scEXPELEM) *;

Rule scEXPELEM => string | letters | scqexp;

'Below the analyzer

Rule ParserDecl =>

"Class" "Parser"

(SectionDecl)

Ruledecl) *

"END" "class";

Rule ruledecl => "rule" ID "=>" Exp;

Rule QEXP => ("(" Exp ")") | ("[" Exp "]");

Rule Exp => OREXP ["!"] ["*" | " "];

Rule Orexp => Joinexp ("|" joinexp) *;

Rule Joinexp => Explelem ([","] Explelem *;

Rule Explelem => String | ID | qExp;

Rule sectionDecl =>

Section ID

(Labeldecl | ruledecl) *

"End" "section";

Rule RosectionDecl => 'only allows the provision of the rule

Section ID

Ruledecl *

"End" "section";

END CLASS

The purpose of the above EBNF is given is to understand the purpose, format, meaning, and grammar of C3 more in-depth. Next we have to start the compiler of this C3.

转载请注明原文地址:https://www.9cbs.com/read-34121.html

New Post(0)