C3 language definition (EBNF dialect)
Keyword
:
Section
Class
End
Scanner
Parser
Rule
Label
Letter
Digit
LetterORDIGIT
Upperletter
Lowerletter
Symbol
WHITESPACE
CRLF
Operator
:
=>
,
*
[]
!
|
Other characters
:
()
;
among them
Section,
Class,
End,
Rule,
Label is a universal keyword for eBNF language.
Label only allows a terminator to be used as derivation.
Scanner, Parser as a special keyword, represents the rules of the scanner and analyzer.
Letter et al. For the Scanner rule, which will correspond to its literal characters.
=> Is a derivation, indicating that the previous rule is derived from the rear generation. It is not allowed to be in the derived expression.
To derive the end symbol, similar to the logical row end of C.
For connectors, blank can also be used.
For the correct closure, the rear operator, indicating that the correctness of the elements in front of the previous elements is acceptable.
* Star closed package, the rear operator, indicating that the natural digit of the elements in front of it is acceptable.
[] Optional, indicating that the elements in which it can match or do not match.
() Parentheses, handle priority issues.
! Non, post operator, scanner rules are dedicated. And only a single character is allowed or or an operation is allowed.
| Or, indicating that one of a plurality of elements can be matched.
Where *, ,! is a one-dollar operator.
Connect and or for binary operators.
Priority
>> * = > Connection> or
Different Unicode characters and normal characters.
Due to the implementation of the problem, the scanner rule uses the .NET regular expression represented by the Label type string. Instead of standard EBNF language semantics. Mainly the technical issues on the conversion.
Sample code
:
1.
ID => letter ;
The above example is that the tag ID is composed of a positive closure of characters, that is, one or more characters. For example: A, AAAAA, Hello ... is all legitimate IDs, and A2, __ a__ is illegal.
2.
ID => letterORDIGIT *;
The ID consists of a star closure package of characters or numbers, 0 or more characters or numbers. For example: φ, AA, SSS, A3, A2 are legal.
3.
ID => Letter! ;
The ID consists of a symbol not a character. For example: 32-9023 & *) is legal, and 234AF is illegal.
4.
ID => "Hello";
IDs have only one legal form, namely Hello, other forms of unlaminated.
5.
ID => "Hello" | "hi";
The ID is only two legitimate form, namely Hello and Hi, other forms of forms unlaminated.
6.
ID => "AAA" ["SSS"] "DDD";
The ID can match the AAADDD or Match AaacesSDDD, that is, SSS is optional.
Precautions:
Don't start the Whitespace keyword as the beginning of the statement because it does not generate a matching code. Rule calls are placed before the end of the match.
In this way, the basic structure of an EBNF dialect has gradually appeared. Next we use C3 to describe itself. At the same time we can see the use of C3 and prove its availability.
EBNF
Self-description:
Class Scanner 'is essential, otherwise there is no scanner, the entire compiler can't work. Section Operator 'Operators, identify all operators, can use a single word type similar to VB.
Label Trans => "=>"; 'Dental, indicating the derived of the right side of the rules on the left.
Label endline => ";"; 'Since some rules are very long, and worried about the readability of the code,
'Use the endorse to represent the end of the statement, while ignoring the blank characteristics
'The statements can be divided into multiple rows like a plurality of rows like the C language to enhance readability.
Label Join => "," in the initial planning, blank meaning, but in successive version, blank
'No meaning, so two words need a separator, thereby reserved.
Label Positive => " "; 'is close to
Label repeat => "*"; 'Star closed package
Label lopt => "["; 'optional start
Label Ropt => "]"; 'can be selected. The optional index actually has two symbols, which is also included.
The number of segmentation / changing priority is also.
Label NOT => "!"; 'Non, only the single character or character set of the scanner.
End section
Section Keyword 'Key fields, place all keywords here.
Label k_section => "section"; 'is used in the declaration section
Label k_class => "class"; 'declaration class, actually current syntax only two classes, namely Scanner and Parser
Label K_END => "end"; 'end segment or class
Label k_scanner => "scanner"; '
Label K_Parser => "Parser"; '
Label k_rule => "rule"; 'declare a grammar rule
Label k_label => "label"; 'declare a string constant
Section Letters
Label K_Letter => "Letter";
Label K_Digit => "DIGIT";
Label K_LOD => "LetterOrdigit";
Label k_upper => "Upperletter";
Label K_Lower => "lowerletter";
Label K_Symbol => "Symbol";
Label K_Whitespace => "Whitespace";
End section
Label K_CRLF => "CRLF";
End SectionSECTION Comment
Label linecomment => "'";' comment
End section
Section Other 'This section is simplified using the regular expression of .NET's regular expression.
'Transformation complexity
Rule ID => ("_" LetterOrdigit | Letter ("_" | letterordigit) *;
"((_ [/ d / p {l}]) | / p {l}) [_ / d / p {l}] *"
Rule string => "" "" "" "" "" "" "" "" "" "([^" "" "" "" "" "" "
End section
END CLASS
Class Parser
Rule Program => ScannerDecl ParserDecl; 'Scanner class before, the analyzer class is behind.
Rule scannerDecl =>
"Class" "Scanner"
[Operatorsection]
[KeywordSection]
[Commentsection]
[OtherRuleSection]
"END" "class";
Rule Operatorsection =>
"Section" "Operator"
SCSECTIONDECL *
LabelDecl *
"End" "section";
Rule KeywordSection =>
"Section" "keyword"
SCSECTIONDECL *
LabelDecl *
"End" "section";
Rule commentsection =>
"Section" "comment"
[LinecomDecl]
[BlockComstart 'declares that the beginning must declare the end, otherwise it will be ignored.
Blockcome]
"End" "section";
Rule OthersECTION =>
"Section" "other"
SCSECTIONDECL *
LabelDecl *
"End" "section";
Rule linecomDecl => "rule" "linecomment" => "string; 'The comment obviously can only be a string.
Rule blockComstart => "rule" "block" "=>" string;
Rule Blockcomend => "rule" "block" => "string;
Rule scsectionDecl =>
Section ID
(LabelDecl | Scruledecl) *
"End" "section"; rule scrosectionDecl => 'only allows section of the rule, which is required for the above specification,
'But in order to achieve it, it is actually not used.
Section ID
Scruledecl *
"End" "section";
Rule LabelDecl => "Label" ID "=>" String;
Rule Scruledecl => "Rule" ID "=>" SCEXP;
Rule scqExp => ("(" SCEXP ")") | ("[" SceXP "]");
Rule scEXP => scorExp ["!"] ["*" | " "];
Rule scorExp => scjoinexp ("|" scjinexp) *;
Rule scjoinExp => scEXPELEM ([","] scEXPELEM) *;
Rule scEXPELEM => string | letters | scqexp;
'Below the analyzer
Rule ParserDecl =>
"Class" "Parser"
(SectionDecl)
Ruledecl) *
"END" "class";
Rule ruledecl => "rule" ID "=>" Exp;
Rule QEXP => ("(" Exp ")") | ("[" Exp "]");
Rule Exp => OREXP ["!"] ["*" | " "];
Rule Orexp => Joinexp ("|" joinexp) *;
Rule Joinexp => Explelem ([","] Explelem *;
Rule Explelem => String | ID | qExp;
Rule sectionDecl =>
Section ID
(Labeldecl | ruledecl) *
"End" "section";
Rule RosectionDecl => 'only allows the provision of the rule
Section ID
Ruledecl *
"End" "section";
END CLASS
The purpose of the above EBNF is given is to understand the purpose, format, meaning, and grammar of C3 more in-depth. Next we have to start the compiler of this C3.