Construct a language recognizer
ANTLR: Another Tool for Language Recognition (http://www.antlr.org)
Original source: http://www.jguru.com/faq/view.jsp? EID = 78
Chinese and English word control:
Grammar: grammar
SYNTAX: Syntax
Action: Action
LEXER: Section ID
Parser: Analyzer
TOKEN: marker
AST: Abstract Syntax Tree
Lexer grammar: marker recognizer
Parser Grammar: Analyzer Grammar
Tree Grammar: Analysis Tree Law
Tree Parser: Analyze Tree Analyzer
Walk: Traversing
In order to construct a language recognizer, the structure of the language can be used in the grammar, and then use ANTLR to generate a Java or C written to identify the definition identification program of the statement in the language. Some simple operators can be added to automatically construct an intermediate formal syntax tree, which can be used to perform some conversion. It is also possible to embed Java or C action (Actions) in a grammar to collect information or perform some conversion.
When making simple conversion, you will construct two grammar: A Lexer Grammar and a Parser grammar, ANTLR can generate a Lexer according to them (usually called Scanner or Tokenizer) and a Parser. Lexer turns the input character stream into a marker stream, and Parser will apply a text-based structure (syntax) on the marker stream (Syntax). Here is a simple Lexer to match the comma (integers) and IDENTIFIERS:
Class Intandidlexer Extends Lexer;
INT: ('0' .. '9') ;
ID: ('a' .. 'z') ;
COMMA: ',';
Parser will see a stream of tokens by requesting a marker to Lexer, Parser will see a stream of tokens. Not only that, Parser will also verify that this marker flow has the correct syntax structure. If your language method is defined as "a series of integers and identity, and by a comma-separated series, you may need a grammance that looks like this:
Class Seriesparser Extends Parser;
/ ** Match An Element (int OR ID) with Possibly A
* Bunch of ", Element" PAIRS TO FOLLOW MATCHING
* Input this Looks Like 32, A, Size, 28923, i
* /
Series: Element (Comma Element) *;
/ ** Match Either an int or id * /
ELEMENT: INT | ID;
You may want to embed action (Actions) in the literary law, so that these codes are executed when Parser sees a particular input structure. If you want to print a total of how many elements have been found, you can add actions like this:
Class Seriesparser Extends Parser; // I'm Using Java ...
/ ** Match An Element (int OR ID) with Possibly A
* Bunch of ", Element" PAIRS TO FOLLOW MATCHING
* Input this Looks Like 32, A, Size, 28923, i
* /
Series
{/ * this is considered an inTIALALIZATION Action
* And is Done Before Recognition of this rule
* Begins. Thase Look Like Local Variables To
* The resulting method seriesparsparser.series ()
* /
INT N = 1; // How Many Elements? AT Least 1
}
: Element (Comma Element {N ;}) *
{System.out.Println ("There Were" N "Elements");}
;
/ ** Match Either an int or id * /
ELEMENT: INT | ID;
So what did Antlr do to these grams? Ok, look at SeriesParser, Antlr will generate the following code (except for the error handling section):
Public Class T Extends Antlr.llkParser Implements Ttouestypes {
// i cut out the usual set of constructors and a few
// Other Details.
/ ** Match An Element (int OR ID) with Possibly A
* Bunch of ", Element" PAIRS TO FOLLOW MATCHING
* Input this Looks Like 32, A, Size, 28923, i
* /
Public Final Void Series () {
ELEMENT (); // match an element
_LOOP3:
Do {
IF ((LA (1) == COMMA) {
Match (comma);
ELEMENT ();
}
Else {
Break _loop3;
}
WHILE (TRUE);
}
/ ** Match Either an int or id * /
Public final void element () {
Switch (la (1)) {
Case Int:
{
Match (int);
Break;
}
Case ID:
{
Match (ID);
Break;
}
}
}
}
Consider the above code, you will begin to discover the correspondence between the grammat and the above code, which is similar to the handwritten code.
In order to use Lexer and Parser, you need a main () method to create their instance, linked them and call the rules in Parser:
Main (String [] args) {
DataInputStream Input = New DataInputStream (System.in);
// attach lexer to the input stream
INTANDIDEXER Lexer = New Intandidexer (INPUT);
// Create Parser attached to Lexer
SeriesParser Parser = New SeriesParser (Lexer);
// Start up the Parser by Calling the Rule
// at Which You Want to Begin Parsing.
Parser.Series ();
}
In order to print the text value of the marks that Parser, the corresponding item must be launched. This marking will point to the token object constructed by the Lexer. In one action, you can get the text value of the marker object:
Class Seriesparser Extends Parser;
Series: Element (Comma Element) *;
ELEMENT
: A: int {system.out.println (a.getText ());}
| B: ID {system.out.println (b.gettext ());}
;
Enter 32, A, Size, 28923, i, will result in the following output:
32
a
Size
28923
i
More complex conversions generally need multiple traverses for input, so programmers generally construct a middle-form grammar tree called abstract syntax tree (AST), which is an institutionalization manifestation for input text. You can use your handwritten code to traverse the grammar tree, or you can use the ANTLR syntax tree to describe the structure of the syntax tree. The action embedded in the grammar tree method will be executed when Tree Parser resolves the relevant position of the input syntax tree.
How can I construct a simple language tree? Very simple! Tell the ANTLR to construct the syntax tree, it will do this, each input mark accounts for a node; that is, the ANTLR will construct a linked list consisting of an input marker. In order to make things more interesting, add a 'behind the Comma marker! 'To point out a comma, you don't need to be included in the input syntax tree:
Class Seriesparser Extends Parser;
Options {
Buildast = True;
}
Series: Element (COMMA! Element) *;
ELEMENT: INT | ID;
What do we do to do AST? You may construct a subclass of ANTLR's CommON AST and join a Walk () method or something else, but a better way is to describe the structure of AST with another text method. The Tree Grammar is like a comment that can be performed on your intermediate form. There is a small grammar here, which matches the syntax tree generated by our Parser Grammar:
Class SeriestreeParser Extends TreeParser;
/ ** Match A Flat Tree (a list) of one or more intes or IDs.
* This rule diffries from seriesparsparser.series (), Which
* Is in a different grammar.
* /
SERIES: (int | ID) ;
note! The Tree Grammar is simpler than the analyzer grammer. In general, you will construct a relatively simple syntax tree for easy traversal, rather than facing input texts containing all blank and other grammar, for humans, for humans, and formerly. In order to call your Tree Parser, you need to increase the main () method:
Main (String [] args) {
DataInputStream Input = New DataInputStream (System.in);
// attach lexer to the input streamintandidexer lexer = new intandidexer (Input);
// Create Parser attached to Lexer
SeriesParser Parser = New SeriesParser (Lexer);
// Start up the Parser by Calling the Rule
// at Which You Want to Begin Parsing.
Parser.Series ();
// Get the tree out of the paser
Ast resulttree = parse.getast ();
// make an instance of the tree dispser
SeriestreeParser TreeParser = New SeriesTreeParser ();
// Begin Tree Parser At Only Rule
TreeParser.Series (ResultTree);
}
You can join the action into Tree Grammar, just as simple as Parser Grammar. In order to print an integer and identification list as in Parser Grammar, there is a need to join several actions:
Class SeriestreeParser Extends TreeParser;
Series
: (A: int {system.out.println (a.getText ());}
| B: ID {system.out.println (b.gettext ());}
)
;
If you still want to traverse AST again, use other actions? One way is to define another as rule, but correspond to different actions:
Class SeriestreeParser Extends TreeParser;
Series
: (A: int {system.out.println (a.getText ());}
| B: ID {system.out.println (b.gettext ());}
)
;
/ ** SUM UP All The Integers for Fun. * /
Passtwo
{
INT SUM = 0;
}
: (A: int {sum = integer.parseint (a.getText ());}
| ID
)
{System.out.println ("SUM IS" SUM);
;
So your main () method needs to call this new rule after calling the Series rules:
Main (String [] args) {
...
// Get the tree out of the paser
Ast resulttree = parse.getast ();
// make an instance of the tree dispser
SeriestreeParser TreeParser = New SeriesTreeParser ();
TreeParser.Series (Resulttree); // Walk Ast Once
TreeParser.Passtwo (Resulttree); // Walk Ast Again !!
}
You will see the output below:
32
a
Size
28923
i
SUM IS 28955