Summary
ANTLR (Another Tool for Language Recognition) is a grammatical analysis program (hereinafter referred to as the analyzer) generating tool based on LL (K) grammar. Its generated analyzer uses Java description by default, not more efficient C . This article describes the method of organizing engineering using VC6.0, using ANTLR to generate C , and give an example. Finally, this paper makes a little improvement on the ANTLR itself.
Keyword
ANTLR, Syntax Analyzer, Syntax Analyzer Generation Tool
Introduction to ANTLR
Automatic generation of analyzers has always been a direction in compiling theory research. Early programmers hand-written analyzer, not only cost time, but also prepared analyzers unstable, not easy to modify and transplant. Under the impact of automation, more and more programmers abandon this manual practice.
Antlr (previously called PCCTS, Purdue Compiler Construction Tool Set, Pudu University Compiler Building) is an analyzer automatic generation tool, which can accept language-based descriptions and can produce Identify procedures for these languages. And we can insert a specific semantic actions in the grammar description, telling ANTLR how to create an abstract syntax tree (AST) and how to generate output.
Now Antlr is increasingly popular (with commentary that Antlr's appearance is a milestone), not only because of its function, easy to expand, open source, and Antlr generated code and use recursive decreased methods (main methods of manual decline) The code is very similar, easy to read. In contrast, another well-known analyzer generated tool YACC (YET Another Compiler-Compiler, LR analysis method) is more embarrassing.
At present, there are not many articles in China, and only the article also describes the analyzers that use the ANTLR to generate a Java description. In fact, ANTLR can also generate a source program description of C (starting from version 2.7.3, Antlr begins supporting C #, will support python in the future), but some preparations are required. The specific steps will be described in detail herein.
The latest version of ANTLR can go to ANTLR's official website (http://www.antlr.org) download. As of June 2004, the latest version of ANTLR was 2.7.4. The downloaded file is a compressed package in the form of a TAR.GZ in 1.3m, unzip it to a directory (represented below).
Antlr is developed using Java and requires JDK support. This article assumes that your machine has already installed JDK and sets ClassPath correctly.
Grammar document
The grammar is the rules of language identification. It is the basis for the ANTLR generating program. The grammar document is the core of ANTLR, which is the interface of the programmer and ANTLR.
The writing of the grammar document is basically a problem that is solved. Programmers only need to focus on solving the logic of solving problems, rather than embrace the implementation details of some programming language, thereby reducing the possibility of errors.
Profile
This article simply introduces the syntax of a grammar file, and details can be found in the relevant documentation of ANTLR.
The grammar files generally include Header blocks, Options blocks, grammar analyzer (PARSER) and rule definitions, lexical scanners (Lexer) and token definitions. The most important is the rules and the definition of Token.
The definition of the rules and the expansion of the Bacos paradigm (EBNF) in the compilation theory are extremely similar, including rule names, regulatory bodies, one semicolon and abnormal processing section (omitted) used as the end flag. For example, the following rules describe the syntax of the assignment statement in the C language: Assignment_Stat:
ID '=' expr ';'
;
Its significance is that a assignment statement is composed of one ID, a equal sign, an expression, and a sequential order.
The definition method of Token is similar to the rule. For example, the following token definition represents a decimal integer:
Num:
('1' .. '9') ('0' .. '9') *
;
Its significance is that the first character of the number (NUM) is a character in '1' to '9', and there are 0 or more '0' to '9' characters.
One thing to note is that the name of the rule must be on the lowercase letters, and the name of Token must begin with uppercase letters.
Set the language generated by ANTLR
Antlr has many options that can be set in the Options block in the grammar file, including the Language of the ANTLR finally generated. If you want to generate a C description analyzer program, you should set it as follows:
Options
{
Language = "cpp";
// Other Options
}
The default value of the Language option is "java". If you want the generated program, it is ok to set the language to "csharp".
Example of C program
An example of an analyzer described in ANTLR generated C is given below. The function of the analyzer is an arithmetic expression to analyze the user input, gives the final result of the expression. In addition to the addition or subtraction, the operator is included in this expression, including the power operator "^", and SIN, COS and TAN three triangle functions.
Before starting, we must first generate the library file required when compiling the program generated by the link ANTLR.
Build a static link library
Build (Build) The C program generated by ANTLR requires support for a running library. The source code of the running library is also fully open, located in the
First use VC6.0 New Win32 Static Link Library named ANTLRLIB, do not select "Pre-Compiled Header" and "MFC Support" options.
Click on the menu "Project" à "add to project" à "files ...", add all files below the ANTLR-2.7.4 / lib / cpp / src to all files other than DLL.CPP (Note Do not join the DLL .cpp, otherwise it cannot be compiled).
In order to let VC6.0 find the desired header file, you need to add
At this time, Build's entire project can generate the ANTLR running file ANTLRLIB.LIB (some documents say that the RTTI options need to be turned on in the engineering settings, but it doesn't seem to do this too much). Write a grammar file
According to the needs, it is not difficult to write the following grammar files:
HEADER {
#include
#include
#include
}
Options
{
Language = "cpp";
}
Class Exprparser Extends Parser;
{
}
// Rules
Expr returns [double value = 0]
{Double X;}
:
Value = TERM
(
Plus x = Term {Value = x;}
|
Minus x = term {value- = x;
) *
;
EXCEPTION
Catch [ANTLR_USE_NAMESPACE (ANTLR) ANTLREXCEPTION & EX] {
// catch all Exceptions and report IT
ReportError (ex.toString ());
}
Term returns [double value = 0]
{Double X;}
:
Value = factory
(
Star x = Factor {Value * = x;
|
Slash x = Factor {Value / = x;
) *
;
Factor Returns [double value = 0]
{Double X;}
:
Value = Atom
(
TOK_POW X = Atom {Value = Pow (Value, x);
) *
;
Atom Returns [double value = 0]
{Double X;}
:
i: Num
{
Value = ATOF ((i-> getText ()). c_str ());
}
|
Tok_sin x = atom {value = sin (x);
|
Tok_cos x = atom {value = cos (x);
|
Tok_tan x = atom {value = tan (x);
|
Lparen value = expr rparen
;
EXCEPTION
Catch [ANTLR_USE_NAMESPACE (ANTLR) ANTLREXCEPTION & EX] {
ReportError (ex.toString ());
}
Class Exprlexer Extends Lexer;
Options {
K = 1;
Casesensitive = false;
}
// tokens
Lparen: '(';
Rparen: ')';
Plus: ' ';
Minus: '-';
Star: '*';
Slash: '/';
Num :( '0' .. '9') ('0' .. '9') * ('.' ('0' .. '9')?
Return: '/ n';
// Math Token
Tok_sin: "sin";
TOK_COS: "COS";
TOK_TAN: "Tan"; tok_pow: '^';
// White Space
WS:
(
'' '
|
'/ t'
)
{$ settype (ANTLR_USE_NAMESPACE (ANTLR) token :: Skip);
;
Save the file as Test.g ('g' is the default grammar file extension).
In this grammar file, a parser class ExprParser and a lexical scanner EXPRLEXER are defined. ANTLR will generate header files and implement files for two classes.
Use VC6.0 organization project
Now you need to generate the source code of the analyzer by the grammar file, then add some other code, and finally compile these code, the process is slightly cumbersome. Therefore, the engineering, simplifying steps are organized with VC6.0 as a development environment.
New Construction
Use the VC6.0 to create a project called Antlrcpp's Win32 console. Choose New An Empty Project. Click "Project" à "add to project" à "files", add the text method Test.g to the project.
Right-click this gramfa file in the fileview of the VC, select "Settings ..." in the pop-up menu, pop up the dialog shown in the figure, select the "Custom Build" tab, as shown in the figure:
In "Commands", fill in the command to call the ANTLR compile text method:
Java-cp
In "Outputs", fill in the name of all the files to be generated after the ANTLR compile text method, as follows:
Exprlexer.cpp
ExprleXer.hpp
Exprparser.cpp
Exprparser.hpp
Exprparsertouestypes.hpp
ExprparsertokeTypes.txt
Generate analyzer source code
After the setting is complete, you can compile the grammat file: select the grafly file, press Ctrl F7 (or click the Compile button in the toolbar) to perform the compilation operation. The following is displayed in the Output window of the VC:
-------------------- Configuration: ANTLRCPP - Win32 Debug --------------------
Performing Custom Build Step on ./testjava.g
ANTLR PARSER Generator Version 2.7.4 1989-2004 jguru.com
EXPRLEXER.CPP - 0 Error (s), 0 Warning (s)
At this time, the source code of the analyzer has been generated. Click on the menu "Project" à "add to project" à "files" to add all the generated CPP files and HPP files to the project.
Specify input mode
The source code generated by ANTLR is only the core part of the analyzer, and the programmer also needs to specify the input of the analyzer. To this end, you need to create a new main.cpp file, specify the input of the analyzer.
If you want the analyzer to analyze the strings entered by the keyboard, the code is as follows:
#include "exprparser.hpp"
#include "exprlexer.hpp"
#include
Using namespace std;
void main ()
{
ExprleXer Lexer (CIN);
Exprparser Parser (Lexer);
Double x = 0; x = parse.expr ();
Cout << "The result is:" << x << endl;
}
If you want the analyzer to analyze the strings in a file, the corresponding code is as follows:
#include "exprparser.hpp"
#include "exprlexer.hpp"
#include
#include
Using namespace std;
void main ()
{
FSTREAM FROM ("Test.in");
Exprlexer Lexer (from);
Exprparser Parser (Lexer);
Double x = 0;
x = parse.expr ();
Cout << "The result is:" << x << endl;
}
Compile to get the final result
Before building the project, you need to let VC6.0 know where to find the required header file. The method of adding the header file search path has been introduced, and will not be described again, as shown.
If you now go to Build Engine, you will see a lot of link errors. This is because the compiled target file is not connected to the running library. The method for specifying the link library is to select the "LINK" tab in the dialog box above, add "ANTLRLIB.LIB" in "Object / Library Modules:".
You can build throughout the project. The resulting executable operation is as shown in the figure:
A small improvement to ANTLR
All source code is included in the download file of ANTLR, and modifications are allowed. We can improve the ANTLR for your own special needs.
statement of problem
If a rule (after the ANTLR is compiled), the default value of the parameter is specified, then in the generated program, we may see the implementation of the corresponding function, the default of this parameter Value (otherwise it will violate the syntax of C ). But considering the readability of the program, we want to make a little improvement: In the implementation, the default value of the parameters is enclosed in the comment symbol ("/ *" and "* /"), not completely hidden.
Code modification
Find the
String Oldarg = rblk.Argaction;
String newarg = "";
String COMMA = "";
INT EQPOS = Oldarg.indexOf ('=');
IF (EQPOS! = -1)
{
INT CMPOS = 0;
While (CMPOS! = -1)
{
NEWARG = NEWARG COMMA OLDARG.SUBSTRING (0, EQPOS) .trim ();
COMMA = ",";
Cmpos = Oldarg.indexOf (',', eqpos);
IF (CMPOS! = -1)
{
// Cut Off Part WE JUST HANDLED
Oldarg = Oldarg.substring (Cmpos 1) .trim (); EQPOS = Oldarg.indexOf ('=');
}
}
}
Else
NEWARG = Oldarg;
PRINTLN (NEWARG);
Modify as follows:
String Oldarg = rblk.Argaction;
String newarg = "";
String COMMA = "";
INT EQPOS = Oldarg.indexOf ('=');
IF (EQPOS! = -1)
{
INT CMPOS = 0;
While (CMPOS! = -1)
{
NEWARG = NEWARG COMMA OLDARG.SUBSTRING (0, EQPOS) .trim () "/ *";
COMMA = ",";
Cmpos = Oldarg.indexOf (',', eqpos);
IF (CMPOS! = -1)
{
// Get the default value of the argument
NEWARG = NEWARG OLDARG.SUBSTRING (EQPOS, CMPOS) "* /";
// Cut Off Part WE JUST HANDLED
Oldarg = Oldarg.Substring (Cmpos 1) .trim ();
EQPOS = Oldarg.indexOf ('=');
} else {
NEWARG = NEWARG OLDARG.SUBSTRING (EQPOS) .trim () "* /";
}
}
}
Else
NEWARG = Oldarg;
PRINTLN (NEWARG);
Recompament ANTLR
The steps to recompile ANTLR are:
Open a command line environment, change the current path to
Javac * .java
Then change the current path to
Java-cp
Finally, regenerate the JAR file, enter the following command:
Java-cp
At this point, Antlr has been updated.
to sum up
After this article, I believe that you have more understanding of Antlr. Take advantage of ANTLR can greatly reduce the burden of writing and analysis. Combined with efficient C , Antlr generated analyzer efficiency and hand-written analyzer is similar. I believe that ANTLR will have a beautiful tomorrow.
Compiler Environment
Windows2000 SP4, VC6.0, ANTLR 2.7.4, J2SDK 1.4.1
references
ANTLR-2.7.4 Documentation
Www.antlr.org