Use ANTLR to generate C ++ description analysis programs

xiaoxiao2021-03-06  68

Summary

ANTLR (Another Tool for Language Recognition) is a grammatical analysis program (hereinafter referred to as the analyzer) generating tool based on LL (K) grammar. Its generated analyzer uses Java description by default, not more efficient C . This article describes the method of organizing engineering using VC6.0, using ANTLR to generate C , and give an example. Finally, this paper makes a little improvement on the ANTLR itself.

Keyword

ANTLR, Syntax Analyzer, Syntax Analyzer Generation Tool

Introduction to ANTLR

Automatic generation of analyzers has always been a direction in compiling theory research. Early programmers hand-written analyzer, not only cost time, but also prepared analyzers unstable, not easy to modify and transplant. Under the impact of automation, more and more programmers abandon this manual practice.

Antlr (previously called PCCTS, Purdue Compiler Construction Tool Set, Pudu University Compiler Building) is an analyzer automatic generation tool, which can accept language-based descriptions and can produce Identify procedures for these languages. And we can insert a specific semantic actions in the grammar description, telling ANTLR how to create an abstract syntax tree (AST) and how to generate output.

Now Antlr is increasingly popular (with commentary that Antlr's appearance is a milestone), not only because of its function, easy to expand, open source, and Antlr generated code and use recursive decreased methods (main methods of manual decline) The code is very similar, easy to read. In contrast, another well-known analyzer generated tool YACC (YET Another Compiler-Compiler, LR analysis method) is more embarrassing.

At present, there are not many articles in China, and only the article also describes the analyzers that use the ANTLR to generate a Java description. In fact, ANTLR can also generate a source program description of C (starting from version 2.7.3, Antlr begins supporting C #, will support python in the future), but some preparations are required. The specific steps will be described in detail herein.

The latest version of ANTLR can go to ANTLR's official website (http://www.antlr.org) download. As of June 2004, the latest version of ANTLR was 2.7.4. The downloaded file is a compressed package in the form of a TAR.GZ in 1.3m, unzip it to a directory (represented below).

Antlr is developed using Java and requires JDK support. This article assumes that your machine has already installed JDK and sets ClassPath correctly.

Grammar document

The grammar is the rules of language identification. It is the basis for the ANTLR generating program. The grammar document is the core of ANTLR, which is the interface of the programmer and ANTLR.

The writing of the grammar document is basically a problem that is solved. Programmers only need to focus on solving the logic of solving problems, rather than embrace the implementation details of some programming language, thereby reducing the possibility of errors.

Profile

This article simply introduces the syntax of a grammar file, and details can be found in the relevant documentation of ANTLR.

The grammar files generally include Header blocks, Options blocks, grammar analyzer (PARSER) and rule definitions, lexical scanners (Lexer) and token definitions. The most important is the rules and the definition of Token.

The definition of the rules and the expansion of the Bacos paradigm (EBNF) in the compilation theory are extremely similar, including rule names, regulatory bodies, one semicolon and abnormal processing section (omitted) used as the end flag. For example, the following rules describe the syntax of the assignment statement in the C language: Assignment_Stat:

ID '=' expr ';'

;

Its significance is that a assignment statement is composed of one ID, a equal sign, an expression, and a sequential order.

The definition method of Token is similar to the rule. For example, the following token definition represents a decimal integer:

Num:

('1' .. '9') ('0' .. '9') *

;

Its significance is that the first character of the number (NUM) is a character in '1' to '9', and there are 0 or more '0' to '9' characters.

One thing to note is that the name of the rule must be on the lowercase letters, and the name of Token must begin with uppercase letters.

Set the language generated by ANTLR

Antlr has many options that can be set in the Options block in the grammar file, including the Language of the ANTLR finally generated. If you want to generate a C description analyzer program, you should set it as follows:

Options

{

Language = "cpp";

// Other Options

}

The default value of the Language option is "java". If you want the generated program, it is ok to set the language to "csharp".

Example of C program

An example of an analyzer described in ANTLR generated C is given below. The function of the analyzer is an arithmetic expression to analyze the user input, gives the final result of the expression. In addition to the addition or subtraction, the operator is included in this expression, including the power operator "^", and SIN, COS and TAN three triangle functions.

Before starting, we must first generate the library file required when compiling the program generated by the link ANTLR.

Build a static link library

Build (Build) The C program generated by ANTLR requires support for a running library. The source code of the running library is also fully open, located in the /ANTLR-2.7.4/lib/cpp directory. We can select these codes to compile as a static link library or a dynamic link library. For the 2.7.4 version of ANTLR, compile the dynamic link library requires a compilation environment above VC7.0. Here we compile them as static libraries.

First use VC6.0 New Win32 Static Link Library named ANTLRLIB, do not select "Pre-Compiled Header" and "MFC Support" options.

Click on the menu "Project" à "add to project" à "files ...", add all files below the ANTLR-2.7.4 / lib / cpp / src to all files other than DLL.CPP (Note Do not join the DLL .cpp, otherwise it cannot be compiled).

In order to let VC6.0 find the desired header file, you need to add /ANTLR-2.7.4/lib/cpp to the header file search path. The specific method is to click "Project" à "settings ...", select "Debug" tab in the pop-up dialog box, select "PreProcessor" in the drop-down list, in "Additional Include Path", as shown in the figure:

/ANTLR-2.7.4/LIB/CPP

At this time, Build's entire project can generate the ANTLR running file ANTLRLIB.LIB (some documents say that the RTTI options need to be turned on in the engineering settings, but it doesn't seem to do this too much). Write a grammar file

According to the needs, it is not difficult to write the following grammar files:

HEADER {

#include

#include

#include

}

Options

{

Language = "cpp";

}

Class Exprparser Extends Parser;

{

}

// Rules

Expr returns [double value = 0]

{Double X;}

:

Value = TERM

(

Plus x = Term {Value = x;}

|

Minus x = term {value- = x;

) *

;

EXCEPTION

Catch [ANTLR_USE_NAMESPACE (ANTLR) ANTLREXCEPTION & EX] {

// catch all Exceptions and report IT

ReportError (ex.toString ());

}

Term returns [double value = 0]

{Double X;}

:

Value = factory

(

Star x = Factor {Value * = x;

|

Slash x = Factor {Value / = x;

) *

;

Factor Returns [double value = 0]

{Double X;}

:

Value = Atom

(

TOK_POW X = Atom {Value = Pow (Value, x);

) *

;

Atom Returns [double value = 0]

{Double X;}

:

i: Num

{

Value = ATOF ((i-> getText ()). c_str ());

}

|

Tok_sin x = atom {value = sin (x);

|

Tok_cos x = atom {value = cos (x);

|

Tok_tan x = atom {value = tan (x);

|

Lparen value = expr rparen

;

EXCEPTION

Catch [ANTLR_USE_NAMESPACE (ANTLR) ANTLREXCEPTION & EX] {

ReportError (ex.toString ());

}

Class Exprlexer Extends Lexer;

Options {

K = 1;

Casesensitive = false;

}

// tokens

Lparen: '(';

Rparen: ')';

Plus: ' ';

Minus: '-';

Star: '*';

Slash: '/';

Num :( '0' .. '9') ('0' .. '9') * ('.' ('0' .. '9')?

Return: '/ n';

// Math Token

Tok_sin: "sin";

TOK_COS: "COS";

TOK_TAN: "Tan"; tok_pow: '^';

// White Space

WS:

(

'' '

|

'/ t'

)

{$ settype (ANTLR_USE_NAMESPACE (ANTLR) token :: Skip);

;

Save the file as Test.g ('g' is the default grammar file extension).

In this grammar file, a parser class ExprParser and a lexical scanner EXPRLEXER are defined. ANTLR will generate header files and implement files for two classes.

Use VC6.0 organization project

Now you need to generate the source code of the analyzer by the grammar file, then add some other code, and finally compile these code, the process is slightly cumbersome. Therefore, the engineering, simplifying steps are organized with VC6.0 as a development environment.

New Construction

Use the VC6.0 to create a project called Antlrcpp's Win32 console. Choose New An Empty Project. Click "Project" à "add to project" à "files", add the text method Test.g to the project.

Right-click this gramfa file in the fileview of the VC, select "Settings ..." in the pop-up menu, pop up the dialog shown in the figure, select the "Custom Build" tab, as shown in the figure:

In "Commands", fill in the command to call the ANTLR compile text method:

Java-cp /antlr-2.7.4/antlr.jar ANTLR.TOOL -O "$ (wkspdir)" $ (InputName) .g

In "Outputs", fill in the name of all the files to be generated after the ANTLR compile text method, as follows:

Exprlexer.cpp

ExprleXer.hpp

Exprparser.cpp

Exprparser.hpp

Exprparsertouestypes.hpp

ExprparsertokeTypes.txt

Generate analyzer source code

After the setting is complete, you can compile the grammat file: select the grafly file, press Ctrl F7 (or click the Compile button in the toolbar) to perform the compilation operation. The following is displayed in the Output window of the VC:

-------------------- Configuration: ANTLRCPP - Win32 Debug --------------------

Performing Custom Build Step on ./testjava.g

ANTLR PARSER Generator Version 2.7.4 1989-2004 jguru.com

EXPRLEXER.CPP - 0 Error (s), 0 Warning (s)

At this time, the source code of the analyzer has been generated. Click on the menu "Project" à "add to project" à "files" to add all the generated CPP files and HPP files to the project.

Specify input mode

The source code generated by ANTLR is only the core part of the analyzer, and the programmer also needs to specify the input of the analyzer. To this end, you need to create a new main.cpp file, specify the input of the analyzer.

If you want the analyzer to analyze the strings entered by the keyboard, the code is as follows:

#include "exprparser.hpp"

#include "exprlexer.hpp"

#include

Using namespace std;

void main ()

{

ExprleXer Lexer (CIN);

Exprparser Parser (Lexer);

Double x = 0; x = parse.expr ();

Cout << "The result is:" << x << endl;

}

If you want the analyzer to analyze the strings in a file, the corresponding code is as follows:

#include "exprparser.hpp"

#include "exprlexer.hpp"

#include

#include

Using namespace std;

void main ()

{

FSTREAM FROM ("Test.in");

Exprlexer Lexer (from);

Exprparser Parser (Lexer);

Double x = 0;

x = parse.expr ();

Cout << "The result is:" << x << endl;

}

Compile to get the final result

Before building the project, you need to let VC6.0 know where to find the required header file. The method of adding the header file search path has been introduced, and will not be described again, as shown.

If you now go to Build Engine, you will see a lot of link errors. This is because the compiled target file is not connected to the running library. The method for specifying the link library is to select the "LINK" tab in the dialog box above, add "ANTLRLIB.LIB" in "Object / Library Modules:".

You can build throughout the project. The resulting executable operation is as shown in the figure:

A small improvement to ANTLR

All source code is included in the download file of ANTLR, and modifications are allowed. We can improve the ANTLR for your own special needs.

statement of problem

If a rule (after the ANTLR is compiled), the default value of the parameter is specified, then in the generated program, we may see the implementation of the corresponding function, the default of this parameter Value (otherwise it will violate the syntax of C ). But considering the readability of the program, we want to make a little improvement: In the implementation, the default value of the parameters is enclosed in the comment symbol ("/ *" and "* /"), not completely hidden.

Code modification

Find the /ANTLR-2.7.4/.ndlr/cppcodegenerator.java file to modify its 3502 line to 3526 lines. The original program is:

String Oldarg = rblk.Argaction;

String newarg = "";

String COMMA = "";

INT EQPOS = Oldarg.indexOf ('=');

IF (EQPOS! = -1)

{

INT CMPOS = 0;

While (CMPOS! = -1)

{

NEWARG = NEWARG COMMA OLDARG.SUBSTRING (0, EQPOS) .trim ();

COMMA = ",";

Cmpos = Oldarg.indexOf (',', eqpos);

IF (CMPOS! = -1)

{

// Cut Off Part WE JUST HANDLED

Oldarg = Oldarg.substring (Cmpos 1) .trim (); EQPOS = Oldarg.indexOf ('=');

}

}

}

Else

NEWARG = Oldarg;

PRINTLN (NEWARG);

Modify as follows:

String Oldarg = rblk.Argaction;

String newarg = "";

String COMMA = "";

INT EQPOS = Oldarg.indexOf ('=');

IF (EQPOS! = -1)

{

INT CMPOS = 0;

While (CMPOS! = -1)

{

NEWARG = NEWARG COMMA OLDARG.SUBSTRING (0, EQPOS) .trim () "/ *";

COMMA = ",";

Cmpos = Oldarg.indexOf (',', eqpos);

IF (CMPOS! = -1)

{

// Get the default value of the argument

NEWARG = NEWARG OLDARG.SUBSTRING (EQPOS, CMPOS) "* /";

// Cut Off Part WE JUST HANDLED

Oldarg = Oldarg.Substring (Cmpos 1) .trim ();

EQPOS = Oldarg.indexOf ('=');

} else {

NEWARG = NEWARG OLDARG.SUBSTRING (EQPOS) .trim () "* /";

}

}

}

Else

NEWARG = Oldarg;

PRINTLN (NEWARG);

Recompament ANTLR

The steps to recompile ANTLR are:

Open a command line environment, change the current path to /rantlr-2.7.4/antl/build, enter the following command:

Javac * .java

Then change the current path to / ANTLR-2.7.4, enter the following command, compile the source:

Java-cp /antlr-2.7.4/antlr.jar antlr.build.tool build

Finally, regenerate the JAR file, enter the following command:

Java-cp /antlr-2.7.4/antlr.jar antlr.build.tool jar

At this point, Antlr has been updated.

to sum up

After this article, I believe that you have more understanding of Antlr. Take advantage of ANTLR can greatly reduce the burden of writing and analysis. Combined with efficient C , Antlr generated analyzer efficiency and hand-written analyzer is similar. I believe that ANTLR will have a beautiful tomorrow.

Compiler Environment

Windows2000 SP4, VC6.0, ANTLR 2.7.4, J2SDK 1.4.1

references

ANTLR-2.7.4 Documentation

Www.antlr.org

转载请注明原文地址:https://www.9cbs.com/read-112710.html

New Post(0)