Yacc and Lex Quick Start Lex and Yacc
Ashish Bansal Software Engineer, Sapient Company November 2000
Content: Lexlex General Expression LEX Programming C Global Declaration Lex Mode Rules C Code in LEX Combining All Advanced LexyAcc Use YACC Writing YACC Syntax Rules Additional C Code Other Command Row Options To combine LEX and YACC About the author
Lex and Yacc are two very important and powerful tools of UNIX. In fact, if you master Lex and Yacc, their powerful features make the compiler of Fortran and C, like children. Ashish Bansal discusses you in detail the two tools used to write your own language and compilers, including regular expressions, declarations, matching modes, variables, yacc syntax, and parser code. Finally, he explained how to combine LEX and Yacc.
Lex represents Lexical Analyzar. YACC represents YET ANOTHER Compiler Compiler. Let us start from Lex.
Lexlex is a tool for generating a scanner. The scanner is a program for identifying the vocabulary mode in the text. These vocabulary modes (or conventional expressions) are defined in a special sentence structure, which will be discussed later.
A matching conventional expression may contain related actions. This action may also include returning a tag. When Lex receives input of a file or text, it tries to match text with regular expressions. It reads an input character once until a matching mode is found. If you can find a matching mode, Lex performs the relevant action (possibly, it may include returning a tag). On the other hand, if there is no conventional expression that can be matched, further processing will be stopped, and the Lex will display an error message.
LEX and C are strong coupled. One .lex file (LEX file has a .lex extension) passed through the LEX utility and generates C's output file. These files are compiled into executable versions of the lexical analyzer.
Lex's conventional expressions routine expression is a mode description using a meta-language. Expression consists of symbols. Symbols are typically characters and numbers, but there are other tags with special meaning in Lex. The following two tables define some of the tags used in Lex and give several typical examples.
Use lex to define conventional expressions
Character Meaning A-Z, 0-9, A-Z constitutes a partial pattern character and a number. Match any of the characters, except / n. - Used to specify the range. For example: A-Z refers to all characters from A to Z. [] A character set. Match any character in brackets. If the first character is ^ then it represents a negative mode. For example: [ABC] matches any one of A, B, and C. * Match 0 or more of the above modes. Match 1 or more of the above modes. • Match 0 or 1 above mode. $ As the last character of the mode matches the end of the line. {} Indicates the number of possibilities that may occur. For example: A {1, 3} means A may appear once or 3 times. / Used to escape the metamorphic characters. Also used to overwrite the specific meaning defined in this table, only the original meaning of characters. ^ Negation. | Logic between expressions or. "
General expressions
General expression mean Joke [RS] matches Jokes or Joker. A {1,2} SHIS matches AASHIS, ASHIS, AASHI, ASHI. (A [b-e]) matches 0 or 1 in all characters from b to e after A appearance position.
The tag in Lex declares that the variable name in C is declared. Each tag has a related expression. (Examples of markers and expressions are given in the table below.) Using the example in this table, we can edit a quote statistics. Our first task is to explain how to declare tags. Mark declaration example
Tag Related Expression Meaning Digital (Number) ([0-9]) 1 or Multiple Digital Characters (Chars) [A-ZA-Z] Arbitrary Character Space ("One Number (Word) 1 or more Chars variables (Variable) (characters) * (character) * (number) *
Lex programming LEX programming can be divided into three steps:
The format specified by LEX can be understood. Run Lex on this file to generate the C code of the scanner. Compile and link C code to generate executable scanners.
Note: If the scanner is part of the parser developed with YACC, only the first step and the second step is required. For help for this special problem, please read YACC and combine LEX and YACC.
Let us look at the program format that Lex can understand. One LEX program is divided into three segments: the first paragraph is a global statement of C and LEX, and the second paragraph includes mode (C code), and the third paragraph is a supplemental C function. For example, there are also main () functions in the third paragraph. These paragraphs are bound by %%. So, return to the LEX program of the word count, let's take a look at the composition of the different sections.
C and LEX global statements we can increase C variable declarations. Here we will declare a whole variable for the word count statistics to save the number of words statistics. We will also make LEX tag declarations.
Declaration of the word count statistics
% {
INT WordCount = 0;
%}
Chars [A-ZA-Z / 抃 '/./ "]
Numbers ([0-9])
Delim ["" / n / t]
Whitespace {DELIM}
Words {chars}
%%
Two percentage markers indicate the end of this paragraph in the LEX program and the beginning of the second paragraph in the third paragraph.
LEX's pattern matching rules let's take a look at the rules of the tag we have to match. (We will use C to define the action after the tag match.) Continue to see our word count statistics, the following is the rules that match the match.
LEX rules in the word count statistics program
{Words} {WordCount ; / *
INCREASE the word count by one * /}
{Whitespace} {/ * do
Nothing * /}
{NumBers} {/ * One May
Want to add some processing here * /}
%%
The third paragraph of C code LEX programming, that is, a function declaration that covers C (sometimes the main function). Note that this paragraph must include a yywrap () function. Lex has a set of functions and variables available. One of them is YYWRAP. In general, the definition of yyWrap () is as follows. We will explore this issue in advanced Lex.
Word Count Statistics C Code Segment
void main ()
{
Yylex (); / * Start the
Analysis * /
Printf ("NO OF Words:
% D / N ", WordCount;
}
Int yywrap ()
{
Return 1;
}
In the previous section we discussed the basic elements of LEX programming, which will help you write simple lexical analysis programs. In the Senior Lex, we will discuss the functions provided by Lex so you can write more complex programs.
Combine them all .lex files are the scanner of Lex. It represented in the lex program: $ lex
This generates the lex.yy.c file, which can be compiled with the C compiler. It can also use a parser to generate an executable, or contain the LEX library by option 杔 l in the link step.
Here is some LEX signs:
-c represents a C action, it is default. -t writes the lex.yy.c program instead of the standard output. -v provides a two-line statistical summary. -n does not print -V summary.
Advanced Lexlex has several functions and variables that provide different information that can be used to compile programs that implement complex functions. Some variables and functions are listed in the table below, as well as their use. For a detailed list, please refer to the Lex or Flex manual (see the resources of the text).
Lex variable
YYINFILE * Type. It points to the current file that Lexer is parsing. YYOoutFile * type. It points to the location of the recorded LEXER output. By default, yyin and yyout points to standard input and output. The text of the YYTEXT matching mode is stored in this variable (char *). YYLENG gives the length of the matching mode. Yylineno provides current number of line information. (Lexer is not supported.)
LEX function
Yylex () This function begins to analyze. It is automatically generated by Lex. YYWRAP () This function is called at the end of the file (or input). If the return value of the function is 1, the resolution is stopped. So it can be used to parse multiple files. The code can be written in the third paragraph, which can parse multiple files. The method is to point to different files using the YYIN file pointer (see on the table) until all files are parsed. Finally, yyWrap () can return 1 to represent the end of the resolution. YYLESS (INT N) This function can be used to send all read tags except for worry? YYMORE () This function tells LEXER to attach the next tag to the current tag.
The discussion of Lex is here. Let's discuss Yacc ...
YACCYACC represents YET Another Compiler Compiler. Yacc's GNU is called Bison. It is a tool that translates all syntax of any programming language into a YACC language parser for such languages. It is written in a Barcean paradigm (BNF, Backus Naur Form). According to the convention, YACC files have .y suffix. The compiling line calls the Yacc compiler:
$ Yacc
Before further explaining, let's review what is grammar. In the previous section, we see that Lex identifies tags from the input sequence. If you are looking at the marker sequence, you may want to perform an action when you appear in this sequence. The specification of the effective sequence in this case is called syntax. YACC grammar files include this language specification. It also contains things you want to do when the sequence matches.
In order to more clear this concept, let us take English as an example. This set of tags may be: nouns, verbs, adjectives, and more. In order to use these markers, your structure must meet certain rules. A simple sentence may be a noun verb or noun verb noun. (Such as i Care. See Spot Run.)
So in us, tag itself from the language (lex), and the tag sequence allows these tags (tag sequences).
Terminal and non-terminal symbol terminal symbols: represents a class of equivalent markers on grammatical structures. Terminal symbols have three types:
Name tag: These are defined by the% Token identifier. In accordance with the practice, they are all capitalized.
Character tag: The writing of character constants is the same as C. For example, it is a character marker. String tag: Writing is the same as the string constant of C. For example, "<<" is a string tag.
Lexer returns a naming tag. Non-terminal symbol: is a group of non-terminal symbols and terminal symbols. According to conventions, they are both lowercase. In the example, File is a non-terminal tag and the NAME is a terminal tag.
Create a compiler with YACC to include four steps:
Generate a parser by running Yacc on a grammatical file. Description Syntax:
Write a .y syntax file (simultaneously explaining C here to do this). Write a lexical analyzer to process the input and pass the tag to the parser. This can be done using lex. Write a function and start parsing by calling yyparse (). Write an error handling routine (such as YYERROR ()). Compile YACC generated code and other related source files. Link the target file to the appropriate executable parser library.
Writing syntax with YACC is like Lex, a YACC program is divided into three segments in double hundred. They are: declaration, grammar rules and c code. We will parse a format as an example of a file as an example as an example to explain the symptom rules. We assume that there are multiple names and ages, which are separated by spaces. When seeing each of the YACC program, we will write a grammar file for our example.
C and YACC Declaration C declaration may define the types and variables used in the action, and macros. You can also include the header file. Each YACC declaration segment declares that the name of the terminal symbol and the non-terminal symbol (tag) may also describe operator priority and data types for different symbols. Lexer (lex) is usually returned to these tags. All of these tags must be described in the YACC declaration.
In an example of file parsing, we are interested in these tags: name, equal sign, and agn. Name is a value consisting entirely of characters. AGE is a number. So the declaration will be like this:
Document analysis example
%
#typedef char * string; / *
To Specify token Types as char * * /
#define yystype string / *
A Yacc Variable Which Has The Value of Returned Token * /
%}
% Token Name EQ Age
%%
You may feel that yystype is a bit strange. But similar to LEX, YACC also has a set of variables and functions for users to expand. YYSTYPE defines the type used to copy the value from LeXer to the parser or Yylval (another YACC variable) of YACC. The default type is INT. Since the string can be copied from the Lexer, the type is redefined as char *. For a detailed discussion of YACC variables, please refer to YACC Manual (see Resources).
YACC syntax rules YACC syntax rules have the following general format:
Result: Components {/ *
Action to be taken in c * /}
;
In this example, Result is the non-terminal symbol described in the rule. Components are different terminals and non-terminal symbols that are placed together in accordance with rules. If you match the specific sequence, you can follow the action you want to perform. Consider the following example:
Param: name eq name {
Printf ("/ TNAME:% S / TValue (Name):% S / N", $ 1, $ 3);
| Name Eq Value {
Printf ("/ TNAME:% S / TValue (Value):% S / N", $ 1, $ 3);
;
If the sequence Name EQ Name in the previous example matches, the action in the corresponding {} parentheses will be performed. Another useful here is the use of $ 1 and $ 3, which references values that mark Name and Name (or Value). Lexer returns these values via YACC variables Yylval. The LEX code for marking NAME is like this: char [a-za-z]
Name {char}
%%
{Name} {yylval = strdup (YYTEXT);
Return name;}
The rule segment of the file parsed example is like this:
File analysis grammar
File: Record file
| Record
;
Record: Name Eq Age {
Printf ("% S is now% s Years Old !!!", $ 1, $ 3);
;
%%
Additional C code Now let's take a look at the last paragraph of the grammar file, add C code. (This section is optional, if someone wants to slightly, "a function is like main () call the yyparse () function (YYLEX () equivalent function in Yacc). In general, YACC is best to provide code for the YYERROR (CHAR MSG) function. Call YYERROR (Char MSG) when the parser encounters an error. Error message is passed as a parameter. A simple Yyerror (char *) may be like this:
Int Yyerror (char * msg)
{
Printf ("Error:% S
Encountered at line number:% d / n ", MSG, YYLINENO);
}
Yylineno offers row information.
This section also includes the main function of the file analysis example:
Additional C code
void main ()
{
Yyparse ();
}
Int Yyerror (char * msg)
{
Printf ("Error:% S
Encountered / N ", MSG);
To generate a code, you may use the following command:
$ Yacc 杁
This generates output file Y. Tab.h and y.tab.c, which can be compiled (such as GCC) with any standard C compiler on UNIX.
Other common options for the command line
'-d', '- defines': Write additional output files that contain these macros: The tag type name, semantic value type, which is defined in the syntax, and some external variable declarations. If the parser output file is called 'name.c', then '-d' file is called 'Name.h'. If you want to put Yylex define in a separate source file, you need 'name.h', because Yylex must be able to reference the tag type code and Yylval variables. '-B file-prefix', '- file-prefix = prefix': Specifies the prefix that all YACC output file names can be used. Select a name, just enter the file name called 'prefix.c'. '-O outfile', '- output-file = outfile': Specifies the output file name of the parser file. Other output files are named after the output file described in '-d' option.
The YACC library is usually automatically included in the compilation step. But it can also be explicitly included in order to specify the 杔 Y option in the compilation step. In this case, the compilation command line is:
$ cc
Combining LEX and YACC to now we have discussed LEX and YACC. Now let's take a look at how they use it.
A program typically calls the yylex () function when returning a tag each time. It is only terminated only if the file ends or the error tag.
A parser generated by YACC calls the Yylex () function to get the tag. Yylex () can be generated by lex or completely written by yourself. For Lexer generated by Lex, it is necessary to use YACC, and a tag must be returned whenever matching a mode in LEX. Therefore, the action in the match mode in the LEX is:
{Pattern} {/ * do smthg * /
Return token_name;}
So YACC will get the returned mark. When Yacc recompores a .y file with a 杁 tag, a header is generated, it has the definition of #define for each tag. If LEX and Yacc are used together, the header must include in the C declaration segment in the corresponding LEX file .lex.
Let's go back to the name and age of the file parsed example, take a look at the code of the Lex and Yacc files.
Name.y - grammar file
%
Typedef char * string;
#define yyntepe string
%}
% Token Name EQ Age
%%
File: Record file
| Record
;
Record: Name Eq Age {
Printf ("% s iS% s years old !!!! / n", $ 1, $ 3);
;
%%
int main ()
{
Yyparse ();
Return 0;
}
Int Yyerror (char * msg)
{
Printf ("Error
Encountered:% S / N ", MSG);
}
Name.lex - LEX parser file
% {
#include "y.tab.h"
#include
#include
EXTERN CHAR * YYLVAL;
%}
Char [A-ZA-Z]
Num [0-9]
EQ [=]
Name {char}
AGE {Num}
%%
{Name} {yylval = strdup (YYTEXT);
Return name;}
{EQ} {RETURN EQ;
{age} {yylval = strdup (yytext);
Return Age;}
%%
Int yywrap ()
{
Return 1;
}
As a reference, we list Y. Tab.h, Yacc generated header files.
Y. Tab.h - Yacc generated header file
# Define name 257
# Define EQ 258
# Define ag 259
We have discussed LEX and YACC. What languages do you want to compile today?
Resource
Lex and Yacc, Levine, Mason and Branson, O 扲 Eilly and its partner company, 2nd Ed. Program Development in Unix, J. T. SHEN, Prentice-Hall India. Compilers: Principles, Techniques and Tools, Ahoo, Sethi and Ullman, Addison-Wesley Pub. Co., 1985, 11. Bison Manual from GNU. Flex Manual from GNU. INFO ON Bison, (GNU YACC). MKS Lex and YACC version comment. Lex and yacc guidance. Lex and Yacc and Compiler Writing Guide. Java version of Lex guidance is called Jlex. Using Lex and Yacc's Formalizing a Grammar instance. Author Introduction Ashish Bansal Bachelor of Electronics and Communication Engineering of the University of Varase Hindu University of Varase, India. He is currently a SAPInt company software engineer. His email is Abansal@sapient.com.