Say the compiler from Lex & Yacc (2. Flex)

zhaozj2021-02-08  259

Say the compiler from Lex & Yacc (2. Flex)

Author: tangl_99

QQ: 8664220

MSN: TANGL_99@hotmail.com

Email: tangl_99@sohu.com

After reading the first instructions about the regular expression, let's use it, use the Flex this lexical analysis tool to construct our texture analyzer.

About Lex tutorial should be a lot, here I simply introduce, then focus on the use of LEX and YACC and its skills. So, if you don't look, I still don't understand the use of Lex or Yacc, please Go online to check it out, this tutorial is much more. I know a common thing to

Yacc and Lex Quick Start Lex and Yacc

Its author is Ashish Bansal.

Flex is the meaning of Fast Lex. Lex is the meaning of Lexical Analyzar. Flex can be found in Cygwin or GNUPRO. It is a tool for UNIX, which is a GNU organization product. Online can also find version available in Windows.

We generally write some of our words scanners to be scanned with regular expressions, and then use the input files of the Lex, enter the command flex xxx.l (xxx.l is the input file), LEX has been processed, You can get a C source code called Lex.Yy.c. This C source code file is our lexical scanner. Usually Lex is very complicated and huge for the C source code for the lexical analyzer we generated. We generally won't go to see the code in it (rest assured, flex will not be wrong)

Let's take a look at several LEX input files I have already used.

This is a previous time I use the LEX input file used for the scripting engine written for a RPG game on GBA (part)

Example 2.1

% {

/ * NEED THIS for the call to atof () BELOW * /

#include

#include

#include

#include "globals.h"

%}

Digit [0-9]

Number ("-" | " ")? {DIGIT}

HEXNUMBER "0x" ({DIGIT} | [A-FA-F])

Letter [A-ZA-Z]

Identifier ({letter} | _) ({Number} | {letter} |

Newline [/ n]

Whitespace [/ t]

String / "[^"] * / "

Comment "#" [^ #] * "#"

%%

{string} {returnvm_string;

"Logo" {returnvin_logo;

"Facein" {return_facein;}

"Faceout" {return_faceout;}

"Loadtile" {return_load_tile;}

"Createrole" {returnve_create_role;}

"ReleaseRole" {RETURN VMIN_RELEASE_ROLE;}

"Createmap" {return vmin_create_map;}

"ReleaseMap" {RETURN VMIN_RELEASE_MAP;

"Showbitmap" {return vmin_showbitmap;}

"CreateDialog" {return_create_dialog;} "releasedialog" {return_Release_Dialog;

"Fight" {returnvmin_fight;}

"Delay" {returnvmin_delay;}

"Pressa" {return_press_a;}

"Pressb" {returnvin_press_b;}

"Pressr" {return_press_r;}

"Pressl" {return_press_l;}

"Pressstart" {return vmin_press_start;}

"Pressselect" {returnvmin_press_select;}

{Number} {return vm_number;}

{Whitespace} {/ * Skip Whitespace * /}

{Identifier} {return vm_id;}

{NEWLINE};

.;

%%

Int yywrap ()

{

Return 1;

}

There are three parts here, separated by %%. The% {and}% in the first part is to place directly in the top of the Lex output C code. We can define some required Macro, functions and include some headers, etc. I have no special things in this LEX input file, which is the include header file of the regular C source file.

% {

/ * NEED THIS for the call to atof () BELOW * /

#include

#include

#include

#include "globals.h"

%}

In the first part, in addition to the portion of the previous% {and}% included, the following is the definition of regular expressions.

I saw the first regular expression, so you can send it here.

Let's take a look at the regular expressions I defined here:

Digit [0-9]

Number ("-" | " ")? {DIGIT}

HEXNUMBER "0x" ({DIGIT} | [A-FA-F])

Letter [A-ZA-Z]

Identifier ({letter} | _) ({Number} | {letter} |

Newline [/ n]

Whitespace [/ t]

String / "[^"] * / "

Comment "#" [^ #] * "#"

Digit is not to say, it is the Arabic digital definition of 0-9. The first article also raised this example .Number is Digit's 1 to unlimited repetition, then add " " and "-" in front. symbol.

note:

"A": Even if A is a metamor, it is still character A

/ A: When A is a metamorphic, for character a

A ?: An optional A, that is, can be A, or no A

A | B: A or B

(a): a itself

[ABC]: any one in characters a, b or c

[A-D]: one of A, B, D or D

[^ ab]: In addition to any character outside A or B

: In addition to any of the characters outside the new line

{xxx}: Regular expression indicated by name XXX

It is necessary to specifically explain it here.

Newline [/ n]

Newline is a new line, here I use [] 括 / 换 换号. Because if I use / n directly, then according to the rules above, then it will be regarded as / and n characters, so I use [/ N]. Sometimes newline is also written into [/ n] | [/ r / n]. Because in the text file, it is generally wrap once, then one / n (0xA), but in the binary, wrap Sometimes it is / r / n (0xD, 0xA) a total of two word symbols. The second part is to define the action of scanning to regular expressions.

These actions are actually C code, which will be in the yylex () function in the C file in the LEX output.

The action above the example is actually very common, that is, returns a value.

When we use this Lex outside to generate C code for us, we only need to use its int yylex () function. When we use YyleX (), then automatically scan a matching regular expression, then complete it accordingly Action. The action here is a value, then YyleX will return this value. Usually the default Yylex returns 0, indicating that the file scan is over, so do not return 0 in your action to avoid conflicts. Of course, action You can also return a value without returning, then YyleX automatically scans the next string that can be matched, until the scan to the file.

When scanning a string that can be matched, then this time, the global variable YYText is equal to this string

Everyone must remember the order of these regular expressions.

If a string appears, you can match multiple regular expressions simultaneously, then it will be defined in the previous regular expression match. So I generally define the string string in the forefront.

If the characters in the file are not matched by any of the LEX input files, then it will be automatically outputted. So everyone must remember that after each regular expression is processed, must add {newline} and. These two regular expressions have been moved.

Ok, let's see that Lex provides some constants for our output C file.

Lex variable

yyin

File * type. It points to the current file that Lexer is parsing.

YYOUT

File * type. It points to the location of the recorded LEXER output. By default, yyin and yyout points to standard input and output.

Yytext

The text of the matching mode is stored in this variable (char *).

Yyleng

Give the length of the matching mode.

Yylineno

Provide current number of lines. (Lexer is not supported.)

Example 2.2

This is << Compilation Principles and Practices >> LEX input files for the source code in the book. You can refer to it, and the author is compiled by a Tiny C defined by itself.

/ ************************************************** *** /

/ * File: tiny.l * /

/ * Lex Specification for Tiny * /

/ * Compiler Construction: Principles and Practice * /

/ * Kenneth C. Louden * /

/ ************************************************** *** /

% {

#include "globals.h"

#include "util.h"

#include "scan.h"

/ * Lexeme of Identifier or Reserved Word * /

CHAR tokenString [MAXTOKENLEN 1];

%}

Digit [0-9]

Number {DIGIT}

Letter [A-ZA-Z]

Identifier {letter}

NEWLINE / N

Whitespace [/ t]

%%

"if" {returnix if;} "the" {return.

"else" {return else;}

"end" {return end;}

"repeat" {return repeat;}

"until" {return unsil;}

"read" {return read;}

"Write" {Return Write;}

": =" {Return ansters;}

"=" {Return EQ;}

"<" {Return lt;}

" " {Return Plus;}

"-" {returnus;}

"*" {Return Times;}

"/" {Return;

"(" {Return LParen;}

")" {Return rparen;}

"{RETURN SEMI;}

{Number} {return num;}

{Identifier} {return ID;}

{NEWLINE} {Lineno ;}

{Whitespace} {/ * Skip Whitespace * /}

"{" {Char C;

DO

{c = input ();

IF (c == EOF) Break;

IF (c == '/ n') LINENO ;

} While (C! = '}');

}

. {RETURN Error;}

%%

TokenType Gettoken (Void)

{static int firstttime = true;

TOKENTYPE CURRENTTOKEN;

IF (firsttime)

{FIRSTTIME = FALSE;

LINENO ;

YYIN = Source;

YYOout = Listing;

}

CurrentToken = Yylex ();

Strncpy (TokenString, Yytext, MaxTokenlen);

IF (tracescan) {

FPrintf (Listing, "/ T% D:", LINENO);

PrintToken (CurrentToken, TokenString);

}

Return CurrentToken;

}

It is a bit different here that the authors have used another GetToken function to replace Yylex as an external output function. The default output function Yylex () in GetToken has also used the LEX default output function Yylex (), but also some things. But I suggest you don't want everyone. Similar to the author, write its own result output function, because after it is necessary to work with YACC, YACC generated syntax analysis program only recognizes the word YyleX () of the lexical result output function.

IF (firsttime)

{FIRSTTIME = FALSE;

LINENO ;

YYIN = Source;

YYOout = Listing;

Among them, YYIN, YYOUT, SOURCE, LISTING are file * type. YYIN is the file to be scanned by lex generated, YYout is the basic output file (in fact, we usually do not need yyout, even if you want to generate some output information, we They are all output by fprintf).

"{" {Char C;

DO

{c = input ();

IF (c == EOF) Break;

IF (c == '/ n') LINENO ;

} While (C! = '}');

}

The author's TINY C is included in {}. The author does not write the regular expression of the comment information, but it can check {after using the "{", then use the Lex internal function input () The characters are not} to skip the comment text. (C language / * / note text regular expression is very difficult, so many times we have used this method directly to write its DFA (scan automi)) .

This article is to explain the Flex input file by simply raising two relatively practical examples. Release again, if you are in contact with Lex, please see the article I recommend in front, you can online in IBM Isolated. The next article regarding Yacc is in the BNF literary law. Please refer to the other standard tutorial first.

2003-9-30

Chengdu, Sichuan University

转载请注明原文地址:https://www.9cbs.com/read-3199.html

New Post(0)