Say the compiler from Lex & Yacc (6. Mathematical Expression)

zhaozj2021-02-08  237

Say the compiler from Lex & Yacc (6. Mathematical Expression)

Foreword

The most important algorithm in the grammar analysis is LL-top down and LR bottom-up algorithm. The previous article mainly explained the theory of the LL algorithm and a LL algorithm's grammar analyzer JAVACC. This article is the most in the LL (1) algorithm A simple form recursive decline algorithm analyzes mathematical expression problems in the routine algorithm problem. At the same time, this paper also introduces the analyzer code general method for hand-constructed EBNF literacy. I hope this paper can realize your own grammar analyzer. Help.

Mathematical expression problem

When learning algorithms, the expression of four mixed operations is a very classic algorithm.

For example, there is a mathematical expression "122 2 * (11-1) / (3- (2-0)). We need to describe the description according to this string and then calculate its results.

INPUT:

122 2 * (11-1) / (3- (2-0))

OUTPUT:

142

The four mixed calculations also need to take into account the priority of parentheses, multiplier, and decline, usually the usual solution is to use the stack. That conventional algorithm and the LL algorithm have the same integrity, more or say, then the algorithm is actually it's the same.

Traditional mathematical expression processing algorithm

This traditional algorithm is actually unknowingly using the essence of the LL (1) algorithm. It is mainly relying on the stack-type data structure to save numbers and symbols, and then perform mathematical calculations according to the priority of the operation symbol, and save the results in the stack. .

Two stacks are used in the traditional algorithm. One is a value of the value, temporarily calling the value stack. The other is saving the symbol, called the symbol stack. We specify a marker # to indicate the bottom. Let's take a look at how to calculate how to calculate how to calculate how to calculate how to calculate A simple expression 11 2-8 * (5-3).

In order to display the entire calculation process, we are expressed in the chart of the stack below.

The change in the symbol stack and the value stack is based on the input string. Basic stack operations can be simply used in the following sentences .Start:

1. If you get the numbers in the current input string, press the value stack directly. Then go to Start.

2. If the symbol is obtained in the current input string, the symbol is determined. 1) If the symbol is ' ' or '-', then pops up the symbol of the symbol stack, calculates the value in the stack until the symbol of the pop-up is not * , /, , -. 2) If the symbol is '*' or '/', press the symbol stack 3) If the symbol is '(', then directly press "('into the symbol stack 4) if the symbol is') ', Follow the symbols in the order of the symbol stack, calculate the value of the stack, press the result into the value stack until the top of the symbol stack is' (', finally popping up '('. Finally turned to START)

3. If the current input string is EOF (string end symbol), the numeric value in the stack is calculated, and the symbol stack is not symbol.

Grammar analysis mathematical expression

Or maybe you have used your own way to solve this program problem, but we will work very wonderfully to solve this algorithm problem through a set of grammar analysis theories established by compilation principles.

The first is to establish a mathematical expression of the EBNF.EBNF grammar can more flexibly represent BNF, which is an extension of BNF paradigm. The following is the computational expression of the computational expression in the introduction of the previous JavaCC.

Expression -> Term {addop term} addop -> " " | "-" term -> factor {mulop factor} mulop -> "*" | "/" factor -> id | NUM | "

Let's take a look at how to achieve a recursive decline according to this EBNF grammar. It is a few steps to achieve it. (Note that the following steps are not only for mathematical expressions for this section, but included All usual recursive decline in literary analyzer implementation)

Grammar analysis implementation

1. STEP establishment lexical analysis

This series of articles, first quarters, the first quarter is the words of the lexical analysis. Because the lexical analysis is the premise of grammar analysis, then we should also take into account the realization of the lexical analysis when we realize the recursive decline algorithm.

This article is to handle the problem of just a mathematical expression, then through the above grammar, you can see the words that need to be identified is nothing more than 2 ID, NUM, and 4 arithmetric symbols ' ' '-' '*' '/' and 2. Brand '(' ') This article does not explain the automaton principle of the lexical analysis. This part should be relatively thorough in the compilation principle. The so-called automaton is the algorithm of each character in accordance with a certain step. Use the following plots to represent the identification of ID and NUM (Identification steps or algorithms) NUM:

Basic algorithm is that if the character is the character is DIGIT ('0' - '9'), then enter the Check loop. If you enter or Digit, then you jump back and looped, if you enter iv (not '0' - '9' ), Then directly accept, receive this string as NUM type token.

ID:

Like NUM, when the link is Letter, then enter the ID of the ID. Just after entering the Check loop, there are two possibilities that may continue in the loop, which is DIGIT and Letter ('A' - 'Z'). When the input is neither DIGIT, it is not letter, jumps out of the Check loop, enters Accept, and turn the received character into the ID type token.

Through this limiting, we can easily write the character analysis program.

However, before this, we have to write code to identify Letter and Digit. We build two functions isletter and isdigit to complete this feature.

Int isletter (char ch)

{

IF (ch> = 'a' && ch <= 'z')

Return 1;

IF (ch> = 'a' && ch <= 'z')

Return 1;

Return 0;

}

Int isdigit (char ch)

{

IF (CH> = '0' && ch <= '9')

Return 1;

Return 0;

}

There is two auxiliary functions, then next, we write the GetToken lexical analysis function, and its function is to analyze from the input, get one token. We first define the Tyken type.

#define id1

#define Num 2

#define Plus 3 // ' '

#define minus 4 // '-'

#define Timers 5 // '*'

#define over 6 // '/'

#define lparen 7 // '('

#define rparen 8 // ')'

#define error 255

The above comments have been said to the symbol token representative mean, I don't say more. But I need to pay attention to it, here we define a constant of Error, but there is no error to TOKEN here, it just processed the results A definition of an error handling information.

Char token [10];

Char * nextchar;

Const char g_strcalculate [] = "122 2 * (11-1) / (3- (2-0))";

We need to define the Token marker and a pointer to the input string. TOKEN record is the TEXT (string) of the current getToken () .nextchar is the pointer to the input string. Finally, we will define a to analyze Mathematical expression of the input string g_strcalculate.

Int gettoken ()

{

Char * ptoken = token;

While (* nextchar == '|| * nextchar ==' / n '|| * nextchar ==' / t ')

NEXTCHAR ;

Switch (* nextchar)

{

Case ' ': Nextchar ; Return Plus;

Case '-': Next Char ; Return Minus; Case '*': Nextchar ; Return Timers;

Case '/': nextchar ; return over;

Case '(': nextchar ; return lparen;

Case ')': Nextchar ; Return Rparen;

DEFAULT: BREAK;

}

// id word method identification analysis

IF (isletter (* nextchar))

{

While (isletter (* nextchar) || isdigit (* nextchar))

{

* ptoken = * nextchar;

NEXTCHAR ;

Ptoken ;

}

* ptoken = '/ 0';

Printf ("Gettoken: token =% S / N", token);

Return ID;

}

// Num word method identification analysis

IF (isdigit (* nextchar))

{

While (isDigit (* nextchar))

{

* ptoken = * nextchar;

NEXTCHAR ;

Ptoken ;

}

* ptoken = '/ 0';

Printf ("Gettoken: token =% S / N", token);

Return Num;

}

Return Error;

}

The code is very simple, I don't write much any comments. In the function, I first use the char * ptoken record token's first address, which is used for the backed string replication (constructed token). At the same time, the first part of the processing code is filtered out Space, tab, and newline. Then the word method analysis of the calculation symbol. The calculation symbol is a fixed word symbol, so it is very simple, directly use Switch to judge * nextchar. And the back ID, NUM identification is Fully according to the previous finite auto motivation to write. In the ID chart, the ID of the ID is first to identify the first character is Letter, then I wrote the first line of IF (isletter). Nextchar), if you meet, enter the Check loop, which is while (isletter (* nextchar)) loop. In the loop, we record * Nextchar to token symbol. Finally, jump out of the checkt cycle, enter accept In the code RETURN ID. For the words of NUM, I don't say much.

2. Establish a grammap identification function according to EBNF grammar

First see the first non-ending

Expression -> Term {Addop Term}

Expression is also our total input result function. We first define the function int Expression (), which is the value of the expression we have to handle. In the yet, the first is TERM, we directly call the TERM function to complete Then, 0 to 0 to unlimited ADDOP TERM, then use a loop. Use the non-end symbol Addop. Program code I don't have especially for this non-end symbol. We are directly in the code ' ' || '-' instead of addop. The code is as follows.

int Expression ()

{

INT TEMP = TERM (); // The first TERM in the correspondence

Int tokentype;

While (* nextchar == ' ' || * nextchar == '-') // {Addop Term} in the corresponding text method

{

Tokentype = gettoken ();

Switch (tokentype)

{

Case Plus: Temp = Term ();

Break;

Case minus:

Temp - = term ();

Break;

DEFAULT:

Break;

}

}

Return Temp;

}

Then there is a second text method. Similarly, I don't have specially write a grammatical function for Mulop, but instead of replace it directly with '*' || '/'.

Term -> Factor {Mulop Factor} is the same as the following functions

Int term ()

{

Int temp;

Int tokentype;

Temp = factor (); // correspondence in the correspondence

While (* nextchar == '*' || * nextchar == '//') // {Mulop Factor} in the corresponding text method

{

Tokentype = gettoken ();

Switch (tokentype)

{

Case Timers:

Temp * = factor ();

Break;

Case over:

TEMP / = Factor ();

Break;

DEFAULT:

Break;

}

}

Return Temp;

}

Finally, Factor Grammar Factor -> ID | NUM | "(" Expression ")"

This grammar involves the choice of granting in the grammar. We can know by LL (1) grammar. We can know that this can be used through id, Num, "(" Expression ")" three generated first end. Different to judge. ID's first word symbol is definitely letter. And Num first word symbols are definitely Digit. "" "The first word symbol is definitely" (". And ID, NUM Judging that we have been done in the INT Gettoken () function. List the code of the Factor literary function.

Int factor ()

{

Int number;

Switch (GetToken ())

{

Case ID: BREAK;

Case Num:

Number = ATOI (TOKEN);

Break;

Case Lparen:

Number = expression ();

GetToken ()! = rparen)

Printf ("Lost ')' In The Expression / N");

Break;

DEFAULT:

Break;

}

Return Number;

}

Ok, write the functions appearing above to the program file, plus a main function, you can compile it.

Int main (int Argc, char * argv [])

{

Nextchar = g_strcalculate;

Printf ("Result =% D / N", Expression ());

System ("pause");

Return 0;

}

The source of the entire mathematical expression can be downloaded here.

Http://member.netese.com/~qinj/tangl_99/my doc / calculate / main.c

3. Summary

From the three EBNF literacy, we can easily draw some mechanical laws.

1. A separate grammap function can be written as a non-ending in EBNF literacy, such as expression (), term (), Factor ().

2. The selection of the generated style in the grammar can be identified according to the first word symbol, which is the first set of the first token. For example, the Factor above is the type of token directly through GetToken, then Depending on the different TOKEN to TOKEN, Switch handles different generated code.

3. {} (0 to unlimited loop) in the grammar, such as {addop term}, {mulop factor} can be done directly through the loop statement. However, the cycle condition still needs to be judged that the next token is not addop or mulop's first collection Elements. For example, addop is ' ' and '-', mulop is nothing more than '*' and '/', so it is easy to judge, directly passing * nextchar can also be judged. If the next token is not AddOp or Mulop's first set element, then you should jump out of the loop. Return to the text method. Although the EBNF literacy structure recursive declined analyzer code is so simple, it is as mentioned in << Compilation Principles and Practices >> Book, it has it Particularity. Many times, just convert BNF literacy into EBNF literacy itself is a very difficult thing. This requires the LL (1) text method mentioned earlier to eliminate left instructions and extracts the left-style problem.

Comparison with traditional algorithms

When we understand that EBNF grammar and recursive decline, the structured mathematical expression processing code is simpler, easy. The reason is also mentioned before, the first thing EBNF literacy is very easy to write, secondly, from eBnf The creation of literacy to code is also very mechanized, and it is also very easy. Finally, the recursive decline algorithm does not contact the operation of the stack, using the programming language itself recursive itself, allowing the program language to deal with the operation of the stack (not to recurrent decline) The algorithm did not contact the stack at all, and it can be said that the recursive process in the data structure is equal to the stack processing).

The regenerative algorithm is also a computational operation of the easy expansion of the expression. For example, we want to add functions (such as sin, cos), and add the highest level of all operations. So we modify the Factor literacy method.

Factor -> ID | NUM | "(" Expression ")"

| "SIN" "" "" "

| "COS" ")" Expression ")"

As for the implementation of the code, we only need to add a few CASE statements in the Switch in the int factory function.

2003-12-06

Author: Tang Liang tangl_99

QQ: 8664220

MSN: TANGL_99@hotmail.com

Email: tangl_99@sohu.com

Chengdu, Sichuan University, Computer College

转载请注明原文地址:https://www.9cbs.com/read-3179.html

New Post(0)