Analyze mathematical expressions using recursive decline algorithm

xiaoxiao2021-03-06  41

Foreword's teaching materials in China have attached importance to theoretical learning and lack of practice. I originally wanted to introduce a classic algorithm problem - mathematical expression problem, to illustrate the practice of a grammar analysis algorithm in the compilation principle. There is a topic called grammar analysis in the compilation principle of our study. The grammar analysis is to analyze the formal language in a fixed grammar format. Both our compilation principles must contain two types of grammar analysis algorithms, one is the LL algorithm, and the other is the LR algorithm. The LL algorithm is also called the top-down analysis algorithm, relatively simple, suitable for manual code, and the LR algorithm is the analysis algorithm from the bottom up, which is complex, usually we all generate its related code by using the tool Yacc. This paper analyzes the mathematical expression problems in the routine algorithm problem in the simplest form recursive decreased decrease in the LL (1) algorithm. At the same time, this paper also introduces the analyzer code common method for hand-constructed EBNF literacy. It is hoped that the practice of this article can help you realize your own grammar analyzer.

Mathematical expression problem

When learning algorithms, the expression of four mixed operations is a very classic algorithm. For example, there is a mathematical expression "122 2 * (11-1) / (3- (2-0)). We need to calculate the result according to the description of this string. INPUT: 122 2 * (11-1) / (3- (2-0)) Output: 142

The four mixed operations also need to take into account the priority computing of parentheses, multiplier, and decline, usually the usual solution is to use the stack. This routine algorithm and the LL algorithm have the same integrity, and the two algorithms are actually the same.

Traditional mathematical expression processing algorithm This traditional algorithm is actually unknowingly using the essence of the LL (1) algorithm. It is mainly relying on the stack-type data structure separately saves numbers and symbols, and then mathematically calculates according to the priority of the operation symbol, and saves the results in the stack. Two stacks were used in the traditional algorithm. One is a saving value, temporarily calling the stack. The other is to save the symbol, called the symbol stack. We specify a marker # to indicate the bottom of the stack. Let's take a look at how to calculate a simple expression: 11 2-8 * (5-3)

The change in the symbol stack and the value stack is based on the input string. The basic stack operation can be simply used in the following sentences. START: 1. If you get a number in the current input string, press the stack directly, then go to START. 2. If the symbol is obtained in the current input string, the symbol is determined: 1) If the symbol is ' ' or '-', then pops up the symbol of the symbol stack, calculates the value of the stack until the symbol of the pop-up is not *, / , , -; 2) If the symbol is '*' or '/', press the symbol stack; 3) If the symbol is '(', directly press '(') ('into the symbol stack; 4) If the symbol is') ', Follow the symbols in the order of the symbol stack, calculate the value in the stack, press the result into the value stack until the top of the symbol stack is' (', finally popping up '('. Final Turning to START. 3. If the current input string If you get EOF (string end symbol), the numeric value in the stack is calculated, knowing the symbol stack is not symbol. Syntax Analysis Mathematical Expression

Or maybe you have used your own way to solve this program problem, but we will work very wonderfully to solve this algorithm problem through a set of grammar analysis theories established by compilation principles. The first is the establishment of a mathematical expression of the grammar EBNF. EBNF grammar can more flexibly represent BNF, which is an extension of BNF paradigm. Below is a grammar of the expression.

Expression -> Term {addop term} addop -> " " | "-" term -> factor {mulop factor} mulop -> "*" | "/" factor -> id | NUM | "

Let's take a look at how to implement a recursive decline in analysis based on this EBNF literature. In general, it is more than a few steps. (Note: The following steps are not only for mathematical expressions for this section, but includes all common recursive decline in grafting analyzers.

Grammar analysis implementation

1. Step Establishing Less Law

Since the lexical analysis is the premise of grammar analysis, then we should also take into account the realization of the lexical analysis when implementing the recursive decline algorithm. This article is to handle the problem of just a mathematical expression, then through the above grammar, you can see the words that need to be identified is nothing more than 2 ID, NUM, and 4 arithmetric symbols' ',' - ',' * ',' / 'And 2 brackets' (', ')'. This article does not explain the automatinal principles of the lexical analysis, and this part of this should be relatively thorough in the principle of compilation. The so-called automaton is the algorithm for identifying each character in a certain step. The basic algorithm of identification of ID and NUM can be used in several drawings, and if the input character is DIGIT ('0' - '9'), then enter the CHECK loop, if input or DIGIT, then jump back to the loop, if the input is Other (not '0' - '9'), then directly accept, receive this string as NUM type token. Like NUM, when it is entered, then the finite automation of the ID is entered. Only after entering the Check cycle, there are two possibilities to continue to stay in the loop, which is DIGIT and Letter ('A' - 'Z'). When the input is neither DIGIT, it is not letter, jumps out of the Check loop, enters Accept, and turn the received character into the ID type token. Through this limiting, we can easily write the character analysis program. However, before this, we have to write code to identify Letter and Digit. We build two functions isletter and isdigit to complete this feature. INTISLETTER (CHAR CH) {IF (CH> = 'a' && ch <= 'z') Return 1; if (CH> = 'a' && ch <= 'z') Return 1; Return 0;} intisdigit CHAR CH) {IF (CH> = '0' && ch <= '9') Return 1; Return 0;}

There is a two auxiliary function, then next, we write the GetToken lexical analysis function, and its function is to analyze from the input, get one one token. We first define the type of token. #define ID 1 # Define Num 2 # Define Plus 3 // ' ' # Define Minus 4 // '-' # Define Timers 5 // '*' # define over 6 // '/' # Define LParen 7 // '(' #define rparen 8 // ')' # Define Error 255 The above note has been said to the meaning of the symbol token, I don't say more. However, it should be noted that we define a constant of Error, but we don't have an error to TOKEN, which is just a definition of an error handling information for our later processes.

Chartoken [10]; char * nextchar; constcharg_strcalculate [] = "122 2 * (11-1) / (3- (2-0))";

We need to define the token marker and a pointer to the input string. Token record is the TOKEN's Text (string) obtained by getToken (). Nextchar is a pointer that is currently referring to the input string. Finally, we will casually define the input string g_strcalculate to analyze the mathematical expressions.

IntgetToken () {char * ptoken = token; while (* nextchar == '|| * nextchar ==' / n '|| * nextchar ==' / t ') Nextchar ; switch (* nextchar) {casse' ': nextchar ; return PLUS; case' - ': nextchar ; return MINUS; case' * ': nextchar ; return TIMERS; case' / ': nextchar ; return OVER; case' ( ': nextchar ; return LPAREN; case') ': nextchar ; default: Break;} // id word method IF (isletter (* nextchar)) {while (isletter (* nextchar) || isdigit (* nextchar)) {* ptoken = * nextchar; Nextchar ; ptoken ;} * ptoken = '/ 0'; Printf ("gettoken: token =% s / n", token; returnid;} // Num word method IF (isdigit (* nextchar)) {while (Isdigit (* nextchar)) {* ptoken = * nextchar; Nextchar ; ptoken ;} * Pt Oken = '/ 0'; Printf ("gettoken: token =% s / n", token); return num;} return error;}

The code is very simple, I didn't write much notes. In the function, the first address of the char * ptoken record token is first used, which is used for the back string replication (constructed token). At the same time, the first part of the processing code is filtered out of space, tab, and wrap, then the lexical analysis of the calculation symbol. The calculation symbol is a fixed word symbol, so its identification is very simple, directly use Switch to judge * nextchar. The latter ID, NUM identification is written in full, according to the previous finite automation representation. In the ID chart, the ID of the ID is to identify the first character is letter, then I wrote the first line of IF (isletter (* nextchar)), if satisfied, enter the Check loop, that is While (isletter (* nextchar) || isdigit (* nextchar)) loops. We record * nextchar to token symbols in the loop. Finally, jump out of the Check loop, enter accept, and return in the code. This is also true for NUM's lexical identification, I don't have much to say. 2. According to EBNF literacy, establish a grammatical identification function first look at the first non-ending generation expression expression -> Term {addop term} Expression is also our total input result function. We first define the function int expression (), and its return value is the value of the expression we want to process. In the right generation, the first is TERM, we directly call the TERM function to complete. Then 0 to unlimited ADDOP TERM, then use a loop. Non-end symbols are used in the literary law. I don't have a function of this non-end symbol in the program code. We replace Addop directly in the code in ' ' || '-'. code show as below. INTEXPRESSION () {INT TEMP = TERM (); // The first terminttokenType in the corresponding text method; while (* nextchar == ' ' || * nextchar == '-') // {Addop Term in the corresponding gram } {Tokentype = getToken (); switch (tokenty) {CASE PLUS: Temp = TERM (); Break; Case Minus: Temp - = Term (); Break; default: Break;}} return temp;} then Two grammar. Similarly, I don't especially write a textual method for Mulop, but instead of directing '*' || '\'.

Term -> Factor {Mulop Factor}

Similarly, the following functions INT TERM () {int Temp; int tokentype; temp = factor (); // corresponds to Factor While in the gram (* nextchar == '*' || * nextchar == '//') // {Mulop Factor} {tokentype = gettoken (); switch (tokentype) {Case: temp; default: case: temp / = factor (); }}} Return Temp;

Finally, Factor Grammar Factor-> ID | NUM | "(" Expression ")"

This grammar involves the choice of granting in the grammar. The theory of LL (1) grammar We can know that this can be determined by id, num, "" express ")") "three generated first endors. The first word symbol of the ID is definitely letter. And the first word symbol of Num is definitely DIGIT. "" Expression ")" The first word symbol is definitely "". " And the judgment of Num has been made in the int GetToken () function when the words are analyzed. The code for the Factor literary function is listed below.

INTFACTOR () {Int Number; Switch ()) {Case: Break; Case Num: Number = ATOI (Token); Break; Case Lparen: Number = Expression (); if (Gettoken ()! = rparen) Printf ("lost ')' in the expression / n"); Break; default: Break;} Return Number;

Ok, write the functions appearing above to the program file, plus a main function, you can compile it.

Int main (int Argc, char * argv []) {nextchar = g_strcalcalculate; Printf ("Result =% D / N", Expression ()); system ("pause"); return 0;}

The source of the entire mathematical expression can be downloaded here. Http: // member. Netease. COM / ¯ Qinj / tangl_99 / my doc / calculate / main. c

3. Summary We can easily draw some mechanical laws from above. 1. A separate grammap function can be written to the non-terminator in the EBNF literary law, such as expression (), TERM (), Factor (). 2. The selection of the generated formula can be identified according to the first word symbol, which is the first set of the first in the LL (1) algorithm. For example, the Factor above is the type of the next token directly through GetToken, and then processes the different generated code according to the different TOKEN of the type. 3. {} (0 to unlimited loop) in the text method, such as {addop term}, {mulop factor} can be done directly through the loop statement. However, the condition of the loop still needs to be judged that the next token is an element that is not the first set of AddOp or Mulop. For example, in the code above, addop is ' ' and '-', mulop is nothing more than '*' and '/', so it is easy to judge, directly through * nextchar, can also be judged. If the next TOKEN is not the first set element of Addop or Mulop, then the loop should be jumped and returned to the literacy function. Although the EBNF literacy structure recursive declined analyzer code is so simple, it is as mentioned in << Compilation Principles and Practices >> Book, it has its particularity. Many times, just convert BNF literacy into EBNF literacy itself is a very difficult thing, which requires us to use the LL (1) grammar to remove left instructions and extract the left. Comparison with traditional algorithms

When we understand that the EBNF literacy and recursive declines, the structured mathematical expression processing code is simpler than the traditional algorithm, and it is easy. The reason is also mentioned earlier. First, EBNF literacy of this thing is very easy to write. Secondly, from the creation of EBNF literacy to code, it is also very easy. Finally, the recursive decline algorithm does not contact the stack operation, using the program The recursive itself characteristics of the language itself makes the operation of the program language to deal with the stack (not to say that the algorithm of recursive decline is not exposed to the stack, and it can be said that the recursive processing in the data structure is equal to the stack processing). The algorithm of recursive decline can also be easily expanded. For example, we want to add functions (such as SiN, CoS), and add the highest level of all operations. Then we modify the Factor literacy.

Factor -> ID | NUM | "|" SIN ")" "" "" "") "Expression") "As for the implementation of the code, we only need to be in the int factor function. Add a few CASE statements in Switch.

转载请注明原文地址:https://www.9cbs.com/read-59372.html

New Post(0)