Students who have learned compilation principles probably know how to analyze one sentence. I refer to the "Advanced Program Design Language Compilation Principle" of Chen Huang, in this article, I mainly stand in the way of compiling principles, telling the implementation of a syntax analysis program, through a typical example - arithmetic expression Analysis, thus enabling you to understand the method of constructing a practical grammatical analysis program, while also providing a major programmer to solve practical problems.
This article includes the following: 1. Arithmetic expression of the expression; 2. The structure of the algorithm and the generated function of the above-mentioned syntax analysis; 3. Improvement of the generated function; 4. Error Treatment in Grammatical Analysis; Implementation of the above-in / next grammar analysis.
1. Arithmetic expression
The arithmetic expression I have to achieve here is 5 kinds of operations: plus, minus, multiply, divided and parentheses. For example, a simple arithmetic expression is included in G1 contains the following production formula: G1: E -> E E | EE | E * E | E / E | (E) | i In order to clear the priority of the operator (parentheses) The priority is higher than the multi-pull method, the priority of the multi-division method is higher than the addition and subtraction method), and the grammar G1 can be rewritten as follows: rewriting the grammar G2: E -> T E | TE | TT -> f * t | f / t | Ff -> (e) Any arithmetic expression with addition, minus, multiplication, division, and brace operation priority can be derived by the above-described grammar, such as for line such as II * (i i) The arithmetic expression has the following derivation process (where i is a digital or variable marker, and the derivation needs to be derived from the starter E, the following is the left derivation):
E => te => Fe => IE => it => if * t => ii * t => ii * f => ii * (e) => ii * (t e) => ii * (f E) => ii * (i e) => ii * (i t) => ii * (i f) => ii * (i i)
In this article, we use the grammar G2 to construct a syntax analysis program.
2. Construction of algorithm and production function of the above-in / next grammar analysis
We can convert a sentence from the beginning of the beginning E to the end of the end of the syntax tree, the root node (ie, the starter) is on, the leaf node (ie, the terminator) is under, the above-down syntax analysis is Traverse the process of such a grammar tree "from top". That is, each time it is traversed from the root node (start), through each intermediate node (divided by the start-up non-terminator), the leaf node (end of the end) is reached. If each generate is made into a function, then we can easily achieve traversal of the grammar tree by recursively calling and backchainating these functions. So three of the three generations in the grammatics G2, we need three functions: void e_addsub (); / / corresponding to the generating VOID T_Muldiv () corresponding to the non-finals E, the generating VOID corresponding to the non-finalizer T F_number (); / / Corresponding to the generation of non-finalizer F
We use the analysis of the input word stream to achieve the top-down syntax analysis. During the grammatical analysis, we need an input character buffer to store an input arithmetic expression string, which requires a character indicator to indicate that the current is analyzing and requires an error handling module. In the algorithm design implementation, we use 3 global members: Ch, Advance and Error, their meaning as follows: Character Advance () referred to in the CH Current Indicator The indicator points to the next character in the input character buffer. Function error () error handler function
This can be constructed from top-to-lower grammar analysis algorithm, first analyze the generated E-> T E | T | T, you may wish to break down into the following three generation: E -> T EE -> T-EE -> T Next, write the E -> T E syntax analysis function:
// Listing 1: Generate E -> T E Syntax Analysis Function VOID E_ADDSUB () {t_muldiv (); // Call the generated function analysis of non-end tutor T. IF (CH == ' ') //// If the current characters are ' ', {Advance (); // remove a character e_addsub (); // Call the generated function analysis of non-endue E E} else // If not ' ' Error () ; // make error handling}
Seeing the algorithm in the above function, you probably you can think of the top and bottom of the generated E-> TE, that is, the ' ' of the IF (CH == ' ') is changed to '- The 'number can be available. Below is an algorithm for generating E-> T, very simple:
// Listing 2: Grammatical analysis function void e_addsub () {t_muldsub () {t_muldsub (); // call the generated function analysis of non-ending T
Everyone can see that a analysis function is written to each generating, and the traversal of the grammar tree can be realized by the mutual calls between them. Since E -> T E, E -> TE, E -> T can be combined into E -> T E | TE | T, we can also combine the corresponding three-generated functions into one Function, due to generating E-> T, only the analytical function of the non-end tutor T is called in the generation function of E, even if the next character is not ' ' or '-' does not have to do errors, The merge of e -> t e | TE is determined by a branch statement if (CH == ' ' || CH == '-'), thus, the combined E-generating function is as follows:
// Listing 3: Analysis function void e_addsub () {t_muldiv (); // call non-finalizer T analysis T if (CH == ' " || CH == '-') // If the current character is ' ' or '-', // If it is ' ', use the generating E -> T E to deliver, // If it is '-' Then use the generated E-> TE. {Advance (); // Remove a character e_addsub (); // Call the generation function analysis of non-ender E E} // At this time, the derived algorithm end //// If the next character is not not ' ' or '-', // The function is derived based on generating E-> T, does not have to be incorrectly handled. } Similarly, you can also easily write from the top and bottom syntax analysis functions of the production type T-> f * t | f / t | f-> (e) | i:
// Listing 4: Analysis function void t_muldiv () {f_number () {f_number () {f_number () {f_number (); // call non-finalizer F generating function analysis f if (CH == ' * '|| CH ==' / ') // If the current characters are' * 'or' / ', // If it is' * ', use the generating T -> f * t, // If it is' / ', Then derive with the production type T -> F / T. {Advance (); // Remove a character t_muldiv (); // Call the generation function analysis of the non-end tutor T} // At this time, the deductive algorithm ends the formula T-> f * t | f / T // If the next character is not '*' or '/', // The function is derived based on the generated T-> F, and it is not necessary to perform an error handling. }
// Listing 5: Generate the analysis function void f_number () {if (ch == ') // if the current indicator indicated by the current indicator is' (' {// Deflue Advance (); // Skip '(', the indicator pointing down the next character e_addsub (); // calls the generated function analysis of the non-ender E E IF (CH! = ')') // Determine if the next character is') ')', // must guarantee that there is a right bracket and the left brackets to use Error (); // If an error is made. Advance (); // If there is ')', The syntax is correct, skip ')' Return; // Return} if (CH is a number) // If the current indicator indicated by the number {//, according to the generated f -> i derived Advance () ; // Skip the number, the indicator points to the next character} // syntax correctly, complete the f -> i derived ELSE // If the current indicator indicates that the characters are not numbers, '(' error (); // Error, turn to the error handler return; // Return} Due to the derivation of the syntax, the derivation of the syntax is starting from the starter E, so when performing grammatical analysis, it is necessary to implement this in the main program:
// Listing 6: Main program int main () {.................................... // The input character buffer and character indicator initialize // call the starter E analysis function starts from the top Grammatical analysis: e_addsub (); // analysis end .................................... Return 0;}
According to this method, the above function achieves the traversal of the self-bottoming of the grammar tree, thereby showing the process of the top and the following syntax analysis, however, these functions do not implement specific functions, such as performing arithmetic expressions or calculations. The function of evaluating the value, I will consider these issues in the following sections.
(to be continued)