Let's build a compiler!
BY
Jack W. Crenshaw, ph.d.
Part II: Expression Analysis
Start
If you have read this series of tutorial, you should know what we will do next. You should have copied the CRADLE program to your Turbo Pascal, and successfully compiled it. So you should have prepared to start new learning. The purpose of this article is to learn how to analyze and translate mathematical expressions. Our defined output is a series of assembly statements that can complete the specified operation. Here, we define the expression in the right side of the equation, such as X = 2 * Y 3 / (4 * z), our progress will be slow. That is to make the beginners do not lose their position. If some courses have been learned before, then they will have a great advantage for the latter. For those readers who have experienced experience, I want to say: Please temporarily pat. We will speed up the progress later.
Single number
In order to maintain the consistent style of this tutorial (KISS guidelines, remember?) We started from the easiest example. That is an expression consisting only of a single number. Before you start the encoding, make sure you already have the routine I last. We will use it in other trials. Add the following code in routines: {--------------------------------------- ----------------------} {PARSE AND TRANSLATE A MATH Expression} Procedure Expression; Begin Emitln ('Move #' GetNum ', D0') End ; {----------------------------------------------------------------------------------------------------------------------------------------- --------------- Add "Expression;" in the main program.
{------------------------------------- --------------}
Begin
Init;
EXPRESSION;
End.
{------------------------------------- --------------}
Run this program now. Try to enter all individual numbers to do tests. The resulting output should be a line assembly code. Now try to enter any other characters, you will see an error in the appropriate way.
congratulations! You have written a translator that can run! OK, I admit that its function is very limited. But don't throw it away easily. This small compiler has completely completed all the work you have to do in a very limited range: it correctly identifies the legitimate statement in the input language of our definition, and generates correct, The assembly code executed, which is also suitable for translating into target language formats. It is important that it also recognizes illegal statements and gives meaningful error messages. Who can you still have more? When we expand this analyzer, we best keep it these two features.
There are other features in this applet worth mentioning. First, we didn't leave the code generation and grammar analysis ... The analyzer once you know what we want, and generate the target code directly. Of course, in an actual compiler, the input is read from the disk file from the disk file, and then outputs it in another disk file, but when we do test, this method we use is undoubtedly much easier.
At the same time, pay attention to an expression must generate a result to place a place. I select 68000 D0 registers to save them. I can also choose another register, but use D0 more appropriate.
Dual expression
It is undeniable that only a character is too far from our request, so now take a look at how to expand it. Assuming that the expression we want to process is as follows: 1 2 or 4-3 or, in general form is,
OK, what we want to do is basically letting the Term process completed the work that Ji Expression. So just rename the Expression process to TERM, and enter the new version of the Expression process.
{------------------------------------- --------------}
{Parse and Translate An Expression}
PROCEDURE EXPRESSION;
Begin
Term;
Emitln ('MOVE D0, D1');
Case Look of
' ': Add;
'-': Subtract;
Else EXPECTED ('Addop');
END;
END;
{------------------------------------- -------------}
Below, you will enter the following two processes before the Expression process. {------------------------------------- -------------} {Identify and translate an additional operation}
PROCEDURE ADD;
Begin
Match (' ');
Term;
Emitln ('Add D1, D0');
END;
{------------------------------------- ------------}
{Identify and translate a subtraction operation}
Procedure Subtract;
Begin
Match ('-');
Term;
Emitln ('SUB D1, D0');
END;
{------------------------------------- ------------}
After entering the above processes, the order between them should be:
o Term (actually is the original Expression version)
o Add
o Subtract
o Expression
Run this program now. Try to enter any two single numbers you can think of, and connect them with ' ' or '-'. Four assembly instructions will be generated each time. Now try to enter some well-designed expressions that contain errors. Is the analyzer captured these mistakes?
Look at the code generated after running. There are two things worth paying attention. First of all, these codes are not like our handwritten code, a little awkward. For example, code sequence: Move # n, d0 Move D0, D1
It is inefficient. If we handwritten these code, we generally put the data directly in D1. This is the following instructions: The code generated by the compiler is generally lower than the code efficiency of our handwritten code. Remember this. It will run through our entire tutorial. All compilers have this problem varying degrees. Many computer scientists will spend their own hearts of blood to study code optimization issues, and there are indeed some results can be used to improve the quality of the target code. Some compilers are doing very well in code optimization, but they are also selling very expensive due to their complexity, in short, this is a forgotten battlefield ... I will briefly introduce some optimization methods before this section. They can complete some mild optimization, but this is just to tell you that we can do some optimization for the code without taking too much trouble. Remember, here we just learn, not to test how compact target code we can generate. Later, we will deliberately ignore the code optimization problem, and focus on how to make the generated code can run normally.
But now our program can't do this! The generated code has an error! The wrong place is to subtract down, subtract D1 (save the second parameter) from D0 (save the first parameter). This method is wrong, so the symbol error of the final result is incorrect. Let's improve the Subtract process, add the modification of the symbol,
{------------------------------------- ------------}
{Identification and translate a subtraction operation} Procedure Subtract;
Begin
Match ('-');
Term;
Emitln ('SUB D1, D0');
Emitln ('NEG D0');
END;
{------------------------------------- ------------}
Now our code is low, but at least gives at least the correct result! Unfortunately, this rule can indicate the meaning of mathematical expressions, but the order of TERM appears in the expression in the expression we are not very habit. No way, this is just one of you must learn to adapt. At this point, we will encounter when it comes to division.
OK, to this step we have enabled the analyzer to identify two numbers and differences. We can only let it identify individual numbers. But the expression in the actual problem is arbitrary. Enter a single number "1" running program to see if there is any problem.
It is not working properly, is it? why? Because we told the analyzer, the only legal form of the expression is between two TERMs. We must rewrite the Expression process to process more cases. A real analyzer is so moving.
General expression
In practical problems, an expression can be composed of one or more TERM, which is divided between "addition operators" (' ' or '-'). Writing in the form of BNF:
We can add a simple loop to the Expression process to describe this new definition.
{------------------------------------- --------------}
{Analysis and translation expressions}
PROCEDURE EXPRESSION;
Begin
Term;
While Look in [' ', '-'] do begin
Emitln ('MOVE D0, D1');
Case Look of
' ': Add;
'-': Subtract;
Else EXPECTED ('addop');
END;
END;
{------------------------------------- -------------}
Now we have stepped forward! This version can handle any number of TERMs, and we just add two codes. If you are further in-depth, you will find that this is the characteristics of the top-down analyzer ... It only needs to increase a small amount of code to handle the extension of this language. This makes our plan to gradually improve the analyzer. It is also to note that the code of the Expression process is very matched to the BNF definition of the expression, which is also a feature of this method. When you are fine in this approach, you will find that you can quickly convert a BNF expression into a corresponding analyzer code.
OK, compile the new version of this analyzer, try it. As usual, verify that this "compiler" can handle any legal expressions, whether it will give meaningful error information on illegal forms. It's not bad, is it? You may find that in our test version, the error message is mixed with the generated code to output together. That's just because we used the display in the test as an output. In actual products, there will be two divided outputs, one is an output file, one is output to the screen.
Use stack
I used to insist on explaining the problem in the code, unless it is absolutely necessary, I will not introduce any complicated theory, but here I am going to break this. At present, the analyzer uses D0 as the primary register, and the intermediate result is stored with D1. So far, it has works well because we deal with just "addition operators" ' ' and '-', as long as there is a new TERM appearing, it is added to the result. But in general, this is not perfect. Consider this expression 1 (2- (3 (4-5))))
If we put 1 in D1, where is 2?
Fortunately, there is a simple solution. Like any modern microprocessor, 68000 also has the structure of the stack, and the stack is an excellent place to save a series of data. So don't need to transfer Term from D0 to D1, we only need to put it in the stack. In the 68000 assembly, the statement of the stack operation is - (SP) and the stack is written, (SP) .
Ok, we change the Emitln statement in the Expression process to: Emitln ('Move D0, - (SP)');
Two statements containing add and subtract are changed to Emitln ('Add (SP) , D0') and Emitln ('SUB (SP) , D0'),
Now try the analyzer to ensure that our changes have not destroyed it. The code generated now is lower than previous efficiency, but this is inevitable.
Multiplication and division
Now start processing some truly trouble. You know, in addition to the "addition operator", there are other mathematical operators ... The expression can also have multiplication and division operators. You also know that there is a priority relationship between the expression of each operator, so the following expression 2 3 * 4, we will think that it is multiplication first, then do addition. (Looking back to see why we need to use the stack?)
In the early days of the development of compiler technology, people use some fairly complex techniques to achieve priority relationships between operators. But that's already outdated, use the top-down analysis technology we can enjoy these rules. So far, the form of TERM we consider has always been only a single decimal number.
More general, we define Term to become something generated by factor (Factor), that is, What is a factor? Now we understand it, the factor is TERM ... a single number. Note The following symmetry: Term has the same form of expression. In fact, we can change the process of processing expressions and rename the process of processing, and then add to the analyzer. However, in order to avoid confusion, the following is the latest complete code of the analyzer. (Note the method used in processing the order in the order in the Divide process.) {------------------------------------- --------------} {Analysis and translate a factor} Procedure factor; Begin Emitln ('Move #' Getnum ', D0') END; {------------------------------------- -------------} {Identification and translation multiplication} Procedure Multiply; Begin Match ('*'); Factor; Emitln ('Muls (SP) , D0'); END; {------------------------------------- ------------} {Identification and translation division operation} Procedure divide; Begin Match ('/'); Factor; Emitln ('Move (SP) , D1'); Emitln ('Divs D1, D0'); END; {------------------------------------- --------------} {Analytical translation of Term} Procedure Term; Begin Factor; While Look In ['*', '/'] do begin Emitln ('Move D0, - (SP)'); Case Look of '*': Multiply; '/': Divide; Else EXPECTED ('Mulop'); END; END; END; {------------------------------------- -------------} {Identification and translation of addition operation} PROCEDURE ADD; Begin Match (' '); Term; Emitln ('Add (SP) , D0'); END; {------------------------------------- ------------} {Identify and translate subtraction operation} Procedure Subtract; Begin Match ('-'); Term; Emitln ('SUB , D0'); Emitln ('NEG D0'); END; {------------------------------------- --------------} {Analysis and translation expressions} PROCEDURE EXPRESSION; Begin Term; While Look in [' ', '-'] Do Beginemitln ('Move D0, - (SP)'); Case Look of ' ': Add; '-': Subtract; Else EXPECTED ('Addop'); END; END; END; {------------------------------------- -------------} Good guy! A close to the practical analyzer / translator, only 55 lines of Pascal language source program! If you continue to ignore its efficiency problem, then this output looks very useful, I also hope that you ignore the efficiency problem. Remember, our purpose is not to generate a very compact code.