Let's build a compiler! (3)

xiaoxiao2021-03-06  127

Let's Build a Compiler! (3) --- Part 2: Expression analysis

brackets

We can add a part of the processed parentheses to the analyzer. Brackets are a mechanism for enforcing changes in operational priorities. For example, expression 2 * (3 4),

Braces enforce the addition to the plus before the multiplication. More importantly, braces provide us with a mechanism for defining an expression of any complexity. For example, expressions (1 2) / ((3 4) (5-6))

Putting the processes of parenthesis to our analyzer, the key is to recognize that a brackets enclosed in parentheses, no matter how complicated, relative to other parts outside the parentheses, it seems equivalent to a simple Factor. So a representation of Term is: :: = ()

This has involved the concept of recursive. An expression can contain Term, and this TERM can also contain another expression, which can contain Term .... So endless.

Regardless of whether it looks complicated, we can solve this problem when we can add a few statements to the Factor process.

{------------------------------------- --------------}

{Analysis and translation factor}

PROCEDURE Expression; Forward;

Procedure factor;

Begin

if Look = '(' Then Begin

Match (');

EXPRESSION;

Match (')');

end

Else

Emitln ('Move #' Getnum ', D0');

END;

{------------------------------------- -------------}

Please pay attention again, our extension of this analyzer is easy, and these code matches the BNF generated.

As before, compile the new version of the analyzer to ensure that it correctly analyzes the legitimate statement and gives an error message for illegal speech.

One dollar subtraction

So far, we seem to have enabled this analyzer to handle any form of expression, is this? OK, enter this statement Try: -1

Oh! It can't work properly, is it? The Expression process hopes that each expression will start with an integer, so it cannot recognize the opening of the beginning. You will find that the analyzer cannot recognize 3, and it cannot be identified - (3-2). Such an expression.

There are two ways to solve this problem. The easiest way (although not the best way) is a false, there is a 0-3, so -3 becomes 0-3. We can easily correct the expression existing version with this method.

{------------------------------------- --------------}

{Analysis and translation expressions}

PROCEDURE EXPRESSION;

Begin

If isaddop (look) THEN

Emitln ('CLR D0')

Else

Term;

While Isaddop (Look) Do Begin

Emitln ('Move D0, - (SP)');

Case Look of

' ': Add;

'-': Subtract;

Else EXPECTED ('addop');

END;

END;

{------------------------------------- -------------}

I am very easy to make a modification of the source program! We have only added 3 lines of new code. Pay attention to the newly referenced function isaddop. Because two times of testing of the addition operator, I defined a new function for it. Function IsadDo should be as intuitive as the function isalpha. Its code is as follows:

{------------------------------------- -------------}

{Identifying addition operator}

Function isaddop (C: char): boolean;

Begin

IsadDop: = C in [' ', '-'];

END;

{------------------------------------- -------------}

OK, change our program as above and recompile. You should put the function isaddop to your program. It is also required to be used later. Now try again -1 again. Hoot! The efficiency of the code generated is really low ... The 6 line code is just to load a simple constant ... but at least it is correct. Remember, our purpose here is not to replace Turbo Pascal.

So far our work is just the structure of our expression analyzer. The current version should be able to correctly analyze and compile any form of expressions you deliberately entered. But it is still limited to the process of processing TERM is only a single decimal number. But I hope that now you have realized that we can make greater extensions on the analyzer, but only do some small modifications. Maybe I heard a variable or even a function call is only another form of factor, you will not be surprised.

In the next section, I will show you how to extend our analyzers to handle the above problems, and how to extends to extend their numbers and variable names that can handle multiple characters. You will see that we are not far from a real practical analyzer.

About code optimization

At the beginning of this section, I said that some tips on how to improve the quality of the target code. Although I also said that the main purpose of this series of tutorials is not to make a compact code. But you should at least understand that we are not wasting time here ... Because we can make further modifications to this analyzer to make it generate better code, rather than throw away all our work we have so far, and stove. Like the previous case, do some optimized measures are not so difficult ... Just add some extra code to the analyzer.

We have two basic optimization methods:

o Target code generation after it is improved

This is the principle of "Peephole" optimization method. The usual idea is that we know what kind of instruction combination will be generated by the compiler, and we also know which instruction combinations are low efficiency (such as the code generated above -1). So what we have to do is to scan the generated code, find those command combinations that are not efficient, and replace them with a better efficient instruction. This approach will cause the code to expand vigorously, but is a method of comparable direct mode matching. The only complex place is that there may be many combinations need to be found. And this method is called the peeking hole optimization method is that it is only looks only in a group of instructions at a time. Peephole optimization method is obvious to the optimization effect of the target code, almost no change in the structure of the compiler itself. Despite this, it is still necessary to pay a certain price in the speed, volume, and complexity of the compiler. Find all those instruction portfolios to use a lot of IF tests, each IF may be the root cause of the error, which is of course a lot of time.

In the implementation of the classic peephole optimizer, it is carried out as the second time of the compiler. The code output from the previous output is written to the disk and then the optimizer reads and processes the target code file. In fact, you will find that an optimizer can even become a relatively independent program with the entire compiler. Because the optimizer is just in a small "window" instruction (this is the origin of this method name), a better implementation method is to place the output code of several rows into the buffer, then scan after each Emitn statement. Buffer. o Generates a better code in the beginning

This method requires that we look for some specific situations in the source program before calling EMIT. As a less typical example, when doing add 0 addition, we should define constant 0, and call the EMIT process to generate a CLR statement instead of the LOAD instruction, or simply nothing. For a closer example, if we want to identify one yuan during the Factor, rather than recognition during the Expression process, we can treat -1 as a normal constant, not from their absolute value to generate negative numbers. These tasks are not difficult to complete ... only need to add some additional code. For this method, my opinion is that once we have a compiler that can run, it can generate useful code, we can often turn back to improve it, make it generated more compact. This is why there will be a 2.0 version in the world.

There are other optimization methods worth mentioning, they have no doubt that can generate beautiful compact code. I think the way to introduce below is my "invention", although I can't prove that it is my original, but I have not seen it in any published publication.

In fact, this method is to use the CPU register, thereby avoiding excessive use stacks. Recall that we only do the addition and subtraction, we use the registers D0 and D1, not the stack, remember it? This method works well because the number of operands used in "stack" will not exceed two operations.

The 68000 processor has 8 registers. Why don't you use them to construct a stack of our own management? The key is to recognize that at any time during the processing, the analyzer knows how many operands in the stack, so the analyzer must be able to properly manage the stack. We can set a custom "stack pointer" to indicate the top of the stack and is used to locate the corresponding register. For example, the Factor process will not be placed directly in the register D0, but put it in a register that happens to be currently in the "stack top" position.

What we have to do is to effectively consist of registers, replacing the CPU management memory stack by our own hop. For most expressions, the depth of the stack will not exceed 8, so we can get quite beautiful target code. Of course, we have to deal with those expressions that are not a common stack of more than 8, but that is not a problem. As long as we simply let our stack over from the stack of CPU management. The expression of the target code does not exceed the code difference between the stack, and the deeper is more than 8 expressions, and the target code quality is much better than the expression of the stack is less than 8.

In order to write the above views in the tutorial, I have implemented this method in advance to make sure it is feasible. When you really practice, you will find that you can't really use 8 registers to construct the stack ... You do at least if you need to idle a register to exchange the operand in the division operation (I hope that 68000 has an instruction like Xthl, Like an 8080 processor!). For expressions that contain functions, I need to call a register for function calls. Even so, the quality of the target code is still greatly improved for most expressions.

Now you have seen it, you have to generate a better code is not so difficult, but you will indeed add the complexity of the compiler ... Therefore, we strongly recommend that we continue to ignore efficiency in the subsequent part of this tutorial. I hope you can remember It is indeed that we can improve the quality of code without the need for another stove to throw away our work. Next lesson, I will explain how to handle the Factor and function calls in the form of variables, and will continue to demonstrate how easily demonstrate how to handle multi-character token and blank characters.

转载请注明原文地址:https://www.9cbs.com/read-125765.html

New Post(0)