Let's build a compiler! (4)

xiaoxiao2021-03-06 144

Let's Build a Compiler

Jack W. Crenshaw, ph.d.

Part III: Reproduction

Introduction

In the previous chapter, we experinated the various technologies of analyzing and translating a general mathematical expression. Finally, we have completed a simple analyzer that handles any complex expressions that meet the following two conditions:

o None variables in o expression, only numbers consisting of Factor

o Factor consisting of numbers is limited to single digital characters

In this chapter, we will cancel the above limit. And also to expand the analyzer so that it can process the case where the function call is included. To remember, the second restriction mentioned above is mainly our own strengths. This is for handling convenience, but also allows us to focus on the foundation concept. You will see later, this restriction is also easy to eliminate, so we don't have to stick to this now.

variable

In practical problems, most expressions we have seen include variables, such as

B * B 4 * a * c

It is not a good analyzer that cannot be handled by variables. Fortunately, it is easy to do this. :: = | ()

The relationship of "or" "or", that is, any of them is a legitimate Factor. At the same time, it is not difficult to distinguish it in the end of the expression in the expression ... In one case, the first character is the left bracket '(', and the first character is a numeric character.

If we look for variables as another form of Factor, should you be surprised? The above BNF generated can be expanded to:

:: = | () |

This extension does not generate any secondary sense: If the first character is a letter character, the variable is obtained; if the first character is a numeric character, it is a number. When I recall the previous processing of numbers, we use it as an immediate number, and the generated code will directly put it in the D0 register. Now we use the same method to handle variables, just put it in D0 is a variable.

A slightly complicated part of the code generation is that the vast majority of 68,000 operating systems, including the SK * DOS I use, require the target code to write "location-independent" form, roughly meaning that all things are with PC ( Program Counter - Translator Note) Register is related. Therefore, the instruction to load the variable should be written

Move X (PC), D0x represents the variable name. According to the result, we rewrite the Factor process:

{------------------------------------- --------------}

{PARSE AND TRANSLATE A MATH FACTOR}

PROCEDURE Expression; Forward;

Procedure factor;

Begin

if Look = '(' Then Begin

Match (');

EXPRESSION;

Match (')');

end

Else if isalpha (look) THEN

Emitln ('Move' GetName '(PC), D0')

Else

Emitln ('Move #' Getnum ', D0');

END;

{------------------------------------- -------------}

As I mentioned earlier, it is easy to extension the analyzer because we design the analyzer. The conclusions above this section are still established. This time we only added two code. At the same time, pay attention to how the IF-ELSE-ELSE structure matches the BNF syntax. OK, compile and test this new version. function

There is also a factor, which is supported by the majority programming languages, which is the function call. But now I want to deal with the functional problem is still early, because we haven't talked about the parameter delivery problem. Another reason is that in an actual programming language, the identifier generally does not only represent a type, and one of them is a function. Although now, although there is no processing of functions, I still insist on analyzing the function identifier in the expression. There are two reasons, one is to make the analyzer close to its final form, and the other is that the processing function identifier will lead a new very worth mentioning topic.

At present, our analyzer can be called "predictive analyzer." It means at any time, you can use the first line character to accurately know what to do next. But when we add the situation after the process of the function is different. Each language has a naming mechanism of an identifier. Now our naming mechanism is to simply specify the identifier as letters "a" to "z". The problem is that the variable names and function names follow the same rules. So how do we know when an identifier is a variable, when is the function? One solution is that they are required to declare them before using variables or functions. Pascal is the method used. Another way is that we can request a parameter list after the function name (the parameter table can be empty). This is the method of C language usage.

Since there is no mechanism to declare the variable, we now use the method in C. Also because there is no mechanism for parameters, you can only use empty parameter tables. So our function call form is like this:

x ().

Since the parameter table is not handled, you can use a BSR command (meaning function call).

Nowadays, there are two possibilities in the "if isalpha" branch test in the Factor, and we have handled these two situations in a separate process. Modify the Factor as follows:

{------------------------------------- --------------}

{PARSE AND TRANSLATE A MATH FACTOR}

PROCEDURE Expression; Forward;

Procedure factor;

Begin

if Look = '(' Then Begin

Match (');

EXPRESSION;

Match (')');

end

Else if isalpha (look) THEN

Ident

Else

Emitln ('Move #' Getnum ', D0');

END;

{------------------------------------- -------------}

Then insert the next new process into the Factor process.

{------------------------------------- --------------}

{Parse and Translate An Identifier}

Procedure Ident;

VAR name: char;

Begin

Name: = getName;

if Look = '(' Then Begin

Match (');

Match (')');

Emitln ('BSR' Name);

end

Else

Emitln ('Move' Name '(PC), D0') end;

{------------------------------------- --------------}

OK, compile the new version of the analyzer. See if it analyzes all legal expressions? Can I give an error message for illegal expressions?

It is important to note that even if we no longer have a predictive analyzer, there is no complexity of recursive decline. When the Factor process discovers an identifier (that is, a letter), Facotr doesn't know that the variable name is still a function name, but it doesn't matter. It passes the identifier to the Ident process, indicating the type of the identifier by Ident. Take a look at the Ident process, it saves the identifier first and then reads the next input character to determine which identifier being processed.

Remember the above method. It is very powerful. No matter when you encounter a variety of possibilities, you need to view the input forward, you can use it. Even if you have to look forward to several symbols, this method is still applicable.

Re-processing

When we talk about the structure of the compiler, it will lead to another important topic: Error handling. Please note that although our current analyzer can refuse (almost) all illegal expressions and give meaningful error messages, we don't have a lot of work in this regard. In fact, in the entire compiler, only two incorrect processing procedures are called. Even so, they are still unnecessary ... If you look back at the Term and Expression procedures, you will find that the statement that calls the error handling process is actually unable to be executed. Earlier I put them in the program just for insurance, but now I don't need it. So why not delete them?

How can I get a beautiful error handling without paying an extra price? That is also very simple, I just avoid direct reading characters directly with GetChar. Instead, I put the error handles in GetName, GetNum, and Match, allowing them to complete all error detection. Smart readers will notice some calls to Match (for example, calls in Add and Subtract) are unnecessary ... When we arrive at the call to Match, it is actually known which character is read. ...... But doing this can maintain the symmetry of the program, and use Match also helps to keep the formula of getchar that does not directly call.

There is also a problem, we have not told the analyzer to end the flag of the end of the line, and how to deal with the blank character embedded in the expression. So blank characters (or any other characters that do not belong to the analyzer character) will stop the analysis process.

Take a look at the example below:

1 2 3 4

Our processing method is to slightly blank characters and specify the expression as the end sign with the carriage. To process the problem of the expression, we add the following line at the end of the Expression process: if Look <> Cr THEN EXPECTED ('newline');

Don't forget to add CR = ^ M in a constant definition.

Assigning statement

OK, our analyzer is very good. We only used 88 lines of code, so many features.

Of course, if there is no successor after the expression is completed, then our analysis is not much. Expression usually (but not any time) appears in the assignment statement, the form is as follows:

We are close to the analysis of a complete assignment statement. Let's finish this last step. In fact, just need to add the following new process after the Expression process: {------------------------------- -----------------------------}

{Parse and translate an associternment statement}

Procedure assignment;

VAR name: char;

Begin

Name: = getName;

Match ('=');

EXPRESSION;

Emitln ('LEA' NAME '(PC), A0');

Emitln ('Move D0, (A0)')

END;

{------------------------------------- -------------}

Please note that the above code is matched to the BNF generated. And the error handling does not use any additional code, still use the GetName and Match procedure.

Due to the provisions of the 68000 processor, the two assembly code generated above should be written as related to the PC register.

Ok, now call the Expression process during the Assignment, and no longer calls in the main program.

SON OF A GUN! We have been able to translate the assignment statement. If the assignment statement is the unique statement of a certain programming language, then just put the assignment process in a loop, a complete compiler is completed!

In fact, the assignment statement is not the only statement form, and there is a control statement (such as if statement and loop statement), process, declaration, etc. But it is worth celebrating that the processing of mathematical expressions in front of our previous completion is already one of the most challenging work in the compilation of the language. Compared with our work, the translation of the control statement is easier. I will explain them in Chapter 5. The processing of other statements is also similar, as long as the Kiss principle is remembered.

Multi-character Token

Before I have been carefully defined to analyze the single character token, and guarantee that it is not difficult to analyze the multi-character token. I don't know if you believe this. Below I have to continue to use the methods used in previous ways to complete this extension because it will continue to make our work easily. Before you start, please save the current version with another file name. In the next chapter, we will use this version of the single-character TOKEN.

Most compilers will divide the processing of the processing input stream into a separate module called the lexical scan. Its idea is this, the scanner receives and analyzes a character to pick a character's input, and then return a separate marker unit (which is token). Maybe we need to do this later, but it seems that there is no need. We can complete the processing of multi-character token as long as we do small modifications to getName and getNum procedures.

The label is generally defined as a letter, but it can be connected to the letter can also be connected to the digital characters. Based on this, we need a new function to identify the identifier:

{------------------------------------- -------------} {recognize an alphaumeric}

Function isalnum (c: char): Boolean; begin isalynum: = isalpha (c) or isdigit (c); end; {------------------------ ---------------------------------------}

Add it to your analyzer. I am putting it behind the isdigit process.

Now, modify the function getName so that it returns a string instead of a character: {---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------} {Get An Identifier} Function GetName: String; Var token: String; Begin token: = ''; If not isalpha (LOOK) THEN EXPECTED ('Name'); While Isalnum (Look) Do Begin token: = Token Upcase (LOOK); GetChar; End; GetName: = Token; End; {------ -------------------------------------------------- --------}

Assignment, modify the getNum process:

{------------------------------------- -------------} {Get a Number}

Function GetNum: string; var value: string; begin value: = '; if not isdigit (LOOK); While Isdigit (Look) Do Begin Value: = Value Look; GetChar; End; GetNum : = Value; End; {----------------------------------------- -------------------}

Surprising, this is all the changes we have to do! The local variables in the Ident and Assignment process are originally declared as "char" type, and now you must change to String [8]. (You can also define longer length, but there is a limit). Ok, recompile and test the analyzer. Now you believe this modification is very simple?

转载请注明原文地址:https://www.9cbs.com/read-125763.html

9cbs

New Post(0)