Let's build a compiler! (1)

xiaoxiao2021-03-06  107

Foreword

This document contains all parts of the Jack Crenshaw compiler development tutorial, including new chapter 15. The predetermined reader group is those non-computer majors, and they love the computer and want to know how the compiler works. Many compiler theories in this tutorial are omitted, but the content of compiler practice is overlap. When you learn this tutorial, you should be able to design and construct your own compiler. It may not be the best compiler in the world and cannot generate a compact target code. Your results may never drive Borland or Microsoft out of the market, but it works normally, and it belongs to yourself.

Jack W. Crenshaw

Let's build a compiler!

BY

Jack W. Crenshaw, ph.d.

Part 1: Getting Started

Introduction

This series of articles is a tutorial for the development programming language analyzer and compiler, including theory and practice two aspects. In this tutorial, we will involve compiler structures, design a new programming language and build a content that can actually run the compiler.

Although I am not a computer scientist who has been educated (my Ph.D. is in another different field: physics), I am interested in compilers for many years. I bought almost all books in this area and tried to digest the essence of content in the book. I don't mind telling you that the progress of my study is actually slow. The compiler textbook is written by the students who lead to computer science. It is difficult to understand for those non-computer majors. But a few years ago, I began to gradually understand the connotation of those theories. True to change the situation is when I started experimenting on my computer. Now I want to share what I have realized with you. But after you learn this tutorial, you are still not a computer expert, and you can't just understand all the Dao Austrian compiler theory. I intend to completely go to those preference theory. You will learn all the practical content to be understood when building a compiler.

This tutorial takes the way to learn. I will make a demonstration on a computer during learning exploration. I hope you can follow me, repeat the experiments I did and complete some of your own things. I use Turbo Pascal 4.0. I will periodically insert some examples written in TP. These examples can be used directly to compile the code, I hope you can copy on your computer and run them. If you don't have Turbo Pascal, your study will be restricted, so it is strongly recommended that you get a copy of a TP. After all, TP is an excellent product, there are many other good features!

There are some articles in the theory of compiler to show you some examples or show you a complete work (such as Small-c) you can copy and use them, but you don't know much about its internal working principle. I hope that this situation mentioned above is more coming, so that you can have some own feelings after you get from the book, not just to rewrite my work again, but it is possible to get on it. floor.

It is true that this is a wild heart, not one or two papers can be clear. I want to write a series of articles to complete it. Each article will tell the content of a compiler theory, and remain relatively independent. If you have a limited time, and you only want to see the article on a certain aspect. Every article will be uploaded after completion, so you have to wait until the last article is released after it can eventually complete your study. Please keep the tolerance.

The general compilation principle textbook covers many basic theories we are not prepared. Typical order is:

o Introduction to the compiler.

o One chapter or two chapters describes the syntax generated by the BACKUS-NAUR FORM.

o One chapter or two chapters describes the lexical scan, focusing on determinism and non-deterministic have poor automation.

ORIC sections describe the grammatical analysis theory, starting from the Top-Down recursive drop method, ending at the LALR analyzer. o In one chapter, the intermediate language is introduced, focusing on P-CODE and reverse Poland expressions.

o To introduce different methods of processing subroutines, parameter delivery, type declarations, etc. with many chapters.

The content of the O-chapter introduces the target code generation, which is usually a virtual machine based on some instruction sets. Most readers (in fact, college students who have elected this lesson) have never read here.

o The last chapter or two chapters describe the code optimization. This part is often not read.

I will take a completely different ways in this series of tutorials. First of all, I won't be too much to hover in different implementation methods. I will give a practical way. If you want to study those different ways, that's good ... I support you ... But I will insist on explaining my way. I will also omit most of the theory of faint sleep. Don't misunderstand me: I am not a small look, when dealing with some tricky problems, compiling theory is extremely important. But I believe that the most important thing should be placed foremost. Here we will explain 95% of compilation technology that does not require many theoretical guidance.

I will also only discuss a syntax analysis method: a recursive decline in the top downward, which is the only technology that can manually build a compiler. Other methods are useful when you have an automatic constructor like YACC and don't care how much memory space is consumed.

I also borrowed the results of Ron Cain, Ron Cain is the author of the Small C compiler. Although most other compiler authors are accustomed to using a intermediate language before generating the target code, such as P-CODE, and divide the compiler into two parts (front and later, the front end generation P-CODE, the rear end processing P-CODE to generate an executable target code), Ron tells us that the target code that generates assembly code form is a more direct way. (Of course, there are many shortcomings in this way, such as platform portability, not conducive to code optimization, etc. - Translator Note) These target code is not the most compact code ... The optimization of code is a more complex work. . But it can still run and have a good operation. I don't want you to think that we have no value in the back of the compiler, so I intend to introduce you to some compiler optimization.

Finally, I will use some skills you have summed up. I found them very helpful when I understand the compiler's workflow, and I don't need me to spend too much. One of the main tricks is to use single characters in early design. I think if the analyzer identifies and handles the character sequence I-T-L, I will make it the same operation to IF-THEN-ELSE. This is the ability to do. In the second lesson, I will show you how easy to extend the single-character analyzer into the token to handle any length. Another tip is completely ignored file I / O, I think if I can read the input from the keyboard and send the output to the screen, you can also read or output the disk file. Experience proves that a compiler can work correctly in the above way, it is a simple thing to redirect I / O to disk files. The last trick is not correct or repaired. Our program only recognizes errors, and will not let the program crash because of the discovery of the error, just simply discover the first error place ... just like the old version of Turbo. You will also see some other techniques later. Most of them have not been mentioned in any compiler textbook, but it is really effective.

About programming style and efficiency. As you can see, I write programs tend to divide them into a number of very small and easy to understand modules. We have to discuss learning procedure without a length of more than 15-20 rows. I am a faithful supporter of the KISS (Keep IT Simple, Sidney) principle in software development. When a thing is capable of completing a simple method, I will never complete it with a superior or complex method. Lack of efficiency? Maybe, but you will like this result. Brian Kernighan has said that first let the program run, then let it run faster. If you want to make your work code more compact, it is easy, because these codes are readily readable. Despite this, I am strongly suggested that you can do it when you can run in full accordance with your requirements. I still have a habit until I find that I really need a module to start writing it. Trying to foresee if the problem that you may encounter will make you crazy, and usually you will find that your original guess is wrong. Now we have a full screen editor and a very fast compiler. When I find a more functional module, I will do not hesitate to modify the original version. At that time, I only wrote a version that only met my current needs.

The last thing to pay attention to: One of the principles we have to persist is not in the P-Code or virtual machine, but use the target code in the compilation form. Maybe you don't like the assembly code I used ... it is a compilation of 68000cpu, which is the assembly language running on my system. I think you will find it to convert them into compilation instructions on any other CPU, such as X86 compilation, is a light, at least I didn't find any problems. In fact, I hope there is a pair of x86 compilation than I am more in a row, I can provide X86 assembly code with the content code equivalent.

Origin

Each program requires some common or routine parts ... such as I / O routines, error information routines, and more. The procedures we develop here are no exception. I try to minimize the proportion of these raw materials so that we will focus on those that really important problems and not in the forest. The code given below shows the least things required to complete some of the work in the compilation. It consists of some I / O routines, an error handling routine, and a main program skeleton. I call it as our origin. When we develop other subroutines, we will add them to "Organized" and increase the calling statement. Please copy this program because we will use it more than once.

There are many different methods for lectone scanning. In UNIX systems, the authors of the compiler tend to use the getc () function and the ungetc () function (the ungetc () function function is to press the character read from the input stream back to the input stream - Translator Note) . Here I also use this method, I use a separate global variable to pre-read a character. A part of the code during the initialization process (which is the only part of the only part of the current "starts the entire compilation process by reading the first character from the input stream. Don't need any special skills under Turbo Pascal 4.0 ... The following to get the next character from the input stream later later.

{------------------------------------- -------------}

PROGRAM CRAdle;

{------------------------------------- -------------}

{Constant declaration}

Const Tab = ^ i;

{------------------------------------- -------------}

{Variable declaration}

Var Look: char; {Read Reading Character}

{------------------------------------- -------------}

{Read characters from the input stream}

PROCEDURE GETCHAR;

Begin

Read (LOOK);

{------------------------------------- -------------}

{Report error}

Procedure Error (s: string);

Begin

Writeln;

Writeln (^ g, 'error:', s, '.');

END;

{------------------------------------- -------------}

{Report error and stop operation}

Procedure Abort (S: String);

Begin

Error (s);

Halt;

END;

{------------------------------------- -------------}

{Report next steps to appear characters}

Procedure expected (s: string);

Begin

Abort (S 'EXPECTED');

END;

{------------------------------------- -------------}

{Matching specific input character}

Procedure match (x: char);

Begin

if Look = x Then GetChar

Else EXPECTED ('' '' x '' ');

END;

{------------------------------------- -------------}

{Identify alphabetic character}

Function isalpha (C: char): boolean;

Begin

ISALPHA: = Upcase (c) in ['a' .. 'z'];

END;

{------------------------------------- -------------}

{Identification numeric character}

Function isdigit (c: char): boolean;

Begin

Isdigit: = c in ['0' .. '9'];

END;

{------------------------------------- -------------}

{Get identifier}

Function GetName: char;

Begin

If not isalpha (look) THEN EXPECTED ('Name');

GetName: = Upcase (LOOK);

GetChar;

END;

{------------------------------------- -------------}

{Acquired number}

Function Getnum: char;

Begin

If not isdigit (look) THEN EXPECTED ('INTEGER');

Getnum: = LOOK;

GetChar;

END;

{------------------------------------- -------------}

{After the tabletor output string}

Procedure emit (s: string);

Begin

Write (Tab, s);

END;

{------------------------------------- -------------}

{Table character output string and wrap}

Procedure Emitln (S: String);

Begin

Emit (s);

Writeln; end;

{------------------------------------- -------------}

{initialization}

PROCEDURE INIT;

Begin

GetChar;

END;

{------------------------------------- -------------}

{Main program}

Begin

Init;

End.

{------------------------------------- -------------}

The above is to write the code for entry. Copy them to the TP to compile. Make sure you can compile success and run correctly. Then continue to learn the first lesson, we will explain how expression analysis.

转载请注明原文地址:https://www.9cbs.com/read-125768.html

New Post(0)