Compilation principle learning introduction
Why do the university courses open compilation principles? This course is concerned about the principle and technical issues of compiler. It seems that the computer's basic field is not in the side, but the principle of compilation has been a compulsory course of the University undergraduate, and it has become a must-have for graduate entrance examination. Compilation principles and technologies are essentially an algorithm problem. Of course, because this problem is very complex, its solution algorithm is relatively complicated. Our data structure and algorithm analysis is also the algorithm, but the basic algorithm, in order to say that the algorithm is introduced, and the principle of compilation This course is to compare the algorithm. In the 1950s, the compiler has been considered a very difficult thing, and the first Fortran compiler is said to have taken 18 years. While people try to write compilers, many theoretical and technologies related to compilation are born, and these theories and technologies are more worth more than one actual compiler. Just like mathematicians to solve the famous Gotbach guess, although there is no final solving problem, there is a lot of books of many famous families.
Recommended reference book
Although the compilation theory has developed to today, there has been a more mature part, but as a college student, it is still too difficult to write a compiler like Turboc C, Java. Not only written compilers, this course is also difficult to learn compilation.
It is because of compilation principles to learn relatively difficult, then there are good teachers and good textbooks. Teachers are not what we can change, but we can read it in our own wishes. I recommend a few good compilation textbooks. The books I recommend are classic textbooks in foreign countries, because in the domestic textbooks, I really haven't found satisfactory.
The first book is called "Compilers Principles, Techniques, And Tools", and another loud name is the dragon book. The reason is that there is a red dragon on the cover of this book, but also because this book is really too famous in the basic field of compilation, so many foreign scholars are directly named Dragon Book. Recently, the Machinery Industry Press has published the Chinese version of this book, name is called "Compilation Principle". The book is more early, probably completed in 85 or 86, one of the authors is also a famous Bell Lab scientist. The core compilation principle inside the inside has not changed, so it has been to today, its value is extraordinary. The biggest feature of this book is that through an actual small example, put the rough content of the compilation principle, so that many of the compilation principles have a very fast, and know why there will be these theories, how Use these theories. And this is what I feel the lack of domestic textbooks. Therefore, the domestic textbook is not writer who is willing to self-study. In short, it is half a day, but I don't know what is used inside.
The second book is originally called "Modern Compiler Design", Chinese name is "Modern Compile Program Design". The book is from the People's Posts and Telecommunications Press. This book is more concerned about the practice of compilation principles, and a lot of actual program code is given in the book, and there are many actual compilation technical issues, etc. Another feature of this book is its "modern". In traditional compilation principles, you can't see algorithms such as "garbage collection" in Java. Because Java's explanation executive language is only popular in recent years. If you want to learn the theoretical knowledge of compilation principles, then you must see the previous dragon book, if you want to do an advanced compiler yourself, then you have to see this "modern compilation program design".
The third book is the "Compilation Principle and Practice" recommended by many domestic compilation principles. Perhaps this book introduces this book to be more early, I remember that I bought this book in high school, but I also finished reading the book for some time. This book is also a good choice as an entry tutorial. The compilation principle given in the book is also quite meticulous, although it is not as deep as the previous dragon book, but many places have been to the university undergraduate teaching is very deep. The characteristics of the book are paying attention to practices, but it is not as good as the practice of "modern compile program design" in front. The key points of this book are still practiced in principle, not the previous technical practice. "Compilation Principles and Practices" is also gradually practicing a modern compiler Tiny C. Waiting for you to read the whole book, you can write a tiny C. The authors also have a very detailed description of the two commonly used compilation related tools of Lex and Yacc, which is also difficult to see in the domestic textbook. These three textbooks have been recommended, all of which are English and Chinese. Many English good students only like to see the original book, not my feelings are the translation of these three books very good, there is no need to buy English. The essence of understanding theory is more important than understanding the text.
Compilation principle
As mentioned earlier, the principle of learning compilation is actually learning algorithms, nothing special. However, these algorithms have already formed a set of theories. Let's take a look at what is in the primary theory in the compilation principle.
Almost every compilation principle is divided into words, gramatic analysis (LL algorithm, recursive decline algorithm, LR algorithm), semantic analysis, runtime environment, intermediate code, code generation, code optimized these parts. In fact, the textbooks in many compilation principles are arranged in accordance with the 85,86-published books, so that the content format of this dragon book is almost the format of the compilation principle, including domestic textbooks. In general, undergraduate teaching in the university is impossible to finish all parts of the above, but rather be compared to the previous sections. Like the code to optimize the part, just like a bottomless hole, if you have to talk about it, it is impossible for lessons that have been opened alone. Therefore, it is generally relatively high for undergraduate students, and the requirements for lexic analysis and grammar analysis are relatively high.
The lexical analysis is relatively simple. It may be that the lexical analysis program itself is very simple, and many people who have not studied compilation can also write a variety of lexical analysis procedures. However, when the compilation principle explains the lexical analysis, the principle of the regular expression and the automaton is followed, and then the generation of the explanatory term analysis program in a very standard manner. This kind of practice is obvious, that is to let the lexical analysis from the program rise to the theory.
The grammatical analysis section is more troublesome. There are generally two syntaxial analysis algorithms, and the LL is from the top down algorithm and the LR from the bottom up algorithm. The LL algorithm also said that when the LR algorithm is, it is difficult. Many self-study compilation principles have gave up self-study after experiencing the understanding of the LR algorithm. In fact, these things are as long as everyone understands, and it is not like a lexical analysis. The grammar analyzer like the LR algorithm is generally generated with tool Yacc, and there is no comparison in practice. For the special recursive decline algorithm in the LL algorithm, because of its practice, it should be required to write yourself by each student. Of course, there are still many well-known LL algorithms, but if you change in a non-C platform, such as Java, Delphi, you can't use YACC tools, then you only have yourself to write a grammar analyzer.
When you have learned the lexical analysis and grammatical analysis, you may have such a question: "What is wrong with the lexical analysis and grammatical analysis?" As a compiler's perspective, the compiler needs to convert the programmer to one. Convenient processing data structure (abstract syntax tree or syntax tree), then this conversion process is through lexical analysis and grammar analysis. In fact, lexic analysis is not a must-have part of the compiler, but we extract this cumbersome work separately in order to simplify the grammatic analysis, it will be extracted separately. In addition to the compiler part, it is also useful in other places, lexical analysis and grammar analysis. For example, when we enter the command under DOS, UNIX, Linux, how the program analyzes the command of the command you entered, which is also a simple application. In summary, the work of these two parts is to convert the text information of the non-Rules into a data structure that is better analyzed. So why do the tutorial of the compilation principle ultimately convert the source analysis to be analyzed into "tree" data structure? STACK, LINE, LIST ... STACK, LINE, LIST ... in the data structure, each have its own characteristics. But this structure has a strong recursive, that is to say that we can extract any Node Node Node, it is still a complete tree. This is in line with the formal language of our current compilation. For example, we use a letter tree in a function, cycles in the cycle, such as conditions, etc., it can be visually demonstrated to represent this data structure in Tree. Similarly, we are also such recursive when we perform a form language. In the part of the code generated behind the compilation principle, it will introduce a stack-type intermediate code. We can use the abstract syntax tree according to the analysis, it is easy to use, which can be used to recursively traverse the abstract syntax tree to generate this command code. . This code is actually widely used in other interpreted languages. As now popular Java, .NET, its underlying byte code bytecode, it is to say that this is based on the stack-based instruction code. Regarding semantic analysis, grammar guidance translation, type inspection, etc., in fact, it is a process of improving the abstract syntax tree obtained in front. For example, when we write C language programs, we know that if a floating point is directly assigned to an integer, the type does not match, then how do the C language compiler know? That is to pass the type of this step. The language that supports polymorphisms like C languages, which is more complicated. The textbooks of most compilation principles are all explained some better handling strategies. Because new problems are always in happening, the old way is not enough to solve.
Originally, as a compiler, the role is the source program generated by the user input to the final code generation. But when I explain the final code generation, I have to explain the machine operating environment. Because if you don't know how the machine performs the final code, you certainly can't know how to generate the final code. This part of this part I feel that it is even more than the principle of compilation itself. Because it will pass the running process of a computer in front of you, you may not engage in the development of compilers in the future, but as long as it is a field related to computer software development, it will involve the execution process of the program. The runtime environment explains how to be more clear how a computer program is stored, how to load, how to do it. About some of the content, I strongly recommend that everyone will look at the explanations on the dragon book, from the most basic storage organization, storage allocation strategy, access, parameter delivery, symbolic table to dynamic storage allocation (Malloc, New) Detailed explanation. These things are often done when we write usual procedures, but we are less to find how it is completed inside.
About the intermediate code generation, code generation, code optimization section is really difficult. Many domestic textbooks will be very simple to go to this part. The students have listened to the students, but they are only known as understanding. I don't know how to apply. However, if this part of this content is to be serious, the course of alone is not finished. In the book "Compilation Principle and Practice", the explanation of this part is just right. The author mainly explains or a stack-based command code. It is very easy to understand. After people look, it is easy to imitate it, and they can write their code generation after they come down. Of course, for other code generation technology, the explanation of code optimization technology is very simple. If you want to study the code generation technology carefully, there is also an additional "Advance Compiler Desgin and Implement". The book is now introduced by the Machinery Press, very heavy, and the original English. However, I didn't listed it as a recommendation book to everyone. After all, I can figure out the content of the dragon book. I have already a very good master in China. I will see this "Advance Compiler Desgin and Implement" also. Not late. The code optimization part is still a less important part of the undergraduate teaching. It is to be in the practice process. I believe that everyone is not very useful. After all, the compiler you do can correctly generate the execution code is very good, and talk about what optimization? About practice
The program of compilation is just a course of explaining the principles, not specialized compilation technology courses. These two courses are very different. Compilation technology is more concerned about the actual techniques used in the compiler, and the principles are concerned about the basic theory. However, computer science itself is a very practical course. If you can learn, it is called true learning. Li Yang said when explaining crazy English, as long as you will use a word a phrase when you use a word a phrase, you can learn this word or phrase, not just knowing its spelling and meaning. In fact, any learning is the same, if it lacks the combination of practice, you can't think of the society.
The curriculum of compilation is mainly to explain the theoretical and principles generated by the compiler, so simple, write a compiler is the best practical process. However, you have to be careful, compile system may be one of the most complex systems in all software systems, otherwise why do you have a compiler's writing into a course called compilation? I admire those who learn how to write the operating system, learn the principle of compilation, start to write the compiler, indeed, in China, the students who dare to do this too little. And no matter what you can do, at least this attempt will make your program design, and the system planned arrangement has improved a lot. I will give some difficulties that may encounter during the practice process, I hope to help you before you can't get in front of you.
1. Lex and yacc. These two tools are tools for analyzing the speech analysis as the lexical analysis. If you write a compiler yourself, I don't recommend that you have learned this kind of thing. Lex and YACC should be a must-have for the textbook for each compilation principle, but it is very little seen in the domestic textbook. These two tools are small things under UNIX systems, if you want to use in Windows, then you are best going to Cygwin software. It is a Southone that simulates UNIX under Windows, which contains two tools for Flex.exe and Bison.exe (YACC). These two tools are still very troublesome (in fact, many very useful tools under UNIX) This is all like this), but in this book in "Compilation Principle and Practice", the explanation of these two tools is very detailed, and there is also a lot of practical examples.
2. Do the interpreted language than the compiler that generates machine code. Although it is said that the interpreted compiler is like Java, you have to write an interpreter, but you don't have to look for the machine code. If you do the generated final machine code compiler, there may be a problem with the register-based code generation method. As mentioned earlier, if you generate a stack-based code, the code generation process is very simple, there is not much thing to consider, if you consider the final machine code generation, you must consider how the machine's register is assigned Waiting for trouble. 3. Consider using the grammar files that have been generated by others, try not to write the lexical files and grammar files yourself. Previous friends have said that a good program language is written, almost half of the compiler. It is indeed this, the writing of the grammar file is a very difficult thing. Now you can find the language files and grammar files such as C language, C , Java, Tiny C, Minus C, and you can use it down. .
In the book "Compilation Principle and Practice", the author gives a total code of Tiny C. I feel that the author's compiler is very good, relatively simple to other php, perl, etc. Many, easy to understand, and clearly show a completed compilation system implementation. The source code can be downloaded on the author's website.
Later
The learning principles may be a difficult course, especially for classmates who are not interested in compiling systems. Since it has been used as an undergraduate compulsory course, then the atriore theory it will be introduced throughout the computer The scientific field also has a relatively important position.
If we are studying history, we will find that many people known as the program design master are masters in the field. The Basic language of the first micro-machine is written, and the "world of the world" of Delphi's Borland is designed. The most powerful programmer ", Sun's Java father, Bell Lab C father ...