Garbage collection
Note
This article was published in February 2004 "9CBS Development Master"
The original intention of writing this article is to want to share the simple and interesting history of Garbage Collection technology. Before the pen, I stood on the window and looked at the cleaning car that was shipping garbage in the community. Similar to the work of the sanitation workers in life, the garbage collection in software development is actually a technology that automatically cleans and removes memory garbage. It can effectively prevent two dangers that may occur in dynamic memory allocation: due to memory garbage Excessive-raised memory resume (this and the danger of blocked the sewage pipeline of the domestic garbage), and the illegal references caused by inappropriate memory release (this is similar to our bought a bottle in life) It has expired for three years of milk).
According to historians, the ancient Egyptians have built a perfect sewage and garbage trash in the city, and the Chinese have taken the strongest cleaning capacity in the world. Urban - Changan. Today, when we experience the convenience and comfort of automatic garbage collection in software development, we should at least know that this refusal of messy "garbage collection" spirit is actually human since ancient times.
Texture era
Back to top
Most of the domestic programmers are a huge charm of the garbage collection technology in the Java language. Many people have also regarded Java and garbage into the unsolicientation. But in fact, garbage collection technology has been developed and matured more than 30 years before the Java language, and Java language has made this magical technology to the majority of programmers.
If you must find a twin brother for garbage collection technology, then the Lisp language is a well-deserved candidate. The Lisp language born in mit in 1960 is the first language that relies on dynamic memory allocation technology: almost all data in the LISP appears in the form of "table", and the space occupied by "Table" is in the heap Dynamically assigned. Lisp Language Inn the dynamic memory management feature requires that the Lisp language must resolve the automatic release problem of each memory block in the heap (otherwise, the Lisp programmer will inevitably be overwhelmed by countless Free or DELETE statements), This directly leads to the birth and development of garbage collection technology - saying out the question, a teacher, a teacher told us that Lisp is a language contributing to modern software development technology. I didn't think about this statement: I'm covered with parentheses, how can I look like a labyrinth? How can I be more great than C language or PASCAL? But now, when I know that garbage collection technology, data structural technology, artificial intelligence technology, parallel processing technology, virtual machine technology, metadata technology, and programmers have originated from Lisp language, I especially want to The teacher apologized, and took back my childish idea.
I know the close relationship between Lisp language and garbage collection, we don't understand why the two pioneers of garbage collection technology J. McCarthy and M. L. Minsky are also important people in the history of LISP language development. J. McCarthy is the father of the Lisp. He also described the algorithm and implementation of garbage collection while invented the LISP language; ML Minsky became a few mainstream garbage collection today. The founder of the algorithm - similar to the experience of many technical masters at the time, J. McCarthy and ML Minsky have achieved enviable achievements in many different technical fields. Perhaps, in the 1960 software development history, the thinking, the researchers, the firm, and the determinants of the will become more likely to become a western tough guy.
Before you understand the origin of the garbage collection algorithm, it is necessary to review the main way of memory allocation. We know that most mainstream language or operating environments support three most basic memory allocations, which are: 1. Static allocation: static variables and global variables allocation form. We can regard the static allocation of memory as a durable furniture at home. Usually, they don't need to be released and recycled, because no one will take the big wardrobe as garbage every day to the window.
Second, Automatic Allocation: Method for allocating memory in the stack. The memory in the stack can be automatically released as the spodder is exited. This is similar to visitors in home, and it is necessary for one night to return to each home. In addition to individual miscarriers, we generally don't have to bundle the guests in the garbage bag.
Third, dynamic allocation: Dynamically allocate memory space in the heap to store data. The memory block in the pile seems to be a napkin paper in our daily use, you have to throw it into the trash, otherwise it will be full of wolf. Lounge like me wants to have a household robot to cleanse with a household robot. In software development, if you are too lazy to release memory, you also need a similar robot - this is actually a garbage collector implemented by a particular algorithm.
That is, all the garbage collection algorithms mentioned below are algorithms that collected and cleaned up for useful "napkins" during the program operation. Their operational objects are neither static variables, nor local variables, but all allocated in the heap. Memory block.
Reference counting algorithm
Prior to 1960, when people designed garbage collection mechanisms for the Lisp language in the embryo, the first algorithm was the reference count algorithm. An example of taking tissue paper, this algorithm can be roughly described as:
After lunch, in order to record the design inspiration in the mind, I took a napkin from the napkin bag and intended to draw the blueprint of the system architecture above. In accordance with the requirements of the reference count of napkins, I must write a count value 1 in a corner of napkin, in order to express this napkin. At this time, if you also want to see the blueprints I painted, you will add the count value on the napkin, change it to 2, indicating that there are 2 people using this napkin at the same time (of course, I am Will not allow you to use this napkin to wipe the nose). After you finish reading, you must subtract the count value, indicating that you have ended the use of the napkin. Similarly, when I write all the contents of the napkin to the notebook, I will consciously reduce the count value on the napkin. At this time, there is no accident, the count value on the napkin should be 0, which will be garbage collector - assuming that is a robot responsible for cleaning the sanitation - picked up to the trash, because garbage collection The only mission of the unit is to find a napkin with a count value of 0 and clean them.
The advantages and defects of the reference count algorithm are also obvious. This algorithm is faster when the garbage collection task is executed, but the algorithm proposes additional requirements for each memory allocation and pointer operation in the program (increasing or reducing the reference count of the memory block). More importantly, the reference count algorithm cannot release the memory blocks of the loop reference, which is a feminist and brilliant discussion: D. Hillis.
One day, a student walked to Moon and said: "I know how to design a better garbage collector. We must record the number of pointers pointing to each node." Moon patiently told the student with this story: "One day, a student walked to Moon, saying: 'I know how to design a better garbage collector ...'" D. Hillis story and we often say "often, there is a mountain, there is a temple on the mountain, There is an old monk in the temple, there is a wonderful thing. This shows that the order is not enough to solve all the problems in garbage collection using the reference count algorithm. Because of this, the reference counting algorithm is often excluded from a narrow garbage collection algorithm. Of course, as a simplest, most intuitive solution, the reference count algorithm itself has its irreplaceable superiority. Before and after the 1980s, DP Friedman, DS Wise, HG Baker et al. Several improvements made the reference count algorithm, which makes reference counting algorithms and variants (such as delay count algorithms, etc.) in a simple environment, or in some In the modern waste collection system of a variety of algorithms, you can still show your hand.
Mark - Clear (Mark-Sweep) Algorithm
The first practical and perfect garbage collection algorithm is J. McCarthy and other people proposed and successfully applied to the labeling of the Lisp language - clear algorithm. Still in napkins as an example, the label - the execution process of the clearance algorithm is:
During the lunch, everyone in the restaurant takes napkins according to their own needs. When the garbage collection robot wants to collect waste napkins, it will let people dine first stop, then ask everyone in the restaurant: "Are you using a napkin? Which napkin is you used?" The robot will draw a napkin on the napkin in each person. After the interrogation process, the robot is looking for all napkins that are scattered on the table and there is no marker (these obviously used waste napkins), throw them into the trash.
As its name is implicit, the execution process of the tag-clearance algorithm is divided into two major phases of "tags" and "clear". This step-by-step implementation has laid the ideological foundation of modern waste collection algorithm. Unlike the reference count algorithm, the tag-clear algorithm does not need to run the environment to monitor each memory allocation and pointer operation, and as long as it tracks the pointing of each pointer variable in the "Mark" phase - the garbage collector implemented with similar ideas It is often commonly known as a track collector (Tracing Collector)
With the success of the Lisp language, the tag-clearance algorithm is also placed in most early LISP operating environments. Although the initial version of the tag-clearance algorithm still has many defects such as high efficiency (marking and clearing is two fairly time-consuming processes), we can see that almost all modern garbage in the discussion Collecting algorithms are tagged - the continuation of the thoughts, only this point, J. McCarthy et al. Contribution to the garbage collection technology is unspeakable, and it is not worthy of their achievements on the Lisp language.
Copying algorithm
In order to solve the labeling-clearance algorithm defect in garbage collection efficiency, M. L. MinSky published a famous papers in 1963, "A Lisp Garbage Collector Algorithm Using Serial Secondary Storage" "in 1963. M. L. Minsky described in this paper is called a replication algorithm, which is also successfully introduced into the Lisp language by M. L. Minsky. The copy calculation is divided into two points into two, and uses simple replication operations to complete garbage collection work, this idea is quite interesting. Borrowing a metaphor of napkins, we can understand the copy algorithm of M. L. Minsky:
The restaurant is divided into two sizes of two sizes and the northern area of the garbage collection robot. At lunch, everyone will take a meal in the southern area (because the space is limited, the number of people will naturally be reduced by half), and napkins can be used at will. When the garbage collection robot is considered necessary to recover the old napkin paper, it will require all the dinner to transfer from the South District to the North District at the fastest speed, while carrying yourself in the napkin. After all people are transferred to the North District, the garbage collection robots simply throw all the scattered napkins in the southern zone into the garbage bin, even if the task is completed. The next garbage collection process is also roughly similar, the only difference is only the direction of people to become from the north. Such cycle reciprocation, each garbage collection is simply transferred (also copy) once, garbage collection speed is unparalleled - of course, the hard work between the dining and trip to and from the north and south, garbage collection robot is never It comes to the slightest.
The invention of M. L. minsky is definitely a kind of whimsy. The partitioning and copying ideas not only greatly increase the efficiency of garbage collection, but also makes concise and concise memory distribution algorithms that have never been unprecedented. Just consider complex situations such as memory, as long as you move the pile pin pointer, you can allocate the memory according to the order), which is a miracle! However, the emergence of any miracle has a certain price. In garbage collection technology, the cost of reproduction algorithms improves the efficiency of the efficiency is half a manual memory. Introduction, this cost is too high.
Regardless of the advantages and disadvantages, the replication algorithm acquires success in practice with the mark-clearance algorithm. In addition to ML Minsky in the Lisp language, from the late 1960s to the early 1970s, RR Fenichel and JC Yochelson et al. Were also improved in the different implementations of the Lisp language. S. ArnBorg is successful. The replication algorithm is applied to the Simula language.
At this point, the three traditional algorithms of garbage collection technology - the reference counting algorithm, the mark-clearance algorithm and the replication algorithm have been introduced before and after 1960, and the three algorithms have their own strengths, and there is a fatal defect. Since the late 1960s, the primary energy of researchers gradually turned to these three traditional algorithms to make long-term, adaptable, adapting to the efficiency and real-time requirements for garbage collection, and operating environment.
Mature
Back to top
Since the 1970s, as scientific research and application practice continues to practice, people have gradually realized that an ideal garbage collector should not cause the application to be suspended at runtime, and there should be no additional amount of memory space and CPU resources. And three traditional garbage collection algorithms are unable to meet these requirements. People must propose an updated algorithm or idea to solve many problems encountered in practice. At that time, the efforts of researchers included: First, improve garbage collection efficiency. Use the tag-Clear Algorithm's garbage collector to consume considerable CPU resources during operation. Early LISP operational environment Collect memory garbage time accounted for 40% of the total running time of the system! - The low garbage collection efficiency directly creates a bad name in the execution speed; until today, many people also reflectively mistakenly think that all LISP programs are unparalleled.
Second, reducing the memory occupancy of garbage collection. This problem mainly appears in the replication algorithm. Although the replication algorithm has obtained quality breakthrough in efficiency, the cost of sacrificing half of the memory space is still huge. In the early days of computer development, in the days to calculate in KB, half of the memory space is quite extorted or blocking the road.
Third, look for real-time garbage collection algorithms. Regardless of the performance efficiency, three traditional garbage collection algorithms must interrupt the current job of the program when performing garbage collection tasks. This latency caused by garbage collection is a lot of programs, especially the execution of key tasks, there is no way to ember. How to improve the traditional algorithm in order to implement a real-time garbage collector that is quietly executed in the background, does not affect - or at least it does not affect the current process, which is obviously a more challenging work.
The researchers explore the determination and research on unknown areas and research work are also amazing: in the 1970s to the 1980s, a large number of new algorithms and new algorithms and new ideas in the practical system have stood out. It is because of these increasingly mature garbage collection algorithms, today we can allocate memory blocks with your heart in Java or .net, you don't have to worry about the risk of space.
Mark - Mark-Compact algorithm
The tagged-organizer is organic combination of tag-clearance algorithms and replication algorithms. The advantages and replication algorithms of the labeling-clearance algorithm in memory are integrated in the execution efficiency, which is the result of everyone wants to see. However, the integration of the two garbage collection algorithms is not as simple as 1 plus 1 is equal to 2, and we must introduce some new ideas. Before and after 1970, G. L. Steele, C. J. CHENEY and D. S. Wise were found in the correct direction, and the marking - the contour of the sorting algorithm gradually became clear:
In our familiar restaurants, this time, garbage collection robots no longer divide the restaurant into two north-south areas. When the garbage collection task needs to be executed, the robot first executes the first step of the clearance algorithm, draws a tag for all the napkins in the use, then the robot is ordered to bring all the dining tissue paper to the restaurant of the restaurant, at the same time Throw out no marked scouring napkins to the restaurant north. In this way, the robot only consists of in the north of the restaurant, embraces the garbage bin, and welcomes the painful napkin.
Experiments have shown that the overall implementation efficiency of marking-organizational algorithms is higher than that of the labeling - clear algorithm, and it is an ideal result that needs to sacrifice half of the storage space like a replication algorithm. In many modern garbage collectors, people use tagged-organizational algorithms or their improved versions.
Incremental Collecting Algorithm
The study of real-time garbage collection algorithms directly leads to the birth of incremental collection algorithms. Initially, people's ideas for real-time garbage collection are: In order to perform real-time garbage collection, a multi-process operating environment can be designed, such as using a process to execute garbage collection work, and another process executing program code. In this way, garbage collection work seems to be quietly completed in the background, and does not interrupt the operation of the program code.
In the example of collecting napkins, this idea can be understood as: garbage collection robots looking for discarded napkins while people dining and throwing them into the trash. This seemingly simple idea will meet the problems between processes between processes during design and implementation. For example, if the garbage collection process includes tagging and clearing two working phases, the result of the garbage collector records in the first stage is likely to be modified by the memory operation code in another process, so that There is no way to carry out the second phase of the work.
ML Minsky and DE KNuth conducted early research on the technical difficulties in real-time garbage collection, and GL Stelec issued a paper entitled "Multiprocessing Compactify Garbage Collection" "entitled" Multiprocessing Compactify Garbage Collection "in 1975. Real-time garbage collection algorithm called "Minsky-KNuth-Steler" by the later generation. E. W. Dijkstra, L. Lamport, R. R. Fenichel and J. C. Yochelson are also successively contributing to this field. In 1978, H. G. Baker published a "List Processing In Real Time On A Serial Computer" article, the system describes the incremental collection algorithm for garbage collection in multi-process environments.
The foundation of incremental collection algorithms is still a traditional tag-clearance and replication algorithm. The incremental collection algorithm allows the garbage collection process to complete tags, cleaning, or replication works in phases in phases in phases. The internal mechanism of various incremental collection algorithms is a quite cumbersome thing. Here, readers need to understand: HG Baker et al. Efforts have become real-time garbage, and we will It is not used for the running of garbage to collect interrupts.
Generation or Collecting algorithm
Like most software development technologies, statistical principles can always play a strong catalyst in the process of technology development. Before and after 1980, technicians who were good at using statistical analysis knowledge in research have found that most memory blocks have a short period of survival, and the garbage collector should put more energy on the inspection and cleanup of the newly allocated memory block. This finding that the value of garbage collection technology can be summarized as the example of napkins:
If the garbage collection robot is smart enough, you can find a habit of using napkins during your restaurant - such as some people like to use a napkin before dining, some people like to come back to a napkin. Some people use a sneeze to use a napkin - the robot can make a better napkin recycling plan, and always take the garbage for how long does people just throw away napkins. This approach based on statistical principles can of course make the tidy of the restaurant have been improved.
D. E. Knuth, T. Knight, G. Sussman and R. Stallman et al. Have made the earliest research on the classification of memory waste. In 1983, H. Lieberman and C. Hewitt published a paper entitled "A Real-Time Garbage Collectes" "on the Life Time Garbage Collectes". This famous paper marks the formal birth of the collection algorithm. Since then, in H. G. Baker, R. L. Hudson, J. E. B. Moss et al., The collection algorithm has gradually become mainstream technology in the field of garbage collection. The collection algorithm usually divides the memory blocks in the stack into two categories, older and young. The garbage collector uses different collection algorithms or collects strategies to handle these two types of memory blocks, and specialize in processing the main working time on the postpone. The distribution algorithm allows the garbage collector to work more effectively under a limited resource conditions - this efficiency has been the best proof of today's Java virtual machine.
Application wave
Back to top
LISP is the first beneficiary of garbage collection technology, but it is obviously not the last one. After the Lisp language, many traditional, modern, postmodern languages have pulled garbage collection technology into their arms. Just a few examples: Birth in the 1964 Simula language, the 1969 SmallTalk language, 1970 Prolog, ML language in 1973, Scheme in 1975, Modula-3 in 1983, 1986 Eiffel language, 1987 Haskell language ... They all have used automatic garbage collection technology. Of course, the garbage collection algorithm used in each language may not be the same, most language and operating environments even use a variety of garbage collection algorithms. But in any case, these examples have explained that garbage collection techniques are not a "college" technology in the day of birth.
For the C and C languages we are familiar with, the garbage collection technology can play a huge effect. As we already know in the school, the C and C languages have not provided garbage collection mechanisms, but this does not hinder our library or class libraries that have garbage collection functions in the program. For example, in 1988, HJ Boehm and AJ Demers successfully realized a library using conservative gc algorithmic (see http://www.hpl.hp.com/personal/hans_boehm/ GC). We can use the function library in a C language or C to complete the automatic garbage collection function, if necessary, even allow traditional C / C code to work together in a program with the C / C code using the automatic garbage collection function.
The birth of Java language in 1995 turned garbage collection technology into one of the most popular technologies in the field of software development. From a point of view, we are difficult to distinguish between Java benefit from garbage collection, or is also known as the popularity of Java itself. It is worth noting that the garbage collection mechanism used by different versions of Java virtual machines is not exactly the same, and Java virtual machines have also passed a process of simple to complex development. In the 1.4.1 version of the Java virtual machine, the garbage collection algorithm can experience, including collection, copy collection, incremental collection, tag-finishing, parallel copy, parallel cleaning, Many of concurrent collected, and the continuous improvement of Java program runs to a large extent to be due to the development and improvement of garbage collection technology. Although there are many application platforms and operating systems that include garbage collection technologies, Microsoft .NET is the first real-purpose language operating environment that includes garbage collection mechanisms. In fact, all languages on the .NET platform include C #, Visual Basic .NET, Visual C .NET, J #, and the like, can be used in almost identical ways to use the garbage collection mechanism provided by the .NET platform. We seem to be able to assert, .NET is a major change in garbage collection technology in the application field, which makes garbage collection techniques from a simple technology into an intrinsic culture in the application environment and even operating systems. This change is far more than the business value of future software development technology.
Trend
Back to top
Today, people dedicated to garbage collection technology are still unremitting efforts, and their research direction includes garbage collection, garbage collection, database, etc. of garbage collection, database, and more garbage collection, and more garbage collection in complex affairs environment.
But in the middle of the programmer, there are still many people who dismissed garbage collection. They would rather believe in the Free or DELETE command written by our line, and they are not willing to hand over the heavy collection of garbage to those who are both stupid. Stupid garbage collector.
I personally think that the popularity of garbage collection technology is a general trend, which is like life will be more and more good. Today's programmers may be due to the garbage collector to take a certain CPU resource, but the programmer in more than 20 years has also insisted on writing a program by the machine language because the advanced language is too slow. Today, in the hardware speed, we are to regret that the time loss is not before, or it is not unswervingly standing in the code and the labet of the environment - garbage collection?
[Wang Wei Gang, December 2003]