Understanding .NET CLR GARBAGE COLLECTION

zhaozj2021-02-16 176

introduction

Memory Management is a fairly complex and interesting field in computer science. In the past few decades, the technology of memory management continues to advance, so that the system can make more resources necessary to use memory this computer.

In general, memory management can be divided into three categories: hardware management (such as TLB), operating system management (such as buddy system, paning, segmentation), application management (such as C , Java, .NET memory management mechanism). In view of the limitations of space and the IP, this paper only involves a small part of memory management, that is, the memory management method in .NET. .NET is a contemporary application framework that uses memory automatic management technology, which is the usual memory garbage automatic recycling technology - Garbage Collection (hereinafter referred to as GC), represents the analysis of .NET's profiling comparison.

History and benefits of GC

Although this paper describes GC as a goal of .NET, the concept of GC is not born. As early as 1958, the Lisp language implemented by John McCarthy, the Ding Ding, the LISP language, has provided GC's function, which is the first appearance of GC. The programmer of LISP believes that memory management is too important, so it cannot be managed by the programmer himself. But the LISP in the later days did not have a gather, and the language used by the memory manual management occupied the upper wind, represented by C. For the same reason, different people have different opinions, and the C programmer believes that memory management is too important, so it cannot be managed by the system, and the Lisp program is slow as the turtle running. Indeed, in that, the speed of the GC to be carefully calculated for every Byte and a lot of occupation of system resources make many people unacceptable. Then, in 1984, the Small Talk language developed by Dave Ungar was first used in the first time, which was described below (this technology), but the Small Talk did not have a wide range of applications.

Until the mid-1990s GC, I mounted the historic stage with the protagonist's identity. This has to be attributed to Java's progress. Today's GC has not been Wu Xia Amon. Java uses a VM (Virtual Machine mechanism, and the runtime of managing the program by the VM is of course also included to GC management. At the end of the 1990s. Net appeared, .NET uses a similar method similar to Java by CLR (Common Language Runtime). The emergence of these two camps introduced people into a virtual platform-based development era, GC is also getting more and more concerned at this time.

Why use GC? It can also be said to use memory automatic management? There are several reasons:

l Improve the abstraction of software development;

l Programmers can focus on actual issues without distracting to manage memory;

l The interface of the module can be clearer and reduce the coupling between the modules;

l A greatly reduced the bug brought about by memory people;

l Make memory management more efficient.

In general, GC can enable programmers to get rid of the complex memory problem, thereby improving the speed, quality and security of software development.

What is GC

GC is as named, is garbage collection, of course, this is only in memory. Garbage Collector (garbage collector) is also a GC without confusion), based on the application's root [1], traversing all objects allocated on the HEAP [2], by identifying whether it is referenced To determine which objects are dying which still needs to be used. The object that is no longer cited by the application's root or other object is the object that has been died, that is, the so-called garbage, needs to be recovered. This is the principle of GC work. In order to achieve this principle, GC has a variety of algorithms. More common algorithms include Reference Counting, Mark Sweep, Copy Collection, and more. Current mainstream virtual system .NET CLR, Java VM and Rotor are all Mark Sweep algorithms. This article is based on .Net, here only describes the Mark Sweep algorithm. Related GC algorithm

Mark Sweep

During the process of running, constantly allocating the HEAP's allocation space, when the HEAP space is occupied, the Mark Sweep algorithm is activated when it is not enough to assign the next object, and the garbage memory is reclaimed and returned to free list. [3].

Mark Sweep is divided into two phases, Mark phases and SWEEP phases like its name. The mission of the Mark phase is to start from root, using mutual reference relationships to traverse the entire HEAP, tagged by the object referenced by root and other objects. The object that is not marked is garbage. Then the SWEEP phase, this stage of the task is to recover all garbage. As shown in Figure 1.

Figure 1: m is the labeled object

Although the Mark Sweep algorithm is faster than Reference Count, and the memory leak caused by cyclic references can be avoided. But there are many shortcomings that it needs to traverse all objects in Heap (surviving objects in the Mark phase traversal, and the death object is traversed in Sweep phase) so the speed is not very ideal. Moreover, a large amount of memory fragments will result in a waste of garbage.

In order to solve these two problems, the Mark Sweep algorithm has been improved. First, in the algorithm, the Compact phase is added, that is, the survival object is first tagged, and then move these objects to continuously, last updated and object-related addresses and free LIST in memory. This is the Mark Compact algorithm, which solves the problem of memory fragmentation. In order to improve the speed, the concept of Generation is introduced.

Geneation

Generational Garbage Collector (also known as Ephemeral Garbage Collector) is based on the following assumptions:

The smother the object, the shorter its life cycle;

l The object, the more it is, the longer its life cycle;

l The relationship between young objects and other objects is relatively strong, and the frequency accessed is also relatively high;

l Recycling compression part of the HEAP is faster than the recovery compression of the entire HEAP.

The concept of Generation is to manage objects in HEAP (divided into several pieces, different objective life of each block). When the object is just assigned, the Mark Compact algorithm is started when the space of Generation 0 will be exhausted. After several GCs, this object will still be moved to Generation 1. Similarly, if this object is still in the GC, it will be moved to Generation 2 until it is last recycled or died with the same program. The greatest advantage of using Generation is that each GC does not need to be processed throughout HEAP, but a small piece is processed each time. For objects in Generation 0, because they die the most likely, the number of times the GC can be arranged more, while other relative death possibilities, the generation of some of the objects, the generation of some of the objects can arrange several GCs. This makes it a degree of increase in the speed of GC. This creates a few problems to be discussed, first of all, should set a few Generations, each generation should be set to, then it is to be more than how many times it has been GC when it is upgraded for each object. About the .NET CLR's processing on this problem, an example will be tested in this article. Related data structure

There are three Managed Heap, Finalization Queue and Freachable Queue with .NET GC.

Managed HEAP

Managed Heap is a simple and optimized pile that is not only the same as the traditional C-Runtime. Its simple management method is to improve the management speed of the heap, but also based on a simple (also impossible) assumption. The management assumption of Managed HEAP is endless. There is a pointer called NextObjPTR on Managed HEAP, which is used to indicate the address of the last object on the stack. When there is a new object to be assigned to this pile, just form a new NEXTOBJPTR with the size of the NEXTOBJPTR's value to form a new object. This is just a simple addition. When the value of NextObjPTR is outside the Managed Heap boundary, the stack is full, the GC will be started.

Figure 2: Schematic diagram of related data structure

Finalization Queue and FREACHABLE QUEUE

These two queues and .NET objects are related to the Finalize [4] method provided. These two queues are not used to store real objects, but store a pointer to the object. When the NEW operator is allocated on the Managed HEAP, the GC will analyze it if the object contains the Finalization Queue, and a pointer to the object is added to the Finalization Queue. After GC is started, it is garbage in the Mark phase. Search in the garbage, if there is a target pointing to the pointer in the Finalization Queue in the garbage, this object is separated from the garbage and moves to the pointer to FREACHABLE Queue. This process is called the resurference of the object, and the object of death is saved. Why save it? Because the Finalize method of this object has not been executed yet, it can't let it die. FREACHABLE Queue usually does not do anything, but once inside it has been added, it will trigger the Finalize method of the object to be executed, then remove this pointer from the queue, which is the object to die. The .NET Framework's System.gc class provides two ways to control Finalize, ReregisterForFinalize, and SuppressFinalize. The former is a Finalize method for requesting the system to complete the object, the latter is a Finalize method that requesting the system not to complete the object. The ReregisterForFinalize method is actually re-adding the pointer to the object to the Finalization Queue. This has a very interesting phenomenon, because the object in Finalization Queue can be reborn, if the reegisterforfinalize method is called in the target's Finalize method, this forms an object that will never die on the pile, like a Phoenix Nirvana It can be reborn every time you die. Direct control of GC

The SYSTEM.GC class of .NET Framework provides some methods that can operate directly to GC. The System.Runtime.InteropServices.Gchandle class provides a method of accessing managed objects from a non-hosting memory (herein not discussed here). Let's first look at this example of direct operation using System.gc.

Using system;

Namespace gctest

{

Class gcdemo

{

Private static void generationDemo ()

{

// Let's See How Many Generations The GCH Supports (We know it's 2)

Console.writeline ("Maximum GC Generations: {0}", gc.maxgeneration;

// Create a new baseobj in the heap

GENOBJ OBJ = New Genobj ("Generation");

// Since this Object is newly created, IT SHOULD BE IN Generation 0

Obj.displayGeneration (); // displays 0

For (INT i = 1; i <= gc.maxgeneration; i )

{

// Performing a Garbage Collection Promotes The Object's Generation

Gc.collect ();

Obj.displayGeneration (); // Displays i

}

Obj = null; // destroy the strong reference to this objectfor (INT i = 0; i <= gc.maxgeneration; i )

{

Gc.collect (i);

Gc.waitforpendingfinalizers ();

// Suspend THISREAD Until The Freachable Queue of

// the I generation has been emptie

// only when i = gc.maxGeneration, this finalization method

// of obj Will Be Performed

}

Console.writeline ("Demo Stop: Understanding Generations);

// Total GC Times

// generation 0: 5 Times

// generation 1: 4 Times

// generation 2: 3 Times

}

Public static void

Main

()

{

GenerationDemo ();

}

Class GenoBJ

{

Private string objname;

Public genobj (String name)

{

this.objName = name;

}

Public void displayGeneration ()

{

Console.Writeline ("I am in generation {0}", gc.get generation (this));

}

This is an interesting example, first using gc.maxGeneration (), the GC in .NET CLR is used to use a 3 generation structure, which is Generation 0 ~ 2. Next, a Genobj instance OBJ is assigned on the Managed HEAP. At the beginning, OBJ is located in Generation 0, and then the entire Managed HEAP is two GC. It can be found that every level of GC survival will rise until it reaches the generation 2. Set Obj = NULL, so that this is to cancel root's strong reference to OBJ, so that OBJ is garbage. Follow the use of gc.collect (i) to perform GC step by step to the Managed Heap, this method is GC for Generation 0 ~ I. The role of gc.waitforpendingfinalizers () is to hang the entire process, wait until the FINALIZE method of the object to which FREACHABLE Queue is pointed. The purpose of this is to ensure complete recovery of garbage determined by this GC, without the subject of the object's Finalize method.

Some of this example can be visually seen. Net CLR's processing of GC, to get more specific data readers can test the .NET application using performance monitor PerfMon.exe provided by Windows.

Finally, it is also mentioned that GC's processing of Large Object, this processing is different from the above discussion, just GC does not make Compact, because move a larger object to system performance The adverse effects of the coming are obvious.

in conclusion

This article is intended to make readers have a rough understanding of .NET CLR Garbage Collection, which is just a shallow discussion, and many aspects are not involved, such as the working principle of GC in multithreaded state, various versions of .NET GC, etc. Wait. Interested in a friend can read the source code of Rotor and Mono, and the source code of Microsoft .NET Framework is impossible. I am very welcome to be interested in friends to discuss with me. Reference

l "Garbage Collection - Problem and Technology" 宗燕

l ".NET automatic memory management" Cai Xuebei

l "Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework" Jeffrey Richter

l Msdn of Visual Studio .NET

2003-8-23

[1] Values can be operated directly by the program to ROOT.

[2] A separate data block in HEAP.

[3] Data structure for recording hollow space in HEAP

[4] For the Finalize method, please consult the relevant information, this article is not introduced.

转载请注明原文地址:https://www.9cbs.com/read-27967.html

9cbs

New Post(0)