Garbage Collection, Java-Java (extracted from the network)

xiaoxiao2021-03-06  65

Garbage Collection (GARBAGE Collection)

Author: Mac Wang

Tuesday, April 15 2003 11:46 AM

I remember that when I first started learning Java, there were always many seniors who learned Java to tell me the advantages of Java repeatedly, such as "Write Once, Run Anywhere", but I remember the most impressed or garbage collection Because you don't have to worry about memory management. However, I only knew that these dirty lives were killed, and the other didn't know the cloud in the cloud. Moreover, it is always a bit unhearted to give this important thing to this guy. After all, there is no self-ending.

However, with the deep understanding of the understanding, this guy is more and more. After all, we don't have to personally ask the annoying memory management. Don't care about which object is released, and this guy does make memory well. Since this guy does this important thing, and bring us so much benefits, I think we still need to understand it. At least, if one day, the unresponsible programmer pushes the mistakes of their programs to the garbage collector, you can reveal his lies. And only understand it, you might really master it, use it or even optimize it.

What is garbage collection?

Garbage Collection (GC) is actually a dynamic storage management technology. Mainly in accordance with a specific garbage collection algorithm (Garbage Collection Algorithm referred to as GC algorithm). Simply put, it is automatically completed by the system in those programs, which is automatically completed in the background. Such a mechanism is called garbage collection, which provides the programming language of this function, and we say it supports GC, such as Typical Java, C #, Python, etc., this article mainly discusses the GC in the series Java virtual machine (JVM) in this article.

Why use GC?

I believe that people will have this question every first time to contact GC? If you want to figure out the reason why GC is so popular, you must first understand its role. What can I do?

It is obvious that it saves the trouble and danger of programmers to manage memory. In the age of GC, we must release our hegestal memory in time, but at the same time, it must figure out that these memory can be released before the release. Due to the uncertainty of manual management memory, it is easy to cause memory leaks or Dangling Pointer and even system crashes, which is undered to be unacceptable for many requirements for high performance applications. Moreover, the system memory is such a more expensive, and any accident may be fatal, so the use of GC has improved the system security.

At the same time, I think the popularity of GC is also a trend in the calculation ideas. The current operating system and programming tools are increasing to provide enough default services, and programmers only need to know how to use these services. Optimists, of course, believe that this is conducive to us from being free from heavy low-level work, which is conducive to us to put a limited time to more important things. Pessimism believes that although troubles and dangers, we still need to manage memory as much as possible. And with the default realization of more and more underlying mechanisms, programmers become more fools, this and the initial use of computer thinking is also incorporated. Originally, we just need to understand how the computer works, how to implement a certain function, but now how to use a certain product of a certain manufacturer, this may be the sadness of modernization!

In addition, the defect of GC itself also affects the use of GC, such as performance issues, and incomplete GC algorithm. Some GC algorithms adopted earlier cannot guarantee 100% to collect all waste memory. Of course, with the continuous improvement of the GC algorithm and the continuous improvement of the efficiency of hardware and software, these problems seem to be solved.

How does GC work?

As mentioned earlier, the GC's working mode is determined by the specific GC algorithm. We use a collection of GC algorithms that we call a certain garbage collector (GARBAGE Collector). The garbage collector used in Java (actually per garbage collector corresponds to at least a GC algorithm), including: Mark and Sweep Collector, Tag / Mr. Collector (Mark and Compact Collector ), Node Copy Collector, Incremental Collector, Generational Collector, and Advanced Computer Collector (see Resources 1). Of course, these garbage collectors are often used, such as the default configuration mode in the HotSpot series Java virtual machine uses a genetic collector, which takes a non-copy mark / shrinkage for the old generation (Old generation). Non-Copying Mark-Compact Collector, and for new generations, a single-threaded Copying Collector is taken. Let's take this as an example to explain the work mode of GC.

We know that traditional GC work is: GC is executed by scanning all threads and registers running in the JVM. If it finds a pointer to Java Pile (HEAP) (that is, the so-called root pointer --static Pointer), GC will continue to check. If the reference is indeed a reference to an object, the GC will follow all references within the object. All objects that are cited objects and all objects referenced to this object will be tagged so that the process of tagging is completed. The next operation There are different processing, and the tag / cleaning collector is a recycling of the memory of those unmarked objects, and the logging / shrinking the collector needs to move the living object. Together, even a large memory block, and the node replication collector relocated all the marked objects to another area in the Java heap, and reclaims all the memory within the original area. (See References 2)

As we can see, traditional GC must scan the entire Java pile every time, this is bound to cause too long, which is not good news for those real-time applications. Therefore, the collection collector will be shipped, it mainly uses such a fact: most of the objects (more than 95%) are very short, only a considerable number of objects have long residing in memory. By isolating newly created objects to a separate area (new generation), you can bring at least two benefits:

Since the newly created object is used to use a similar stack-like in this area, the object assignment becomes very fast;

When a GC operation is required each time, most of the objects in this area have been discarded, so only the very small amount of logged object is required, and the collection method replicated with node is required to copy a small amount of object, thereby avoiding For the recovery of a large amount of discarded object, the replication collection algorithm is often used in this area to complete the GC operation; and for the region where the old object (old index) is long, it is very small for the survival of these objects. . So there are many objects, and the arrangement is neat, so it is not advisable to use the node replication method used in the new generation, but can adopt a higher efficiency marker / shrink collection algorithm.

In addition, in order to better utilize the CPU resources of the system (such as multiple processors), in the 1.4.1 version of the HotSpot JVM, the optional multi-threaded node replicate collector is allowed to use optionally. Multithreaded Copying Collector is also known as Parallel Collector, multiple threads to complete the task of tracking and copying objects (Live Objects), which greatly reduces the time of the system waiting. You can use the -xx: usparneWGC parameter to turn on the collector (see Resources 3).

Figure 1: New generation parallel collector

This parallel collector uses a depth first Order to copy the relevant objects together, which can improve the area of ​​memory (cache utilization). In the later JVM version, the old independence collectors will also consider using parallel collectors.

In general, JVM only performs GC operation in the area where the new object is located, only when the system issues a request or memory is not enough, the old version of garbage collection begins to work.

Although these optimization measures have been taken, there is still a user waiting. In particular, when the system uses large memory, because there is a lot of objects, the GC time is rising. At the same time, there is no longer parking period in many interactive systems. In order to solve this contradiction, another optional garbage collector - incremental garbage collector is also available in the HotSpot series JVM. Incremental Low-Pause Garbage Collector, which uses the TRAIN algorithm (see the introduction of the reference data) to transform a large pause into a small pause. You can use parameter -XIncgc to turn on the collector.

Figure 2: Incremental garbage collector

In addition, in order to solve the large pause in the Old object storage area GC operation, an optional almost concurrent tag / cleaning collector is implemented in the 1.4.1 version of the HotSpot JVM (MOSTLY Concurrent Non-Copying Mark- Sweep Collector is a CMS Collector, which divides the entire tag / cleaning work into four phases:

Initial Mark: Objects directly referenced at this stage system tag root pointer

Concurrent Marking: During this stage, the markup work tag completion phase is completed (Remark): at this stage, complete the tagged end of the tag: Concurrent Sweeping: Clean all unmarked objects Fall out

Two inTIset Mark and Remark are very small (1G sized old independence, probably cost time 200ms or even less) needs to block application threads to process, and complete most of the work of Concurrent Marking and Concurrent Sweeping two phases use idle CPU resources to process so that the stop period of the application system is shortened to the smallest. Although it may cause a certain amount of performance in the peak period of the system (because everyone is working, there is no free CPU resource), but the average time and longest time will have two orders of magnitude. You can use the parameters -Xx: UseconcmarkSweepGC to turn on the collector. Figure 3: Concurrent tag / cleaning garbage collector

Of course, you can use these options in the new and old agents to meet the performance of your needs. The following figure is a demonstration that combines the use of concurrent collectors (new profiles) and CMS collectors (old profits).

Figure 4: Situation using Parallel Collector and CMS Collector

Characteristics of GC

From the above introduction, we can see that modern GC has the following features:

1, the accuracy of GC (Accuracy). This mainly includes two aspects: First, the garbage collector can accurately mark the live object, and the other is the reference relationship between the garbage collector to accurately position the object. The former is the premise of completely recovering all discarded objects, otherwise it may cause memory leakage. The latter is a necessary condition for realizing algorithms such as mergers and replication. It is because Java HotSpot GC can provide 100% accuracy, so it guarantees the correct implementation of the following two points:

All irreparable objects (INACCESSIBLE Object) can be reliably recovered;

All objects can be reassigned, allowing the object's copy and object memory, which effectively prevents the overfragmentation of memory (Object Memory Fragmentation).

And traditional conservative GCs can't do this.

2, the unpredictableness of GC. The most original GC always occurs when the system memory allocation has an error, and the current GC now has different GC algorithms and uses different collection mechanisms, it may be timed, it is possible to appear system free CPU The resource occurs, or like the original GC, wait until the memory consumption has the limit (of course, not necessarily memory is completely consumed), which has a relationship with your garbage collector's selection and specific settings.

3, most of the GC is a synchronous operation, that is, in the process of GC, it is not possible to perform any other processing at the same time, but modern GC also applies a large number of multi-threaded and asynchronous methods to shorten the single system waiting time and utilization. Free system resources. This makes it possible to have better scalability with modern GC applications that meet more requirements.

4, GC's implementation and specific JVM (Java Virtual Machine) and JVM's memory models have a very close relationship. Different JVMs may adopt different GCs, while the JVM memory model determines which types of GCs can be used. The memory system in the HotSpot series JVM is now designed with the most advanced object-oriented framework, which allows the series of JVMs to adopt the most advanced GC.

5. Modern GC achieves an algorithm for improving memory region in the cleaning phase.

6. Modern GC provides many optional garbage collectors, and different parameters can be set when each collector is configured, which gives us the optimal application performance according to different application environments, but at the same time It also brought us the complexity and difficulty of configuration, especially for those beginners. As the saying goes, the sword can hurt yourself in the death of the enemy. With the application of various new technologies, modern GC has become increasingly large and complicated. Therefore, only the characteristics of truly understanding and mastering modern GC can be given to the advantages, avoiding its shortcomings, achieving optimal use results.

Java programming and GC

The article wrote this, I think everyone should have a probably understanding of GC. Below I still want to remind you a few more confused places:

1. Do not try to assume that the GC takes time, all this is unknown. From the above discussion we can see that the GC may happen at any time. For example, a temporary object in the method becomes a useless object after the method call is completed. At this time it can be released, pay attention to this time only can be released, but when it is released, no one knows.

2, Java provides some classes of GC derived classes, and provides a method for forcibly performing GC - calling System.gc (), but this is also an uncertain way. Java does not guarantee that each call will be able to start the GC, but it will only send a request to the JVM, and if it is really a GC, everything is an unknown.

3. Pick the garbage collector that is suitable for you. From above, many optional garbage collectors are implemented in the JVM in the HotSpot series, many of which use very advanced algorithms. Then that one is really suitable for me? Is it more advanced? The answer is of course negative, because advanced is relative, it may be the best in sacrifice other aspects, and the most suitable for you is the best. In general, if your system does not have special and harsh performance requirements, you can use JVM default options. Otherwise you can consider using targeted garbage collectors, such as incremental collectors, compare systems that are suitable for real-time requirements. The system has a higher configuration, there is a relatively idle resource, and it can be considered using the CMS collector. Of course, choose the right collector is an aspect, and reasonable GC configuration is also the other side you should pay, and this is often easily ignored.

4, the most important thing is the problem that is the most difficult to grasp, that is, memory leakage. I need to list it alone.

Memory leak

When it comes to GC, you have to lift the memory leak. It is the so-called "chaotic world", which was because of memory leaks these demon, people invented this tool, and passed through the points of heroes, GC has already increased, and it has been able to have a unique body on the rivers and lakes. And also gradually be recognized by the people of the rivers and lakes. But don't be paralyzed, think that there is a GC, you can have no worries, then you're a big mistake! In order to avoid the re-evil of memory leaks, you must always look at the following tips:

1. Good programming habits and rigorous programming attitudes will always be the most important, don't let your small mistakes cause large black holes in memory;

2, pay special attention to the collection object, such as HashTable and Vector, etc .;

3. When an object is associated, it may also be the breeding of memory leaks.

With regard to more discussions, you can read the articles in Reference 5, which discusses the causes of memory leakage and how to prevent leakage, and some tools for checking memory leaks.

Description

1, Java Hotspot Performance Engine (Java Hotspot Performance Engine) officially released on April 27, 1999, which mainly adopts the use of this technology as a HotSpot series in this paper. JVM, detailed introduction. Articles you can refer to Steve Meloan in java.sun.com The Java HotspotPotperFormance Engine: an in-depth look. 2, some pictures in the text (Figure 1, Figure 3)) from Reference 3 "Turbo-Charging The Java Hotspot Virtual Machine, V1.4.X To Improve The Performance and Scalability Of Application Servers", where green arrows The representative runs in multi-CPU, the red arrow represents the GC thread, and its length is generally reflected in the time of the GC operation leads to the system waiting.

Reference

1, Nagendra Nagarajayya and J. Steven Mayer in java.sun.com "Improving Java [TM] Application Performance and Scalability By Reducing Garbage Collection Times and Sizing Memory".

2. About the detailed working process of these collectors can refer to the online teaching materials of UNIVERSITY BRISTOL.

3, more detailed instructions on these parameters and JVM options, you can refer to Alka Gupta and Michael Doyle in developer.java.sun.com "Turbo-charging the Java Hotspot Virtual Machine, V1.4.x to Improve To Performance and Scalability of Application Servers "and the instructions documentation for HotSpot JVM.

4, Bill Venners Chapter 9 of "Inside the Java 2 Virtual Machine".

5. The IBM DW Java area in Jim Patrick's article "Processes memory vulnerabilities in the Java program".

转载请注明原文地址:https://www.9cbs.com/read-84424.html

New Post(0)