IBM Java JVM GC implements insider

zhaozj2021-02-11  179

Today, I gave it to my CD when I did Java Optimize, I saw it. Send a detailed introduction to the JVM of IBM Java 1.3.0 for AS400. The author of this article is Sam Borman. He is an IBM Java's management team member, which is responsible for the GC module. Another author is Richard Jones, he is a GC expert, he and Rafael LINS combined with a GC's monographic . If you want to know more about GC This book can not be seen on Amazon http://www.amazon.com/exec/obidos/asin/0471941484/qid=1030028976/sr=1-1/ref=sr_1_1/103- 9503275-3854231

It is said that it is orthodontics, according to Sam Borman's statement, IBM Java 1.3.0 GC is twice the HotSpot, and if performance is more performance in multi-symmetrical architecture. How does IBMJAVA do high performance GC? I put the 20,000-word articles to everyone. IBM JVM GC is divided into three steps, Mark Phase (Mark), Sweep Phase (Cleaning), Compaction Phase (Memory Tightening). Before you understand these procedures, let's take a look at Layout and Heap Lay Out in IBMJAVA. The structure of the Java object in the IBM VM is as follows 1.size flags2.mptr3.locknflags4.ObjectDataSize flags This is a 4Byte Slot (32 platform). The main function of this SLOT is to describe the size of the object. Since objects in IBMJAVA are allocated with a multiple of 8BYTE, the size of the object is actually in the true size / 8 stored in the 4Byte slot. In addition, the low three bits of this SLOT are the role of the reserved field to mark objects. They are bit1: swapped bit, which is used for Compaction Phase, which is used in the memory tightening phase. At the same time, this bit is also used in tag stack overflow (Mark Stack overflow) is also used to mark the NotyetScanned state. Bit2: dosed bit. This bit is used to indicate whether this object is here for a stack or register Reference. . This object cannot be deleted in the current GC CYCLE if this flag is in place. And if a Reference pointing memory is not a real REFERENCE, such as a simple float or integer variable but its value is the address of an Object in HEAP, we can't modify this REFERNECE. The bit2 of this object is also set to 1. Bit3: pinned bit. Marking an object is a nail object (Pinned Object). A pinnedObject cannot be deleted by GC because they may be referened outside of HEAP. A typical example is Thread, remember that I have the zombie county? It can't be deleted is this. Another PinnedObject is JNI Object, which is used by local code. MPTR: The 4byte slot is also 4byte on the 32 platform. MPTR has two functions, 1. If MPTR is not an array, MPTR points to a method block (Method Block), you can get a class block (Class Block) through this Method Block. This set of blocks tells you that this object is an instance of which Class is. Method Block and Class Block are allocated by Class Loader instead of HEAP to assign 2 in HEAP 2. If MPTR is an array, MPTR contains this object, the number of elements of arrays. Lockflags is also a 4byte slot on the 32 platform, but this SLOT is only available in 4 digits. Bit2: is Array Flag. If this bit is set, then this object is an array at the same time that the MPTR field contains the number of elements of the array. Bit4 is HASHED and MOVED BIT. If this bit is set, then he tells us that this object is deleted after being Hashed. Object Data: is the data of this object itself, Heap Layout: Heap Top Heap Limit Heap Base

Heap base is the start address of the Heap, and the Heap Top is the end address of the HEAP. Heaplimit is the Heap used by the current program to expand and shrink the limit. You can use the -xmx parameter to control the HEAP TOP and HEAP BASE when running in Java. Alloc Bits and Mark Bits Heap Top Allocmax Markemax Heap Limit Alloc Size Marksize Heap Base The above structure describes the relationship between Heap and Alloc Bits and, Markbits. AllocBits and Markbits are all elements of 1 bit. They have the same length as Heap. Below is the two objects are allocated in Heap and two vectors. Heaptop Allocmax MarkMaxheaplimit Allocsize Marksize

Object2top..Object2Base Object2allocbit Object2Markbit

Object1top.Object1Base Object1allocbit As the structure above, if an object is coming out in Heap, the address of the start address of this object is labeled in allocbits. Only the start address is marked in allocbits. But this process tells us that this object is created there, but does not tell us whether this object survives. When an object is still survive in Mark Phase, marking the idle block in the Free List IBM JVM with a Free List chain is labeled with a Free List link on the corresponding address in Markbits. Figure

Freechunck1 freechunck2 freechuncknsize size sizenext -------------> Next ---> ......... Next ---> NullFreeStorage FreeStorage FreeStorge has these basic concepts Let's take a look Mark Phase work

MarkPhase GC's MarkPhase will mark all live objects. The process of this tag all the accessibility object is called Tracing. Active State is composed of several parts below. 1. Saved Registers 2. Saved Registers 2. Describe the static elements 3. of the thread 3.java class 3. And the local and global JNI (Java Native Interface). Methods in JVM are called to trigger a frame on the C Stack. This frame contains, object instances, as the ASSIGNMENT results of local variables or parameters of incoming methods. All of these references are treated equally during TRACING. In fact, we can look at a series of 4-bytes slot in a set of threads, and then scan these SLOTs from the top to each stack. Each slot must be verified in the scanning process to point to a real object among the Heap. Because I said in front, it is very likely that these SLOT values ​​are just an int or float but their values ​​are equally equal to an object address in HEAP. Therefore, when scanning must be considerable, the scan must ensure that all pointers are an object, and this object is not deleted in the GC. Only Slot that meets the following conditions is a pointer to the object. 1. Musical memory must be allocated with a multiple of 8-byte. Must be within the range of Heap (ie, greater than HeapBase is smaller than Heaptop) 3. The corresponding allocbit must be set to 1. Objects that meet these conditions are referenced to ROOTS, and their dosed bit is set to 1 means that they cannot be deleted by GC. I think everyone knows why INT and FLOAT are Object in C #. In C # Because it is Object, a check is reduced during Tracing. This reduction has a great impact on performance. If the scan is complete, the Tracing process can be implemented safely. That is to say, we can find his corresponding objects in REFERENCE, because they are real Reference, then we can move the corresponding objects in CompactionPhase and modify these Reference. The TRACE process uses a STACK that can accommodate 4K. All references enter this stack by PUSH and labeled in Markbits. When the work is completed by Push and Mark, we started POP out of these SLOTs and conducted trace. Conventional objects (non-array objects) will access ClassBlock via MPTR, ClassBlock will tell us about the reference from other objects found in this object? When we found a Refernce in ClassBlock, if he found that he did not be by Mark, then we just in Markallocbits he and then pressed him into the stack.

Array objects use MPTR to access each array element if they don't have Mark, then press the stack. The Trace process continued until the stack is empty. Markstack overflow limits the size of MarkStack, so it may overflow. If the overflow occurs, then we set a global flag to indicate Markstack overflow, then we set those Bit1 of the Object of the Stack to NotyetScanned. Then when the TRACING process is complete, verify the global flag If you find overflow, put the NOTYETSCANED object again into the stack to start the new Tracing process. Parallel Mark (Parallel Mark) Due to the use of bitwise sweep and memory tightening regulatory, GC will make most of the time for Mark rather than two. This causes the IBM JVM to develop a parallel version of a GC. The purpose of parallel GC is not to exchange efficient on the 4, 8-way symmetric CPU system in sacrifice. The basic idea of ​​parallel Mark is to reduce the time of Marking through multiple auxiliary threads and a shared work. In a single CPU system, only one main thread is performed by the GC work. Parallel Mark still needs this host's participation, and he acts as a role of management coordination. The work to be performed by this Thread is more than the single CPU, including he must scan C-stack to identify the ROOTS pointer you need to collect. A system with N-way symmetric CPU automatically contains N-1 Helper Thread and the average distribution is on each CPU, and Master Thread seizes the SCAN's REFERENCE collection and handed it to Helper Thread independently completed Mark work. Each helper thread is assigned a separate local Mark Stack, as well as a Shareable Queue. Sharqueue will store the NotyetScanned object when Help Thread is in Mark overflow. Then the object BALANCE in ShareQueue is on the THREAD that has arrived by Master Thread. The main purpose of concurrent Mark (Concurrent Mark) Concurrent Mark is to reduce the GC's Pause Time when heap grows. As long as the HEAP arrives at Heap Limit, the Concurrent Mark will be executed. In Concurrent Phase, GC requires each thread in the application (not Helper Thread, the application you open in your own to make full use of system resources) to scan their own stacks to get roots. Then use these roots to synchronize the TRAC can be used. Tracing work is performed by a background low priority thread, while the program that the program you open must perform Heap Lock Allocation when allocating memory. Due to the use of the thread that the procedures you open, we must record the changes in Object that have been traced. This feature is implemented in a write barrier. This write gate is activated when the reference is referenced. It tells us when an object is old, so that we will scan some HEAP from the new scan. The specific implementation of the written is that HEAP assigns a 512byte memory interval allocated a BYTE in the card table (Card Table). Whenever one object's Reference is updated by CardTable, the start address of this object will be synchronized. The reason why you don't have to write BYTE is 2 times faster than writing Bit, and we may want the empty bit to be used in the future.

STW Collection (STW TOTAL World) will be executed when the Concurrent Mark is executed. STW means a thread that Suspend all programs you open. So we can see if the application does not stop completely if you use the Concurrent Mark. STW is only executed when collecting a collection of Collection. In the discussion above, we think that STW's Mark, Sweep, Compaction may be suspended for a long time. In fact, IBM's GC stop is much shorter than we think. STW is only implemented in the following conditions. When STW is executed, the card table will be scanned to check those HEAP needs from the new TRACE, then perform the usual SWEEP. The benefit of Concurrent Mark is to reduce the pause time brought by STW. But this also requires the thread that the program you open will pay a certain price. This price is to implement Heap Lock Allocation. The size of this cost depends mainly on how much super scale flows in the CPU are idle. A single GC thread is still using a single GC thread in a SUN's HotSpot, so IBMJAVA's GC is more than a few and there is less latency. Sweep Phase executed SWEEP after running Mark. Sweep Phase is actually the most interesting phase, and a more sharp problem in our discussion is that the survival of the GC control object is necessary. This may exist in Sun's Java, but GC at IBMJAVA does not know when Sweep has an object, and even knows SWEEP's object. SWEEP in Sun's Hotspot uses the usual approach to scanning allocBits and Makrbits crossings, and puts those memory to SWEEP. In the IBM species, a fairly efficient method called BitsWeep. This method is directly looking for 0 bits that are not used for a long period of time (1 bit represents Mark 0) idle or need SWEEP memory). Once you find 0 bits that don't use it for a long time, then we will decide to release the memory that needs to be released to the address corresponding to the HEAP. If the total number of idle exceeds 512 * Header Size, we move this Free block to Free List. Those small memory films are not placed in Free List because they are overwritten together when they perform clearance or Compact Heap. After using BitsWeep, GC does not need to delete a single object at all, because we know that Chunck to delete is a Free Storge. So actually, when we delete a Chunck, we don't know how many objects and delete those objects. After cleaning, GC will put Makrbit Copy to Allocbit to ensure that all objects' Reference is valid. Therefore, Myan will be handled separately in the effile, which is not a good idea for GC. All rely on REFERENCE to clear multiple objects at a time, and separate processing must use the HotSpot method to reduce GC performance.

Parallel Bitwise Sweep

IBMJAVA also designed a parallel version of Bitwise Sweep for a multi-symmetric system. The principle is consistent with parallel Mark. Compaction Phase

When the cleaning is completed, it will start executing Compaction. Java's Compaction is quite complicated. Because moving an object, you must modify the Reference they have. And if a Reference is from a stack, and we can't determine if it point to a real object, it may be just a float, then these objects

It cannot be moved. An object can be moved to its "dosed" bit is set. The same Pinned Object, those those referenced by JNI, only when JNI Unnpined is only moved. The determination of whether the Pinned Object is more complicated. Mainly depends on whether it is cleared by the MPTR low. There are two places that are cleaned by the scenarios: 1. The size flags field, if tagged into olink_isswapped. 2. MPTR is marked to GC_Firstswapped. Therefore, it seems that Java will handle int this general type and Object to the GC in the GC, which causes too much unmovable object and excessive fragmentation. For GC is very unused, but also don't see anything else in other places. Otherwise, do you want to be an Integer class? And C # has a greater advantage in this point.

The Compaction Algorithm in IBM Java In order to avoid excessive mobile objects and the use of mobile to deal with some free blocks that are not collected, it is surprisingly complicated. He adopted a different algorithm with HOTSPOT. Sam Borman gave a very image of example, imagined the entire heap into a warehouse, and the warehouse stacked different sizes of furniture. There is a certain gap between the furniture due to the reason for the outcome. The work of Compact is to push the furniture to one direction to clean up the gap. Push the wall close to the wall and let the second furniture close together. Pushing in this class, then all furniture rely on again, and the gap is on the other side. Pinned and dosed Objects cannot be moved to complicate this algorithm, but the main thinking is unchanged.

Compaction Avoidance

The main purpose of Compaction Avoidance is to reduce the number of uses of Compaction when opening up large memory to ensure that GC Pause Time can be short enough. The execution conditions of the Compaction in the IBM JVM are as follows:

1. If you open up a big memory, it is discovered that there is no suitable free storge to stimulate alloc failure.

2. ALOC FAILURE appeared in the last GC process

3. Active Heap (Heap between Heap Limited to Heap Base) is only 5% for Free

4. The activated HEAP is not more than 128K

IBM JVM meets one of the four conditions to perform Compaction. The most common is the first one.

In order to avoid Companction, IBMJAVA uses a method of tightening avoidance. This method is called Wilderness Preservation, which is to open a memory on the HEAP LIMIT. This memory keeps the original state, which is 5% of activation of HEAP. The default is set to 3M. If there is a large number of memory needs to be opened, and there is no suitable Storge in FreeList to ensure that you do not thrown alloc Failure . Once Wilderness is exhausted, an Alloc Failure Notification GC performs compaction. Generally, Wilderness Preservation ensures that you do not use Compaction, because the object that basically uses Wilderness is the largest object in this application.

转载请注明原文地址:https://www.9cbs.com/read-5264.html

New Post(0)