In-depth study on the performance engine of Java Hotspottm
Prizted
Java Hotspottm Performance Engine was officially posted on April 27, 1999. It is far less than a performance adjustment engine, but a practical Java Virtual Machine (VM), which can play the highest performance from start to end - often increase the running speed of the server-based Java technology to increase twice .
Figure 1. SPECJVM98 (System Performance Evaluation Committee - Translator Note) Evaluation results of Java HotSpot VM running on Windows NT 350MHz (quoting: Sun Microsystems)
Figure 2. Volanomark evaluation results of Java Hotspot VM running on Windows NT (quoting: Sun Microsystems)
Figure 3. Volanomark Evaluation results of Java HotSpot VM running on Solaristm Platform (Sparc Platform Edition) (quoting from: Sun Microsystems)
Developers who use Java programming languages know that virtual machines are between Java applications and underlying hardware platforms, it can be used to perform application-by-unit code, manage system memory, provide system security and discriminating processing multiple threads, etc. .
With a new pluggable architecture of Java 2 platform, Java HotSpot can be seamlessly implanted to replace traditional virtual machines and Just-In-Time (JIT) compilers. Once this new performance engine is installed, any application or applet that is processed in this Java 2 Run Environment (Application Starter, Plug / Program, or Applet Browser) will use the Java HotSpot performance engine by default.
In the words of Davis Stoutamire, this article is deeply drilled in the functionality of the Java HotSpot performance engine, exploring how it does and do what is done - Where is it "Screams" Where is it just "purrs" - and accompany code examples that can demonstrate internal operating conditions of the engine. 2. Foundation
The Java Hotspot performance engine focuses on several critical technologies, which has obtained its outstanding performance:
· Compile with "running in operation"
· Method built-in
· Improved and redesigned object layout
· Fast and completely accurate garbage collection
· Super fast thread synchronization
Such technological improvements are most effective to server-side applications. "This is a way to adjust performance", David Stoutamire explains, "The longer running the application, the more Java byte codes that are executed, the greater the benefits you get. If an application To perform a jumping ball on the screen - it might use C code or system calls for strengthening graphics, and it may only be performed only after someone click on the link - then Java Hotspot VM will not be able to display its 'taihua'.
Applications written in Java programming languages are generally dependent on four factors:
· Overall design of the application
· Execute the speed of Java bytecode
· The speed of the library (with local code)
· The speed of the underlying hardware and the operating system
The performance of the typical client application (especially graphics applications) is mainly affected by the local library, and typical server-side applications focus on the execution speed of the bytecode - this is the flash point of the new VM.
Figure 4. What are the time when a virtual machine is used? (Quoting from: Sun Microsystems) However, according to STOUTAMIRE, the engine will be published in the future will be particularly targeted to improve the performance of client applications. 3. Adaptive compilation
Most applications use most of their time on a small piece of code to execute them. The Java HotSpot performance engine analyzes a runtime application and confirms the most critical area of performance - here, the maximum amount of time is used on the execution of these bytecodes. This performance engine is not a beginning of compiling all programs, or compiling each called method (as JIT is done), but first uses an interpreter to run the program, then analyze it when the program is running. Find performance "Hot Spot"; then compile and optimize the code of the performance critical area. This monitoring process dynamically runs throughout the entire life of the program, accompanied by the performance engine "in operation" adaptation applications.
Java HotSpot adaptive compilation technology makes it far more than the Just-In-Time compiler. When using the JIT compiler, due to the 20% code may take up most of the execution time, the optimization of the other 80% code is not always useful at runtime - may not necessarily compensate for Optimize the consideration given.
The dynamic optimization method of the Java HotSpot performance engine brings the following benefits:
· Usually, the program is started faster. Because, less than less compiled using the Java HotSpot Performance Engine is compared to the JIT compiler.
· Compiling over time, so that the compilation is suspended shorter, and it is more difficult to be found by the user.
· Only compile performance critical codes "purchase time", allow these times to perform better optimization.
· Due to only the less part of the compiler, the compiler is less memory to compile the code.
· Since more time is longer before compiling code, more information can be collected to perform better optimization (such as inline).
4. Method Inline
The dynamic compilation of "in run" is just the beginning of the optimization of the Java HotSpot Performance Engine. "Suppose I have a way to complete some trivial things, such as adding one to a parameter, then returning it." Stoutamire said, "In this case, the compiler may only generate code - and Not actually calling this method - to add one content directly to the variable. Use this method to save the overhead of the instruction to jump to that method, and also reduce the overhead of the return instruction. "
But such a "method inline", there is a problem in an object-oriented code design environment. "Recently, the address you want to jump is difficult to encode instructions," STOUTAMIRE said, "but not always. In some instances, dynamic dispatch (Dynamic Dispatch) or virtual method is called, you You can get a runtime pointer that can be used to call a set of methods among different methods. "
The concept of this dynamic scheduling is the core of the Java programming language - it refers to a subclass to overload an existing method, and then at runtime, this unique method is automatically terminated. "Inline problem is," STOUTAMIRE continues, "You can't interlocate in dynamic scheduling. The reason is that you never really make sure what kind of method you want to call. So you can't execute the method The body is brought to this call. The reason is that Java allows you to load new classes at any time. This uses a new method, which may introduce a new class - if there is inline before this introduction, then suddenly Between, all code may become incorrect code. Java HotSpot VM can easily solve this problem by dynamic inverse optimization. "Inverse optimization is able to restore the compiled code to interpret the ability to interpret code." STOUTAMIRE is explained. "Substant, it is the ability to have a stack frame with the stack frame of compilation to explain. The interpreter has its own representation of the method of being executed. If you have a local variable in a method in executing with compilation code, it may be in the stack or to represent anywhere in the registrar. But not forced interpreters also have the same layout. Therefore, you must be able to accept all things and re-group them so that it appears to be an interpretation frame before the interpreter can continue. This is one of the reasons for the Java HotSpot performance engine with other virtual machines. "
As indicated by its name, dynamic inverse optimization is a process in which a particular program is in a particular program. "We assume that you have a running program," STOUTAMIRE said, "The program's performance hotspot has been compiled. During this compilation process, the compiler uses only one single class such a fact, you can do Union. But how long, a new class is dynamically loaded, it interrupts the existing compilation code. Therefore, Java Hotspot VM releases the existing compilation and restarts the code with the interpreter; or, if it continues to be a performance hotspot The compiler is used to compile it again and restart it. "
Methods Interpretation Example 5. Object layout
In the Java HotSpot virtual machine, the improved object layout is not like most other Java virtual machines, which contains three machine word titles, but an object title with two machine words. The first title word is a reference to the object class; the second title contains other information, such as identifying the hash code, garbage recovery status information, etc. Only the array has the third title field, which is set up for the size of the array. Since the average size of the Java programming language is small, the saving of the above machine words has a positive impact on memory consumption (approximately 8% of the stack of stacks).
Java HotSpot VM also eliminates the concept of "handle" - an indirect tool for accessing objects in memory. This reduces the use of memory, but also accelerates the processing speed. "Traditional object representation is that when you have an object, when you point to another object, STOUTAMIRE said," it will have a pointer to the title of the "another" object. However, traditional virtual machines Using a completely different object representation. It is not directly directed to the object, but pointing to a table. The title of the object is in this table, and the pointer to the location of the object of the object is also in the table. "
This indirect representation is particularly useful to reposition the object in memory. It often works in the process of garbage collection (see below). "If object a wants to point to object B", Stoutamire is explained, "I actually want to point to a handle table. Assume the object a point to the fourth expression (entry) in the table, and the fourth table in the table Take a pointer to the object B. Now, when I want to move the object B, everything I want to do for the object A is the expression in the update table. If I only have an object to point to the object B, then this is not Big; however, if I have 1000 objects pointing to object b (or I can't determine the location of all pointers), then if you want to move the object B, you can only update the single expression. "Although the use of the handle It has brought more simplicity, but this indirect method is very slow, and the table itself also occupies more memory space. Further, the main cause of the handle table - facilitates the repositioning of the object during the garbage collection process - later appears to be not so necessary.
"In fact, during the object repositioning process, all pointers are updated directly, and the overhead is not much bigger." STOUTAMIRE said, "In order to do garbage collection, you have to always track all the piles - you must Check each pointer. So, you have to access a fact that every pointer, means that you have the opportunity to change it, don't do more. "
Finally, because Java HotSpot VM uses direct memory references, it does not need to establish a memory reference handle when allocating memory; and in addition to the management object memory, it does not have to manage the handle. This makes the assignment of the temporary data structure are as fast as C's stack-based memory allocation. This is a great success. 6. Garbage Recycling
The Java programming language is the first mainstream language that provides automatic built-in garbage recovery. 6.1 Garbage Recycling before Java Hotspot VM
Many Java virtual machines use "conservative" or some precise garbage collectors. The conservative collector assumes those who seem to have a valid pointer, may actually be a pointer. The conservative collector is convenient to achieve, but it cannot always understand the location of all object references in memory. As a result, despite few, sometimes errors may occur - For example, the integer misunderstand is the object pointer - this can cause difficulty to debug the discovered memory leak.
In addition, a conservative collector must use the handle to indirectly collect objects, or avoid repositioning objects; because the relocation No handleable object requires updating all object references, the conservative collector does not even determine if a explicit reference is actually true of. This incompetence on the repositioning object, which causes the memory to debris, and hinders the use of more complex garbage collection algorithms.
Conservative garbage collection has a certain negative impact on the local approach. "A conservative garbage collector must ensure that it does not move any content referred to by Java code," STOUTAMIRE explains, "this means you have to scan memory, find those pointers pointing to your pile - It all takes time. "This is one of the problems faced when using the old local method interface (NMI) specification of Java. The intrinsic taste is that this problem can be solved by using a different handle.
"When using the Java 1.1 platform," Stoutamire said, "NMI is replaced by Java Local Interface (JNI), you must not directly point to Java objects from the outside, you can only point to the handle, then point it to the object again. This means, garbage The recovery does not have to consider the external process. However, the handle is only used when the local code wants to point to VM. "He continued," This is almost the same as the efficiency of the method used in the past, and the efficiency of garbage recovery is higher. If you want to use the Java 2 platform and its Java HotSpot VM, you have to use JNI. "6.2 The garbage collection of the garbage collection in the Java HotSpot performance engine is also fundamentally carried out. Redesign. Java Hotspot VM Garbage Recycler is "full-precise", which can provide the following guarantee:
· All irrevable object memory can be reliably recycled;
• All objects can be repositioned, allowing memory compression, eliminating the object memory debris and increasing the locality of memory.
In Java Hotspot VM garbage collection, there are some advanced features. "In order to achieve accurate garbage collection," Stoutamire said, "You have to track all piles - because if you don't track each pointer, then it may point to a activated thing, but you missed it wrong - This is a very bad thing. On the other hand, if all the stacks are strictly enforced, the speed of garbage recovery will become slower and slower with the growth of the heap. "6.3 successive copy recycling
A way to avoid this possibility is to use the advanced successive replication algorithm of Java Hotspot. "Successful copy recycling uses most objects that don't really survive." STOUTAMIRE said, "Here, mainly set up two stacks - one is set up for old objects, one is set up for new objects . The system will put reference to the new object and the reference to the old object, recorded in another table. "
When using successive replication recycling, most objects (generally more than 95%) can be directly recovered by many of the "waste utilization" that is made to the new object space (sometimes referred to as "contributions"). Long life objects are finally copied (or "occupied") to the old space area. "If things save a certain time", Stoutamire said, "If they may want to survive longer. Before copying them, you want to give them a mature opportunity, but once they prove that they have never died, you You can copy them. "
Because new objects are constantly added to "Painting Sales" as stacks, allocation must be particularly fast - because it directly involves updating a single pointer and check overflow. When the "Pacon" is full, most of the objects have done. Garbage collectors can directly copy the remaining survival objects - thus avoiding any recovery work. 6.4 Tag - Solidning Recycling
The successive replication recycling technologies can handle most demonsive objects in the Cultural Academy. However, long life objects are finally settled in the old object area. There, be restricted by low memory conditions, or the requirements of the program, the garbage collection of the old object must occur. Java HotSpot VM uses a standard tag-organizer that completes this task by traversing the entire surviving object tree from the root of the survival object. "Mark - to check the memory, and mark all available objects," Stoutamire said, "is those objects that are not garbage." Thus, any gap left by the existed object can be sorted out and recycled. By finishing the gap in the heap - rather than collecting a free zone - can clear the memory fragment, improve the distribution of the old object (by canceling the Free List) and more efficiently use the cache. 6.5 Increment "Train" Recycling But SUNT / Mark - Solving the garbage collection algorithm, and cannot eliminate all the suspensions that the user can feel. Such pauses typically occur during the old object collection process, and proportional to the amount of viable objects used. To meet the requirements of "no pause" garbage collection, this new virtual machine also provides an incrementally or "train" algorithm. Increment collection options (which can be selected in the program execution, with the -xincgc flag) provides a relatively short pause time, even when processing large object data sets, is also the same. Incremental garbage collection is best suited:
· Server applications, especially high availability applications;
· Handling a very large "live" object data set application;
· Do not want to have a suspended app, such as games, animation, or interactive applications.
"Train algorithm is a complex variety of successive replication algorithms," Stoutamire said, "it not only has a new space and an old space, but has a middle space consisting of many small spaces. It tried to make these small space Small as much as possible, and combine mutual directed objects in the same space. " The Tight Coupled object is saved within the adjacent memory area, which has additional benefits for multi-threaded applications that handle different object data sets.
The train algorithm will be suspended by the old-space garbage, breaking into many tiny pauses (approximately a few milliseconds). They distributed over time, so that the user is not actually unable to actually. "If you are running a graphic program," Stoutamire said, "You have to drag something with the mouse. At this time, you will not want to see the sudden 'snoring', you will suspend it. This is often some users doing The experience of garbage collection - they often feel this 'buse'. However, the train algorithm can be substantially eliminated. "
Overall, the algorithms of various garbage recovery can work together to complete Java Hotspot advanced garbage collection. "The first stage is the stage of the PA, or sometimes called 'Eden'", STOTAMIRE said, "Most objects died when they were young, so they never removed the Garden in Eden. Copy used in Eden, The area that is often done is most effective - because you can terminate some things. "
Assuming that the incremental garbage collection is open, the object will move to the area managed by the train algorithm in the next step. And thereby move from there to the permanent area of the longevity object being managed by the marker-finishing algorithm.
However, increment (train) mode is not open without overhead (about 10% speed). "Increment mode requires a certain overhead," STOUTAMIRE confirms, "this is why it is not set to open by default. Because of this version of the product, we want to get the highest value throughput, not care Suspend the number of times. Suppose you want to use Java HotSpot VM on the client, you can turn on the increment mode, because this will bring you more consistent and no suspension response. "As a side note, Stoutamire pointed out, actually, even When the increment / train algorithm is not activated, a middle garbage collection area is also present. "Still has a middle generation," he explained, "it is managed with a copy collection algorithm, rather than managing the train algorithm with fine texture."
Figure 5. Realization of Java Hotspot Garbage Recycling
7. Thread synchronization
According to Sun estimation, a typical hardware resource used by a typical Java application is used for garbage recycling and processing multi-threads (can handle multiple I / O data streams at the same time). The Java HotSpot virtual machine has made breakthroughs in terms of thread synchronization, thus making performance significantly.
"For programmers, the most important thing to understand is," Stoutamire said, "We did what they wished to do on local threads. In the old version of Java VM, how to work for I / O Have a laughter limit - no matter what condition, suppose you have a Java thread, then there is a local OS thread to perform it. Similar things can eventually result in performance degradation. "
However, the Java HotSpot thread is simultaneously implemented, and a "full-scale" thread is provided by using the thread model of the host operating system. "When using Java HotSpot VM," Stoutamire said, "Every Java thread corresponds to a local OS thread. In traditional VM, it is not always like this - sometimes a local thread may correspond to multiple Java threads. In case, if a thread is blocked for some reason, all other related threads cannot continue to run. Once you have a pair of corresponding relationships of the local and Java threads, just like Java Hotspot VM, then if one Java threads are blocked for some reason, it does not affect other Java threads. This is the first meaning - a thread can run and be pretreated on another thread. In a non-charming system, a thread may Sleepy another thread. "8. Future
After the Java HotSpot performance engine is installed, the first thing to do, which is the first thing to make, is spin (spin). However, this performance engine is designed to be based on the real-world or enterprise system, or when working with both systems, its performance is best. In a microabenchmark code, an internal working method of trying to expect and use this performance engine is often frustrated and disappointed. Therefore, Sun has collected a "reference test program question and answer", which interprets the general misunderstandings generated during many performance enhancements using the performance engine.
A universal reference test program has established a simple code to test many iterations in a circular statement (perhaps, it increments a number of cycles). Java Hotspot VM is a program that starts such a program from interpreting mode, but it is discovered (due to many repetitions of the loop), this area is a "hotspot". Therefore, it will send the method and compile. However, in the currently released product, the version of this new compile code is actually called after the next method (main) is called. Of course, this situation does not happen in this too simple microval quasi-test program.
"Solution to this situation is," Stoutamire explanation, "on-stack replacement - it has been implemented in the next Java Hotspot update, but has not been released." Stack replacement and dynamic reverse on the stack Optimization is the opposite - an interpretation frame is converted into a compilation frame, but the method is still running. "In Java Hotspot 1.0, we don't care about this solution," Stoutamire said, "because it is not a typical thing to do, but it is for this situation and enables this microval quasi-test. The program is what people expect what they do. "
Sample Example 9 on the stack 9. Conclusion
Initially, Sun's Java HotSpot performance engine is provided for the SolarisTM operating system and Microsoft Windows operating system. It is free to end users and independent software developers, but to pay the license (royalty) for vendors that contain them in the operating system. The Beta Test Edition of Java Hotspot 2.0 will be released this summer, and its performance will increase by 30%.