Author: Vladimir Roubtsov
Recently, we help to develop a Java server, this is a similar memory database. That is to say, we especially emphasize the design, because design should be specifically considering a large amount of data in memory to improve the performance of the query.
Once we get the prototype of running, then after the data is analyzed from the hard disk, we naturally determine the contour of the data store. Not very satisfied with the initial effect, prompted us to find a better solution.
tool:
Since Java has a purposeful to hide the details of many memory management, you have to find out how much memory is required to consume some memory. You can use the runtime.freeMemory () method to measure the change value of the heap before and after one or more objects being assigned. These methods are described in detail, for example, in detail, in detail. But unfortunately, the failure of these previous articles in the implementation uses a wrong Runtime method. Even after later articles have its incompleteness.
l The function provided by calling the runtime.freeMemory () method is not enough because JVM can decide to increase its current heap size at any time (as long as needed, especially when running garbage collection). Unless the parameter -Xmx specifies the maximum value of the pile unless you run, we should use Runtime.TotalMemory () - Runtime.FreeMemory () as the heap size used.
l Perform a single runtime.gc () method does not guarantee effective request garbage collection. For example, we can request a normal Finalizer running normal. Since Runtime.gc () cannot guarantee blocking to garbage disposal, then wait until the stack of stability is a good way.
l If the contour class creates a static data as part of the previous class initialization, the stack memory should include this data for the assignment of the first class instance. We should ignore the heap space consumed by the first class instance.
Consider these issues: We give a sizeof, as a tool to see a variety of Java cores and application classes.
Public Class Sizeof
{
Public static void main (string [] args) Throws Exception
{
// Warm Up All Classes / Methods We Will Use
RUNGC ();
USEDMEMORY ();
// array to keep strong references to allocated Objects
Final Int count = 100000;
Object [] Objects = new object [count];
Long Heap1 = 0;
// Allocate Count 1 Objects, Discard The First ONE
For (int i = -1; i { Object Object = NULL; // instantiate your data here and assign it to object Object = new object (); // Object = new integer (i); // Object = new long (i); // Object = new string (); // Object = new byte [128] [1] IF (i> = 0) Objects [i] = Object; Else { Object = null; // discard the Warm Up Object RUNGC (); HEAP1 = UsedMemory (); // Take a Before Heap Snapshot } } RUNGC (); Long Heap2 = UsedMemory (); // Take an After Heap Snapshot: Final int size = math.round ((Float) (Heap2 - Heap1)) / count); System.out.println ("'Before' HEAP:" Heap1 ", 'After' Heap:" Heap2); System.out.println ("Heap Delta:" (Heap2 - HEAP1) ", {" Objects [0] .getClass () "} size =" size "bytes"); For (int i = 0; i Objects = NULL; } Private static void rungc () THROWS EXCEPTION { // IT helps to call runtime.gc () // Using Several Method Calls: For (int R = 0; r <4; r) _rungc (); } Private static void _rungc () THROWS EXCEPTION { Long usemedmem1 = usedmemory (), usedmem2 = long.max_value; For (int i = 0; (usemedmem1 { S_Runtime.RunFinalization (); S_Runtime.gc (); Thread.currentthread () .yeld (); Usedmem2 = usedMem1; Usedmem1 = usemedMemory (); } } Private static long buyMemory () { Return S_Runtime.TotalMemory () - S_Runtime.FreeMemory (); } Private static final runtime s_runtime = runtime.getRuntime (); } // End of class The key way of SIZEOF is RUNGC () and UsedMemory () methods, I used the package method such as RUNGC () to call _rungc () several times, in order to make this method more significant effect. Note that I call the RUNGC () method, you can edit your code in Heap1 and Heap2, join your example you are interested in. Also note how the SIZEOF outputs the size of the object, and the data transfer closure requirement is used to be used by the COUNT class instance and is then removed by COUNT. For most classes, this result will be a memory size consumed by a single class instance object, including all its own member domains. The boundary of memory is different from the shadow memory boundary by some commercial tools (for example, if an object has an int [], its memory consumption will appear very special). result: Let us use this tool to see if the results are the same as we expected. Note: The following results are based on the JDK1.3.1 version of the Windows platform and cannot guarantee that all platforms or JDK versions are available. l java.lang.object The base class of all objects is our first example. We will get: for java.lang.object. 'Before' Heap: 510696,'After 'Heap: 1310696 Heap Delta: 800000, {Class Java.lang.Object} size = 8 BYTES So, a simple object object takes up 8 bytes of memory space, of course, we can't want the space it occupies 0 because each instance must contain some of the most basic operations, such as equals (), hashcode. WAIT () / notify (), etc. l java.lang.integer I often encapsulate local int to INTEGER in the instance of local int to INTEGER, so that we can use them in a collection of objects, how much memory is to be consumed? 'Before' Heap: 510696, 'After' Heap: 2110696 Heap Delta: 1600000, {Class Java.lang.integer} size = 16 BYTES This 16-byte result is bad than we expected, because an int value happens to be 4 bytes, but after INTEGER, 3 times the space is used. l java.lang.long Long looks more space than Integer, but the truth is not the case: 'Before' Heap: 510696, 'After' Heap: 2110696 HEAP Delta: 1600000, {Class Java.lang.long} Size = 16 BYtes Obviously, because a special JVM implementation must conform to a specific CPU type, the factual object size must be aligned with the low-level memory boundary in the stack. It seems a long is an 8-byte size Object object plus 8 bytes to save the LONG value. In contrast, Integer has 4 bytes where there is no space. So, it should be a JVM forced object to use 8 bytes as the boundary of the word. l arrays Next, there are some basic types of arrays, more guiding significance, can partially discover some hidden information and prove that other popular tricks: use a size-1 array package basic type as an object. Use a loop to increase the length of the array by modifying SizeOf.main (). The INT array can then be obtained: Length: 0, {Class [i} size = 16 bytes Length: 1, {class [i} size = 16 BYTES Length: 2, {class [i} size = 24 BYTES Length: 3, {class [i} size = 24 BYTES Length: 4, {class [i} size = 32 byteslength: 5, {class [i} size = 32 BYTES Length: 6, {class [i} size = 40 BYTES Length: 7, {class [i} size = 40 BYTES Length: 8, {class [i} size = 48 BYTES Length: 9, {class [i} size = 48 BYTES Length: 10, {Class [i} size = 56 bytes There are also some Char arrays: Length: 0, {class [C} size = 16 BYTES Length: 1, {class [C} size = 16 bytes Length: 2, {class [C} size = 16 BYTES Length: 3, {class [C} size = 24 bytes Length: 4, {class [C} size = 24 BYTES Length: 5, {class [C} size = 24 bytes Length: 6, {class [C} size = 24 bytes Length: 7, {class [C} size = 32 bytes Length: 8, {class [C} size = 32 bytes Length: 9, {class [C} size = 32 bytes Length: 10, {class [C} size = 32 bytes As can be seen from the above, the border of 8 bytes is obviously manifested. At the same time, it is definitely containing the inevitable 8 bytes of Object headers, and then the array of basic data types takes up for other 8 bytes. It does not provide any memory usage than INT [1] and Integer, in addition to the variable version of the same data. l multi-dimensional array The multidimensional array has another amazing thing. Developers generally use a constructor such as int [DIM1] [DIM2] for numbers or scientific calculations. In an INT [DIM1] [DIM2], each nested int [DIM2] is an object, and each object is plus a 16-byte array object header. When I don't need a triangular or rough array, the one represents a pure head. When the dimension increases, the effect is increased. For example, an instance of an int [128] [2] takes up 3600 bytes, uses 246% of the head. In special examples Byte [256] [1], this head factor is already 19! Compared to C / C solutions, the same syntax does not increase so much memory consumption. l java.lang.string Let's test a space, now constructed a new string (): 'Before' Heap: 510696, 'After' Heap: 4510696 Heap Delta: 4000000, {Class Java.lang.String} Size = 40 BYTES The result provides a quite bad phenomenon, that is, an empty String is to take up the size of 40 bytes, enough to save 20 characters. Before we use the string containing the content, we use a help group method to create a string. However, use the following text to create: Object = "String with 20 chars"; Will not work because all such object operations will end on the same string instance. The language specification clearly shows that such behavior (Java.lang.String.Intern ()), so use: public static string createString (Final Int Length) { Char [] result = new char [length]; For (int i = 0; i Return New String (Result); } After this creation function, you get this result: Length: 0, {class java.lang.string} size = 40 BYTES Length: 1, {class java.lang.string} size = 40 BYTES Length: 2, {class java.lang.string} size = 40 BYTES Length: 3, {class java.lang.string} size = 48 BYTES Length: 4, {class java.lang.string} size = 48 BYTES Length: 5, {class java.lang.string} size = 48 bytes Length: 6, {class java.lang.string} size = 48 BYTES Length: 7, {class java.lang.string} size = 56 bytes Length: 8, {class java.lang.string} size = 56 bytes Length: 9, {Class Java.lang.String} size = 56 bytes Length: 10, {class java.lang.string} size = 56 bytes The results showed that the memory increased trajectory of the string. But the string should add a 24-byte head. For non-empty strings, if the characters are less than 10 or less, this increased head will consume relative to the effective load (2 bytes for each character, plus 4 as the length) at 100 % To 400% change. What can we do? "This is very good, but we don't have any options that use String and other Java, is this?" I heard you asking, let us find a answer. l package class Packages such as java.lang.integer, look that a large amount of data is like a bad choice in memory. If you try our best for the economy, you should avoid this. It is not difficult to use the vector class of your own Int. Of course, if the core function library of Java already contains this, it is best. Perhaps this situation will greatly change when Java has special types. l Multi-digit array For large data structures that use multi-dimensional array, you can reduce additional dimensions /, for example: Convert INT [DIM1] [DIM2] instance to an instance of INT [DIM1 * DIM2], change all The expression of A [I] [J] is A [i * DIM1 J]. This way you don't have to spend the index check of Kung Fu on DIM1 to improve efficiency. l java.lang.string You can use some tips to reduce the static memory size of strings in your application. First, you can try a very common technology, just when an application loads or caches a lot of strings from a data file or network connection, and the value of this string is limited. For example: If you want to analyze an XML file, in this file, you often encounter some properties, but this property is only limited to two possible values. Your goal: Filter all strings through a hash map, reducing all the same but obvious strings, and target object references. Public String Internstring (String S) { IF (s == null) Return NULL; String is = (string) m_strings.get (s); IF (is! = null) Return IS; Else { m_strings.put (s, s); Return S; } } PRIVATE MAP M_STRINGS = new hashmap (); If applicable, this technique can be doubled to reduce your static memory needs. A rich reader should be able to observe the functionality of this technique to copy java.lang.String.Intern (). There are countless reasons exist to let you avoid using the string.intern () method. One of them is that the current JVM has almost no reservations that can achieve a lot of data. What happens if your string is completely different? This is the second trick to introduce, re-collected those small string spaces, these spatial potential hidden in the char array because the use array only accounts for half of the memory occupied by the string package. Therefore, when our application caches many unique characters, we only need to be converted to a string as needed. If this string is just a temporary, it will be abandoned soon, which will be effective. A simple experiment is to select 90,000 words as a cache from a dictionary file. These data is about 5.6m size. If it is char, only 3.4m space is required, only 65% of previous 65%. The second technique is clearly included in a disadvantage, that is, you can't support a string through a constructor, because this constructor does not copy this array and will have this array. why? Because this complete PUBLIC string API ensures that each string is not variable, the constructor of each string obviously wants to copy the input data and then incoming the parameters. Then we will use the third tip. This trick is used when converting a char array for a string of a string confirmed too high. This skill uses java.lang.string.substr () to avoid data replication: This method is to display the invariance of the string, and a shadow string object created to share character content, but its internal start The position and end position are correct. We still write an example, new string ("smiles"). Substring (1, 5) is a string, which is the character buffer from the character buffer, and the character buffer will share the original characters. Character buffering of string constructor points. You can use this way: give a large string collection, you can merge its character content to a large-character array, create a string on it, and use this primary string to recreate A original string. As described below: Public static string [] Compactstrings (String [] strings { String [] result = new string [strings.length]; INT OFFSET = 0; For (int i = 0; i // can't Use StringBuffer Due to HOW IT MANAGES CAPACITY CHAR [] allchars = new char [offset]; OFFSET = 0; For (int i = 0; i { Strings [I] .getchars (0, Strings [i] .length (), allchars, offset; OFFSET = strings [i] .length (); } String allstrings = new string (allchars); OFFSET = 0; For (int i = 0; i Result [i] = allstrings.substring (offset, Offset = strings [i] .length ()); Return Result; } The above method returns a new string set equivalent to the input character set, but is more compact in memory. Re-obtain 16 bytes of heads of each string array, which is effectively removed in the method. This storage is more effective when the buffer compression is short. When this method is used for the same 90,000 word dictionary, memory is mainly covered from 5.6m to 4.2m, which is about 30%. l Is these efforts worth it? The way I mentioned here seems to be very subtle optimization, is it worth spending time to achieve? However, remember that our brain should remember that the server's application can cache a large amount of data in memory, which can greatly improve the performance and efficiency of data from the disk and database. In the current 32-bit JVM, a few hundred trillor cache data represents a position in the stack. Reducing 30% or more should not be launched, it can improve the performance of the system's measurable nature. Of course, these techniques do not apply to data structures that are well designed at the beginning, and the facts decisions should be determined by Hotspots. Anyway, you should now know how much memory your object consumes.