The set class in java.util contains some of the most commonly used classes in Java. The most commonly used collections are LIST and MAP. The specific implementation of List includes ArrayList and Vector, which are variable-sized lists, which are more suitable for build, store, and operate any type of objects for any type object. List is suitable for use in value to access elements.
Map provides a more versatile element storage method. The MAP set class is used to store element pairs (called "keys" and "value"), where each key is mapped to a value. In terms of concept, you can view List as a MAP with a value key. In fact, in addition to List and MAP are defined in java.util, the two have no direct contact. This article will focus on the MAP included in the core Java issuance suite, and will also describe how to adopt or implement a dedicated MAP for your application specific data.
Understand the MAP interface and method
There are many predefined MAP classes in the Java core class. Before introducing the specific implementation, let's first introduce the MAP interface itself so that all the common points of the implementation. The MAP interface defines four types of methods, each MAP contains these methods. Below, we will introduce these methods from two ordinary methods (Table 1).
Table 1: Covered method. We override the two methods of this Object to correctly compare the equivalence of Map objects.
Equals (Object O) Compares Specify Objects with this Map Equivation () Returns the Hash code of this MAP
Map build
The MAP defines several transformation methods for inserting and deleting elements (Table 2).
Table 2: MAP update method: You can change the MAP content.
CLEAR () Removes all mapping Remove (Object Key) from the Map Delete the key and the associated value PUT (Object Key, Object value) to remove the specified value with the specified key to remove all mapping Putall from the MAP ( MAP T) Copy all mappings in the specified map to this MAP
Although you may notice that it is necessary to ignore the overhead of ignore the MAP that needs to be passed to Putall (), it is not more efficient than using a large amount of PUT () call, but Putall () is not Rare. This is because Putall () except that it is an algorithm added to the MAP in addition to iteration PUT (), it is necessary to iterate the elements of the MAP passed. However, it should be noted that putall () can correctly adjust the size of the MAP before adding all elements, so if you do not adjust the size of the MAP (we will introduce this), Putall () may be more effective than expected.
View Map
The elements in the iterative map do not exist directly when it is. If you want to query a MAP to understand which elements are to meet specific queries, or if you want to iterate all of its elements (regardless of the reason), you first need to get the "view" of the MAP. There are three possible views (see Table 3)
All key values - see entryset ()
All keys - see KeySet ()
All values - see Values ()
The first two views returned to the SET object, and the third view returns the Collection object. For these two situations, the problem is not over here because you cannot directly iterate the Collection object or SET object. To make iteration, you must get an Iterator object. Therefore, to be iterated by Elements of MAP, it must be more cumbersome
Iterator KeyValuePairs = AMAP.ENTRYSET (). Iterator (); item keys = AMAP.KEYSET (). Iterator ();
Iterator Values = Amap.Values (). Iterator ();
It is worth noting that these objects (SET, Collection, and Iterator) are actually the view of the base map, not a copy of all elements. This makes them very efficient. On the other hand, the Toarray () method of the Collection or SET object creates an array object containing all elements of the MAP, so it is not high in addition to the case of the elements in the array, it is not high.
I have run a small test (Test1 in the attached file), which uses HashMap and compares the overhead of iterative MAP elements using the following two methods:
INT MapSize = AMAP.SIZE ();
Iterator keyValuepairs1 = amap.entryset (). Iterator ();
For (int i = 0; i { Map.Entry entry = (map.entry) KeyValuePairs1.next (); Object key = entry.getKey (); Object value = entry.getValue (); ... } Object [] keyvaluepairs2 = AMAP.Entryset (). Toarray (); For (INT i = 0; i { Map.Entry entry = (map.entry) KeyValuePairs2 [i]; Object key = entry.getKey (); Object value = entry.getValue (); ... } This test uses two measurement methods: one is the time of measuring iterative elements, and another measurement uses TOARRAY to create additional overhead of the array. The first method (Ignore the time required to create an array) indicates that the speed of the array iterative element that has been created from the ToArray call is about 30% -60% more than the speed of using the Iterator. However, if you will use the ToArray method to create an overhead of an array, you can actually be 10% -20% quickly using the Iterator. Therefore, if a set element is created for some reason, the array iterative element should be used without iterative of these elements. But if you don't need this intermediate array, don't create it, but use the item iterative element. Table 3: Back view MAP method: Use the objects returned by these methods, you can traverse the Elements of the Map, you can also delete the elements in the map. ENTRYSET () Returns the SET view that contains the maps included in the MAP. Each element in the set is a map.Entry object, you can use the getKey () and getValue () method (there is a setValue () method) access the latter's key element and value element keySet () returns to the map included in the MAP Set view of the key. Deleting elements in the SET also delete the corresponding mapping (keys and values) VALUES () returns the Collection view included in the MAP. Deleting elements in Collection will also delete corresponding mappings (keys and values) in MAP Access element The MAP access method is listed in Table 4. MAP is usually suitable for the button (rather than values). There is no rule in the Map definition. This is certain, but usually you can expect this to be true. For example, you can expect the Containskey () method as fast as the get () method. On the other hand, the ContainsValue () method is likely to scan the values in the MAP, so its speed may be slower. Table 4: MAP Access and Test Method: These methods retrieve information about MAP content but does not change the content. GET (Object Key) Returns TrueContainSValue (Object Value) if the MAP contains a mapping of the specified key, returning TrueContainSValue (Object Value) If this MAP maps one or more keys to the specified value, return trueiseMpty ( If the MAP does not contain a key-value mapping, returns Truesize () Returns the number of keys-value mappings in MAP. Testing the time required to use Containskey () and ContainsValue () Traversed in HashMap shows that the time required for ContainsValue () is much longer. In fact, you have to grow several quantities! (See Figure 1 and Figure 2, and Test2 in the attached file). Therefore, if containsValue () is the performance problem in the application, it will quickly appear and can be easily identified by monitoring your application. In this case, I believe that you can think of an effective replacement method to implement the equivalent feature provided by ContainSValue (). But if you can't think of a way, a feasible solution is to create a map and put all the values of the first MAP as a key. In this way, the last mapValue () will be a more efficient containskey () on the second MAP. Figure 1: Create and run the MAP test class using JDeveloper Figure 2: Performance Monitoring In JDeveloper Performance Monitoring Isors Isors in Applications Core MAP Java comes with a variety of MAP classes. These MAP classes can be classified as three types: GM MAP, used to manage mappings in the application, usually implemented in the java.util package Hashmap Hashtable PROPERTIES LinkedHashmap Identityhashmap Treemap Weakhashmap Concurrenthashmap Dedicated MAP, you usually do not have to create such MAP in person, but accessed by some other classes Java.util.jar.attributes Javax.print.attribute.standard.printersTateReasons Java.Security.Provider Java.awt.renderinghints Javax.swing.uidefaults An abstract class for helping to achieve your own MAP class Abstractmap Interior Hash: Hash Mapping Technology Almost all General MAPs use haveh mappings. This is a very simple mechanism that maps elements to arrays, you should understand the working principle of hash mapping in order to make full use of MAP. The hash mapping structure consists of an internal array of a memory element. Since the array is stored inside, there is therefore there is a need to exist an indexing mechanism for determining an arbitrary key access array. In fact, the mechanism needs to provide an integer index value less than the array size. This mechanism is called a hash function. In Java's hash-based MAP, the hash function converts objects into an integer suitable for internal arrays. You don't have to look for a tender brazer that is easy to use: Each object contains a HashCode () method that returns an integer value. To map this value to an array, simply convert it into a positive value, then take the value after the value is taken after the value of the array. The following is a simple, a Java hash function for any object int.com (Key.hashcode ())% table.Length; (% Binary operator (called mode) removes the value of the left side with the value of the right side, then returns the remainder of the integer form.) In fact, before the 1.4 version is released, this is the hash function used by various hash-based MAP classes. But if you look at the code, you will see Int hashValue = (Key.hashcode () & 0x7fffffff)% Table.Length; It is actually used to use the faster mechanism to obtain the same function of positive values. In version 1.4, the HashMap class implements a different and more complex hash function, which is based on Doug Lea's Util.Concurrent package (I will introduce the Doug LEA in more detail later). Figure 3: Hash working principle This figure describes the basic principles of hash mapping, but we have not introduced it. Our hash function maps any object to an array location, but what will the case in the same location if two different keys are shot? This is an inevitable situation. This is called conflict in the terminology of the hash mapping. MAP handles these conflicts is to insert a list of links at the index location and simply add the elements to this link list. Therefore, a basic PUT () method based on hash-based MAP may be as follows Public Object Put (Object Key, Object Value) { // Our internal array is an ENTRY object array // entry [] table; / / Get the hash code and map to an index INT has = key.hashcode (); Int index = (hash & 0x7ffffff)% table.Length; // Cycle traversed a list of links at Table [Index] to identify // Do we have this key item - if you have, override it For (entry E = Table [Index]; E! = NULL; E = E.NEXT) { / / Must check if the key is equal, because different key objects // may have the same hash IF ((E.hash == Hash) && E.Key.Equals (key)) { // This is the same key, override this value / / And return to the OLD value from this method Object Old = E.Value; E.Value = value; Return OLD; } } // Still here, so it is a new key, just add a new Entry // Entry object contains a Key object, a Value object, a integer HASH, // NEXT Entry in the next entry in the list // Create a new entry pointing to the beginning of a list, / / And put this new entry into the table Entry E = New Entry (Hash, Key, Value, Table [Index]); Table [INDEX] = E; Return NULL; } If you look at the source code based on hash-based MAP, you will find that this is basically their working principle. In addition, there are things that need to further consider, such as processing empty keys and values, and adjustment of an array. The PUT () method defined herein also includes an algorithm of the corresponding GET () because inserted into the item included in the search map to identify if the button has existing. (Ie the get () method has the same algorithm as the PUT () method, but GET () does not include insertion and overlay code.) Use a link list is not the only way to resolve conflicts, some hash mappings use another "open The scheme is not introduced in this article. Optimize HASMAP If the internal array of hash mappings contains only one element, all items will be mapped to this array to form a longer link list. Since our update and access use linear searches for link lists, this is much more slower than each array index in Map, so it is very low. Access or update the time of the list of links to the list of the list, and a single element in the use of a hash function is accessed or updated, and the array size is independent of the size of the array - in terms of progressive nature (Big-o representation), the former is O ( n) and the latter is O (1). Therefore, it makes sense to use a large array rather than let too much item gather in too little array. Adjust the size of the MAP implementation In the hash ternary, each position in the inner array is called a "bucket", and the number of storage barrels available (ie, the size of the internal array) is called capacity (Capacity). In order for the MAP object to effectively handle any number of items, the MAP implementation can adjust its own size. But the overhead of the adjustment is very large. Adjustment The size requires re-inserting all elements into the new array, because different array size means that the object is now mapped to different index values. The key of the previous conflict may no longer conflict, and other keys that are not conflict may now conflict. This obviously shows that if the MAP is adjusted enough, it can be reduced even no longer need to re-adjust the size, which is likely to significantly improve speed. Use 1.4.2 JVM to run a simple test, that is, fill the HashMap with a large number of items (more than one million). Table 5 shows the results and standardizes all time into servers modes that have been preset (TEST3 in the associated file). For JVMs that have been set in advance, the client and server mode JVM run time is almost the same (after abandoning the JIT compilation phase). However, the default size using MAP will raise multiple adjustment size operations, the overhead is very large, with more than 50% of the time in server mode, and almost twice the time in client mode! Table 5: Comparison of time required to populate the time required for HashMap and fill the default size of HashMap Client mode server mode pre-set size 100% 100% default size 294% 157% Use load factor To determine when the size is adjusted, not the depth of the link list in each bucket, the hash-based MAP uses an additional parameter and roughly calculates the density of the storage bucket. Map before adjusting the size, using the parameter named "Load Factor" indicates that the "load" amount of MAP will bear, that is, its load level. The relationship between the load factor, the number of items (MAP) and the capacity is simple: If (load factor) X (capacity)> (MAP size), adjust the MAP size, for example, if the default load factor is 0.75, the default capacity is 11, then 11 x 0.75 = 8.25, the value is taken down to 8 elements . Therefore, if the eight item is added to this MAP, the MAP adjusts its own size to a larger value. Instead, to calculate the initial capacity required to avoid adjustment, divide the number of items to be added to the load factor, and hift up, for example, For 100 items of the load factor of 0.75, the capacity should be set to 100/0.75 = 133.33, and the result is taken upward to 134 (or hit 135 to use odd numbers) Old-numbered storage buckets enable MAP to improve the execution efficiency by reducing the number of collisions. Although I did, Test4 in the associated file did not indicate that the number can always achieve better efficiency, but the ideal case is the number of capacities. Some MAPs after the 1.4 version (such as Hashmap and LinkedHashmap, not HashTable or IdentityHashmap) use a hash function that requires 2 power capacity, but the power capacity of the next highest 2 is calculated from these MAP, so you don't have to calculate it. The load factor itself is a compromise between space and time. Smaller load factors will take up more space, but will reduce the possibility of conflicts, thereby accelerating the speed of access and updating. Using a load factor greater than 0.75 may be unwise, and the load factor with greater than 1.0 is definitely unknown because it will definitely cause a conflict. The benefits of load factors that are less than 0.50 are not large, but as long as you effectively adjust the size of the MAP, you should not cause performance overhead for small load factors, but will only cause memory overhead. But smaller load factors will mean that if you do not pre-adjust the size of the MAP, the more frequent adjustment is reduced, thereby reducing performance, so you must pay attention to this problem when adjusting the load factor. Choose the appropriate MAP Which MAP should I use? Does it need synchronization? It is possible to get the best performance of the application, which may be the two most important issues. When using universal MAP, adjust the MAP size and select the load factor covers the MAP adjustment option. The following is a simple way to get the best MAP performance State your MAP variables as MAP, not any specific implementation, ie not a HashMap or HashTable, or any other MAP class implementation. Map criticalmap = new hashmap (); // good Hashmap criticalmap = new hashMap (); // difference This allows you to replace any specific MAP instances with only one line of code. Download Doug LEA's Util.Concurrent Pack (http://gee.cs.oswego.edu/dl/classess/edu/oswego/cs/dl/util/concurrent/intro.html). Use ConcURRENTHASHMAP as the default Map. When transplanted to version 1.5, use java.util.concurrent.concurrenthashmap as your default Map. Do not pack the ConcurrentHashMap in the synchronized wrapper, even if it will be used for multiple threads. Use the default size and load factors. Monitor your app. If a map is found to cause bottlenecks, the cause of the bottleneck is analyzed, and some or all of the following contents of the MAP are changed: MAP class; MAP size; load factor; key object equals () method is implemented. Special MAP basically requires custom MAP implementations for special purposes, otherwise the general MAP will implement the performance goals you need. MAP selection Maybe you have expected more complex considerations, and this is actually too easy? Ok, let us come slowly. First, which MAP you should use? The answer is simple: don't choose any specific MAP for your design unless the actual design needs to specify a special type of MAP. You usually do not need to select a specific MAP implementation. You may know that you need a Map, but I don't know which kind of use. This is precisely the meaning of using the MAP interface. When you need it, select the MAP implementation - If you use the "MAP" declaration, change the MAP implementation of any special MAP in your application only needs to change a line, this is a minimum of the expenditure. Do you want to use the default MAP implementation? I will soon talk about this problem. Synchronize MAP What is the difference between synchronization? (For synchronization, you can use synchronized MAP, or use collections.synchronizedmap () to convert unhemarked MAP into synchronous map. The latter uses "synchronized wrapper") This is an exceptional complicated choice, complete Depending on how you use Map based on multi-threaded and update, you also need to make maintenance considerations. For example, if you have not updated a specific map when you start, it is later changed to concurrent updates, how will the situation? In this case, it is easy to use an unsynchronized map at the beginning and forget to change this unacceptable Map when it is later added to add concurrent update threads to the Synchronous Map. This will make your application easily crash (a worst error to be determined and tracked). However, if the default is synchronized, the multi-threaded application will be executed because of the terrible performance of the following. It seems that we need some kind of decision tree to help us choose correctly. Doug Lea is a professor in the Computer Science Department of Ostrigo, New York State University. He created a set of public domain packages (collective util.concurrent) that contains many utility classes that simplify high performance parallel programming. These classes include two MAPs, That is, ConturrentReaderHashMap and ConcurrenthashMap. These MAP implementations are thread secure and do not need to synchronize concurrent access or updates, while also applicable to most cases requiring MAP. They are still far more scalable than synchronous MAP (such as Hashtable) or using synchronized wrapper, and their destruction of performance is small compared to HashMap. The Util.Concurrent package forms the foundation of JSR166; JSR166 has developed a concurrent utility included in Java 1.5, while the Java version 1.5 will contain these MAPs in a new Java.util.concurrent package. All of this means that you don't need a decision tree to decide whether to use synchronous MAP or use an asynchronous map, just use ConcurrenthashMap. Of course, in some cases, it is not appropriate to use ConcurrentHashMap. But these situations are rarely treated, and should be dealt with in particular. This is the use of monitoring. Conclude Create a test class for comparing various MAP performance via Oracle JDeveloper. More importantly, integrated monitors can quickly and easily identify performance bottlenecks - integrated into IDE during development, usually use to help build a successful engineering. Now, you already have a monitor and understand the basics of generic MAP and its performance, you can start running your own tests to identify if your application has bottlenecks because of MAP, and where you need to change Maps used. The above introduces the basics of general MAP and its performance. Of course, there are more complicated and worthwhile matters related to specific MAP implementations and how to use them according to different needs, which will be described in Part 2 of this article.