Java Theory and Practice: Concurrent Collection

xiaoxiao2021-03-06  72

ConcurrentHashMap and CopyonWriteArrayList provide thread security and improved scalability

Level: entry level

Brian Goetz (Brian@quiotix.com) Chief Consultant, Quiotix Corp September 2003

Doug LEA

Util.concurrent package includes some major collection types in addition to many other useful concurrent buildings

List and

High performance, thread safety implementation of MAP. In this month

In Java Theory and Practice, Brian Goetz shows you.

Concurrenthashmap replacement

Hashtable or

SynchronizedMap will have much concurrency program benefit. You can be in this article

Sharing your idea with the author and other readers in the forum (you can also click on the top of the article or bottom)

Discuss into the forum).

The first associated collections that appear in the Java class library are Hashtable, which is part of JDK 1.0. HashTable provides an easy-to-use, thread security, associated MAP function, which is of course convenient. However, thread security is synchronized by all the way to the cost -Hashtable. At this point, no competitive synchronization will result in considerable performance cost. HashTable Susteress Hashmap appeared as part of the collection framework in JDK1.2, which resolves thread security issues by providing an asynchronous base class and a synchronized wrapper Collectes.SynchronizedMap. By separating the basic functions from thread security, Collections.SynchronizedMap allows users to be synchronized to have synchronization, without the need for synchronous users, do not have to pay for synchronization.

Simple method (Synchronous HashTable, or synchronous MAP wrapper object) has two main shortcomings. First, this method is an obstacle for scalability because only one thread can access the Hash table at a time. At the same time, this is still insufficient to provide real thread safety, and many common mixing operations still need additional synchronization. Although simple operations such as GET () and PUT () can be safely completed without additional synchronization, there are still some common operational sequences, such as iteration or PUT-IF-ABSENT (air is put) Need an external synchronization to avoid data contention.

Conditional thread safety synchronized collection wrapper SynchronizedMap and SynchronizedList, sometimes referred to as a conditionally threaded security - all single operations are thread secure, but multiple operations consisting of operational sequences may result in data disputes Using because the control flow in the operation sequence depends on the result of the previous operation. The first fragment in Listing 1 shows the public PUT-IF-ABSENT statement - if an entry is not in the MAP, then add this entry. Unfortunately, in the containskey () method returns to the PUT () method, there may be another thread that can be inserted into a value with the same key. If you want to make sure that only one plug is inserted, you need to use a synchronization block that synchronizes Map M to package this pair of statements.

Other examples in Listing 1 are related to iteration. In the first example, the result of list.size () may become invalid during the execution of the loop, because another thread can delete an entry from this list. If the timing is not proper, there is an entry after the last iteration of the loop is removed by another thread, then list.get () will return null, and DOSMETHING () is likely to throw a nullpointRexception. So what measures take can avoid this situation? If you are an iteration a list, another thread may also be accessing this list, then you must use a synchronized block to make this List, synchronize on List 1, lock the entire list. Although this is solved the problem of data contention, more costs have been paid in concurrency because they locking the entire list will block other threads, so they cannot access this list for a long time. The collection framework introduces an iterator for traversing a list or other collection to optimize the process of iteration of elements in a collection. However, the iterator implemented in the java.util collection class is very easy to crash, that is, if the other thread is also modified by one thread, the other thread is also modified, then the next Iterator.hasNext ( ) Or iTerator.next () call will throw a ConcurrentModificationException exception. Just taking this example, if you want to prevent the ConcurrentModificationException exception, then when you are an iteration, you must use a Synchronized block that synchronizes on List L to package the list, thereby locking the entire list. (Or, you can also call list.toArray (), iterate on the array in different steps, but if the list is more costly).

Listing 1. Common competition conditions in Synchronous MAP

Map m = collections.synchronizedmap (new hashmap ());

List l = collections.synchronized (new arraylist ());

// Put-IF-Absent Idiom - Contains a Race Condition

// May Require External Synchronization

IF (! map.containskey (key))

Map.Put (key, value);

// ad-hoc ity - Contains Race Conditions

// May Require External Synchronization

For (int I = 0; i

DOSMETHING (List.get (i));

}

// Normal Iteration - Can Throw ConcurrentModificationException

// May Require External Synchronization

Iterator i = list.iterator (); i.hasnext ();) {

DOSMETHING (I.NEXT ());

}

Trusted Illusion SynchronizedList and SynchronizedMap provides a hidden danger - developer assumes that these collections are synchronized, so they are all threads, so they are correct This will negligently. The result is that although these procedures can be normal when the load is light, once the load is heavy, they will begin to throw NullPointerException or ConcURRentModificationException. Scalability problem scalability refers to the performance of an application at workload and available processing resources. A retractable program can process a larger workload accordingly by using more processors, memory, or I / O bandwidth. Lock a shared resource to get exclusive access to this way to form a scalability bottleneck - it makes other threads unable to access that resource, even if there is an idle processor, you can call those threads. In order to achieve scalability, we must eliminate or reduce our dependence on excitement resource locks.

Synchronous collection wrappers and earlier issues brought by early Hashtable and Vector are, they synchronize in a single lock. This means that only one thread can access the collection once, if there is a thread being read a map, then all other threads who want to read or write this MAP must wait. The most common MAP operation, GET (), and PUT () may be more processing more than the surface - when the bucket traversing a Hash table finds a certain KEY, GET () must have a large number of candidates. BUCKET calls Object.Equals (). If the HashCode () function used by the KEY class cannot be evenly distributed over the entire Hash table, or there is a lot of HASH conflicts, some Bucket chains will be much longer than other chains, and traversing a long Hash Chain and a percentage of elements on the Hash chain are called Equals () are a very slow thing. Under the above conditions, the high price of calling GET () and PUT () is not only the slowness of the access process, but also when there is a thread being traversed to the Hash chain, all other threads are locked outside, and they cannot access This Map.

(Hash Table) According to a digital key called Hash, it stores objects in BUCKET. Hash Value is a number that is calculated from the value in the object. Every different hash value creates a new BUCKET. To find an object, you only need to calculate the Hash Value of this object and search for the corresponding bucket. You can reduce the number of objects you need to search by quickly finding the corresponding BUCKET. Translator Note)

GET () can take a lot of time, and in some cases, the conditional thread security issues that have been discussed earlier will make this problem be much worse. The contents of the demonstration in Listing 1 often make it necessary to continue a longer period of time after the lock of a single collection is performed after a single operation is performed. If you want to keep the collected locks during the entire iteration, then other threads will stay outside the lock for a long time and wait for unlocking.

Example: One of the most common applications in the server application is to implement a cache. Server applications may require cache file content, generated pages, result of database query, with DOM trees related to parsed XML files, and many other types of data. The main use of Cache is to reuse the results of the previous processed result to reduce service time and increase throughput. A typical feature of the Cache workload is that the retrieval is much more updated, so (ideal) Cache can provide very good GET () performance. However, Cache, which will hinder performance is better, not as completely Cache.

If you use SynchronizedMap to implement a cache, you have introduced a potential scalability bottleneck in your application. Because only one thread can access Map at once, these threads include threads that take out a value from the MAP and threads to insert a new (key, value) to the MAP. Reduce the latitude of the lock particle size HashMap concurrently a way to provide thread security is to abolish the way the entire table uses a lock, and uses each bucket to the Hash table (or more common It is, use a lock pool, each lock is responsible for protecting a few buckets). This means that multiple threads can access different parts of the MAP simultaneously without having to compete with a single set range of locks. This method can directly improve the scalability of insertion, retrieval, and removal operation. Unfortunately, this concurrency is exchanged at a certain price-this makes some methods for operating the entire collection (such as size () or iSempty (), because these methods require a lot of time The lock and the risk of returning an incorrect result. However, for some cases, such as achieving cache, this is a good compromise - because the retrieval and insertion operations are more frequent, while size () and iSempty () have more.

The ConcurrentHashMap class in the ConcurrentHashmaPutil.concurrent package (also in the java.util.concurrent package in JDK 1.5) is an implementation of the MAP thread security, which provides much more concurrency than SYNCHRONIZEDMAP. Multiple read operations can always be executed concurrently, and the read and write operations are often performed concurrently, while the simultaneous write operations can still be conducted from time to time (related classes also provide similar multiple reading threads. The concurrency, but only one activity write thread is allowed). ConcurrentHashMap is designed to optimize the retrieval operation; in fact, the successful GET () operation is usually not locked resources at all. It is necessary to make a certain skill in the case of not using the lock, and the details of the Java Memory Model are needed to have an in-depth understanding. Concurrenthashmap implementation, plus other parts of the util.concurrent package, has been found in parallel experts in the correctness and thread security. In the next month's article, we will look at the details of the implementation of Concurrenthashmap.

ConcurrentHashMap has achieved higher concurrency by slightly relaxing its commitment to the caller. The search operation will be able to return the value inserted by the most recently completed insertion operation, or the value added to the insertion operation added on the step (but never returns a meaningful result). Iterators returned by ConcurrentHashMap.iterator () will return up to one element each time, and never throw the concurrentModificationException exception, but may not be reflected in the insertion operation or removal operation that occurs after the iterator is built. When it is iterated for a collection, thread security can be provided without the lock of the table range. In any application that does not depend on the locking of the entire table, CONCURRENTHASHMAP can be used to replace SYNCHRONIZEDMAP or HASHTABLE.

The above improvements make CONCURRENTHASHMAP provide much more scalability than HashTable, and for many types of public cases (such as shared cache), it is not required to lose its efficiency.

How much is it? Table 1 has a rough comparison of the scalability of HashTable and ConcurrenthashMap. During each run, the N thread performs a dead loop in which these threads retrieve random Key Value from a HashTable or ConcurrentHashmap, and find 80% of the retrieval failure when executing PUT () operations. Rate, 1% retrieval to success during execution operation. The platform where the test is located is a two-processor Xeon system, and the operating system is Linux. The data shows the run time of 10,000,000 iterations in milliseconds, which is statistically based on the case where the operation of ConcURRentHashMap is standardized as a thread. As you can see, when the thread increases to multiple, the performance of ConcurrentHashMap remains up, and the performance of HashTable will fall down with the appearance of the status lock. Compared to usual server applications, the number of threads in this test looks less. However, because each thread operates constantly on the table, this is basically equivalent to the content of more numbers of threads using this table in the actual environment.

Table 1. Comparison of Hashtable and ConcurrenthashMap in scalability

The number of threads was conscurrenthashmaphashtable11.001.0322.5932.4045.5878.23813.21163.481627.58341.213257.27778.41

CopyonWriteArrayList is generally replacing ArrayList with a CopyonWriteArrayList class in a concurrent application that is more than inserted or removing operations. If it is used to store a listener list, such as in the AWT or Swing application, or in a common JavaBean, then this is common (related CopyonWriteArraySet uses a CopyonWriteArrayList to implement the set interface).

If you are using a normal ArrayList to store a listener list, as long as the list is variable, and you may be accessible by multiple threads, you must ordered it before it iterates. During the cloning operation, lock the entire list, the overhead of these two practices is large. When execution of the list, CopyonWriteArrayList is not created a new copy for the list, and its iterator will definitely return the status of the iterator when the iterator is created without throwing the ConcurrentModificationException. You do not have to clone the list before it is an iteration, or the list is latched during iteration, because the copy of the list of iterators is unchanged. In other words, CopyonWriteArrayList includes a variable reference to a non-variable group, so as long as the reference is kept, you can get the benefits of non-variable thread security, and do not lock the list.

转载请注明原文地址:https://www.9cbs.com/read-87415.html

New Post(0)