Apply HashTables in Java

xiaoxiao2021-03-06  63

Hashtables offers a very useful way to make the application's performance optimal.

By Pete Ford

Hashtables (hash table) is not a new concept in the computer field. They are used to speed up the processing speed of the computer, with today's standards, very slow, and they can quickly find a special entry when querying many data entries quickly. Although the modern machine speed has been several thousand times, it is still a very useful method for the best performance of the application.

Imagine, you have a data file containing approximately one thousand records, such as a small business customer record ?? There is a program that reads the record in memory. Each record contains a unique five-digit customer ID number, customer name, address, account balance, and more. Suppose the record is not classified in the customer ID number, so if the program is to find a special customer record as "key", the only lookup method is to search for each record continuously. Sometimes, it will soon find the record you need; but sometimes, before the program finds the record you need, it has almost searched the last record. If you want to search in 1,000 records, then find any records require average to check 500.5 ((1000 1) / 2) records. If you often need to find data, you should need a faster way to find a record.

Figure 1. Method becomes better and better

A method of speeding up a search is to divide a record into several sections, so you don't have to search for a big list, but search for several short lists. For our digital customer ID number, you can build 10 lists to form a list with a list of 0, which is a list, and it is pushed in this class. So you want to find the customer ID number 38016, you only need to search for a list of 3 starts. If there is a 1,000 record, each list has average length of 100 (1,000 records being divided into 10 lists), then searching the average comparison number of records is down 50 (see Figure 1).

Of course, if the client number is starting with 0, one tenth is one tap, and so, so this method is very suitable. If 90% of the customer number starts with 0, then the list will have 900 records, each finding average, 450 comparisons. In addition, 90% of the program needs to be executed are for numbers at the beginning of 0. Therefore, the average comparison has greatly exceeded the range of simple mathematics operations.

If we can assign records in our list in such a way, it will be better, that is, each list has the same entry record, regardless of the distribution of numbers in the key value. We need a way to mix the customer number and better distribute the results. For example, we can take each digit in the number, multiplied by a big number (different from the number of numbers), then add the result to a total, put this number with 10, and the remainder Index (INDEX). When reading the record, the program runs this hash (haveh) function on the customer number to determine which list belongs to. When the user needs to query, the same hash function is used as a "key" for the customer number, so you can search the correct list. A data structure like this is called a hashtable.

Hashtables in Java

Java contains two classes, java.util.hashtable and java.util.hashmap, which offer a wide range of HashTable mechanisms. These two classes are similar, usually provided with the same public interface. But they do have some important differences, I will talk later.

HashTable and Hashmap objects allow you to combine a key and a value and use the PUT () method to enter this to the key / value. Then you can get this value as a parameter by calling the get () method. As long as the two basic requirements are met, Key and Value can be any object. Note that because KEY and VALUE must be an object, the primitive type must be converted to an object by using methods such as Integer (int). In order to use a specific class object to be used to make a key, this class must provide two methods, equals () and havehcode (). These two methods are in java.lang.Object, so all classes can inherit these two methods; however, these two methods are generally not used in the Object class, so you usually need to overload these two method.

The equals () method compares its objects to another object if the two objects represent the same information, returns True. This method also views and make sure that both objects belong to the same class. If the two reference objects are exactly the same object, Object.Equals () returns True, which means why this method is usually not very suitable. In most cases, you need a method to compare a field, so we think that different objects representing the same data are equal.

The HashCode () method generates an int value by performing a hash function by using the content of the object. HashTable and havehmap use this value to calculate a pair of Key / Value in which bucket (or list) is located.

As an example, we can check the String class because it has its own way to implement these two methods. String.equals () compares two String objects a character, if the string is the same, return true:

String myname = "Einstein";

// the folowing test is

// ALWAYS TRUE

IF (MyName.equals ("Einstein"))))

{...

String.hashcode () runs the hash function on a string. The digital code of each character in the string is multiplied by 31, depending on the location of the characters in the string. These calculations are then added to obtain a total. This process seems to be very complicated, but it ensures better distribution value. It also proves how far you can go when you develop your own havehcode () method, the result is unique.

For example, suppose I use a HashTable to implement a book directory, and search the ISBN number of the book as a search button. I can use the String class to carry details and are ready for equals () and havehcode () methods (see List 1). We can add pairs of key / value to havehtable (see List 2) with the PUT () method.

The PUT () method accepts two parameters that belong to the Object type. The first parameter is Key; the second parameter is Value. The PUT () method calls key's HashCode () method, with this result with the number of lists in the table. Determine the remainder as an index value to determine which list is added to. Note that Key is unique in the table; if you use a already existing key to call Put (), the matching entry is modified, so it refers to a new value, and the old value is returned ( When Key does not exist in the table, the PUT () returns a null value).

To read a value in the table, we use the search key for the get () method. It returns an Object reference to the correct type: BookRecord Br = (BoookRecord) isbntable.get (

"0-345-40946-9");

System.out.println (

Author: " Br.author

"Title:" Br.TITLE);

Another useful method is remove (), which is almost the same as get (), which removes the entry from the table and returns to the call.

Your own class

If you want to use an original type to make a key, you must create an equivalent type of object. For example, if you want to use an integer key, you should use the constructor Integer (int) to generate an object from the integer. All encapsulation classes such as Integer, Float and Boolean are all aspects of the original value, they overload the equals () and havehcode () methods, so they can be used as KEY. Many of the other classes provided in JDK are like this (even the HashTable and HashMap classes achieve their own equals () and havehcode () methods), but you should view the files before doing any class objects, you should view the file. To view the source, see how equals () and havehcode () are implemented, it is also necessary. For example, Byte, Character, Short, and Integer returns the integer value represented as the hash code. This may be suitable, or you may not suit your needs.

Use HashTables in Java (continued)

-------------------------------------------------- ------------------------------

If you want to create a HashTable, this hashtable uses a class of the class yourself as a key, then you should confirm that this class's equals () and havehCode () methods provide useful values. First check the class you extend, determine if its implementation meets your needs. If not, you should overload the method.

Any basic design constraint for any equals () method is that if the object passed to it belongs to the same class, and its data field is set to a value indicating the same data, then it should return True. You should also be confident that if you pass an empty parameter to this method, then your code returns false: Public Boolean Equals (Object O)

{

IF ((o == null)

||! (o instanceof myclass)))

{

Return False;

}

// Now Compare Data Fields ...

In addition, when designing a HashCode () method, you should remember some rules. First, the method must return the same value for a specific object, regardless of this method being called (of course, as long as the content of the object is not changed, when you use an object to make a HashTable key, This should be avoided). Second, if two objects defined by your equals () method are equal, they must also generate the same hash code. Third, this is more like a policy, not a principle, you should try to design a method to generate different results for different object content. If the occasional different objects have the same hash code, this is not tight. However, if the method can only return a value ranging from 1 to 10, only 10 lists can be used, regardless of how many lists in HashTable. When designing Equals () and hashcode (), the other to remember is a performance problem. Each time you call PUT () or get (), it includes calling hashcode () to find the correct list, when the list is scanned to find KEY, it calls Equals () for each element in the list. Implement these methods to run as fast as possible, especially when you intend to make your class publicly available, because other users may want to use your in high-performance applications in high performance applications. class.

HashTable performance

The main factor affecting the HashTable efficacy is the average length of the list in the table, as the average search time is directly related to this average length. Obviously, to reduce the average length, you must increase the number of lists in HashTable; if the list is very large, most lists or all lists contain only one record, you will get the best search efficiency. However, this may be too much. If your HashTable's list is much more than the data entry, then you don't have to do such a memory, and in some cases, people can't accept such practices.

In our previous example, we know how many records we have ?? 1,000. After knowing this, we can decide how many lists should be included in our HashTable to achieve the best compromise between search speed and memory usage efficiency. However, in many cases, you don't know how many records you have to handle; the data read by the data may continue to expand, or the number of records may have changed a lot one day.

With the increase of the entries, the HashTable and HashMap classes handle this problem by dynamically expanding the form. Both classes have the first quantity of constructor in the table, and a load factor as a parameter: Public HashTable

Int initialcapacity,

Float loadFactor)

Public hashmap

Int initialcapacity,

Float loadFactor)

These two numbers are multiplied by a critical value. Each time a new entry is added to the hash table, the count is updated. When the count exceeds the critical value, the table is reset (Rehash). (The number of lists is increased to the previous quantity plus 1, all entries are transferred to the correct list.) The default constructor sets the initial capacity of 11, the load factor is 0.75, so the threshold value is 8. When the ninth record is added to the table, the hash table is re-adjusted, and it has 23 lists, and the new critical value will be 17 (23 * 0.75 integer part). As you can see, the load factor is the upper limit of the average list in the hash table, which means that in the default, there are many lists that contain more than one record. Compare our initial example, in that example, we have 1,000 records, distributed in 10 lists. If we use the default value, this table will extends to 1,500 lists. But you can control this. If the number of lists multiplied by the load factor is greater than the number of entries you handle, the table will never be reproduced, so we can imitate the following example: // Table Will Not Rehash Until IT // HAS 1,100 Entries (10 * 110) :

Hashtable myhashtable =

New hashtable (10, 110.0f);

You may not want to do this, unless you don't save memory, and don't mind additional search time, this may happen in an embedded system. However, this method may be useful because the reset is very occupied, and this method can guarantee that this will never reset this.

Note that although PUT () can make the table increase (the number of lists is increased), the reclaim () does not have opposite results. So if you have a big table, and from it removed most of the entries, you will have a big but most of the empty table.

Hashtable and havehmap

There are three important differences between HashTable and HashMap classes. The first difference is mainly historical reasons. Hashtable is an implementation of the Old Dictionary class, HashMap is an implementation of the MAP interface introduced by Java 1.2.

Perhaps the most important difference is that the HashTable method is synchronized, while the HashMap method is not. This means that although you can use any special behavior to use a HashTable in a multi-threaded application, but you have to provide an external synchronization for a HashMap. A convenient method is to use the Static SynchronizedMap () method of the Collections class, which creates a thread secure MAP object and returns it as a package object. The method of this object allows you to synchronize the potential HashMap. The result of this is that when you don't need to synchronize, you can't cut off the synchronization in HashTable (such as in a single-threaded application), and synchronize has added a lot of processing costs.

The third point is different, only HashMap allows you to take null values ​​as the key or value of the entry of a table. There is only one record in HashMap, which can be an empty key, but any number of entries can be empty Value. That is to say, if there is no search button in the table, or if the search button is found, it is an empty value, then get () will return NULL. If necessary, use the ContaInKey () method to distinguish these two situations.

Some information suggestions when you need to synchronize, use HashTable, which is used to use HashMap. However, because when needed, HashMap can be synchronized, the HashMap's function is more than the HashTable function, and it is not based on an old class, so some people think that in all cases, Hashmap is preferred in HashTable. About Properties

Sometimes you might want to use a HashTable to map the key string to the Value string. There are some examples in DOS, Windows, and UNIX, such as the string Path of Key is mapped to the Value string C: / Windows; C: / Windows / System. Hashtables are a simple way to say these, but Java provides another method.

The Java.util.Properties class is a subclass of Hashtable, designed for String Keys and Values. The usage of the Properties object is similar to the use of HashTable, but the class adds two ways to save time, you should know.

The Store () method saves the content of a Properties object to a file in a readable form. The LOAD () method is just the opposite to read the file and set the Properties object to include Keys and Values.

Note that because Properties expands HashTable, you can add keys and values ​​that are not String objects with a superclass's PUT () method. This is not available. Also, if you use the Store () for a Properties object that does not contain the String object, Store () will fail. As an alternative to PUT () and GET (), you should use setProperty () and getProperty (), they use String parameters.

Ok, I hope that you can now know how to use HashTables to speed up your handle.

About the Author:

Pete Ford has been done for more than 20 years of software development, and he is mainly studied in an embedded system and Turnkey System. He lives and working in Dallas in Texas. You can contact him through p_ford@mindspring.com.

转载请注明原文地址:https://www.9cbs.com/read-84661.html

New Post(0)