In-depth Hashcode method

xiaoxiao2021-03-06 128

Go Deep Into Hashcode

Why is it important for the object? Is there a simple haveh algorithm for the object, although it does not call the true algorithm compared to those true complex Hash algorithms, how it is not just a program The programming level of programming, but related to your object is a very important relationship of the performance. It is possible, and the different hashcode may cause your object access to generate, a hundred thousand performance differences.

Let's take a look at the two important data structures in Java: HashMap and Hashtable, although they have a big difference, such as inheritance relationship, different constraints for Value (whether null is allowed), and thread safety, etc. With a specific difference, it is consistent from the principle of implementation. So, we only explain with HashTable:

In Java, accessing data performance, which is generally the first push group, but in the container selection slightly larger data, HashTable will have a higher query speed than the array performance. See the following content.

When stored data, HashTable generally will first do the HashCode and 0x7fffffffFFFFFFFFFFFFFF, because the HashCode of an object can be negative, so that it can guarantee it as a positive integer. Then take the length of HashTable, get it The index of the value object in the HashTable.

INDEX = (O.hashcode () & 0x7fffffff)% hs.length; this value object will be placed directly in the HashTable of the INDEX location, which is written, this and array, put an object in the INDEX position. But if it is a query, after the same algorithm, HashTable can get index through Key,

This value object is obtained from Sort Index, and the array has to do a loop comparison. Therefore, the HashTable query has higher performance than the data.

Although different objects have different havehcode, different HashCode has passed through the length of the length, which is likely to produce the same index.

In extreme cases, there will be a large amount of objects to produce an identical index. This is the most important issue for the relationship of HashTable performance:

Hash conflict.

Common HASH conflicts are different KEY objects eventually produce the same index, and a very least absolutely rare Hash conflict is that if a set of objects have a large number of int range, and the length of HashCode can only be in the int range. Therefore, there must be the same group of elements to have the same HashCode, so they will have the same index. Of course, this extreme situation is very little, can be taken, but for the same Hashcode, The same index is produced, or different objects have the same HashCode, of course, have the same index.

In fact, a well-designed HashTable. Generally, it is averaged to distribute each element. Because the length of HashTable is always in a certain proportion than the number of actual elements (the filling factor is generally 0.75), so that Most index positions have only one object, and few locations will have several elements. So each location in HashTable is a list, for only one object is position, the linked list has only one first node (entry), entry Next is NULL. Then there is Hashcode, Key, the Value property saves the HashCode, Key, and Value of the location of the location. If there is an object of the same index, you will enter the next node of the list. If you are in the same index There are multiple objects, and the objects that match the KEY of the query can be found in this linked list according to HashCode and Key.

From above, I can see that there is a significant impact on the access performance of HashMap and HashTable. The elements in the data structure may have different Hashcode as possible, although this does not guarantee different Hashcode to produce different index. However, the same HashCode must produce the same index, which affects the production of HASH conflicts.

For an icon, if you have a lot of properties, all properties are involved in the hash, obviously a clumsy design. Because the object's hashcode () method is almost automatically called, such as Equals compare, if too many objects participate Human. Then the required operating constant time will increase great. So, which attributes are selected to participate in the hash absolutely a program level problem.

For the implementation, the general Hashcode method will:

Return Attribute1.hashcode () attribute1.hashcode () .. [ super.hashcode ()] Operation, if the properties of an object have not changed, still have to be calculated each time, so if a tag is set, the current hash code is set, as long as it is recalculated when participating in the hash object changes, otherwise the Cache has a Hashcode This can greatly improve performance from a large extent.

The default implementation is to convert the internal address of the object into an integer as a Hashcode, which can certainly guarantee that each object has different Hascode, because the internal addresses of different objects are certain (nonsense), but the Java language does not allow programmers to get the internal address of the object. So let each object produce different HashCode with many researchable technologies.

If you have an average distribution of HashCode properties from multiple properties, this is a contradictory location of performance and diversity. If all properties are involved in the hash, of course, the diversity of HashCode will be greatly improved, but sacrificed performance And if only a small amount of attribute sampling hash, extreme cases produce a large number of hash conflicts, such as the "people" properties, if you use gender rather than your name or date of birth, it will only have two or more The selected HashCode value will generate more than half of the hash conflict. So if possible, specifically produce a sequence to generate HashCode will be a good choice (of course, the performance of the sequence is more than all attributes involved in the performance of the hash In the case of high conditions, it is better to use all attributes to have been used directly.

How to get a balance for the performance and diversity of HashCode, can refer to the relevant algorithm design, in fact, do not necessarily require very good, as long as we can reduce the gathering of the column value. Important is that we should remember Hashcode Our program performance has the effects of vitality, and should be noted when programming.

From the above process we can see that the Object class costs a very strong completion of a Hashcode function, but in fact, we

How many opportunities for objects are used to do a HASH data structure? It can be said that more than 95% of the time, an object's Hashcode

It is a waste because you can't use it at all. Unfortunately, this HashCode is designed to be an Object class. In fact, it should be more

Define in an interface. When you need to use an object as a HASH data structure R Key by this data structure check it is achieved.

Hashcode, while other objects do not need to consider Hashcode.

Also reminded, don't use HashCode as a uniqueness of persistence. For example, you put "mypasswd" hashcode

Saves the database as an encryption method of the coded. When your JDK version changes, String's hashcode is entirely likely to be used.

Realization of new ways. And your original hashcode will not be verified at all, and if you are invalidated all users.

转载请注明原文地址:https://www.9cbs.com/read-87544.html

9cbs

New Post(0)