Hashing function is widely used in web applications, signed from digital sign, error detection, login verification, to compressed storage space, because of its principle complication, many people confuse it with encryption, for how Using Hash Function, how to choose the right Hash Function, and its advantages and disadvantages are unclear, this article tries to answer these questions.
Briefly, Hashing is a map of a data image (Algorithm), usually used to image a large string of data to a fixed length, shorter data, this fixed length data is called Hashing Value (Has a hash value).
For example, we add a string of any length of the English letters, add each character's ASCII value, and finally divide the remainder obtained by 256 as a Hash Value. The length of the strings input is not limited, and the value of the output must be Between 0 and 255, it is a legitimate Hashing function.
The above Hash Function is only 256 possible Hash Value, which is clear that there are many strings that will get the same Hash Value. In this case, we call Hash Collision, or Collision, in fact, from an uncertainty. The data is not avoidable, and the collision is unavoidable. We don't have to don't have Collision, just try the chance of collision to minimize, if you really don't have Collision, Hash Value must As in the length of the input data, this violates the design purpose of the Hash Function.
The haSHING Function of the real application is usually more complicated, more famous, including MD4, MD5, SHA1, SHA256, etc., their Hash Value's number of tens of times to hundreds of times. In fact, anyone else can design a Hashing Function, but based on the practical use of Hashing Function, we have some basic requirements for Hashing Function. Before further explaining, let us see what is common use of Hashing. Hashing
1.
Digital sign
A lot of offers downloaded websites will list the Hash Value of the download file on the web, which is more common is the MD5 code. The downloaded person can calculate the Hash Value of the downloaded archive to match the website, thus verify this program. Whether I have been modified, this process is a digital sign. The concept of digital signing can be applied in a lot of communication, for example, you have to send a very important email to others, in order to let the recipient rest assured that the content is not changed by others during the transfer, you can tell the recipient electronics The MD5 code for the mail, let him verify itself.
In this use, the ideal Hashing Function should have two features, first of all, any Hash Value change produced by the original file will make the change will be changed; the second is that there is no way to know how to move the original file. Hash Value is the same.
Of course, we must also ensure that Hash Value will not be intercepted and modified on the way of transfer, but this is a problem with communication security, which exceeds the discussion of Hash Function. 2.
Error detection
When the information is transmitted on the network, it will be subject to many interference, including network problems, computer hardware issues, computer program, etc., in order to verify the correctness of the information, we can send the information to the information Hash Value For those who make the recipient to confirm the correctness of the data over the computational Hash Value and the received haveh value. In this type of use, the ideal Hash Function is similar to the above requirements, that is, any Hash Value change produced by the original information. 3.
Login verification
The system password stored on the server is risky. The first do this is equal to the security assignment of the password to the server manager. Are they reliable? Don't forget that the password will leave the back is you instead of them; the second a lot of users apply the same password in many different systems (so, they are of course very bad, but you can't limit the user), When a system is hacked to leak the user's password, they also open at the same time in other systems, and the consequences can be very serious. In order to ensure the user, the design is not directly stored directly, and the password will only store the password's Hash Value. The password entered when the user logins will be converted to Hash Value, and then compare the Hash Value stored on the server to identify.
The Hash Function of this use must be impossible to return the original password from Hash Value. In addition, due to Collision, as long as a password is found, its Hash Value is the same as the Hash Value of the user's password, you can impel the user login system, not to know the true password, so the number of Hash Value must be very large. Make the possibility of collision is very low, making people looking for this "fake" password to pay great consideration. 4.
Compressed storage space
One of the most classic uses of Hash Function is to make a Hashing Table, which can be said to be an associated array (ASSOCIATIVE ARRAY), the indicator of the array is some of the unproductive data or more complex data structures, many high-level programming Language includes PHP, Perl, Gawk, etc. Support connected arrays, the principle behind the use of Hash Function to convert these data into numbers, and then read the elements in the array. In most cases, the data as an array indicator can be very large, but the length of the array (the number of elements) is relatively small, so the conflict will be highlighted, from the user (programmer) angle conflict Needless, different data should correspond to different array locations, so these languages have certain methods to handle conflicts.
The advantage of using Hash Table is the benefit of the associated array is the high speed of the search data. No matter how much information, the speed of search is fixed, this is important for applications to handle large amounts of data.
What is the Hashing tool?
Hash FunctionShash Value's Length (BIT) CRC3232MD5128SHA-1160
(You can use Hash_Algos () after PHP5.12, and you can know 35 algorithms from the manual; view the manual)
Before php5, we only had CRC32, MD5 and SHA1 three built-in Hash function, which output the Hash Value as follows: Hash Functions Hash Value's Length (BIT) CRC32 32 MD5 128 SHA-1 160
SHA-1 can be said to be the most people used by Hash Function, because its Hash Value is much smaller than other big, Collision's opportunity. Secondly, the HASHING FUNCTIONS is designed by the NSA - National Security Agency and is listed as part of the US Federal Information Treatment Standard, so many complex security schemes such as SSL are used. SHA-1. There are two of the PHPs that need to be installed with a library support for more Hash Function, which is Mhash and Hash, Hash starts to list the standard modules from php 5.1.2, and do not need to be compiled or installed, more and more people use . Some more advanced Hash Functions than SHA-1 can be found in these two libraries, such as SHA-256 and SHA-512, which belong to the SHA-2 family, but because SHA-1 has a long history, many systems Continue to use it, especially for logging in to log in with SHA-1, because of its non-revertibility of Hash Function, it is difficult to switch to other Hash function.
The method of using SHA-1 is simple (PHP is very simple, isn't it?):
Echo Sha1 ("I am a happy boy");
Hash's usage is also very simple:
Echo hash ("SHA256", "I am a happy boy.");
Hash supports a lot of Hash Function, you can view your PHP version support with Hash_algo:
Print_r (hash_algos ());