Author: Wang Ke
Summary MD5's full name is Message-Digest Algorithm 5 (Information - Summary Algorithm), developed by Mit Laboratory for Computer Science and RSA Data Security Inc. in the 1990s, developed by MD2, MD3 and MD4. Its role is to make large capacity information to be "compressed" into a confidential format before signing private privacy with digital signature software (that is, transforms an arbitrary length of byte string into a certain length). Whether it is MD2, MD4 or MD5, they all need to obtain a random length of information and generate a 128-bit information summary. Although the structure of these algorithms is more similar, the design of MD2 is completely different from MD4 and MD5, because MD2 is designed for 8-bit machines, while MD4 and MD5 are 32-bit computers. The descriptions of these three algorithms have a detailed description in Internet RFCS 1321 (http://www.ietf.org/rfc/rfc1321.txt), this is the most authoritative document, by Ronald L Rivest submitted to IEFT in August 1992. Rivest developed an MD2 algorithm in 1989. In this algorithm, the information is first supplied to the information, so that the byte length of the information is a multiple of 16. Then, at a 16-bit test and append to the end of the information. And calculate the hash value based on this new information. Later, Rogier and Chauvaud found that if the test and the MD2 conflict will be generated. The result of the encryption of the MD2 algorithm is unique - neither repeat. In order to enhance the safety of the algorithm, RiveSt has developed an MD4 algorithm in 1990. The MD4 algorithm also needs to fill the information to ensure that the byte length of the information plus 448 can be divided by 512 (information byte length MOD 512 = 448). Then, one of the initial lengths of information indicated by 64-bit binary is added. The information is processed into blocks of 512-bit DAMG? RD / Merkle iterative structure, and each block is to be processed by three different steps. Den Boer and Bosselaers and others quickly discovered the first step and third steps in the MD4 version. Dobbertin demonstrates how to use a regular personal computer to find a conflict in the full version of the MD4 in a few minutes (this conflict is actually a vulnerability, which will result in encryption of different content but may get the same encryption. result). There is no doubt that MD4 is eliminated. Although the MD4 algorithm has such a large vulnerability in the security, it has a guiding role in the presence of several information security encryption algorithms that have been developed thereon. In addition to MD5, there are also SHA-1, RIPE-MD, and HAVAL, etc. One year later, that is, in 1991, Rivest developed technology more approached MD5 algorithm. It adds the concept of "safe-band" (safty-beelts) based on MD4. Although MD5 is slightly slower than MD4, it is safer. This algorithm is obvious by four and MD4 designs with a little different steps. In the MD5 algorithm, the size of the information - summary and the necessary conditions of the filler are identical to the MD4. Den Boer and Bosselaers have discovered pseudo-collisions in the MD5 algorithm, but there is no other resulting result. Van Orschot and Wiener have considered a function of violent search conflicts in hash, and they guess a machine that is designed to search for MD5 conflicts (this machine is approximately manufactured in 1994 It is a million US dollars) to find a conflict every 24 days.
However, from 1991 to 2001, there is no new algorithm for MD6 or other names that have alternative MD5 algorithms, and we can see that this flaw does not have much impact on MD5 security. All of these is not enough to become MD5 problems in practical applications. Also, since the use of the MD5 algorithm does not need to pay any copyright costs, in general (non-top secret applications. But even if the application is in the top secret field, MD5 is not a very good intermediate technology), MD5 It should be considered very safe.
The typical application of the algorithm application MD5 is a message-digest to prevent tampering. For example, there are many software in UNIX to have a file name in the download, the file extension is the file named .md5, usually only one line of text in this file, general structure, MD5 (tanajiya.tar.gz) = 0ca175b9c0f726a831d895e269332461 This is a digital signature of tanajiya.tar.gz file. MD5 uses the entire file as a large text message, producing this unique MD5 information summary through its irreversible string conversion algorithm. If in the process of transmitting this file later, regardless of any form changes in the content of the file (including the transmission error caused by the line instability during the download process), as long as you recall the MD5, you will find it. The summary of the information is different, thereby determining that you get just an incorrect file. If there is another third party certification body, use MD5 to prevent "reliability" of the author, this is the so-called digital signature application. MD5 is also widely used in encryption and decryption technology. For example, the user's password in the UNIX system is stored in the file system after the MD5 (or other similar algorithm) is encrypted. When the user logs in, the system calculates the password entered into the MD5 value, then goes and saved the MD5 value in the file system to compare, and then determine if the input password is correct. By this step, the system can determine the legality of the user login system without knowing the coding of the user's password. This not only avoids the user's password known by the user of the system administrator privilege, but also adds the difficulty of password being crack to some extent. It is precisely because of this reason, the method of hacked the most deciphering password is a method called "running". There are two ways to get a dictionary, one is a character string table for daily collections, the other is to generate the MD5 value of these dictionaries with the MD5 program, and then use the target The MD5 value is retrieved in this dictionary. We assume that the maximum length of the password is 8-bit bytes (8 bytes), and the password can only be a letter and a number, a total of 26 26 10 = 62 characters, and the number of items that are arranged in the group is P (62, 1) p (62, 2) .... P (62, 8), it is already a very an astronomical number, storing this dictionary requires a TB-level disk array, and this method has a premise, It is possible to get the password MD5 value of the target account. This encryption technology is widely used in UNIX systems, which is why the UNIX system is more robust than a general operating system. Algorithm Description The MD5 algorithm brief description can be: MD5 in a 512-bit packet to process the input information, and each group is divided into 16 32-bit sub-packets, after a series of processes, the output of the algorithm is from four 32-bit packets, which will generate a 128-bit quotation value after the four 32-bit grouping levels will be generated. In the MD5 algorithm, the information is first to be filled, so that the result of the byte length to 512 is equal to 448. Therefore, the byte length of the information will be extended to N * 512 448, namely N * 64 56 bytes, n is a positive integer. The filling method is as follows, and one 1 and countless 0 are filled in the rear of the information until 0 to the information of the information is stopped using 0. Then, the length of the pre-filling before this result is additionally indicated by 64-bit binary.
After these two steps, the current information byte length = N * 512 448 64 = (n 1) * 512, that is, the length is exactly the integer multiple of 512. The reason for this is to meet the requirements of the information length in the back processing. The MD5 has four 32-bit integer parameters of chaining variable, which are: a = 0x01234567, b = 0x89abcdef, c = 0xfedcba98, d = 0x76543210. When these four link variables are set, the four-wheel loop calculation of the algorithm is started. The number of cycles is the number of 512-bit information packets in the information. Copy the above four link variables to the other four variables: A to A, B to B, C to C, D to D. The main circulation has four rounds (MD4 only three rounds), each round of cycles are similar. The first round of 16 operations. Each of the A, B, C, and D is operated for a nonlinear function calculation, and then the resulting result is added to the fourth variable, a sub-packet and a constant of the text. The resulting result is then moved to the right ring and plus one of A, B, C or D. Finally, the results are used to replace one of A, B, C or D. Take it is the four nonlinear functions used in each operation (one per wheel). F (x, y, z) = (x & y) | ((~ x) & z) g (x, y, z) = (x & z) | (Y & (~ z)) h (x, y, z) = x ^ Y ^ z i (x, y, z) = y ^ (x | (~ z)) (& Yes, | Yes, ~ is not, ^ is different or) these four functions: If x, The corresponding position of Y and Z is independent and uniform, and each bit of the result should also be independent and uniform. F is a function of bitmaping. That is, if x, then y, otherwise z. The function h is a bitmap parity. Suppose MJ represents the jth packet (from 0 to 15), << ff (A, B, C, D, MJ, S, Ti) ((A (F (B, C, D) MJ TI) <<
GG (A, B, C, D, MJ, S, Ti) represents A = B ((A (G (B, C, D) MJ Ti) <<
HH (A, B, C, D, MJ, S, Ti) represents A = B ((A (H (B, C, D) MJ Ti) <<
II (A, B, C, D, MJ, S, Ti) represents A = B ((A (i (B, C, D) MJ Ti) <<
These four rounds (64 steps) are:
first round
FF (A, B, C, D, M0, 7, 0xD76AA478)
FF (D, A, B, C, M1, 12, 0XE8C7B756)
FF (C, D, A, B, M2, 17, 0X242070DB)
FF (B, C, D, A, M3, 22, 0XC1BDCEEE)
FF (A, B, C, D, M4, 7, 0xF57c0FAF)
FF (D, A, B, C, M5, 12, 0X4787C62A)
FF (C, D, B, M6, 17, 0XA8304613)
FF (B, C, D, A, M7, 22, 0xFD469501)
FF (A, B, C, D, M8, 7, 0x698098d8)
FF (D, A, B, C, M9, 12, 0X8B44F7AF)
FF (C, D, A, B, M10, 17, 0xFFFF5BB1)
FF (B, C, D, A, M11, 22, 0X895CD7BE)
FF (A, B, C, D, M12, 7, 0x6b901122)
FF (D, A, B, C, M13, 12, 0xFD987193)
FF (C, D, A, B, M14, 17, 0XA679438E)
FF (B, C, D, A, M15, 22, 0X49B40821)
second round
GG (A, B, C, D, M1, 5, 0xF61e2562)
GG (D, A, B, C, M6, 9, 0xc040b340)
GG (C, D, A, B, M11, 14, 0X265E5A51)
GG (B, C, D, A, M0, 20, 0XE9B6C7AA)
GG (A, B, C, D, M5, 5, 0xD62F105D)
GG (D, A, B, C, M10, 9, 0x02441453)
GG (C, D, B, M15, 14, 0XD8A1E681)
GG (B, C, D, A, M4, 20, 0XE7D3FBC8)
GG (A, B, C, D, M9, 5, 0X21E1CDE6)
GG (D, A, B, C, M14, 9, 0XC33707D6)
GG (C, D, A, B, M3, 14, 0xf4d50d87)
GG (B, C, D, A, M8, 20, 0X455A14ED)
GG (A, B, C, D, M13, 5, 0XA9E3E905)
GG (D, A, B, C, M2, 9, 0xfcefa3f8)
GG (C, D, A, B, M7, 14, 0X676F02D9)
GG (B, C, D, A, M12, 20, 0X8D2A4C8A)
Third round
HH (A, B, C, D, M5, 4, 0xFFFA3942)
HH (D, A, B, C, M8, 11, 0X8771F681)
HH (C, D, A, B, M11, 16, 0X6D9D6122)
HH (B, C, D, A, M14, 23, 0XFDE5380C)
HH (A, B, C, D, M1, 4, 0XA4BEEA44)
HH (D, A, B, C, M4, 11, 0X4BDECFA9)
HH (C, D, A, B, M7, 16, 0XF6BB4B60)
HH (B, C, D, A, M10, 23, 0XBEBFBC70)
HH (A, B, C, D, M13, 4, 0X289B7EC6)
HH (D, A, B, C, M0, 11, 0XEAA127FA)
HH (C, D, A, B, M3, 16, 0xD4ef3085)
HH (B, C, D, A, M6, 23, 0X04881D05)
HH (A, B, C, D, M9, 4, 0xD9D4D039)
HH (D, A, B, C, M12, 11, 0XE6DB99E5)
HH (C, D, A, B, M15, 16, 0X1FA27CF8)
HH (B, C, D, A, M2, 23, 0XC4AC5665)
Fourth round
II (A, B, C, D, M0, 6, 0xF4292244)
II (D, A, B, C, M7, 10, 0X432AFF97)
II (C, D, B, M14, 15, 0XAB9423A7)
II (B, C, D, A, M5, 21, 0xFC93A039)
II (A, B, C, D, M12, 6, 0x65559c3)
II (D, A, B, C, M3, 10, 0X8F0CCC92)
II (C, D, A, B, M10, 15, 0xffeff47d)
II (B, C, D, A, M1, 21, 0X85845DD1)
II (A, B, C, D, M8, 6, 0x6fa87e4f)
II (D, A, B, C, M15, 10, 0XFE2CE6E0)
II (C, D, B, M6, 15, 0XA3014314)
II (B, C, D, A, M13, 21, 0X4E0811A1)
Ii (A, B, C, D, M4, 6, 0xf7537e82)
II (D, A, B, C, M11, 10, 0XBD3AF235)
II (C, D, A, B, M2, 15, 0X2AD7D2BB)
II (B, C, D, A, M9, 21, 0XEB86D391)
The constant TI can choose from:
In the first step, Ti is 4294967296 * ABS (SiN (i)) integer part, and the unit of I is an arc. (4294967296 equal to 2) 32 times)
After all of these completed, A, B, C, and D were added to A, B, C, D, respectively. Then use the next packet data to continue the algorithm, and the last output is cascaded of A, B, C, and D.
When you implement the MD5 algorithm according to the method I mentioned above, you can use the following information to make a simple test you made, see if there is any error. MD5 ("") = D41D8CD98F00B204E9800998ECF8427E
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("ABC") = 900150983cd24fb0d6963f7d28e17f72
MD5 ("Message Digest") = F96B697D7CB7938D525A2F31AAF161D0
MD5 ("AbcdefghijklmnopqrStuvwxyz") = C3FCD3D76192E4007DFB496CCA67E13B
MD5 ("AbcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrStuvwxyz0123456789) =
D174AB98D277D9F5A5611C2C9F419D9F
MD5 ("123456789012345678901234567890123456789012345678901234567890123456789
01234567890 ") = 57EDF4A22BE3C955AC49DA2E2107B67A
If you use the information on the above information to test the MD5 algorithm instance, finally, the conclusions and standard answers are exactly the same, then I will have a congratulations here. To know, my program is not derived from the same result when the first compilation is successful.
MD5's security MD5 relative to MD4 improvements: 1. Increase the fourth round; 2. Each step has a unique addition constant; 3. To reduce the symmetry of function G in the second round, (X & Y) | (x & z ) | (Y & z) | (Y & (~ z)); 4. The first step plus the previous result, which will cause faster avalanches; 5. Change the second round and The sequence of the message sub-packet is accessed in the three-wheel, which makes it more similar; 6. Approximately optimize the loop left shift shift in each round to achieve a faster avalanche effect. The displacement amount of each wheel is different. The MD5 source program has a source program that implements the MD5 algorithm with the C language in RFC1321. If you need to implement in Java or like PHP, C #, as long as you make some simple changes to the C code, It should be easier to implement.