Fast HASH Corceration Algorithm Based on English Words.

xiaoxiao2021-03-06 156

Because there are project needs, be a software similar to ispell, which produces a large number of lookup operations for words, so after a rollover study, the following hash algorithm is obtained, and the distribution of FNV Hash algorithm is verified more than the general surfacket The curve is basically no different, and under most different dictionary, this algorithm is more faster than the FNV Hash algorithm of the surfacked table, more uniform distribution. But because of the experimental results, there is no effective mathematical inference, but from a large number of different dictionary test data, this algorithm is really efficient. Since I didn't involve the design of the relevant plain algorithm, I just started, I would like to use a Hash, such as using%, in addition to big prime numbers, then built a more strong test environment, and then intend to come according to the test results Improve the model of the HASH algorithm. At the beginning, my Hash function is like this: unsigned int hash_func (char * str, int LEN) {register unsigned int sum = 0; register char * p = Str; While (P - str

It is quite big to see the shock, so how can it improve? First, I think of the possible conflicts: ABCD and ACBD, for these two words, if the above hash function, what will be collisted, why? Because each character is less about its own location information, the first improvement version of the Hash function adds its location information to each character, and improve the function described above:

Unsigned int 6_func (char * str, int LEN) {register unsigned int sum = 0; register char * p = Str; while (p - str

To a certain extent, it is much better than the distribution map generated by the position information, but it is still very uneven. Then analyze the reason for the uneven distribution, because the multiplication is used, so it is still too dependent on the result of the alphabet. So use the xor operation, select the following functions:

Unsigned int 6_func (char * str, int LEN) {register unsigned int sum = 0; register char * p = Str; while (p - str

Although the above figure, although the oscillating is compared, the Regression Line made is significantly more flat than the two images. But the results are still very bad, from 800 to 100 Range too much. The reason is or because the data distribution is not uniform, so thinking that the separate use of adding is not very good, according to the process of checking the table hash algorithm, it is found to use the high and low to combine the final result, so I also Using their method: unsigned int hash_func (char * str, int LEN) {register unsigned int sum = 0; register unsigned int h = 0; register char * p = str; while (p - s len) {register unsterned Short a = * (p ); sum ^ = a * (p - str); h ^ = A / (p - str);} return ((SUM << 16) | h)% max_prime_less_than_hash_len;}

Get the ultimate image:

Finally, it is concluded that the method of checking the table is not compatible, and the method of correcting the character itself by the string itself can obtain the result of the very satisfactory HASH function, after changing several different dictionaries for testing, derived The image is generally consistent with the above figure, very satisfactory. For this item, how to check the word error, and automatic correction, etc.

转载请注明原文地址:https://www.9cbs.com/read-96087.html

9cbs

New Post(0)