Hash Table is implemented in MUDOS

zhaozj2021-02-16  83

This article describes the hash functions used in MUDOS, and has made a simple analysis of the hash table addressing operations that package the hash function, and finally an Object Hash Table in Mudos implements a simplified hash table.

In MUDOS, the application of the hash table is very wide, it can be said that all places that use the lookup have been used to have a hash (have the advantage of its efficiency, ideal, search, insert, and delete the operation time. Both O (1), in the application, although there is no such ideal state, but compared to other data structures, the advantages of Hash Table are still significant. In a non-desirable state, the method of processing collisions in the MUDOS source code is to use a linked list in a non-ideal state.

Hash function (haveh function)

In the latest released MUDOS source (V22.2B14), hash function is defined in Hash.c:

/ * hash.c * /

/ *

** A Simple and Fast Generic String Hasher Based on Peter K. Pearson's

** Article In CaCM 33-6, PP. 677.

* /

#define edit_source

#define no_sockets

#define no_opcodes

#include "std.h"

Static int t [] =

{

1, 87, 49, 12, 176, 178, 102, 166, 121, 193, 6, 84, 249, 230, 44, 163,

14, 197, 213, 181, 161, 85, 218, 80, 64, 239, 24, 226, 236, 142, 38, 200,

110, 177, 104, 103, 141, 253, 255, 50, 77, 101, 81, 18, 45, 96, 31, 222,

25, 107, 190, 70, 86, 237, 240, 34, 72, 242, 20, 214, 244, 227, 149, 235,

97, 234, 57, 22, 60, 250, 82, 175, 208, 5, 127, 199, 111, 62, 135, 248,

174, 169, 211, 58, 66, 154, 106, 195, 245, 171, 17, 187, 182, 179, 0, 243,

132, 56, 148, 75, 128, 133, 158, 100, 130, 126, 91, 13, 153, 246, 216, 219,

119, 68, 223, 78, 83, 88, 201, 99, 122, 11, 92, 32, 136, 114, 52, 10,

138, 30, 48, 183, 156, 35, 61, 26, 143, 74, 251, 94, 129, 162, 63, 152,

170, 7, 115, 167, 241, 206, 3, 150, 55, 59, 151, 220, 90, 53, 23, 131,

125, 173, 15, 238, 79, 95, 89, 16, 105, 137, 225, 224, 217, 160, 37, 123,

118, 73, 2, 157, 46, 116, 9, 145, 134, 228, 207, 212, 202, 215, 69, 229,

27, 188, 67, 124, 168, 252, 42, 4, 29, 108, 21, 247, 19, 205, 39, 203,

233, 40, 186, 147, 198, 192, 155, 33, 164, 191, 98, 204, 165, 180, 117, 76, 140, 36, 210, 172, 41, 54, 159, 8, 185, 232, 113, 196, 231, 47, 146, 120,

51, 65, 28, 144, 254, 221, 93, 189, 194, 139, 112, 43, 71, 109, 184, 209,

}

Inline int

Hashstr P3 (Char *, S, / * String to Hash * /

INT, MAXN, / * MAXIMUM NUMBER of Chars to consider * /

Int, HASHS

{

Register unsigned int h;

Register unsigned char * p;

H = (unsigned char) * s;

IF (h) {

IF (Hashs> 256) {

Register int oh = t [(unsigned char) * s];

For (p = (unsigned char *) s 1; * p

&&p <= (unsigned char *) s maxn; p ) {

H = t [h ^ * p];

OH = T [OH ^ * P];

}

H | = (OH << 8);

Else

For (p = (unsigned char *) s 1; * p

&&p <= (unsigned char *) s maxn; p )

H = t [h ^ * p];

}

Return (int) (hmas);

}

/ *

* Whashstr Is Faster Than Hashstr, But Requires An Even Power-of-2 Table Size

* Taken from Amylars Driver.

* /

Inline int

Whashstr P2 (Char *, S, Int, MaxN)

{

Register unsigned char oh, h;

Register unsigned char * p;

Register INT i;

IF (! s) returnograph;

IF (! * s)

Return 0;

p = (unsigned char *) s;

OH = T [* P];

H = (* (p ) 1) & 0xFF;

For (i = maxn - 1; * p && --i> = 0;) {

OH = T [OH ^ * P];

H = T [h ^ * (p )];

}

Return (OH << 8) h;

}

Note The two functions are published in Pearson, published in Communications of The ACM Archive Volume 33, Issue 6 (June 1990), article topics are: Fast Hashing of Variable-Length Text Strings. Whashstr function takes place from Amylars Driver - another MUD DRIVER? This function is faster than the first function HashStr speed, so the WhashStr function is used in MUDOS to implement the hash table.

I didn't find the article of Pearson, and there was no knowledge for the array of T [], and found another algorithm and a hash function to find another T: - Use an Array of 256 1-byte Values

Ub1 x [0..255];

- Fill it with the value of the value 0 to 255 in some pseudorandom ORDER

- (This Is Derived from the alleged rc4)

{for i in 0..255 DO

{x [I]: = i;

}

}

K = 7;

{for j in 0..3 do

{for i in 0..255 DO

{s = x [i];

K = (k s) MOD 256;

x [i] = x [k];

X [K] = S;

}

}

}

- Use the length of the message to initialize the hash

- xxx can be 0, or some constant, or some argument

Hash: = (XXX LEN) MOD 256;

- Hash The Characters in the message one byte at a time

{for i in 0..LEN-1 DO

{have: = (Hash Message [i]) MOD 256; hash: = x [hash];}}

Write a C function to generate T.

Void gen_t (int * vec)

{

INT I, J, S, K = 7;

For (i = 0; i <256; i)

{

VEC [i] = i;

}

For (j = 0; j <4; J)

{

For (i = 0; i <256; i)

{

S = VEC [I];

K = (k s)% 256;

VEC [i] = vec [k];

VEC [k] = s;

}

}

}

So, we have another hash function:

Static int simulate_t [] =

{

49, 118, 63, 252, 13, 155, 114, 130, 137, 40, 210, 62, 219, 246, 136, 221,

174, 106, 37, 227, 166, 25, 139, 19, 204, 212, 64, 176, 70, 11, 170, 58,

146, 24, 123, 77, 184, 248, 108, 251, 43, 171, 12, 141, 126, 41, 95, 142,

167, 46, 178, 235, 30, 75, 45, 208, 110, 230, 226, 50, 32, 112, 156, 180,

205, 68, 202, 203, 82, 7, 247, 217, 223, 71, 116, 76, 6, 31, 194, 183,

15, 102, 97, 215, 234, 240, 53, 119, 52, 47, 179, 99, 199, 8, 101, 35,

65, 132, 154, 239, 148, 51, 216, 74, 93, 192, 42, 86, 165, 113, 89, 48,

100, 195, 29, 211, 169, 38, 57, 214, 127, 117, 59, 39, 209, 88, 1, 134,

92, 163, 0, 66, 237, 22, 164, 200, 85, 9, 190, 129, 111, 172, 231, 14,

181, 206, 128, 23, 187, 73, 149, 193, 241, 236, 197, 159, 55, 125, 196, 60,

161, 238, 245, 94, 87, 157, 122, 158, 115, 207, 17, 20, 145, 232, 107, 16, 21, 185, 33, 225, 175, 253, 81, 182, 67, 243, 69, 220, 153, 5, 143, 3,

26, 213, 147, 222, 105, 188, 229, 191, 72, 177, 250, 135, 152, 121, 218, 44,

120, 140, 138, 28, 84, 186, 198, 131, 54, 2, 56, 78, 173, 151, 83, 27,

255, 144, 249, 189, 104, 4, 168, 98, 162, 150, 254, 242, 109, 34, 133, 224,

228, 79, 103, 201, 160, 90, 18, 61, 10, 233, 91, 80, 124, 96, 244, 36

}

Int Simulate_hashstr (Char * S, INT LEN)

{

INT H = len% 256;

For (int i = 0; i

{

H = (h s [i])% 256;

H = simulated_t [h];

}

Return H;

}

Obviously, this hash function is generated between 0 and 255.

There are still a lot of hash functions that handle strings. See Resources 3.

Use hash function addressing

In MUDOS Source (V22.2B14), five macros or functions are defined to handle addressing operations of each haveh table in applications, these macro packages the WhashStr function for easy use:

1.

/ * lex.h * /

#define defhash 64

#define defhash (s) (Whashstr ((s), 10) & (Defhash - 1))

2.

/ * lex.c * /

#define Ident_hash_size 1024

#define Identhash (s) (Whashstr ((s), 20) & (Ident_hash_size - 1))

3.

/ * Object.c * /

/ * CFG_LIVING_HASH_SIZE is defined in Options.h

#define cfg_living_hash_size 256 * /

Inline_static int hash_living_name p1 (char *, str)

{

Return Whashstr (STR, 20) & (CFG_LIVING_HASH_SIZE - 1);

}

4.

/ * OTABLE.C * /

#define objhash (s) Whashstr (S, 40) & OTABLE_SIZE_MINUS_ONE

5.

/ * stralloc.c * /

#define strhao (s) (Whashstr ((s), 20) & (HTABLE_SIZE_MINUS_ONE))

For the 1, 2, 3, the parameters, and function functions have been obvious: extract low byte as the position of the hash table where the evaluation is evaluated, the corresponding hash table is defhash, Ident_hash_Size, CFG_LIVING_HASH_SIZE, Due to the "&" operation (standing and) to extract the low byte, it is necessary to set the size of these hash tables to 2 index.

For the fourth macro, the parameter otable_size_minus_one is assigned in OTABLE.C.

/ * OTABLE.C * /

/ *

* Hash Table - List of Pointers to Heads of Object Chains.

* Each Object In Chain Has A Pointer, Next_hash, To The next Object. * /

Static Object_t ** Obj_table = 0;

void init_otable ()

{

INT X, Y;

/ * ENSURE THATABLE_SIZE IS A POWER OF 2 * /

y = OTABLE_SIZE;

For (OTABLE_SIZE = 1; OTABLE_SIZE

;

OTABLE_SIZE_MINUS_ONE = OTABLE_SIZE - 1;

Obj_table = Callocate (Otable_size, Object_t *,

Tag_obj_tbl, "init_otable");

For (x = 0; x

OBJ_TABLE [X] = 0;

}

The function of the macro otable_size is the value of the Object Table size in the storage profile. When MUDOS is started, the function of the read configuration file is called in the main function, and the otable_size assigns this value to 2 index in the init_otalbe function. And assign this value to otable_size_minus_one.

For the fifth macro, the assignment process of htable_size_minus_one is similar to OTABLE_SIZE_MINUS_ONE. But find the following questions when viewing a MUDLIB profile:

/ * Taken from config.xxx * /

# Define the size of the shared string hash table. This number surr

# a prime, probably between 1000 and 30000; if you set it to about 1/5

# of the number of distinct strings you have, you will get a hit ratio

# (Number of Comparisons to Find A String) Very Close To 1, As Found Strings

# area automatically moilt to the head of a hash chain. You will NEVER

#need more, and you will still Get results with a smaller table.

Hash Table Size: 7001

In the comment (the statement starts in a #), the Hash Table Size must be a current number (prime), but the source code needs to guarantee the index of 2. I think the cause of the contradiction before and after the fact that the MUDOS version is related to: Burglary is used by HashStr functions, this function is used in subsequent addressing operations, and uses the number as Hash Table The size is better in resolving the efficiency of the collision (speculative, unspeakable), so these Mudlib's config files have been using these traditions; currently the new Mudos version uses the WhashStr function, which is subsequent addressing operations - As previously introduced, it is "&" bit and operation, which must require the Hash Table size must be 2 index.

Decrease

With the previous analysis, the hash function and macro will be used as an object table as an example, and the source code implemented by a Hash Table, Object Table Refer to Otable.c in the latest MUDOS source (V22.2B14). The following codes are different from the source code. The reason for this is to do not involve specific applications, but only simple implementation of a Hash Table. / * hashtable.h * /

#if! defined (_hash_table_h_)

#define _hash_table_h_

#define string_size 64

Typedef struct object_s {

CHAR Name [string_size];

Struct Object_s * next_hash;

CHAR VALUE [STRING_SIZE];

} Object_t;

Void init_otable ();

Void clear_otable ();

Void Enter_Object_hash (Object_t *);

Void transove_object_hash (object_t *);

Object_t * lookup_Object_hash (char *);

Void dump ();

#ENDIF / / _HASH_TABLE_H_

/ * hashtable.c * /

#include "stdafx.h"

#include "stdlib.h"

#include "stdio.h"

#include "string.h"

#include "hashtable.h"

#define bucket_size 256

Static int OTABLE_SIZE;

Static int OTABLE_SIZE_MINUS_ONE;

INT WHASHSTR (Char *, int);

#define objhash (s) Whashstr ((char *) s, 40) & OTABLE_SIZE_MINUS_ONE

/ *

* Hash Table - List of Pointers to Heads of Object Chains.

* Each Object In Chain Has A Pointer, Next_hash, To The Next Object. / THE NEXT OBJECT.

* /

Static Object_t ** Obj_table = 0;

void init_otable ()

{

Int Y;

/ * ENSURE THATABLE_SIZE IS A POWER OF 2 * /

Y = bucket_size;

For (OTABLE_SIZE = 1; OTABLE_SIZE

;

OTABLE_SIZE_MINUS_ONE = OTABLE_SIZE - 1;

/ * by using calloc, Each Element in obj_table is initialized to 0 * /

OBJ_TABLE = (Object_t **) Calloc (Otable_size, SizeOf (Object_t *));

}

Void clear_otable ()

{

INT I;

Free (Obj_Table);

For (i = 0; i

OBJ_TABLE [I] = 0;

}

}

/ * A global. * /

Static int h;

/ *

* Looks for obj in Table, Moves It to head.

* /

Object_t * find_obj_n (char * s) {

Object_t * curr, * prev;

H = objhash (s);

Curr = Obj_Table [h];

prev = 0;

While (curr) {

IF (! strcmp (curr-> name, s)) {/ * found it * /

IF (prev) {/ * NOT at head of list * /

Prev-> next_hash = curr-> next_hash;

Curr-> Next_hash = Obj_table [h];

Obj_table [h] = curr;

}

Return (Curr); / * Pointer to Object * /

}

Prev = CURR;

Curr = curr-> next_hash;

}

Return (0); / * NOT FOUND * /

}

/ *

* Add an Object to the table - Can't Have Duplicate Names.

* /

Void Enter_Object_hash (Object_t * OB)

{

Object_t * s;

s = find_obj_n (ob-> name); / * this sets h * /

OB-> Next_hash = Obj_Table [h];

OBJ_TABLE [H] = OB;

Return;

}

/ *

* Remove an Object from the table - generally caled when it

* is removed from the next_all list - i.e. in Destruct.

* /

Void remove_object_hash (Object_t * OB)

{

Object_t * s;

s = find_obj_n (ob-> name); / * this sets h, and cycles the ob to the front * /

Obj_table [h] = ob-> next_hash;

OB-> Next_hash = 0;

Return;

}

/ *

* Lookup An Object In The Hash Table; if it isn't there, return null.

* This is only diffreent to find_object_n in That IT Collects Different

* stats; more finds area actually Done Than the user esr.

* /

Object_t * lookup_object_hash (char * s)

{

Return Find_Obj_n (s);

}

/ *

* Display the sum of elements in obj_table.

* /

Void dump ()

{

INT I, Count;

Object_t * curr;

For (i = 0; i

count = 0;

Curr = Obj_Table [i];

While (curr) {

COUNT;

Curr = curr-> next_hash;

}

Printf ("% d / n", count);

}

}

/ * Hash function. * /

Static int t [] =

{

1, 87, 49, 12, 176, 178, 102, 166, 121, 193, 6, 84, 249, 230, 44, 163, 14, 197, 213, 181, 161, 85, 218, 80, 64, 239, 24, 226, 236, 142, 38, 200,

110, 177, 104, 103, 141, 253, 255, 50, 77, 101, 81, 18, 45, 96, 31, 222,

25, 107, 190, 70, 86, 237, 240, 34, 72, 242, 20, 214, 244, 227, 149, 235,

97, 234, 57, 22, 60, 250, 82, 175, 208, 5, 127, 199, 111, 62, 135, 248,

174, 169, 211, 58, 66, 154, 106, 195, 245, 171, 17, 187, 182, 179, 0, 243,

132, 56, 148, 75, 128, 133, 158, 100, 130, 126, 91, 13, 153, 246, 216, 219,

119, 68, 223, 78, 83, 88, 201, 99, 122, 11, 92, 32, 136, 114, 52, 10,

138, 30, 48, 183, 156, 35, 61, 26, 143, 74, 251, 94, 129, 162, 63, 152,

170, 7, 115, 167, 241, 206, 3, 150, 55, 59, 151, 220, 90, 53, 23, 131,

125, 173, 15, 238, 79, 95, 89, 16, 105, 137, 225, 224, 217, 160, 37, 123,

118, 73, 2, 157, 46, 116, 9, 145, 134, 228, 207, 212, 202, 215, 69, 229,

27, 188, 67, 124, 168, 252, 42, 4, 29, 108, 21, 247, 19, 205, 39, 203,

233, 40, 186, 147, 198, 192, 155, 33, 164, 191, 98, 204, 165, 180, 117, 76,

140, 36, 210, 172, 41, 54, 159, 8, 185, 232, 113, 196, 231, 47, 146, 120,

51, 65, 28, 144, 254, 221, 93, 189, 194, 139, 112, 43, 71, 109, 184, 209,

}

INT WHASHSTR (Char * S, INT MAXN)

{

Register unsigned char oh, h;

Register unsigned char * p;

Register INT i;

IF (! s) returnograph;

IF (! * s)

Return 0;

p = (unsigned char *) s;

OH = T [* P];

H = (* (p ) 1) & 0xFF;

For (i = maxn - 1; * p && --i> = 0;) {

OH = T [OH ^ * P];

H = T [h ^ * (p )];

}

Return (OH << 8) h;

}

This code enables basic operations of Hashtable: Insert, delete, and find. In order to test this hashtable, I wrote a very simple test code, the test data was taken from http://www.graphcomp.com/info/mud/mudos/history_mudos.html, in order to simplify the code, I use Python to this The article has been processed. First, save this article is: History_Mudos.txt, I put it in the root directory of the E-plate, Python Processing the source code as follows: # preread.py

IMPORT RE

DEF foo ():

REX = Re.Compile ('[A-ZA-Z0-9] ')

f = Open ('E: /out.txt', 'W')

For Line In Open ('E: /Histroy_Mudos.txt', 'R'):

For Word in Rex.FindAll (Line):

F.Write (Word)

F.write ('/ n')

Run these code under the Python console, get the processed file out.txt, Out.txt will contain all words in History_Mudos.txt, each word in line. With pre-processed out.txt, it is much more simple to write test code. In order to make more simplified operations, the following code will use C to write:

/ * main.cpp * /

#include

#include

#include

#include

#include

#include

#include "hashtable.h"

Using namespace std;

Static Vector ob_table;

Struct constructor {

Operator () (Const string & str)

{

Object_t * pob = new object_t;

STRCPY (POB-> Name, str.c_str ());

STRCPY (POB-> Value, Strupr (const_cast )))))))));

POB-> Next_hash = 0;

OB_TABLE.PUSH_BACK (POB);

}

}

Struct destructor {

Operator () (Object_t * pob)

{

DELETE POB;

POB = 0;

}

}

Struct ob_inserter {

Operator () (Object_t * pob)

{

ENTER_Object_hash (POB);

}

}

Void create ()

{

IFStream Infile;

String Str;

Infile.open ("e: //out.txt");

Set Word_Set;

While (GetLine (Infile, Str)) {

Word_set.insert (STR);

}

For_each (Word_Set.Begin (), Word_Set.end (), Constructor ());

Infile.Close ();

Init_OTABLE ();

For_each (ob_table.begin (), ob_table.end (), ob_inserter ());

void clear ()

{

For_each (ob_table.begin (), ob_table.end (), deStructor ());

Clear_OTABLE ();

}

Void Lookup (char * s)

{

Object_t * pob = lookup_Object_hash (s);

IF (POB)

Cout << POB-> Value << Endl;

}

Int main (void)

{

CREATE ();

LOOKUP ("THE");

Lookup ("History");

LOOKUP ("of");

Lookup ("Mudos");

DUMP ();

Clear ();

}

Compile operation, the results are as follows:

THE

Of

Mudos

The number of DUMP functions is shown in the figure below, with a total of 651 points, and for the hash size of 256, such distributions are still met.

The results prove that the implementation code of this Hash Table is correct.

Reference:

[1] Sartaj Sahni. Data Structures, Algorithms, And Applications IN C

[2] Hou Jie. "STL Source Code Analysis"

[3] http://burtleburtle.net/bob/hash/index.html

[4] http://www.mudos.org/

转载请注明原文地址:https://www.9cbs.com/read-13076.html

New Post(0)