NAME

src/hash.c - Hash table

DESCRIPTION

A hashtable contains an array of bucket indexes. Buckets are nodes in a linked list, each containing a void * key and value. During hash creation the types of key and value as well as appropriate compare and hashing functions can be set.

This hash implementation uses just one piece of malloced memory. The hash-bu> union points into this regions. At positive indices are bucket pointers, at negative indices is the bucket store itself.

This hash doesn't move during GC, therefore a lot of the old caveats don't apply.

Functions

static size_t key_hash_STRING(Interp *interpreter, void *value, size_t seed): Return the hashed value of the key value.
static int STRING_compare(Parrot_Interp interp, void *a, void *b): Compares the two strings, return 0 if they are identical.; a is the search key, b is the bucket key.
static size_t key_hash_cstring(Interp *interpreter, void *value, size_t seed)
static int cstring_compare(Parrot_Interp interp, void *a, void *b): C string versions of the key_hash and compare functions.
static size_t key_hash_int(Interp *interp, void *value, size_t seed)
static int int_compare(Parrot_Interp interp, void *a, void *b): Custom key_hash and compare functions.
void dump_hash(Interp *interpreter, Hash *hash): Print out the hash in human-readable form.
void mark_hash(Interp *interpreter, Hash *hash): Marks the hash and its contents as live.
void hash_visit(Interp *interpreter, Hash *hash, void *pinfo): This is used by freeze/thaw to visit the contents of the hash.; pinfo is the visit info, (see include/parrot/pmc_freeze.h>).
static void expand_hash(Interp *interpreter, Hash *hash): For a hashtable of size N, we use MAXFULL_PERCENT % of N as the number of buckets. This way, as soon as we run out of buckets on the free list, we know that it's time to resize the hashtable.; Algorithm for expansion: We exactly double the size of the hashtable. Keys are assigned to buckets with the formula; so when doubling the size of the hashtable, we know that every key is either already in the correct bucket, or belongs in the current bucket plus hash_size (the old hash_size). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.; So we scan through all the buckets in order, moving the buckets that need to be moved. No bucket will be scanned twice, and the cache should be reasonably happy because the hashtable accesses will be two parallel sequential scans. (Of course, this also mucks with the ->next pointers, and they'll be all over memory.)
void new_hash(Interp *interpreter, Hash **hptr): Returns a new Parrot STRING hash in hptr.; new_pmc_hash(Interp *interpreter, PMC *container)>; Create a new Parrot STRING hash in PMC_struct_val(container)
void new_cstring_hash(Interp *interpreter, Hash **hptr): Returns a new C string hash in hptr.
void new_hash_x(Interp *interpreter, Hash **hptr, PARROT_DATA_TYPES val_type, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash): Returns a new hash in hptr.; FIXME: This function can go back to just returning the hash struct pointer once Buffers can define their own custom mark routines.; The problem is: During DODs stack walking the item on the stack must be a PMC. When an auto Hash* is seen, it doesn't get properly marked (only the Hash* buffer is marked, not its contents). By passing the **hptr up to the PerlHash's or Hash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.
void new_pmc_hash_x(Interp *interpreter, PMC *container, PARROT_DATA_TYPES val_type, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash): Like above but w/o the described problems. The passed in container PMC gets stored in the Hash end the newly created Hash is in PMC_struct_val(container).
PMC *Parrot_new_INTVAL_hash(Interp *interpreter, UINTVAL flags): Create a new Hash PMC with INTVAL keys and values. flags can be PObj_constant_FLAG or 0.
INTVAL hash_size(Interp *interpreter, Hash *hash): Return the number of used entries in the hash.
void *hash_get_idx(Interp *interpreter, Hash *hash, PMC *key): Called by iterator.
HashBucket *hash_get_bucket(Interp *interpreter, Hash *hash, void *key): Returns the bucket for key.
void *hash_get(Interp *interpreter, Hash *hash, void *key): Returns the bucket for key or NULL if no bucket is found.
INTVAL hash_exists(Interp *interpreter, Hash *hash, void *key): Returns whether the key exists in the hash.
HashBucket *hash_put(Interp *interpreter, Hash *hash, void *key, void *value): Puts the key and value into the hash. Note that key is not copied.
void hash_delete(Interp *interpreter, Hash *hash, void *key): Deletes the key from the hash.
void hash_clone(Interp *interp, Hash *hash, Hash **dest): Clones hash to dest.

HISTORY

Initial version by Jeff G. on 2001.12.05

Substantially rewritten by Steve F.

2003.10.25

leo add function pointer for compare, hash, mark

hash keys are now (void *)

add new_cstring_hash() function

2003.11.04

bucket->value is now a plain pointer, no more an HASH_ENTRY

With little changes, we can again store arbitrary items if needed, see TODO in code.

2003.11.06

boemmels renamed HASH and HASHBUCKET to Hash and HashBucket

2003.11.11

leo randomize key_hash seed

extend new_hash_x() init call by value_type and _size.

2003.11.14

leo USE_STRING_EQUAL define, see comment above

2005.05.23

leo heavy rewrite: use just one piece of malloced memory

TODO

Future optimizations:

Stop reallocating the bucket pool, and instead add chunks on. (Saves pointer fixups and copying during realloc.)

Hash contraction (dunno if it's worth it)

parrotcode: Hash table
Contents \| C

NAME

DESCRIPTION

Functions

SEE ALSO

HISTORY

TODO