src/hash.c - Hash table


A hashtable contains an array of bucket indexes. Buckets are nodes in a linked list, each containing a void * key and value. During hash creation, the types of key and value as well as appropriate compare and hashing functions can be set.

This hash implementation uses just one piece of malloced memory. The hash->bs bucket store points to this region.

This hash doesn't move during GC, therefore a lot of the old caveats don't apply.

Functions ^

static size_t key_hash_STRING

Return the hashed value of the key value. See also string.c.

static int STRING_compare

Compares the two strings, returning 0 if they are identical.

static int pointer_compare

Compares the two pointers, returning 0 if they are identical

static size_t key_hash_pointer

Returns a hashvalue for a pointer.

static size_t key_hash_cstring

Create a hash value from a string.

Takes an interpreter, a pointer to a string, and a seed value. Returns the hash value.

Used by Parrot_new_cstring_hash.

static int cstring_compare

C string versions of the key_hash and compare functions.

size_t key_hash_int

Custom key_hash function.

int int_compare

Custom compare function.

void parrot_dump_hash

Print out the hash in human-readable form. Except it's empty.

void parrot_mark_hash

Marks the hash and its contents as live. Assumes that key and value are non null in all buckets.

static void hash_thaw

This is used by freeze/thaw to visit the contents of the hash.

pinfo is the visit info, (see include/parrot/pmc_freeze.h>).

static void hash_freeze

Freeze hash into a string.

Takes an interpreter, a pointer to the hash, and a pointer to the structure containing the string start location.

Use by parrot_hash_visit.

void parrot_hash_visit

Freeze or thaw hash as specified. Takes an interpreter, a pointer to the hash, and a pointer to the structure identifying what to do and the location of the string.

static void expand_hash

For a hashtable of size N, we use MAXFULL_PERCENT % of N as the number of buckets. This way, as soon as we run out of buckets on the free list, we know that it's time to resize the hashtable.

Algorithm for expansion: We exactly double the size of the hashtable. Keys are assigned to buckets with the formula

        bucket_index = hash(key) % parrot_hash_size
so when doubling the size of the hashtable, we know that every key is either already in the correct bucket, or belongs in the current bucket plus parrot_hash_size (the old parrot_hash_size). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.

So we scan through all the buckets in order, moving the buckets that need to be moved. No bucket will be scanned twice, and the cache should be reasonably happy because the hashtable accesses will be two parallel sequential scans. (Of course, this also mucks with the ->next pointers, and they'll be all over memory.)

void parrot_new_hash

Returns a new Parrot STRING hash in hptr.

void parrot_new_pmc_hash

Create a new Parrot STRING hash in PMC_struct_val(container)

void parrot_new_cstring_hash

Returns a new C string hash in hptr.

static Hash *create_hash

Creates and initializes a hash. Function pointers determine its behaviors. The container passed in is the address of the hash PMC that is using it. The hash and the PMC point to each other.

Memory from this function must be freed.

void parrot_hash_destroy

Free the memory allocated to the specified hash and its bucket store. Used by Parrot_chash_destroy.

void parrot_chash_destroy

Delete the specified hash by freeing the memory allocated to all the key-value pairs, and finally the hash itself.

void parrot_new_hash_x

Returns a new hash in hptr.

FIXME: This function can go back to just returning the hash struct pointer once Buffers can define their own custom mark routines.

The problem is: During DODs stack walking the item on the stack must be a PMC. When an auto Hash* is seen, it doesn't get properly marked (only the Hash* buffer is marked, not its contents). By passing the **hptr up to the Hash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.

void parrot_new_pmc_hash_x

Like parrot_new_hash_x but w/o the described problems. The passed in container PMC gets stored in the Hash end the newly created Hash is in PMC_struct_val(container).

void parrot_new_pointer_hash

Create a new HASH with void * keys and values.

PMC *Parrot_new_INTVAL_hash

Create a new Hash PMC with INTVAL keys and values. flags can be PObj_constant_FLAG or 0.

INTVAL parrot_hash_size

Return the number of used entries in the hash.

void *parrot_hash_get_idx

Called by iterator.

HashBucket *parrot_hash_get_bucket

Returns the bucket for key.

void *parrot_hash_get

Returns the value keyed by key or NULL if no bucket is found.

INTVAL parrot_hash_exists

Returns whether the key exists in the hash.

HashBucket *parrot_hash_put

Puts the key and value into the hash. Note that key is not copied.

void parrot_hash_delete

Deletes the key from the hash.

void parrot_hash_clone

Clones hash to dest.




Future optimizations: