NAME
src/hash.c - Hash table
DESCRIPTION
A hashtable contains an array of bucket indexes.
Buckets are nodes in a linked list,
each containing a void *
key and value.
During hash creation,
the types of key and value as well as appropriate compare and hashing functions can be set.
This hash implementation uses just one piece of malloced memory.
The hash->bs
bucket store points to this region.
This hash doesn't move during GC, therefore a lot of the old caveats don't apply.
Functions
static size_t key_hash_STRING
Returns the hashed value of the key static int STRING_compare
Compares the two strings,
returning 0 if they are identical.
static int pointer_compare
Compares the two pointers,
returning 0 if they are identical
static size_t key_hash_pointer
Returns a hashvalue for a pointer.
static size_t key_hash_cstring
Creates and returns a hash value from a string.Takes an interpreter,
a pointer to a string,
and a seed value.
Returns the hash value.Used by Parrot_new_cstring_hash.
static int cstring_compare
Compares two C strings for equality,
returning -1,
0,
and 1 if the first string is less than,
equal to,
or greater than the second,
respectively.
size_t key_hash_int
Returns a hashed value for an integer key (passed as a void pointer,
sadly).
int int_compare
Compares two integers for equality,
returning -1,
0,
and 1 if the first is less than,
equal to,
or greater than the second,
respectively.
Uses void pointers to store the integers,
sadly.
void parrot_dump_hash
Prints out the hash in human-readable form,
at least once someone implements this.
void parrot_mark_hash
Marks the hash and its contents as live.
Assumes that key and value are non null in all buckets.
static void parrot_mark_hash_keys
Marks the hash and only its keys as live.
static void parrot_mark_hash_values
Marks the hash and only its values as live.
static void parrot_mark_hash_both
Marks the hash and both its keys and values as live.
static void hash_thaw
Visits the contents of a hash during freeze/thaw.static void hash_freeze
Freezes hash into a string.Takes an interpreter,
a pointer to the hash,
and a pointer to the structure containing the string start location.Use by parrot_hash_visit.
void parrot_hash_visit
Freezes or thaws a hash as specified.
Takes an interpreter,
a pointer to the hash,
and a pointer to the structure identifying what to do and the location of the string.
static void expand_hash
Expands a hash when necessary.For a hashtable of size N,
we use void parrot_new_hash
Creates a new Parrot STRING hash in void parrot_new_pmc_hash
Creates a new Parrot STRING hash in PMC_struct_val(container).
void parrot_new_cstring_hash
Creates a new C string hash in static Hash *create_hash
Creates and initializes a hash. Function pointers determine its behaviors. The container passed in is the address of the hash PMC that is using it. The hash and the PMC point to each other.Memory from this function must be freed.
void parrot_hash_destroy
Frees the memory allocated to the specified hash and its bucket store. Used by Parrot_chash_destroy.
void parrot_chash_destroy
Deletes the specified hash by freeing the memory allocated to all the key-value pairs, and finally the hash itself.
void parrot_chash_destroy_values
Deletes the specified hash by freeing the memory allocated to all the key-value pairs, calling the provided callback to free the values, and finally the hash itself.The callback returns void parrot_new_hash_x
Creates and stores a new hash in void parrot_new_pmc_hash_x
Creates a new PMC hash.Like parrot_new_hash_x but w/o the described problems. The passed in void parrot_new_pointer_hash
Creates a new hash with void * keys and values, storing it via the passed-in Hash **.
PMC *Parrot_new_INTVAL_hash
Creates and returns new Hash PMC with INTVAL keys and values. INTVAL parrot_hash_size
Returns the number of used entries in the hash.
void *parrot_hash_get_idx
Finds the next index into the hash's internal storage for the given Key. Used by iterators. Ugly.
HashBucket *parrot_hash_get_bucket
Returns the bucket for void *parrot_hash_get
Returns the value keyed by INTVAL parrot_hash_exists
Returns whether the key exists in the hash.
HashBucket *parrot_hash_put
Puts the key and value into the hash. Note that void parrot_hash_delete
Deletes the key from the hash.
void parrot_hash_clone
Clones
value
.
See also string.c.
pinfo
is the visit info,
(see include/parrot/pmc_freeze.h>).
MAXFULL_PERCENT
% of N as the number of buckets.
This way,
as soon as we run out of buckets on the free list,
we know that it's time to resize the hashtable.Algorithm for expansion: We exactly double the size of the hashtable.
Keys are assigned to buckets with the formula
bucket_index = hash(key) % parrot_hash_sizeWhen doubling the size of the hashtable, we know that every key is either already in the correct bucket, or belongs in the current bucket plus
parrot_hash_size
(the old parrot_hash_size
). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.We scan through all the buckets in order, moving the buckets that need to be moved. No bucket will be scanned twice, and the cache should be reasonably happy because the hashtable accesses will be two parallel sequential scans. (Of course, this also mucks with the ->next
pointers, and they'll be all over memory.)
hptr
.
hptr
.
void
and takes a void *
.
hptr
.FIXME: This function can go back to just returning the hash struct pointer once Buffers can define their own custom mark routines.The problem is: During GC stack walking the item on the stack must be a PMC. When an auto Hash*
is seen, it doesn't get properly marked (only the Hash*
buffer is marked, not its contents). By passing the **hptr
up to the Hash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.
container
PMC gets stored in the Hash and the newly created Hash is in PMC_struct_val(container).
flags
can be PObj_constant_FLAG
or 0.
key
, if found, and NULL otherwise.
key
, or NULL
if no bucket is found.
key
is not copied.
hash
to dest
.SEE ALSO
docs/pdds/pdd08_keys.pod.
TODO
Future optimizations:
- Stop reallocating the bucket pool, and instead add chunks on. (Saves pointer fixups and copying during
realloc
.) - Hash contraction (don't if it's worth it)