| parrotcode: Hash table | |
| Contents | C |

src/hash.c - Hash table

A hashtable contains an array of bucket indexes.
Buckets are nodes in a linked list,
each containing a STRING key and a value.
The value is currently stored as a HASH_ENTRY,
which maybe makes sense for some hashes but probably doesn't for what they're currently used for,
which is PerlHash and Hash PMCs (since those should probably just be hashes of STRINGs mapping to PMCs.)
To minimize memory overhead, buckets are carved out of a pool that is allocated normally by parrot's memory subsystem. That means that the pool can get moved around a lot.

The USE_STRING_EQUAL define allows 2 different strategies to deal with non-ASCII hash keys.
#define USE_STRING_EQUAL 1#define USE_STRING_EQUAL 0Timing of (string, hash -O3, Athlon-800)
% parrot -C examples/benchmarks/hash-utf8.pasm
#define USE_STRING_EQUAL 1
0.35
8.2
#define USE_STRING_EQUAL 0
0.36
0.54

static PARROT_INLINE HashBucket *getBucket(Hash *hash, BucketIndex idx)idx.static PARROT_INLINE BucketIndex lookupBucketIndex(Hash *hash, HashIndex slot)slot.static PARROT_INLINE HashBucket *lookupBucket(Hash *hash, HashIndex slot)slot.static size_t key_hash_STRING(Interp *interpreter, Hash *hash, void *value)value.static int STRING_compare(Parrot_Interp interp, void *a, void *b)a is the search key, b is the bucket key.static size_t key_hash_cstring(Interp *interpreter, Hash *hash, void *value)static int cstring_compare(Parrot_Interp interp, void *a, void *b)key_hash and compare functions.void dump_hash(Interp *interpreter, Hash *hash)void mark_hash(Interp *interpreter, Hash *hash)void hash_visit(Interp *interpreter, Hash *hash, void *pinfo)pinfo is the visit info, (see include/parrot/pmc_freeze.h>).static void expand_hash(Interp *interpreter, Hash *hash)MAXFULL_PERCENT % of N as the number of buckets. This way, as soon as we run out of buckets on the free list, we know that it's time to resize the hashtable. bucket_index = hash(key) % hash_size
hash_size (the old hash_size). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.->next pointers, and they'll be all over memory.)static BucketIndex new_bucket(Interp *interpreter, Hash *hash, void *key, void *value)key and value set.static HashBucket *find_bucket(Interp *interpreter, Hash *hash, BucketIndex head, void *key)key.void new_hash(Interp *interpreter, Hash **hptr)hptr.void new_cstring_hash(Interp *interpreter, Hash **hptr)hptr.void new_hash_x(Interp *interpreter, Hash **hptr, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)hptr.Hash* is seen, it doesn't get properly marked (only the Hash* buffer is marked, not its contents). By passing the **hptr up to the PerlHash's or Hash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.void new_pmc_hash_x(Interp *interpreter, PMC *container, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)container PMC gets stored in the Hash end the newly created Hash is in PMC_struct_val(container).INTVAL hash_size(Interp *interpreter, Hash *hash)STRING *hash_get_idx(Interp *interpreter, Hash *hash, PMC *key)HashBucket *hash_get_bucket(Interp *interpreter, Hash *hash, void *key)key.void *hash_get(Interp *interpreter, Hash *hash, void *key)key or NULL if no bucket is found.INTVAL hash_exists(Interp *interpreter, Hash *hash, void *key)HashBucket *hash_put(Interp *interpreter, Hash *hash, void *key, void *value)key is not copied.void hash_delete(Interp *interpreter, Hash *hash, void *key)void hash_clone(Interp *interp, Hash *hash, Hash **dest)hash to dest.

leo add function pointer for compare, hash, mark
hash keys are now (void *)
add new_cstring_hash() function
bucket->value is now a plain pointer, no more an HASH_ENTRY
With little changes, we can again store arbitrary items if needed, see TODO in code.
boemmels renamed HASH and HASHBUCKET to Hash and HashBucket
leo randomize key_hash seed
extend new_hash_x() init call by value_type and _size.
leo USE_STRING_EQUAL define, see comment above

Future optimizations:
realloc.)
|
|
|