parrotcode: Hash table | |
Contents | C |
src/hash.c - Hash table
A hashtable contains an array of bucket indexes.
Buckets are nodes in a linked list,
each containing a STRING
key and a value.
The value is currently stored as a HASH_ENTRY
,
which maybe makes sense for some hashes but probably doesn't for what they're currently used for,
which is PerlHash and Hash PMCs (since those should probably just be hashes of STRING
s mapping to PMCs.)
To minimize memory overhead, buckets are carved out of a pool that is allocated normally by parrot's memory subsystem. That means that the pool can get moved around a lot.
The USE_STRING_EQUAL
define allows 2 different strategies to deal with non-ASCII hash keys.
#define USE_STRING_EQUAL 1
#define USE_STRING_EQUAL 0
Timing of (string, hash -O3, Athlon-800)
% parrot -C examples/benchmarks/hash-utf8.pasm
#define USE_STRING_EQUAL 1
0.35
8.2
#define USE_STRING_EQUAL 0
0.36
0.54
static PARROT_INLINE HashBucket *getBucket(Hash *hash, BucketIndex idx)
idx
.static PARROT_INLINE BucketIndex lookupBucketIndex(Hash *hash, HashIndex slot)
slot
.static PARROT_INLINE HashBucket *lookupBucket(Hash *hash, HashIndex slot)
slot
.static size_t key_hash_STRING(Interp *interpreter, Hash *hash, void *value)
value
.static int STRING_compare(Parrot_Interp interp, void *a, void *b)
a
is the search key, b
is the bucket key.static size_t key_hash_cstring(Interp *interpreter, Hash *hash, void *value)
static int cstring_compare(Parrot_Interp interp, void *a, void *b)
key_hash
and compare
functions.void dump_hash(Interp *interpreter, Hash *hash)
void mark_hash(Interp *interpreter, Hash *hash)
void hash_visit(Interp *interpreter, Hash *hash, void *pinfo)
pinfo
is the visit info, (see include/parrot/pmc_freeze.h>).static void expand_hash(Interp *interpreter, Hash *hash)
MAXFULL_PERCENT
% of N as the number of buckets. This way, as soon as we run out of buckets on the free list, we know that it's time to resize the hashtable. bucket_index = hash(key) % hash_size
hash_size
(the old hash_size
). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.->next
pointers, and they'll be all over memory.)static BucketIndex new_bucket(Interp *interpreter, Hash *hash, void *key, void *value)
key
and value
set.static HashBucket *find_bucket(Interp *interpreter, Hash *hash, BucketIndex head, void *key)
key
.void new_hash(Interp *interpreter, Hash **hptr)
hptr
.void new_cstring_hash(Interp *interpreter, Hash **hptr)
hptr
.void new_hash_x(Interp *interpreter, Hash **hptr, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)
hptr
.Hash*
is seen, it doesn't get properly marked (only the Hash*
buffer is marked, not its contents). By passing the **hptr
up to the PerlHash's or Hash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.void new_pmc_hash_x(Interp *interpreter, PMC *container, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)
container
PMC gets stored in the Hash end the newly created Hash is in PMC_struct_val(container).INTVAL hash_size(Interp *interpreter, Hash *hash)
STRING *hash_get_idx(Interp *interpreter, Hash *hash, PMC *key)
HashBucket *hash_get_bucket(Interp *interpreter, Hash *hash, void *key)
key
.void *hash_get(Interp *interpreter, Hash *hash, void *key)
key
or NULL
if no bucket is found.INTVAL hash_exists(Interp *interpreter, Hash *hash, void *key)
HashBucket *hash_put(Interp *interpreter, Hash *hash, void *key, void *value)
key
is not copied.void hash_delete(Interp *interpreter, Hash *hash, void *key)
void hash_clone(Interp *interp, Hash *hash, Hash **dest)
hash
to dest
.leo add function pointer for compare, hash, mark
hash keys are now (void *)
add new_cstring_hash()
function
bucket->value
is now a plain pointer, no more an HASH_ENTRY
With little changes, we can again store arbitrary items if needed, see TODO in code.
boemmels renamed HASH
and HASHBUCKET
to Hash
and HashBucket
leo randomize key_hash
seed
extend new_hash_x()
init call by value_type
and _size
.
leo USE_STRING_EQUAL
define, see comment above
Future optimizations:
realloc
.)
|