parrotcode: Hash table | |
Contents | C |
src/hash.c - Hash table
A hashtable contains an array of bucket indexes.
Buckets are nodes in a linked list,
each containing a STRING
key and a value.
The value is currently stored as a HASH_ENTRY
,
which maybe makes sense for some hashes but probably doesn't for what they're currently used for,
which is PerlHashes (since those should probably just be hashes of STRING
s mapping to PMCs.)
To minimize memory overhead, buckets are carved out of a pool that is allocated normally by parrot's memory subsystem. That means that the pool can get moved around a lot.
The USE_STRING_EQUAL
define allows 2 different strategies to deal with non-ASCII hash keys.
#define USE_STRING_EQUAL 1
#define USE_STRING_EQUAL 0
Timing of (string, hash -O3, Athlon-800)
% parrot -C examples/benchmarks/hash-utf8.pasm
#define USE_STRING_EQUAL 1
0.35
8.2
#define USE_STRING_EQUAL 0
0.36
0.54
static PARROT_INLINE HashBucket *getBucket(Hash *hash, BucketIndex idx)
idx
.static PARROT_INLINE BucketIndex lookupBucketIndex(Hash *hash, HashIndex slot)
slot
.static PARROT_INLINE HashBucket *lookupBucket(Hash *hash, HashIndex slot)
slot
.static size_t key_hash_STRING(Interp *interpreter, Hash *hash, void *value)
value
.static int STRING_compare(Parrot_Interp interp, void *a, void *b)
a
is the search key, b
is the bucket key.static size_t key_hash_cstring(Interp *interpreter, Hash *hash, void *value)
static int cstring_compare(Parrot_Interp interp, void *a, void *b)
key_hash
and compare
functions.void dump_hash(Interp *interpreter, Hash *hash)
void mark_hash(Interp *interpreter, Hash *hash)
void hash_visit(Interp *interpreter, Hash *hash, void *pinfo)
pinfo
is the visit info, (see include/parrot/pmc_freeze.h>).static void expand_hash(Interp *interpreter, Hash *hash)
MAXFULL_PERCENT
% of N as the number of buckets. This way, as soon as we run out of buckets on the free list, we know that it's time to resize the hashtable. bucket_index = hash(key) % hash_size
hash_size
(the old hash_size
). In fact, because the hashtable is always a power of two in size, it depends only on the next bit in the hash value, after the ones previously used.->next
pointers, and they'll be all over memory.)static BucketIndex new_bucket(Interp *interpreter, Hash *hash, void *key, void *value)
key
and value
set.static HashBucket *find_bucket(Interp *interpreter, Hash *hash, BucketIndex head, void *key)
key
.void new_hash(Interp *interpreter, Hash **hptr)
hptr
.void new_cstring_hash(Interp *interpreter, Hash **hptr)
hptr
.void new_hash_x(Interp *interpreter, Hash **hptr, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)
hptr
.Hash*
is seen, it doesn't get properly marked (only the Hash*
buffer is marked, not its contents). By passing the **hptr
up to PerlHash's init function, the newly constructed PMC is on the stack including this newly constructed Hash, so that it gets marked properly.void new_pmc_hash_x(Interp *interpreter, PMC *container, PARROT_DATA_TYPES val_type, size_t val_size, Hash_key_type hkey_type, hash_comp_fn compare, hash_hash_key_fn keyhash, hash_mark_key_fn mark)
container
PMC gets stored in the Hash end the newly created Hash is in PMC_struct_val(container).INTVAL hash_size(Interp *interpreter, Hash *hash)
STRING *hash_get_idx(Interp *interpreter, Hash *hash, PMC *key)
HashBucket *hash_get_bucket(Interp *interpreter, Hash *hash, void *key)
key
.void *hash_get(Interp *interpreter, Hash *hash, void *key)
key
or NULL
if no bucket is found.INTVAL hash_exists(Interp *interpreter, Hash *hash, void *key)
HashBucket *hash_put(Interp *interpreter, Hash *hash, void *key, void *value)
key
is not copied.void hash_delete(Interp *interpreter, Hash *hash, void *key)
void hash_clone(Interp *interp, Hash *hash, Hash **dest)
hash
to dest
.leo add function pointer for compare, hash, mark
hash keys are now (void *)
add new_cstring_hash()
function
bucket->value
is now a plain pointer, no more an HASH_ENTRY
With little changes, we can again store arbitrary items if needed, see TODO in code.
boemmels renamed HASH
and HASHBUCKET
to Hash
and HashBucket
leo randomize key_hash
seed
extend new_hash_x()
init call by value_type
and _size
.
leo USE_STRING_EQUAL
define, see comment above
Future optimizations:
realloc
.)
|