NAME ^

src/list.c - List aka array routines

DESCRIPTION ^

List is roughly based on concepts of IntList (thanks to Steve), so I don't repeat them here.

Especially the same invariants hold, except an empty list is really empty, meaning, push does first check for space.

The main differences are:

- List can hold items of different size, it's suitable for ints and PMCs ..., calculations are still done in terms of items. The item_size is specified at list creation time with the "type" argument.

If you later store different item types in the list, as stated initially, you'll get probably not what you want - so don't do this.

- List does auto grow. The caller may implement a different behaviour if she likes.

- Error checking for out of bounds access is minimal, caller knows better, what should be done.

- List structure itself is different from List_chunk, implying:

- list chunks don't have ->start and ->end fields. Instead the list has ->start, which is start of first chunk, and ->cap, the total usable capacity in the list.

- number of items in chunks are not fixed, but there is a mode using same sized chunks

Grow policy ^

enum_grow_fixed

All chunks are of MAX_ITEMS size, chosen, when the first access to the array is indexed and beyond MIN_ITEMS and below 10 * MAX_ITEMS

If the first access is beyond 10 * MAX_ITEMS a sparse chunk will be created.

To avoid this - and the performance penalty - set the array size before setting elements.

    new P0, .PerlArray
    set P0, 100000  # sets fixed sized, no sparse
This is only meaningful, if a lot of the entries are used too.

enum_grow_growing

Chunk sizes grow from MIN_ITEMS to MAX_ITEMS, this will be selected for pushing data on an empty array.

enum_grow_mixed

Mixture of above chunk types and when sparse chunks are present, or after insert and delete.

The chunks hold the information, how many chunks are of the same type, beginning from the current, and how many items are included in this range. See get_chunk below for details.

Sparse lists ^

To save memory, List can handle sparse arrays. This code snippet:

new P0, .IntList set P0[1000000], 42

generates 3 List_chunks, one at the beginning of the array, a big sparse chunk and a chunk for the actual data.

Setting values inside sparse chunks changes them to real chunks. For poping/shifting inside sparse chunks, s. return value below.

Chunk types ^

fixed_items

Have allocated space, size is a power of 2, consecutive chunks are same sized.

grow_items

Same, but consecutive chunks are growing.

no_power_2

Have allocated space but any size.

sparse

Only dummy allocation, chunk->items holds the items of this sparse hole.

Data types ^

A List can hold various datatypes. See src/datatypes.h for the enumeration of types.

Not all are yet implemented in list_set/list_item, see the switch().

Arbitrary length data:

Construct initializer with:

enum_type_sized

item_size (in bytes)

items_per_chunk (rounded up to power of 2, default MAX_ITEMS)

In list_assign the values are copied into the array, list_get returns a pointer as for all other data types.

See src/list_2.t and list_new_init().

Return value ^

List get functions return a (void*) pointer to the location of the stored data. The caller has to extract the value from this pointer.

For non existent data beyond the dimensions of the array a NULL pointer is returned.

For non existing data inside sparse holes, a pointer (void*)-1 is returned.

The caller can decide to assume these data as undef or 0 or whatever is appropriate.

Testing ^

See t/src/{int,}list.c and t/pmc/{int,}list.t.

Also all array usage depends on list.

Functions ^

static List_chunk *allocate_chunk(Interp *interpreter, List *list, UINTVAL items, UINTVAL size)

Make a new chunk, size bytes big, holding items items.

static void list_dump(FILE *fp, List *list, INTVAL type)

Only char and int are supported currently.

static void rebuild_chunk_ptrs(List *list, int cut)

Rebuild chunk_list and update/optimize chunk usage, helper functions.

Delete empty chunks, count chunks and fix prev pointers.

static void rebuild_sparse(List *list)

Coalesce adjacent sparse chunks.

static void rebuild_other(Interp *interpreter, List *list)

Coalesce adjacent irregular chunks.

static void rebuild_fix_ends(Interp *interpreter, List *list)

Called by rebuild_chunk_list().

static UINTVAL rebuild_chunk_list(Interp *interpreter, List *list)

Called to optimise the list when modifying it in some way.

static List_chunk *alloc_next_size(Interp *interpreter, List *list, int where, UINTVAL idx)

Calculate size and items for next chunk and allocate it.

static List_chunk *add_chunk(Interp *interpreter, List *list, int where, UINTVAL idx)

Add chunk at start or end.

UINTVAL ld(UINTVAL x)

Calculates log2(x).

Stolen from src/malloc.c.

static List_chunk *get_chunk(Interp *interpreter, List *list, UINTVAL *idx)

Get the chunk for idx, also update the idx to point into the chunk.

This routine will be called for every operation on list, so its optimized to be fast and needs an up to date chunk statistic, that rebuild_chunk_list does provide.

The scheme of operations is:

    if all_chunks_are_MAX_ITEMS
         chunk = chunk_list[ idx / MAX_ITEMS ]
         idx =   idx % MAX_ITEMS
         done.

    chunk = first
    repeat
         if (index < chunk->items)
             done.

     if (index >= items_in_chunk_block)
         index -= items_in_chunk_block
         chunk += chunks_in_chunk_block
         continue

     calc chunk and index in this block
     done.
One chunk_block consists of chunks of the same type: fixed, growing or other. So the time to look up a chunk doesn't depend on the array length, but on the complexity of the array. rebuild_chunk_list tries to reduce the complexity, but may fail, if you e.g. do a prime sieve by actually list_deleting the none prime numbers.

The complexity of the array is how many different chunk_blocks are there. They come from:

- initially fixed: 1

- initially growing: 2

- first unshift: 1 except for initially fixed arrays

- insert: 1 - 3

- delete: 1 - 2

- sparse hole: 3 (could be 2, code assumes access at either end now)

There could be some optimizer, that, after detecting almost only indexed access after some time, does reorganize the array to be all MAX_ITEMS sized, when this would improve performance.

static void split_chunk(Interp *interpreter, List *list, List_chunk *chunk, UINTVAL ix)

Split a sparse chunk, so that we have

- allocated space at idx

if sparse is big:

- MAX_ITEMS near idx and if there is still sparse space after the real chunk, this also n*MAX_ITEMS sized, so that consecutive writing would make MAX_ITEMS sized real chunks.

static void list_set(Interp *interpreter, List *list, void *item, INTVAL type, INTVAL idx)

Set item of type type in chunk at idx.

static void *list_item(Interp *interpreter, List *list, int type, INTVAL idx)

Get the pointer to the item of type type in the chunk at idx.

static void list_append(Interp *interpreter, List *list, void *item, int type, UINTVAL idx)

Add one or more chunks to end of list.

Public Interface Functions ^

List *list_new(Interp *interpreter, INTVAL type)

Returns a new list of type type.

void list_pmc_new(Interp *interpreter, PMC *container)

Create a new list containing PMC* values in PMC_data(container).

List *list_new_init(Interp *interpreter, INTVAL type, PMC *init)

list_new_init() uses these initializers:

    0 ... size (set initial size of list)
    1 ... array dimensions (multiarray)
    2 ... type (overriding type parameter)
    3 ... item_size for enum_type_sized
    4 ... items_per_chunk
After getting these values out of the key/value pairs, a new array with these values is stored in user_data, where the keys are explicit.

void list_pmc_new_init(Interp *interpreter, PMC *container, PMC *init)

Create a new list containing PMC* values in PMC_data(container).

List *list_clone(Interp *interpreter, List *other)

Return a clone of the list.

TODO - Barely tested. Optimize new array structure, fixed if big.

void list_mark(Interp *interpreter, List *list)

Mark the list and its contents as live.

void list_visit(Interp *interpreter, List *list, void *pinfo)

This is used by freeze/thaw to visit the contents of the list.

pinfo is the visit info, (see include/parrot/pmc_freeze.h>).

INTVAL list_length(Interp *interpreter, List *list)

Returns the length of the list.

void list_set_length(Interp *interpreter, List *list, INTVAL len)

Sets the length of the list to len.

void list_insert(Interp *interpreter, List *list, INTVAL idx, INTVAL n_items)

Make room for n_items at idx.

void list_delete(Interp *interpreter, List *list, INTVAL idx, INTVAL n_items)

Delete n_items at idx.

void list_push(Interp *interpreter, List *list, void *item, int type)

Pushes item of type type on to the end of the list.

void list_unshift(Interp *interpreter, List *list, void *item, int type)

Pushes item of type type on to the start of the list.

void *list_pop(Interp *interpreter, List *list, int type)

Removes and returns the last item of type type from the end of the list.

void *list_shift(Interp *interpreter, List *list, int type)

Removes and returns the first item of type type from the start of the list.

void list_assign(Interp *interpreter, List *list, INTVAL idx, void *item, int type)

Assigns item of type type to index idx.

void *list_get(Interp *interpreter, List *list, INTVAL idx, int type)

Returns the item of type type at index idx.

void list_splice(Interp *interpreter, List *list, PMC *value, INTVAL offset, INTVAL count)

Replaces count items starting at offset with the items in value.

If count is 0 then the items in value will be inserted after offset.

HISTORY ^


parrot