parrotcode: Regex stack handling routines | |
Contents | C |
src/intlists.c - Regex stack handling routines
intlist emulation code, calls routines in src/list.c.
Here is the original documentation for intlist:
The basic data structure is a variant of a doubly-linked list of 'chunks',
where a chunk is a Buffer
header subclass containing the link pointers and other metadata for the chunk.
As expected from it being a Buffer
header,
the PObj_bufstart
field points to the actual array of INTVAL
s.
The handle used by external code for one of these IntLists is just a pointer to a chunk,
always called 'list' in this code.
For now, all of the chunks are fixed-length in size. (That could easily be changed, at the cost of another integer in each header.)
Notice that I said 'variant' of a doubly-linked list.
That is because if you start at 'list' and follow prev pointers,
you will loop through all the used nodes of the list,
as usual.
But if you follow next pointers instead,
you might find a spare node hanging off the last node in the list (the last node is always list->prev
,
so if there is a spare node,
it will be at list->prev->next
.
If no spare exists,
then list->prev->next==list
.)
The first node in the list may be partly full; the intermediate nodes are always completely full; and the last node may be partly full.
Each node has a .start
field,
giving the offset of the first valid element (always zero except for possibly the first node),
and a .end
field,
giving one past the offset of the last valid element (always equal to INTLIST_CHUNK_SIZE
except for possibly the last node).
To make it concrete,
let's walk through some sample operations.
To push onto the end of the list,
first find the last chunk: list->prev
.
Then if chunk->end < INTLIST_CHUNK_SIZE
,
there is space to fit another element and so just stick it in.
If not,
we must add a chunk to the end of the list.
If there is a spare,
just link it fully into the list (forming a conventional doubly-linked list).
Otherwise,
create a new chunk and link it fully into the list.
Easy enough.
To pop something off the end,
first go to the end chunk (list->prev
).
Pop off an element and decrement .end
if the chunk is nonempty.
If it is empty,
make that last chunk into the spare (discarding the previous spare).
Then go to the previous chunk,
which is guaranteed to have .end
set to INTLIST_CHUNK_SIZE
,
and return data[.end--]
.
The length of the list is always cached in the overall header chunk. If an operation changes which chunk is the header (i.e., shift or unshift), then the length is copied to the new header.
Invariants:
There is always space in list->prev
to insert an element.
The 'list' chunk is never empty unless the entire list is empty.
In combination, the above invariants imply that the various operations are implemented as:
push
pop
shift
unshift
Direct aka indexed access of intlist data:
The classic method would be to walk the intlist->next
pointers (or optimized,
the ->prev
pointers if an index near the end is requested) and locate the chunk,
that holds the wanted list item.
To speed things up, especially for bigger lists, there are additional fields in the 'list' (the head chunk):
chunk_list
collect_runs
collect_runs
counter,
when chunk_list
was rebuilt last.n_chunks
chunk_list
If on any indexed access interpreter's collect_runs is different, the chunks might have been moved, so the chunk_list has to be rebuilt.
Getting data outside the array dimensions will return the value NULL
,
which will SIGSEGV
,
the intlist did an explicit exception,
so there is not much difference.
Of course,
a check for valid pointers could be added here.
intlist_mark
intlist_clone
intlist_new
intlist_length
intlist_assign
idx
.intlist_push
val
on the end of the list.intlist_unshift
val
on the front of the list.intlist_pop
intlist_shift
intlist_get
idx
.intlist_dump
include/parrot/intlist.h, src/list.c and include/parrot/list.h.
|