parrotcode: Generational mark and sweep garbage collection | |
Contents | C |
src/gc/gc_gms.c - Generational mark and sweep garbage collection
The following comments describe a generational garbage collection scheme for Parrot.
Keywords:
- non-copying, mark & sweep
- generational
- implicit reclamation, treadmill
A plain mark & sweep collector performs work depending on the amount of all allocated objects. The advantage of a generational GC is achieved by not processing all objects. This is based on the weak generational hypothesis, which states that young objects are likely to die early. Old objects, which have survived a few GC cycles tend to be long-lived.
The terms young and old objects imply that there is some ordering in object creation time and the ordering is also followed by object references.
Specifically object references have to follow the marking direction. In pure functional programming languages this can be a very simple scheme:
+------------+ object references
v |
old .... young .... youngest
|
<-------- scan direction
If (simplified) the only reference-like operation of the interpreter is:
cons = (car, cdr)
and the object references "car" and "cdr" are created prior to the "aggregate" "cons", all object references point always to older objects. By scanning from the youngest to the oldest objects, all non-marked objects can be reclaimed immediately. And the scan can be aborted at any time after some processing, creating a generational GC in a trivial way.
But the programming languages we are serving are working basically the other direction, when it comes to object history:
@a[$i] = $n
A reference operation like this needs first an aggregate and then the contents of it. So the scan direction is from old objects to younger ones. In such a scheme it's a bit more complicated to skip parts of the objects.
To take advantage of not processing all the objects, these are divided into generations, e.g.:
old young := nursery
generation 0 generation 1
A mark phase now processes the root set and only objects from the young generation. When all objects are either referenced by the root set or only by the young generation, the algorithm is correct and complete.
But there is of course the possibilty that a young object is stored into an aggregate of an older generation. This case is tracked by the write barrier, which remembers all such operations in the IGP (inter generational pointer) list. When now generation 1 is marked, the IGP list can be considered as an extension to the root set, so that again all live objects of the young generation are detected.
typedef struct _gc_gms_gen Gc_gms_gen
typedef struct _gc_gms_hdr Gc_gms_hdr
typedef struct _gc_gms_hdr_list Gc_gms_hdr_list
* XXX
Main problem TODO 1):
[ PCont ] ... continuation object in old generation
|
v
[ Stack chunk ] --> [ e.g. P register frame ] ... new generation
By pushing a new stack chunk onto the (old) existing stack frame, we'd need a WRITE_BARRIER that promotes the stack chunk to the old generation of the continuation. This would also need an IGP entry for the stack chunk buffer. But - as buffers aren't really containers in Parrot - this isn't possible.
To get that right, the code needs better support by the running interpreter. - never promote continuations (and stacks) in the current stack frame to an old generation - create scope_enter / scope_exit opcodes
A scope_enter happens on a subroutine call *and' with new_pad / push_pad opcodes. Each lexical scope must have its distinct register frame, else timely destruction can't work. If the frame needs active destruction, the old frame should be converted to the (new-1) generation, the inner frame is the nursery. On scope exit the newest (nursery) generation is collected and the current generation number is reset back to (new-1).
If the scope_enter doesn't indicate timely destruction, generation promoting should be done only, if object statistics indicate the presence of a fair amount of live objects.
TODO 2) in lazy sweep If timely destruction didn't find (all) eager objects, go back to older generations, until all these objects have been seen.
TODO 3) interpreter startup After all internal structures are created, promote interpreter state into initial first old generation by running one GC cycle before program execution begins (or just treat all objects as being alive).
static void parrot_gc_gms_deinit(PARROT_INTERP)
static void gc_gms_pool_init(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))
add_free_object
, get_free_object
, alloc_objects
, and more_objects
.PARROT_API void Parrot_gc_gms_init(PARROT_INTERP)
static void gc_gms_add_free_object(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), NOTNULL(PObj *to_add))
static void gc_gms_chain_objects(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), NOTNULL(Small_Object_Arena *new_arena), size_t real_size)
1) object allocation
1a) one bunch of allocated objects was consumed: the free ptr did
hit the marker
+===+---+---+---+---+---+===+
I M I w | w | w | w | w I M I
+ +---+---+---+---+---+ +
^ ^
| |
white free == marker
All these pointer ranges include the first element, but not the last one.
[white ... free_list) is the list of all whites
1b) after allocating another bunch of objects
+===+---+---+---+---+---+---+---+---+---+---+===+
I M I w | w | w | w | w | f | f | f | f | f I M I
+ +---+---+---+---+---+---+---+---+---+---+ +
^ ^ ^
| | |
white free marker
static void gc_gms_alloc_objects(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))
static void gc_gms_more_objects(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))
PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL static PObj *gc_gms_get_free_object(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))
overall header chain layout
gen 0 gen 1 ... gen N
marker [first last) [first last) ... [first last) marker
The last (youngest) generation N holds these (pool) pointers:
[ black ... gray ) during marking
[ gray ... white ) during marking
[ white ... free_list ) allocated items
[ free_list ... marker ) free items
The black, white, and generation ranges have additionally (TODO) *fin variants, which refer to PMCs that need destruction/finalization. These are always in front of the ranges to be processed first.
PARROT_MALLOC PARROT_CANNOT_RETURN_NULL static Gc_gms_gen *gc_gms_create_gen(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), size_t gen_no)
static void gc_gms_init_gen(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))
PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL static Gc_gms_gen *gc_gms_find_gen(PARROT_INTERP, NOTNULL(Gc_gms_hdr *h), UINTVAL gen_no)
static void gc_gms_promote(PARROT_INTERP, NOTNULL(Gc_gms_hdr *h), UINTVAL gen_no)
static void gc_gms_store_hdr_list(PARROT_INTERP, NOTNULL(Gc_gms_hdr_list *l), NOTNULL(Gc_gms_hdr *h))
static void gc_gms_clear_hdr_list(PARROT_INTERP, NOTNULL(Gc_gms_hdr_list *l))
static void gc_gms_store_igp(PARROT_INTERP, NOTNULL(Gc_gms_hdr *h))
static void gc_gms_clear_igp(PARROT_INTERP, NOTNULL(Gc_gms_gen *gen))
void parrot_gc_gms_wb(PARROT_INTERP, NOTNULL(PMC *agg), NOTNULL(void *old), NOTNULL(void *new))
void parrot_gc_gms_wb_key(PARROT_INTERP, NOTNULL(PMC *agg), NOTNULL(void *old), NOTNULL(void *old_key), NOTNULL(void *new), NOTNULL(void *new_key))
static void gc_gms_merge_gen(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(Gc_gms_plan *plan))
static void gc_gms_use_gen(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(Gc_gms_plan *plan))
PARROT_WARN_UNUSED_RESULT static int set_gen_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static void gc_gms_set_gen(PARROT_INTERP)
Header chain layout
Init: gray := black := white
3) marking the root set
3a) the white 'h' is to be set to gray to be scanned for children
+---+---+---+---+---+---+-> +---+->
| b | b | g | g | g | w | h |
+---+---+---+---+---+---+ <-+---+
^ ^ ^
| | |
black gray white
3b) DFS if 'h' needs timely destruction
+---+---+---+---+---+---+---+->
| b | b | h | g | g | g | w
+---+---+---+---+---+---+---+
^ ^ ^
| | |
black gray white
3c) BFS in the normal case
+---+---+---+---+---+---+---+->
| b | b | g | g | g | h | w
+---+---+---+---+---+---+---+
^ ^ ^
| | |
black gray white
3d) the white is a scalar and immediately blackened
+---+---+---+---+---+---+---+->
| b | b | h | g | g | g | w
+---+---+---+---+---+---+---+
^ ^ ^
| | |
black gray white
3e) blacken the gray 'h' during trace_children
+---+---+---+---+---+---+---+->
| b | b | h | g | g | g | w
+---+---+---+---+---+---+---+
^ ^ ^
| | |
black gray white
+---+---+---+---+---+---+---+->
| b | b | h | g | g | g | w
+---+---+---+---+---+---+---+
^ ^ ^
| | |
black gray white
static void gc_gms_setto_gray(PARROT_INTERP, NOTNULL(Gc_gms_hdr *h), int priority)
h
to gray.static void gc_gms_setto_black(PARROT_INTERP, NOTNULL(Gc_gms_hdr *h), int priority)
h
to black.PARROT_API void parrot_gc_gms_pobject_lives(PARROT_INTERP, NOTNULL(PObj *obj))
static int init_mark_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static void gc_gms_init_mark(PARROT_INTERP)
static int trace_igp_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static int gc_gms_trace_root(PARROT_INTERP, int trace_stack)
trace_stack
is true, trace system areas.static int trace_children_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static int gc_gms_trace_children(PARROT_INTERP)
static int sweep_cb_pmc(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static int sweep_cb_buf(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static void gc_gms_sweep(PARROT_INTERP)
static int end_cycle_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))
static void gc_gms_end_cycle(PARROT_INTERP)
static void parrot_gc_gms_run(PARROT_INTERP, int flags)
Parrot_do_dod_run
. flags
is one of: DOD_lazy_FLAG ... timely destruction
DOD_finish_FLAG ... run a final sweep to destruct objects at
interpreter shutdown
static void gms_debug_verify(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), NOTNULL(const char *action))
src/gc/dod.c, include/parrot/dod.h, include/parrot/pobj.h, src/gc/gc_ims.c
Initial version by leo (2005.01.12 - 2005.01.30)
|