src/gc_gms.c - Generational mark and sweep garbage collection


The following comments describe a generational garbage collection scheme for Parrot.


 - non-copying, mark & sweep
 - generational
 - implicit reclamation, treadmill


A plain mark & sweep collector performs work depending on the amount of all allocated objects. The advantage of a generational GC is achieved by not processing all objects. This is based on the weak generational hypothesis, which states that young objects are likely to die early. Old objects, which have survived a few GC cycles tend to be long-lived.

The terms young and old objects imply that there is some ordering in object creation time and the ordering is also followed by object references.

Specifically object references have to follow the marking direction. In pure functional programming languages this can be a very simple scheme:

        +------------+    object references
        v            |
   old   .... young .... youngest
                   <--------  scan direction

If (simplified) the only reference-like operation of the interpreter is:

   cons = (car, cdr)

and the object references "car" and "cdr" are created prior to the "aggregate" "cons", all object references point always to older objects. By scanning from the youngest to the oldest objects, all non-marked objects can be reclaimed immediately. And the scan can be aborted at any time after some processing, creating a generational GC in a trivial way.

But the programming languages we are serving are working basically the other direction, when it comes to object history:

  @a[$i] = $n

A reference operation like this needs first an aggregate and then the contents of it. So the scan direction is from old objects to younger ones. In such a scheme it's a bit more complicated to skip parts of the objects.

To take advantage of not processing all the objects, these are divided into generations, e.g.:

   old               young := nursery
   generation 0      generation 1

A mark phase now processes the root set and only objects from the young generation. When all objects are either referenced by the root set or only by the young generation, the algorithm is correct and complete.

But there is of course the possibilty that a young object is stored into an aggregate of an older generation. This case is tracked by the write barrier, which remembers all such operations in the IGP (inter generational pointer) list. When now generation 1 is marked, the IGP list can be considered as an extension to the root set, so that again all live objects of the young generation are detected.

Structures ^

typedef struct _gc_gms_gen Gc_gms_gen

Describes the state of one generation for one pool.

typedef struct _gc_gms_hdr Gc_gms_hdr

This header is in front of all Parrot objects. It forms a doubly-linked list of all objects in one pool and points to its generation.



These two macros convert from and to headers and objects.

typedef struct _gc_gms_hdr_list Gc_gms_hdr_list

A chained list of headers used e.g. for the IGP list.

Initialization functions ^

static void parrot_gc_gms_deinit(Interp*)

Free used resources.

static void gc_gms_pool_init(Interp *, Small_Object_Pool *pool)

Initialize pool variables. This function must set the pool function pointers for add_free_object, get_free_object, alloc_objects, and more_objects.

void Parrot_gc_gms_init(Interp *interp)

Initialize the state structures of the gc system. Called immediately before creation of memory pools.

Interface functions ^

static void gc_gms_add_free_object(Interp *, Small_Object_Pool *pool, void *to_add)

Unused. White (dead) objects are added in a bunch to the free_list.

static void *gc_gms_get_free_object(Interp *, Small_Object_Pool *pool)

Get a new object off the free_list in the given pool.

static void gc_gms_alloc_objects(Interp *, Small_Object_Pool *pool)

Allocate new objects for the given pool.

static void gc_gms_more_objects(Interp *, Small_Object_Pool *pool)

Run a GC cycle or allocate new objects for the given pool.

Generation handling functions ^

static Gc_gms_gen *gc_gms_create_gen(Interp *, Small_Object_Pool *pool, size_t gen_no)

Create a generation structure for the given generation number.

static void gc_gms_init_gen(Interp *, Small_Object_Pool *pool)

Initalize the generation system by creating the first two generations.

void parrot_gc_gms_wb(Interp *, PMC *, PMC *old, PMC *new)

Called by the write barrier. The aggregate belongs to an older generation then the new value written into it. Put the header of the new value onto the IGP list for the current generation, if it contains pointers to other items, and promote it to the old generation.

Marking functions ^

static void gc_gms_setto_gray(Interp *, Gc_gms_hdr *h, int priority)

Set the white header h to gray.

static void gc_gms_setto_black(Interp *, Gc_gms_hdr *h, int priority)

Set the white header h to black.

parrot_gc_gms_pobject_lives(Interp*, PObj *)

Set the object live - called by the pobject_lives macro

static void gc_gms_init_mark(Interp *)

Initialize the mark phase of GC.

static int gc_gms_trace_root(Interp *, int trace_stack)

Trace the root set. If trace_stack is true, trace system areas.

static int gc_gms_trace_children(Interp *)

Traverse gray objects: mark and blacken. Returns 0 if the trace was aborted lazily.

static int gc_gms_sweep(Interp *)

Free unused resources, put white objects onto free_list.

Interface function main entry ^

void parrot_gc_gms_run(Interp *, int flags)

Interface to Parrot_do_dod_run. flags is one of:

  DOD_lazy_FLAG   ... timely destruction
  DOD_finish_FLAG ... run a final sweep to destruct objects at
                      interpreter shutdown


src/dod.c, include/parrot/dod.h, include/parrot/pobj.h, src/gc_ims.c


Initial version by leo (2005.01.12 - 2005.01.30)