NAME ^

src/gc/gc_ims.c - Incremental mark and sweep garbage collection

DESCRIPTION ^

The following comments describe a new garbage collection scheme for Parrot.

The scheme of this algorithm is described in the literature with these keywords:

 - non-copying, mark & sweep
 - incremental
 - realtime
 - incremental update with write barrier

Further we might try this optimization

 - treadmill optimization or
 - implict reclamation

Drawbacks of the current mark and sweep collector. ^

 * can take arbitrary time to complete (1s for 1 Meg objects)
 * can't be used in multi-threaded Parrot
 * works fast for plain (non-aggregate) objects but suffers badly
   for nested aggregates or HLL objects
 * the sweep phase takes time proportional to the allocated storage

INCREMENTAL GARBAGE COLLECTION ^

Terms ^

object

An item like a buffer header or a PMC which is managed by Parrot's dynamic memory system.

aggregate

An object that possibly holds references to other objects. For example an array, hash, or reference PMC.

tri-color marking

All objects have one of three colors: white, grey, or black.

At the beginning of a DOD run all objects are white (not yet visited). During marking objects are greyed (visited - found alive), but their contents isn't yet scanned. A fully scanned grey object gets finally colored black. It will not again be rescanned in this DOD run.

Only aggregates can be grey, non-containers are blackened immediately.

Objects on the free-list are sometimes denoted having the color off-white or ecru.

DOD

Dead object detection. Please note that in Parrot src and docs this stands for the stop-the-world garbage collector that recycles objects.

GC

In Parrot tree the copying garbage collector that recycles string and buffer memory. Configure.pl has a switch to use a malloc library instead, which makes string and buffer memory non-moving.

collector

The DOD and reclamation system.

mutator

The normal operation of the program which may or may not change the collectors view of objects.

incremental

Garbage collection and normal program operation is interleaved. This guarantees short and bounded pause times. Garbage collection doesn't significantly interrupt program execution, collector and mutator are running pseudo-parallel.

root set

All structures in the interpreter that might point to objects. E.g. stacks, globals, and of course the registers. All objects the interpreter works with, are directly or indirectly reachable starting from the root set.

the tri-color invariant

At no time a black object may reference a white one directly. Actually this is the strong incarnation of the invariant - all paths from black objects to white objects lead over at least one grey object.

The weak tri-color invariant is: there is at least one such path to a white object, so that it's reachable.

The strong invariant is the basic idea of mark and sweep too. But as the mutator isn't running during DOD the invariant is never violated.

Due to this invariant, after the root set has been marked and when all greyed objects are marked (blackened), the white objects have to be dead.

paint it black

Or, which color do new objects have?

Actually this should be tunable. Or it depends. If objects are born white and die immediately, they get collected in the same GC cycle. OTOH when these objects are stored into an existing (black) array, we have to do more work to keep the tri-color invariant valid.

Anyway, when allocating new objects white, the collector must run more often or must do more work per increment to make the algorithm stop somewhen.

write barrier

To keep the tri-color invariant valid all pointer stores into black objects have to be tracked. If a white object would be stored into a black array, and this object isn't refered to by another object it would get collected. The write barrier greys the white object, so that it get scanned later or alternatively greys the aggregate for a rescan. The latter can be better, if a sequence of such stores would happen.

Data structure overview ^

The incremental mark and sweep collector has an additional structure in the arena_base that keeps track of the collector's state. Pool and arena structures are unchanged. Only the allocation of new arena blocks is done much more fine grained in e.g. 8K blocks.

Implicit reclamation (optional) ^

from-space

The graph of all objects found live during the last collection.

to-space

The work area of the collector. During marking live objects are "moved" from the from-space into the to-space. This is the same as the text_for_GC list used in src/dod.c. The to-space is initially empty. During marking it gets greyed and finally all reachable objects are black.

free-list

New objects are allocated from the free-list. The free-list is adjacent to the to-space. Allocating a new objects thus means, moving the free pointer one word forward and paint the new object black.

All objects get two additional pointers (forward, backward) and are arranged like in this scheme:

    <-- allocation direction         marking -->
            |                          |
  [w] <--> [w] <--> [b] <--> [b] <--> [g] <--> [g] <--> [w] <-> [w]

            ^        ^                 ^                 ^
            |        |                 |                 |
   free-list-ptr     to-space          scan-pointer      from-space

Objects get "moved" during collection by rearranging the doubly-linked object pointers. At the end of a DOD run (when the last grey object is blackened), the from-space and the free-list are merged serving as the new free-list of the next DOD cycle. This operation is just a few pointer manipulations that replaces the sweep phase of a mark and sweep collector.

Phases of operation ^

a) initialization

After interpreter creation the DOD system is initialized by marking parts of the root set (globals, internal structures).

b) program operation

For each bunch of allocated objects (A) the collector does k.A work, for some constant k > 1. As new objects are allocated black the number of whites is reduced steadily. This means that the throttle factor k could be less then one too, but this could highly increase average memory usage.

To keep the memory usage limited k > 1 must hold.

c) near the end of a DOD cycle

The rest of the root set is scanned, i.e. the registers. By deferring scanning of registers all temporaries that might have exist somewhen just stay unscanned - they will be collected in this DOD cycle, if we allocate new objects white or in the next DOD cycle.

d) finishing a DOD cycle

The current sweep of the whole arena is done, or with implicit reclamation:

Garbage gets appended to the free-list by merging the unscanned from-space with the free-list, these objects are all considered white. All other items are in the to-space and are black. These objects constitute the from-space of the new collection cycle.

Now he meaning of the black bit is reversed effectively setting the new from-space to white.

The next DOD cycle is initialized in one step a) and the new cycle starts.

Alternatively the mutator could run and allocate objects for some time, without starting the collector again, if there are plenty of free objects on all free-lists.

e) collect buffer memory

Finally, we might trigger a collect run on string and buffer memory if there is an impending shortage of resources. While the copying compactor is rather independent of the collector that cleans object headers, it's more efficient to collect buffer memory when the live information is accurate. This avoids copying of dead buffer memory.

Comparison with our current mark and sweep collector ^

  MS  ... mark and sweep (stop-the-world)
  IMS ... incremental mark and sweep
  IMIR .. incremental mark implicit reclamation

                       MS                 IMS               IMIR
  ------------------------------------------------------------------------
  operation            stop-the-world     incremental       incremental
  time per DOD cycle   unbounded          bounded           bounded
  size overhead        1 word             1 word            2 words
  time overhead        O(2*live + dead)   O(2*live + dead)  O(live)   2)

Notes:

  2) it should be possible to mark containers at once by using the
     information of the from-space pointers and tracking changes
     to the aggregate.

Implementation details and unsorted remarks ^

the object graph

The MS and IMS scheme use the next_for_GC pointer for keeping track of references. This interferes with the freeze functionality, which can use the same pointer to keep track of visited objects.

IMIR has a dedicated pointer pair to build the object graph.

Greying objects

Greying objects is done depth-first. This has much better cache locality then visiting an object again much later. In the picture above this means that grey objects are inserted at the left end of the mark chain immediately to the right of the object that gets blackened.

big aggregates

Greying has to be done in increments. Big aggregates can't have a mark vtable that could run arbitrarily long. This means that the DOD system must know the layout of arrays, hashes, and objects. This is currently true for arrays and objects but not for hashes. But the latter need some refactoring of internals anyway.

To avoid visiting all aggregate elements, it could be better to track the graph of old aggregates by using a write barrier for all writes into the array. This would basically create a generational collector. The old generation (the aggregate) isn't scanned. But changes to this "old generation" are tracked and reflected in the collectors graph of objects.

timely destruction

The interpreter arena has a count of currently active objects that need timely destruction. When during scope exit an high priority sweep is triggered, we have basically two cases:

1) all of these objects were already seen by this DOD run - the scope exit can continue.

2) Not all objects were seen - they might be alive or not. This means that the DOD cycle must run to the end to decide, if these objects are alive (or again until all are found alive).

To increase performance its likely that we need some additional information that keeps track of the location of such objects and just try to mark paths to objects that need timely destruction.

concurrent or parallel collection

As the described algorithm is already incremental its well-suited for parallel collection in a multi-threaded Parrot. The work of greying objects can be done in parallel by atomically handling a bunch of objects to another thread. After doing some increments of marking, these objects then get returned to the shared to-space. The parallel collection is finished when the last object is blackened and all threads have reached the thread barrier rendezvous point. (Please note the very different meaning of barrier here).

But also a single-threaded Parrot can vastly take advantage by running increments of the collection during waiting for I/O completion or during a sleep opcode.

FUNCTIONS ^

static void gc_ims_add_free_object(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), NOTNULL(PObj *to_add))

Add object to_add to the free_list in the given pool. pool-num_free_objects> has to be updated by the caller.

PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT static PObj *gc_ims_get_free_object(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))

Get a new object off the free_list in the given pool.

static void gc_ims_alloc_objects(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool))

Allocate new objects for the given pool.

static void gc_ims_pool_init(SHIM_INTERP, NOTNULL(Small_Object_Pool *pool))

TODO: Not yet documented!!!

static void parrot_gc_ims_deinit(PARROT_INTERP)

TODO: Not yet documented!!!

void Parrot_gc_ims_init(PARROT_INTERP)

Initialize the state structures of the gc system. Called immediately before creation of memory pools. This function must set the function pointers for add_free_object_fn, get_free_object_fn, alloc_objects_fn, and more_objects_fn.

static void parrot_gc_ims_reinit(PARROT_INTERP)

Reinitialize the collector for the next collection cycle.

static void parrot_gc_ims_mark(PARROT_INTERP)

Mark a bunch of children.

The work depends on item counts with and without a next_for_GC field. The former are marked immediately, only the latter need real work here.

static int sweep_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))

TODO: Not yet documented!!!

static void parrot_gc_ims_sweep(PARROT_INTERP)

Free unused objects in all header pools.

TODO split work per pool.

static int collect_cb(PARROT_INTERP, NOTNULL(Small_Object_Pool *pool), int flag, NOTNULL(void *arg))

TODO: Not yet documented!!!

static int parrot_gc_ims_collect(PARROT_INTERP, int check_only)

Run the copying collector in memory pools, if it could yield some free memory.

static void parrot_gc_ims_run_increment(PARROT_INTERP)

Run one increment of collection. This function is triggered by object allocation.

static void parrot_gc_ims_run(PARROT_INTERP, int flags)

Interface to Parrot_do_dod_run. flags is one of:

  DOD_lazy_FLAG   ... timely destruction
  DOD_finish_FLAG ... run until live bits are clear
void Parrot_dod_ims_wb(PARROT_INTERP, NOTNULL(PMC *agg), NOTNULL(PMC *_new))

Write barrier called by the DOD_WRITE_BARRIER macro. Always when storing a white object into a black aggregate, either the object must be greyed or the aggregate must be rescanned - by greying it.

SEE ALSO ^

src/gc/dod.c, include/parrot/dod.h, include/parrot/pobj.h,

HISTORY ^

Initial version by leo (2004.08.12 - 2004.08.15)


parrot