|parrotcode: Garbage Collection Subsystems|
|Contents | Documentation|
docs/pdds/pdd09_gc.pod - Garbage Collection Subsystems
This PDD specifies Parrot's garbage collection subsystems.
Garbage collection is a process of freeing up memory that is no longer used by the interpreter, by determining which objects will not be referenced again and can be reclaimed.
All reachable objects are marked as alive, first marking a root set, and then recursively marking objects reachable from other reachable objects. Objects not reached are considered dead. After collection, all objects are reset to unmarked, and the process starts again.
Instead of a simple separation of marked (as live) and unmarked (dead), the object set is divided into three parts: white, gray, and black. The white objects are presumed dead. The gray objects have been marked as live by some other object, but haven't yet marked the objects they refer to. The black objects are live, and have marked all objects they directly refer to.
In the initial run, all objects start as white and the root set is marked gray. The marking process changes white objects to gray (marking them from another gray object), and gray objects to black (when all objects they refer to are marked). When the gray set is empty, all live objects have been marked and the white set can be collected. After a collection run, all black objects are reset to white, the root set to gray, and the process begins again.
The advantage of a tri-color mark over a simple mark is that it can be broken into smaller stages.
In this GC scheme, after all reachable objects are marked as live, a sweep through the object arenas collects all unmarked objects.
In this scheme, all objects are marked black (live) when created. White objects are free memory available for allocation. When no white objects remain (out of memory), the black objects are all changed to white, and a marking process runs to mark all reachable objects as live. Any unreachable objects are left white, and available for allocation.
In some implementations, the change from black to white is made by simply changing the interpretation of the mark bit, for example, from 1 == black to 1 == white.
In this scheme, live objects are copied into a new memory region. The entire old memory region can then be reclaimed.
In this scheme, live objects are moved closer together, eliminating fragments of free space between live objects. This compaction makes later allocation of new objects faster, since the allocator doesn't have to scan for fragments of free space.
In this scheme, all objects have a count of how often they are referred to by other objects. If that count reaches zero, the object's memory can be reclaimed. This scheme doesn't cope well with reference loops--loops of dead objects, all referencing one another but not reachable from elsewhere, never get collected.
A common disadvantage of a simple mark implemenation is that the entire system (including all threads that use the same memory pools) must be suspended while the whole memory set is examined during marking and collection. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrarily long pauses during program execution.
Rather than suspending the system for marking and collection, GC is done in small increments intermittent with normal program operation. Some implementations perform the marking as part of ordinary object access.
The pauses caused by GC don't exceed a certain limit.
The object space is divided between a young generation (short-lived temporaries) and one or more old generations. Only young generations are reset to white (presumed dead). Avoiding scanning the old generations repeatedly can considerably speed up GC.
Generational collection does not guarantee that all unreachable objects will be reclaimed, so in large systems it is sometimes combined with a mark-and-sweep or copying collection scheme, one for light collection runs performed frequently, and the other for more complete runs performed rarely.
GC marking and collection runs as a separate thread, sometimes with multiple threads participating in GC. On a multi-processor machine, concurrent GC may be truly parallel.
Parrot supports pluggable garbage collection cores, so ultimately any garbage collection model devised can run on it. However, different GC models are more or less appropriate for different application areas. The current default stop-the-world mark-and-sweep model is not well suited for concurrent/parallel execution. We will keep the simple mark-and-sweep implementation, but it will no longer be primary.
Parrot really has two independent GC models, one used for objects (PMCs) and the other used for buffers (including strings). The core difference is that buffers cannot contain other buffers, so incremental marking is unnecessary. Currently, PMCs are not allowed to move after creation, so the GC model used there is not copying nor compacting.
The primary GC model for PMCs, at least for the 1.0 release, will use a tri-color incremental marking scheme, combined with a concurrent sweep scheme.
Each PMC has a
flags member which,
among other things,
facilitates garbage collection.
At the beginning of the mark phase,
PObj_is_fully_marked_FLAG are both unset,
which flags the PMC as presumed dead (white).
The initial mark phase of the collection cycle goes through each PMC in the root set and sets the
PObj_is_live_FLAG bit in the
flags member (the PMC is gray).
It does not set the
PObj_is_fully_marked_FLAG bit (changing the PMC to black),
because in the initial mark,
the PMCs or buffers contained by a PMC are not marked.
It also appends the PMC to the end of a list used for further marking.
if the PMC has already been marked as black,
the current end of list is returned (instead of appending the already processed PMC) to prevent endless looping.
The fourth combination of the two flags,
PObj_is_live_FLAG is unset and
PObj_is_fully_marked_FLAG is set,
is reserved for PMCs of an older generation not actively participating in the GC run.
The root set for the initial marking phase includes the following core storage locations:
After the root set of PMCs have been marked,
a series of incremental mark runs are performed.
These may be performed frequently,
between other operations.
The incremental mark runs work to move gray PMCs to black.
They take a PMC from the list for further marking,
mark any PMCs or buffers it contains as gray (the
PObj_is_live_FLAG is set and the
PObj_is_fully_marked_FLAG is left unset),
and add the contained PMCs or buffers to the list for further marking.
If the PMC has a custom mark function in its vtable,
it is called at this point.
After all contained PMCs or buffers have been marked,
the PMC itself is marked as black (the
PObj_is_fully_marked_FLAG are both set).
A limit may be placed on the number of PMCs handled in each incremental mark run.
The initial marking phase also marks the root set of buffers. Because buffers cannot contain other buffers, they are immediately marked as black and not added to the list for further marking. Because PMCs may contain buffers, the buffer collection phase can't run until the incremental marking of PMCs is completed.
The root set for buffers includes the following locations:
Once a buffer is found to be live,
flags member of the buffer structure has the
PObj_is_fully_marked_FLAG bits set.
When the list for further marking is empty (all gray PMCs have changed to black), the collection stage is started. First, PMCs are collected, followed by buffers. In both cases (PMC and buffer), the "live" and "fully_marked" flags are reset after examination for reclamation.
To collect PMCs,
each PMC arena is examined from the most recently created backwards.
Each PMC is examined to see if it is live,
already on the free list,
If it is not,
then it is added to the free list and marked as being on the free list with the
To collect buffers,
each Buffer arena is examined from the most recently created backwards.
If the buffer is not live,
not already on the free list and it is not a constant or copy on write,
then it is added to the free pool for reuse and marked with the
For the most part, the variable sets between concurrent tasks don't interact. They have independent root sets and don't require information on memory usage from other tasks before performing a collection phase. In Parrot, tasks tend to be short-lived, and their variables can be considered young generations from a generational GC perspective. Because of this, a full heavyweight task will maintain its own small memory pools, quickly born and quickly dying.
Shared variables, on the other hand, do require information from multiple concurrent tasks before they can be collected. Because of this, they live in the parent interpreter's global pools, and can only be collected after all concurrent tasks have completed a full mark phase without marking the shared variable as live. Because GC in the concurrent tasks happens incrementally between operations, a full collection of the shared variables can happen lazily, and does not require a stop-the-world sweep through all concurrent tasks simultaneously.
The different GC cores are independent, but they share some code and resources. The arena structures and arena creation routines are common across most GC cores, and some GC cores also share mark routines.
The main interpreter structure has an arena_base member, which is a pointer to an Arenas struct.
The Arenas structure contains pointers to a variety of memory pools, each used for a specific purpose. Two are Memory_Pool pointers (memory_pool, constant_string_pool), and six are Small_Object_Pool structures (pmc_pool, pmc_ext_pool, constant_pmc_pool, buffer_header_pool, constant_string_header_pool).
The Arenas structure holds function pointers for the core defined interface of the currently active GC subsystem:
It holds various accounting information for the GC subsystem,
including how many GC runs have been completed,
amount of memory allocated since the last run,
and total memory allocated.
This accounting information is updated by the GC system.
The current block level for GC mark and sweep phases is stored in the Arenas structure.
(See "Blocking GC".)
void *gc_private is reserved for use by the currently active GC subsystem (with freedom for variation between GC implementations).
The Memory_Pool structure is a simple memory pool. It contains a pointer to the top block of the allocated pool, the total allocated size of the pool, the block size, and some details on the reclamation characteristics of the pool.
The Small_Object_Pool structure is a richer memory pool for object allocation. It tracks details like the number of allocated and free objects in the pool, a list of free objects, and for the generational GC implementation maintains linked lists of white, black, and gray PMCs. It contains a pointer to a simple Memory_Pool (the base storage of the pool). It holds function pointers for adding and retrieving free objects in the pool, and for allocating objects.
Currently only one GC system is active at a time, selected at configure or compile time. Future versions will support switching GC systems at execution-time to accommodate different work loads.
Each GC core provides a standard interface for interaction with the core.
Each GC core declares an initialization routine,
which is called from src/memory.c:mem_setup_allocator() after creating
arena_base in the interpreter struct.
void Parrot_gc_XXX_init(Interp *)
Each GC system declares 3 function pointers, stored in the Arenas structure.
void (*do_gc_mark) (Interp *, int flags)
GC_trace_stack_FLAGindicates that the C-stack (and other system areas like the processor registers) have to be traced too.
void (*finalize_gc_system) (Interp *)
void (*init_pool) (Interp *, struct Small_Object_Pool *)
Each GC core defines 4 function pointers stored in the Small_Object_Pool structures.
PObj * (*get_free_object) (Interp *, struct Small_Object_Pool*)
void (*add_free_object) (Interp *, struct Small_Object_Pool *, PObj *);
void (*alloc_objects) (Interp *, struct Small_Object_Pool *);
void (*more_objects) (Interp *, struct Small_Object_Pool *);
alloc_objects, and in some GC cores the same function pointer is used for both. In some GC cores,
more_objectsmay do a GC run.
Each GC core has to provide these (possibly empty) macros:
GC_WRITE_BARRIER(Interp *, PMC *agg, PMC *old, PMC *new)
oldis getting overritten by
newmay be NULL.
GC_WRITE_BARRIER_KEY(Interp *, PMC *agg, PMC *old, PObj *old_key, PMC *new, PObj *new_key)
Being able to block GC is important, so newly allocated Buffers or PMCs won't be collected before they're attached to the live tree. The following routines control GC:
Note that the blocking is recursive--if you call Parrot_block_GC_sweep() three times in succession, you need to call Parrot_unblock_GC_sweep() three times to re-enable the GC sweep phase.
For PMCs and Buffers to be collected properly, you must set the appropriate flags on them.
Directly manipulating these flags is not recommended. Always use the macros defined in include/parrot/pobj.h.
markvtable slot will be called during the GC mark phase. The mark function must call
pobject_livesfor all non-NULL objects (Buffers and PMCs) that PMC refers to.
pobject_livesmay be a macro.
The following flags can be used by the GC subsystem:
"A unified theory of garbage collection": http://portal.acm.org/citation.cfm?id=1028982
"Scalable Locality-Conscious Multithreaded Memory Allocation": http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf
"Parallel and concurrent garbage collectors": http://chaoticjava.com/posts/parallel-and-concurrent-garbage-collectors/
"Region-Based Memory Management": http://www.irisa.fr/prive/talpin/papers/ic97.pdf
Dan's first musings on the GC subsystem: http://email@example.com/msg14072.html
Semi-timely and ordered destruction: http://www.sidhe.org/~dan/blog/archives/000199.html