PDD 9: Garbage Collection Subsystem

Abstract

This PDD specifies Parrot's garbage collection subsystems.

Version

$Revision$

Definitions

Garbage collection (GC)

Garbage collection is a process of freeing up memory that is no longer used by the interpreter, by determining which objects will not be referenced again and can be reclaimed.

Simple mark

All reachable objects are marked as alive, first marking a root set, and then recursively marking objects reachable from other reachable objects. Objects not reached are considered dead. After collection, all objects are reset to unmarked, and the process starts again.

Tri-color mark

Instead of a simple separation of marked (as live) and unmarked (dead), the object set is divided into three parts: white, gray, and black. The white objects are presumed dead. The gray objects have been marked as live by some other object, but haven't yet marked the objects they refer to. The black objects are live, and have marked all objects they directly refer to.

In the initial run, all objects start as white and the root set is marked gray. The marking process changes white objects to gray (marking them from another gray object), and gray objects to black (when all objects they refer to are marked). When the gray set is empty, all live objects have been marked and the white set can be collected. After a collection run, all black objects are reset to white, the root set to gray, and the process begins again.

The advantage of a tri-color mark over a simple mark is that it can be broken into smaller stages.

Mark-and-sweep

In this GC scheme, after all reachable objects are marked as live, a sweep through the object arenas collects all unmarked objects.

Mark-and-don't-sweep

In this scheme, all objects are marked black (live) when created. White objects are free memory available for allocation. When no white objects remain (out of memory), the black objects are all changed to white, and a marking process runs to mark all reachable objects as live. Any unreachable objects are left white, and available for allocation.

In some implementations, the change from black to white is made by simply changing the interpretation of the mark bit, for example, from 1 == black to 1 == white.

Copying collection

In this scheme, live objects are copied into a new memory region. The entire old memory region can then be reclaimed.

Compacting collection

In this scheme, live objects are moved closer together, eliminating fragments of free space between live objects. This compaction makes later allocation of new objects faster, since the allocator doesn't have to scan for fragments of free space.

Reference counting

In this scheme, all objects have a count of how often they are referred to by other objects. If that count reaches zero, the object's memory can be reclaimed. This scheme doesn't cope well with reference loops--loops of dead objects, all referencing one another but not reachable from elsewhere, never get collected.

Stop-the-world

A common disadvantage of a simple mark implementation is that the entire system (including all threads that use the same memory pools) must be suspended while the whole memory set is examined during marking and collection. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrarily long pauses during program execution.

Incremental

Rather than suspending the system for marking and collection, GC is done in small increments intermittent with normal program operation. Some implementations perform the marking as part of ordinary object access.

Real-time

The pauses caused by GC don't exceed a certain limit.

Generational

The object space is divided between a young generation (short-lived temporaries) and one or more old generations. Only young generations are reset to white (presumed dead). Avoiding scanning the old generations repeatedly can considerably speed up GC.

Generational collection does not guarantee that all unreachable objects will be reclaimed, so in large systems it is sometimes combined with a mark-and-sweep or copying collection scheme, one for light collection runs performed frequently, and the other for more complete runs performed rarely.

Concurrent

GC marking and collection runs as a separate thread, sometimes with multiple threads participating in GC. On a multi-processor machine, concurrent GC may be truly parallel.

Synopsis

Not applicable.

Description

- Parrot provides swappable garbage collection schemes. The GC scheme can be selected at configure/compile time. The GC scheme cannot be changed on-the-fly at runtime, but in the future may be selected with a command-line option at execution time.
- All live PMCs must be reachable from the root set of objects in the interpreter.
- Garbage collection must be safe for objects shared across multiple interpreters/threads.
- The phrase "dead object detection" and abbreviation "DOD" are deprecated.

Implementation

Parrot supports pluggable garbage collection cores, so ultimately any garbage collection model devised can run on it. However, different GC models are more or less appropriate for different application areas. The current default stop-the-world mark-and-sweep model is not well suited for concurrent/parallel execution. We will keep the simple mark-and-sweep implementation, but it will no longer be primary.

Parrot really has two independent GC models, one used for objects (PMCs) and the other used for buffers (including strings). The core difference is that buffers cannot contain other buffers, so incremental marking is unnecessary. Currently, PMCs are not allowed to move after creation, so the GC model used there is not copying nor compacting.

The primary GC model for PMCs, at least for the 1.0 release, will use a tri-color incremental marking scheme, combined with a concurrent sweep scheme.

Terminology

A GC run is composed of two distinct operations: Finding objects which are dead (the "trace" phase) and freeing dead objects for later reuse (the "sweep" phase). The sweep phase is also known as the collection phase. The trace phase is also known as the "mark phase" and less frequently as the "dead object detection" phase. The use of the term "dead object detection" and its acronym DOD has been deprecated.

Initial Marking

Each PMC has a flags member which, among other things, facilitates garbage collection. At the beginning of the mark phase, the PObj_is_live_FLAG and PObj_is_fully_marked_FLAG are both unset, which flags the PMC as presumed dead (white). The initial mark phase of the collection cycle goes through each PMC in the root set and sets the PObj_is_live_FLAG bit in the flags member (the PMC is gray). It does not set the PObj_is_fully_marked_FLAG bit (changing the PMC to black), because in the initial mark, the PMCs or buffers contained by a PMC are not marked. It also appends the PMC to the end of a list used for further marking. However, if the PMC has already been marked as black, the current end of list is returned (instead of appending the already processed PMC) to prevent endless looping.

The fourth combination of the two flags, where PObj_is_live_FLAG is unset and PObj_is_fully_marked_FLAG is set, is reserved for PMCs of an older generation not actively participating in the GC run.

The root set for the initial marking phase includes the following core storage locations:

Global stash
System stack and processor registers
Current PMC register set
Stashes
PMC register stack

Incremental Marking

After the root set of PMCs have been marked, a series of incremental mark runs are performed. These may be performed frequently, between other operations. The incremental mark runs work to move gray PMCs to black. They take a PMC from the list for further marking, mark any PMCs or buffers it contains as gray (the PObj_is_live_FLAG is set and the PObj_is_fully_marked_FLAG is left unset), and add the contained PMCs or buffers to the list for further marking. If the PMC has a custom mark function in its vtable, it is called at this point.

After all contained PMCs or buffers have been marked, the PMC itself is marked as black (the PObj_is_live_FLAG and PObj_is_fully_marked_FLAG are both set). A limit may be placed on the number of PMCs handled in each incremental mark run.

Buffer Marking

The initial marking phase also marks the root set of buffers. Because buffers cannot contain other buffers, they are immediately marked as black and not added to the list for further marking. Because PMCs may contain buffers, the buffer collection phase can't run until the incremental marking of PMCs is completed.

The root set for buffers includes the following locations:

Current String register set
String register set stack
Control stack

Once a buffer is found to be live, the flags member of the buffer structure has the PObj_live_FLAG and PObj_is_fully_marked_FLAG bits set.

Collection

When the list for further marking is empty (all gray PMCs have changed to black), the collection stage is started. First, PMCs are collected, followed by buffers. In both cases (PMC and buffer), the "live" and "fully_marked" flags are reset after examination for reclamation.

Collecting PMCs

To collect PMCs, each PMC arena is examined from the most recently created backwards. Each PMC is examined to see if it is live, already on the free list, or constant. If it is not, then it is added to the free list and marked as being on the free list with the PObj_on_free_list_FLAG.

Collecting buffers

To collect buffers, each Buffer arena is examined from the most recently created backwards. If the buffer is not live, not already on the free list and it is not a constant or copy on write, then it is added to the free pool for reuse and marked with the PObj_on_free_list_FLAG.

Concurrent collection

For the most part, the variable sets between concurrent tasks don't interact. They have independent root sets and don't require information on memory usage from other tasks before performing a collection phase. In Parrot, tasks tend to be short-lived, and their variables can be considered young generations from a generational GC perspective. Because of this, a full heavyweight task will maintain its own small memory pools, quickly born and quickly dying.

Shared variables, on the other hand, do require information from multiple concurrent tasks before they can be collected. Because of this, they live in the parent interpreter's global pools, and can only be collected after all concurrent tasks have completed a full mark phase without marking the shared variable as live. Because GC in the concurrent tasks happens incrementally between operations, a full collection of the shared variables can happen lazily, and does not require a stop-the-world sweep through all concurrent tasks simultaneously.

Internal Structures

The different GC cores are independent, but they share some code and resources. The arena structures and arena creation routines are common across most GC cores, and some GC cores also share mark routines.

The main interpreter structure has an arena_base member, which is a pointer to an Arenas struct.

The Arenas structure

The Arenas structure contains pointers to a variety of memory pools, each used for a specific purpose. Two are Memory_Pool pointers (memory_pool, constant_string_pool), and six are Small_Object_Pool structures (pmc_pool, constant_pmc_pool, constant_string_header_pool).

The Arenas structure holds function pointers for the core defined interface of the currently active GC subsystem: init_pool, do_gc_mark, finalize_gc_system. It holds various accounting information for the GC subsystem, including how many GC runs have been completed, amount of memory allocated since the last run, and total memory allocated. This accounting information is updated by the GC system. The current block level for GC mark and sweep phases is stored in the Arenas structure. (See "Blocking GC".)

The pointer void *gc_private is reserved for use by the currently active GC subsystem (with freedom for variation between GC implementations).

The Memory_Pool structure

The Memory_Pool structure is a simple memory pool. It contains a pointer to the top block of the allocated pool, the total allocated size of the pool, the block size, and some details on the reclamation characteristics of the pool.

The Small_Object_Pool structure

The Small_Object_Pool structure is a richer memory pool for object allocation. It tracks details like the number of allocated and free objects in the pool, a list of free objects, and for the generational GC implementation maintains linked lists of white, black, and gray PMCs. It contains a pointer to a simple Memory_Pool (the base storage of the pool). It holds function pointers for adding and retrieving free objects in the pool, and for allocating objects.

Internal API

Currently only one GC system is active at a time, selected at configure or compile time. Future versions will support switching GC systems at execution-time to accommodate different work loads.

Each GC core provides a standard interface for interaction with the core.

Initialization

Each GC core declares an initialization routine as a function pointer, which is installed in src/memory.c:mem_setup_allocator() after creating arena_base in the interpreter struct.

void Parrot_gc_XXX_init(Interp *)
A routine to initialize the GC system named XXX.The initialization code is responsible for the creation of the header pools and fills the function pointer slots in the interpreter's arena_base member.

Arenas structure function pointers

Each GC system declares 3 function pointers, stored in the Arenas structure.

void (*init_gc_system) (Interp *)
Initialize the GC system. Install the additional function pointers into the Arenas structure, and prepare any private storage to be used by the GC in the Arenas->gc_private field.
void (*do_gc_mark) (Interp *, int flags)
Trigger or perform a GC run. With an incremental GC core, this may only start/continue a partial mark phase or sweep phase, rather than performing an entire run from start to finish. It may take several calls to do_gc_mark in order to complete an entire run of an incremental collector.For a concurrent collector, calls to this function may activate a concurrent collection thread or, if such a thread is already running, do nothing at all.The do_gc_mark function is called from the Parrot_gc_mark_and_sweep function, and should not usually be called directly.flags is one of:
0
Run the GC normally, including the trace and the sweep phases, if applicable. Incremental GCs will likely only run one portion of the complete GC run, and repeated calls would be required for a complete run. A complete trace of all system areas is not required.
GC_trace_normal | GC_trace_stack_FLAG
Run a normal GC trace cycle, at least. This is typically called when there is a resource shortage in the buffer memory pools before the sweep phase is run. The processor registers and any other system areas have to be traced too.Behavior is determined by the GC implementation, and might or might not actually run a full GC cycle. If the system is an incremental GC, it might do nothing depending on the current state of the GC. In an incremental GC, if the GC is already past the trace phase it may opt to do nothing and return immediately. A copying collector may choose to run a mark phase if it hasn't yet, to prevent the unnecessary copying of dead objects later on.
GC_lazy_FLAG
Do a timely destruction run. The goal is either to detect all objects that need timely destruction or to do a full collection. This is called from the Parrot run-loop, typically when a lexical scope is exited and the local variables in that scope need to be cleaned up. Many types of PMC objects, such as line-buffered IO PMCs rely on this behavior for proper operation.No system areas have to be traced.
GC_finish_FLAG
Finalize and destroy all living PMCs. This is called during interpreter destruction. The GC subsystem must clear the live state of all objects and perform a sweep in the PMC header pool, so that destructors and finalizers get called. PMCs which have custom destructors rely on this behavior for proper operation.
void (*finalize_gc_system) (Interp *)
Called during interpreter destruction. Free used resources and memory pools. All PMCs must be swept, and PMCs with custom destroy VTABLE methods must have those called.
void (*init_pool) (Interp *, Small_Object_Pool *)
Initialize the given pool. Populates the Small_Object_Pool structure with initial values, and sets a series of function pointers for working with the pool. The function pointers used with the pool are discussed next.

Small_Object_Pool function pointers

Each GC core defines 4 function pointers stored in the Small_Object_Pool structures. These function pointers are used throughout Parrot to implement basic behaviors for the pool.

PObj * (*get_free_object) (Interp *, Small_Object_Pool*)
Get a free object from the pool. This function returns one free object from the given pool and removes that object from the pool's free list. PObject flags are returned clear, except flags that are used by the garbage collector itself, if any. If the pool is a buffer header pool all other object memory is zeroed.
void (*add_free_object) (Interp *, Small_Object_Pool *, PObj *);
Add a freed object to the pool's free list. This function is most often called internally to the GC itself to add items to the free list after a sweep, or when a new arena is created to add the new items to the free list. It does not need to be used in this way, however.
void (*alloc_objects) (Interp *, Small_Object_Pool *);
Allocate a new arena of objects for the pool. Initialize the new arena and add all new objects to the pool's free list. Some collectors implement a growth factor which increases the size of each new allocated arena.
void (*more_objects) (Interp *, Small_Object_Pool *);
Reallocation for additional objects. It has the same signature as alloc_objects, and in some GC cores the same function pointer is used for both. In some GC cores, more_objects may do a GC run in an attempt to free existing objects without having to allocate new ones. This function may also call pool-alloc_objects> internally, to allocate objects if a GC run fails to free any old objects.

Write Barrier

Each GC core has to provide the following macros. All of these might be defined empty, for GC cores which do not use them.

GC_WRITE_BARRIER(Interp *, PMC *agg, PMC *old, PMC *new)
This macro is invoked when in aggregate agg the element old is getting overwritten by new. Either old, new, or both may be NULL.
GC_WRITE_BARRIER_KEY(Interp *, PMC *agg, PMC *old, PObj *old_key, PMC *new, PObj *new_key)
Similar to GC_WRITE_BARRIER. Invoked when a hash key new_key is inserted into hash agg with value new, possibly replacing a key/value pair old_key and old, respectively. Any of old, old_key, new or new_key might be NULL.

Blocking GC

Being able to block GC is important, so newly allocated Buffers or PMCs won't be collected before they're attached to the live tree. Parrot provides locking mechanisms to prevent the GC from taking certain actions, such as marking or sweeping. GC block functions are nesting, and multiple calls to a lock function requires the same number of calls to the corresponding unlock function in order to operate the GC normally again. The following functions are used to block the GC from performing certain actions:

Parrot_block_GC_mark(Interp *interpreter)
Block the GC mark phase for the passed interpreter, but do not block the sweep phase. In a stop-the-world collector, this will prevent the entire collection run, but in an incremental collector this will only block if the GC is in the trace state.
Parrot_block_GC_sweep(Interp *interpreter)
Block the GC sweep phase for the passed interpreter, but do not block the trace phase.
Parrot_unblock_GC_mark(Interp *interpreter)
Unblock the GC mark phase for the passed interpreter, but do not unblock a blocked sweep phase, if it is blocked using Parrot_block_GC_sweep.
Parrot_unblock_GC_sweep(Interp *interpreter)
Unblock the GC sweep phase for the passed interpreter, but do not unblock the mark phase if it has been blocked by Parrot_block_GC_mark.
Parrot_is_blocked_GC_mark(Interp *interpreter)
Test whether the mark phase has been blocked. Notice that the sweep phase can be locked independently and cannot be determined using this function.
Parrot_is_blocked_GC_sweep(Interp *interpreter)
Test whether the sweep phase has been blocked. Notice that the mark phase can be locked independently and cannot be determined using this function.

PMC/Buffer API

Flags

For PMCs and Buffers to be collected properly, you must set the appropriate flags on them. Directly manipulating these flags is not recommended because the exact values can be changed over time. A series of macros have been created in include/parrot/pobject.h that set and check for these flags. Always use these provided macros when you need to test or set these flags.

PObj_custom_destroy_FLAG
The PMC has some sort of active destructor, and will have that destructor called when the PMC is destroyed. The destructor is typically called from within src/gc/api.c:Parrot_gc_free_pmc.
PObj_custom_mark_FLAG
The mark vtable slot will be called during the GC mark phase. The mark function must call Parrot_gc_mark_PObj_alive for all non-NULL objects (Buffers and PMCs) that PMC refers to. This flag is typically tested and the custom mark VTABLE method called from src/gc/api.c:mark_special.
PObj_external_FLAG
Set if the buffer points to memory that came from outside Parrot's memory system.
PObj_sysmem_FLAG
Set if the memory came from the system malloc. When the buffer is considered dead, the memory will be freed back to the system.
PObj_COW_FLAG
The buffer's memory is copy on write. Any changes to the buffer must first have the buffer's memory copied. The COW flag should then be removed.

The following flags can be used by the GC subsystem:

PObj_live_FLAG
The system considers the object to be alive for collection purposes. Objects with this flag set should never be collected, freed, destroyed, or put on the free list.
PObj_on_free_list_FLAG
The object is unused, and on the free list for later allocation.
PObj_custom_GC_FLAG
Mark the buffer as needing GC.

Attachments

None.

Footnotes

None.

References

"Uniprocessor Garbage Collection Techniques" http://www.cs.rice.edu/~javaplt/311/Readings/wilson92uniprocessor.pdf

"A unified theory of garbage collection": http://portal.acm.org/citation.cfm?id=1028982

"Scalable Locality-Conscious Multithreaded Memory Allocation": http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf

"Parallel and concurrent garbage collectors": http://chaoticjava.com/posts/parallel-and-concurrent-garbage-collectors/

"Region-Based Memory Management": http://www.irisa.fr/prive/talpin/papers/ic97.pdf

Dan's first musings on the GC subsystem: http://www.mail-archive.com/perl6-all@perl.org/msg14072.html

Semi-timely and ordered destruction: http://www.sidhe.org/~dan/blog/archives/000199.html