PDD 9: Garbage Collection Subsystem

Abstract

This PDD specifies Parrot's garbage collection and memory management subsystems.

Definitions

Garbage collection (GC)

Garbage collection is a process of freeing up memory that is no longer used by the interpreter, by determining which objects will not be referenced again and can be reclaimed.

Mark and sweep (MS)

Starting from a known root set, the GC traces all reachable memory objects by following pointers. Objects reached in this way, and therefore visible for use by the program, are alive. Objects which are not reached in the trace are marked dead. In the second stage, sweep, all dead objects are destroyed and reclaimed.

Tri-color mark and sweep

Instead of a simple separation of marked (as live) and unmarked (dead), the object set is divided into three parts: white, gray, and black. The white objects are presumed dead. The gray objects have been marked as live by some other object, but haven't yet marked the objects they refer to. The black objects are live, and have marked all objects they directly refer to.

In the initial run, all objects start as white and the root set is marked gray. The marking process changes white objects to gray (marking them from another gray object), and gray objects to black (when all objects they refer to are marked). When the gray set is empty, all live objects have been marked and the white set can be collected. After a collection run, all black objects are reset to white, the root set to gray, and the process begins again.

The advantage of a tri-color mark over a simple mark is that it can be broken into smaller stages.

Copying collection

A copying GC copies objects from one memory region to another during the mark phase. At the end of the mark, all memory in the old region is dead and the whole region can be reclaimed at once.

Compacting collection

The compacting GC moves live objects close together in a single region in memory. This helps to elimianate fragmented free space and allows the allocation of large live objects. Compacting and copying collectors are often similar or even identical in implementation.

Uncooperative

An uncooperative GC is implemented as a separate module, often without affecting the remainder of the program. The programmer can write software without needing to be aware of the operations or implementation of the GC. The alternative is a cooperative GC, which is often implemented as a reference counting scheme and requires GC-related logic to be dispersed throughout the entire program.

Stop-the-world

A common disadvantage of a simple mark implementation is that the entire system (including all threads that use the same memory pools) must be suspended while the whole memory set is examined during marking and collection. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrarily long pauses during program execution.

Incremental

In order to alleviate the arbitrarily long pauses in a stop-the-world GC, the incremental GC breaks the mark and sweep process up into smaller, shorter phases. Each GC phase may still require the entire program to pause, but the pauses are shorter and more frequent.

Real-time

The pauses caused by GC don't exceed a certain limit.

Generational

The object space is divided between a young generation (short-lived temporaries) and one or more old generations. Only young generations are reset to white (presumed dead). The older generations are scanned less often because it is assumed that long-lived objects tend to live longer.

Concurrent

GC marking and collection runs as a separate thread, sometimes with multiple threads participating in GC. On a multi-processor machine, concurrent GC may be truly parallel.

Conservative

A conservative GC traces through memory looking for pointers to living objects. The GC does not necessarily have information about the layout of memory, so it cannot differentiate between an actual pointer and an integral value which has the characteristics of a pointer. The Conservative GC follows a policy of "no false negatives" and traces any value which appears to be a pointer.

Precise

A precise GC has intimate knowledge of the memory layout of the system and knows where to find pointers. In this way the precise collector never has any false positives.

Synopsis

Not applicable.

Description

No GC algorithm is ideal for all workloads. To support multiple workloads, Parrot provides support for pluggable uncooperative GC cores. Parrot will attempt to provide a default core which has reasonable performance for most programs. Parrot provides no built-in support for cooperative GCs.

Parrot uses two separate memory allocation mechanisms: a fixed-size system for small objects of fixed size (PMC and STRING headers, etc), and a buffer allocator for arbitrary-sized objects, such as string contents. The default fixed-size memory allocator uses a SLAB-like algorithm to allocate objects from large pre-allocated pools. The default buffer allocator uses a compacting algorithm.

Implementation

Parrot supports pluggable garbage collection cores, so ultimately any uncooperative garbage collection model devised can run on it.

Parrot really has two independent GC models, one used for objects (PMCs) and the other used for buffers (including strings). The core difference is that buffers cannot contain other buffers, so incremental marking is unnecessary.

Terminology

A GC run is composed of two distinct operations: Finding objects which are dead (the "trace" or "mark" phase) and freeing dead objects for later reuse (the "sweep" phase). The sweep phase is also known as the collection phase. The trace phase is less frequently known as the "dead object detection" phase.

Marking

Each PMC and STRING has a flags member which is a bitfield of various flags. Three flags in particular are important for GC operation. PObj_live_FLAG is set if the object is currently alive and active. PObj_on_free_list_FLAG is set if the object is currently on the free list and is available for reallocation. A third flag, PObj_grey_FLAG can be used to support tricolor mark. Despite the given names of these flags, they can be used by the active GC core for almost any purpose, or they can be ignored entirely if the GC provides another mechanism for marking the various life stages of the object. These flags are typically not used outside the GC subsystem.

Special PMCs

Root Set

The root set for the GC mark is the interpreter object and, if necessary, the C system stack. If the C system stack is traced, the GC is conservative.

Initiating a mark and sweep

Depending on the core in use, the mark and sweep phases may be initiated in different ways. A concurrent core would always be running in the background. The most common mechanism for a non-concurrent core is to initiate a run of the GC system when an attempt is made to allocate

Object marking

To mark a PMC, the Parrot_gc_mark_pmc_alive function is called. To mark a STRING, the Parrot_gc_mark_string_alive function is called. These functions mark the object alive, typically by setting the PObj_live_FLAG flag.

If the PMC contains references to other PMCs and STRINGS, it must have the PObj_custom_mark_FLAG flag set. If this flag is set, the mark VTABLE for that PMC is called to mark the pointers in that PMC. The custom_mark flag is ignored in STRINGs.

Buffer Marking

Buffers are always attached to a fixed-size header, or several headers. During the mark phase of the fixed-size objects, owned buffers are flagged as alive. At somet time after the fixed-size objects are marked, the buffer pool is compacted by moving all alive buffers to a new pool and then freeing the old pool back to the operating system.

Collection

When all objects have been marked, the collection phase begins.

Collecting objects

During the sweep phase, objects which had previously been alive but were not traced in the most recent mark phase are dead and are collected. If the PObj_custom_destroy_FLAG is set on a PMC, the GC will call the destroy VTABLE on that PMC to do custom cleanup. This flag is ignored in STRINGs.

The GC does not collect dead PMCs in any particular order and does not guarantee any ordering of collection between dependent PMCs. Some GC cores may enforce some ordering or dependency recognition, but this is not guaranteed.

Finalization

When the interpreter object is destroyed, the GC system is finalized. During finalization, all living PMCs in the system are destroyed unconditionally and all memory owned by the interpreter is freed back to the operating system.

Internal Structures

A GC core is defined in memory by a structure of function pointers to various routines that perform the primitive operations of the GC. A GC core must define most of the pointers in the interp->gc_sys structure, which is a GC_Subsystem structure.

GC_Subsystem has the following fields:

void (*finalize_gc_system) (PARROT_INTERP)
Function to finalize the GC system, by freeing all PMCs and returning all allocated memory to the operating system.
void (*destroy_child_interp)(Interp *dest_interp, Interp *child_interp)
void (*do_gc_mark)(PARROT_INTERP, UINTVAL flags)
Perform a GC mark and sweep run, or at least run a single increment of it.
void (*compact_string_pool)(PARROT_INTERP)
Compact the string pool and destroy all unused buffers.
void (*mark_special)(PARROT_INTERP, PMC *)
Mark a special PMC. A PMC is special if it has the PObj_is_special_FLAG flag set.
void (*pmc_needs_early_collection)(PARROT_INTERP, PMC *)
Flag a PMC as needing early collection.
void (*init_pool)(PARROT_INTERP, struct Fixed_Size_Pool *)
Initialize a new memory pool.
PMC* (*allocate_pmc_header)(PARROT_INTERP, UINTVAL flags)
Allocate a new PMC object from the system.
void (*free_pmc_header)(PARROT_INTERP, PMC *)
Free a PMC object back to the system.
STRING* (*allocate_string_header)(PARROT_INTERP, UINTVAL flags)
Allocate a new STRING header from the system.
void (*free_string_header)(PARROT_INTERP, STRING*)
Free a STRING object back to the system.
Buffer* (*allocate_bufferlike_header)(PARROT_INTERP, size_t size)
void (*free_bufferlike_header)(PARROT_INTERP, Buffer*, size_t size)
int (*is_pmc_ptr)(PARROT_INTERP, void*)
Determine if the given pointer is or resembles a valid PMC pointer.
int (*is_string_ptr)(PARROT_INTERP, void*)
Determine if the given pointer is or resembles a valid STRING pointer.
void (*mark_pobj_header)(PARROT_INTERP, PObj*)
void (*mark_pmc_header)(PARROT_INTERP, PMC *)
Mark a PMC alive.
void* (*allocate_pmc_attributes)(PARROT_INTERP, PMC *)
Allocate attribute storage for a PMC. The size of the attributes structure is determined from the PMCs VTABLE.
void (*free_pmc_attributes)(PARROT_INTERP, PMC *)
Free an attribute structure back to the system.
void (*allocate_string_storage) (PARROT_INTERP, STRING *str, size_t size)
Allocate buffer storage for a string.
void (*reallocate_string_storage) (PARROT_INTERP, STRING *str, size_t size)
Resize existing string storage to fit data of the new size.
void (*allocate_buffer_storage) (PARROT_INTERP, ARGMOD(Buffer *buffer), size_t nsize)
Allocate buffer storage for any purpose.
void (*reallocate_buffer_storage) (PARROT_INTERP, ARGMOD(Buffer *buffer), size_t newsize)
Reallocate or resize existing buffer storage.
void* (*allocate_fixed_size_storage)(PARROT_INTERP, size_t size)
Allocate storage for a fixed-size header which is not a PMC or a STRING. The contents of this structure are not marked automatically by GC.
void (*free_fixed_size_storage)(PARROT_INTERP, size_t size, void *)
Free a fixed-size structure back to the system.
void* (*allocate_memory_chunk)(PARROT_INTERP, size_t size)
void* (*reallocate_memory_chunk)(PARROT_INTERP, void *data, size_t newsize)
void* (*allocate_memory_chunk_with_interior_pointers)(PARROT_INTERP, size_t size)
void* (*reallocate_memory_chunk_with_interior_pointers)(PARROT_INTERP, void *data, size_t oldsize, size_t newsize)
void (*free_memory_chunk)(PARROT_INTERP, void *data)
void (*block_mark)(PARROT_INTERP)
Block the GC mark from occurring.
void (*unblock_mark)(PARROT_INTERP)
Unblock the GC mark.
unsigned int (*is_blocked_mark)(PARROT_INTERP)
Query the blocked state of the GC mark.
void (*block_sweep)(PARROT_INTERP)
Block the GC sweep phase.
void (*unblock_sweep)(PARROT_INTERP)
Unblock the GC sweep phase.
unsigned int (*is_blocked_sweep)(PARROT_INTERP)
Query the blocked state of the GC sweep.
size_t (*get_gc_info)(PARROT_INTERP, Interpinfo_enum)
Query information about the GC core.

The Memory_Pools structure

The Memory_Pools structure contains pointers to a variety of memory pools, each used for a specific purpose. Two are Var_Size_Pool pointers (memory_pool, constant_string_pool), and six are Fixed_Size_Pool structures (pmc_pool, constant_pmc_pool, constant_string_header_pool).

The Memory_Pools structure holds function pointers for the core defined interface of the currently active GC subsystem: init_pool, do_gc_mark, finalize_gc_system. It holds various accounting information for the GC subsystem, including how many GC runs have been completed, amount of memory allocated since the last run, and total memory allocated. This accounting information is updated by the GC system. The current block level for GC mark and sweep phases is stored in the Memory_Pools structure. (See "Blocking GC".)

The pointer void *gc_private is reserved for use by the currently active GC subsystem (with freedom for variation between GC implementations).

The Var_Size_Pool structure

The Var_Size_Pool structure is a simple memory pool. It contains a pointer to the top block of the allocated pool, the total allocated size of the pool, the block size, and some details on the reclamation characteristics of the pool.

The Fixed_Size_Pool structure

The Fixed_Size_Pool structure is a richer memory pool for object allocation. It tracks details like the number of allocated and free objects in the pool, a list of free objects, and for the generational GC implementation maintains linked lists of white, black, and gray PMCs. It contains a pointer to a simple Var_Size_Pool (the base storage of the pool). It holds function pointers for adding and retrieving free objects in the pool, and for allocating objects.

Internal API

Each GC core provides a standard interface for interaction with the core.

Initialization

Each GC core declares an initialization routine as a function pointer, which is installed in src/memory.c:mem_setup_allocator() after creating mem_pools in the interpreter struct.

void Parrot_gc_XXX_init(Interp *)
A routine to initialize the GC system named XXX.The initialization code is responsible for the creation of the header pools and fills the function pointer slots in the interpreter's mem_pools member.

Memory_Pools structure function pointers

Each GC system declares 3 function pointers, stored in the Memory_Pools structure.

void (*init_gc_system) (Interp *)
Initialize the GC system. Install the additional function pointers into the Memory_Pools structure, and prepare any private storage to be used by the GC in the Memory_Pools->gc_private field.
void (*do_gc_mark) (Interp *, int flags)
Trigger or perform a GC run. With an incremental GC core, this may only start/continue a partial mark phase or sweep phase, rather than performing an entire run from start to finish. It may take several calls to do_gc_mark in order to complete an entire run of an incremental collector.For a concurrent collector, calls to this function may activate a concurrent collection thread or, if such a thread is already running, do nothing at all.The do_gc_mark function is called from the Parrot_gc_mark_and_sweep function, and should not usually be called directly.flags is one of:
0
Run the GC normally, including the trace and the sweep phases, if applicable. Incremental GCs will likely only run one portion of the complete GC run, and repeated calls would be required for a complete run. A complete trace of all system areas is not required.
GC_trace_normal | GC_trace_stack_FLAG
Run a normal GC trace cycle, at least. This is typically called when there is a resource shortage in the buffer memory pools before the sweep phase is run. The processor registers and any other system areas have to be traced too.Behavior is determined by the GC implementation, and might or might not actually run a full GC cycle. If the system is an incremental GC, it might do nothing depending on the current state of the GC. In an incremental GC, if the GC is already past the trace phase it may opt to do nothing and return immediately. A copying collector may choose to run a mark phase if it hasn't yet, to prevent the unnecessary copying of dead objects later on.
GC_lazy_FLAG
Do a timely destruction run. The goal is either to detect all objects that need timely destruction or to do a full collection. This is called from the Parrot run-loop, typically when a lexical scope is exited and the local variables in that scope need to be cleaned up. Many types of PMC objects, such as line-buffered IO PMCs rely on this behavior for proper operation.No system areas have to be traced.
GC_finish_FLAG
Finalize and destroy all living PMCs. This is called during interpreter destruction. The GC subsystem must clear the live state of all objects and perform a sweep in the PMC header pool, so that destructors and finalizers get called. PMCs which have custom destructors rely on this behavior for proper operation.
void (*finalize_gc_system) (Interp *)
Called during interpreter destruction. Free used resources and memory pools. All PMCs must be swept, and PMCs with custom destroy VTABLE functions must have those called.
void (*init_pool) (Interp *, Fixed_Size_Pool *)
Initialize the given pool. Populates the Fixed_Size_Pool structure with initial values, and sets a series of function pointers for working with the pool. The function pointers used with the pool are discussed next.

Fixed_Size_Pool function pointers

Each GC core defines 4 function pointers stored in the Fixed_Size_Pool structures. These function pointers are used throughout Parrot to implement basic behaviors for the pool.

PObj * (*get_free_object) (Interp *, Fixed_Size_Pool*)
Get a free object from the pool. This function returns one free object from the given pool and removes that object from the pool's free list. PObject flags are returned clear, except flags that are used by the garbage collector itself, if any. If the pool is a buffer header pool all other object memory is zeroed.
void (*add_free_object) (Interp *, Fixed_Size_Pool *, PObj *);
Add a freed object to the pool's free list. This function is most often called internally to the GC itself to add items to the free list after a sweep, or when a new arena is created to add the new items to the free list. It does not need to be used in this way, however.
void (*alloc_objects) (Interp *, Fixed_Size_Pool *);
Allocate a new arena of objects for the pool. Initialize the new arena and add all new objects to the pool's free list. Some collectors implement a growth factor which increases the size of each new allocated arena.
void (*more_objects) (Interp *, Fixed_Size_Pool *);
Reallocation for additional objects. It has the same signature as alloc_objects, and in some GC cores the same function pointer is used for both. In some GC cores, more_objects may do a GC run in an attempt to free existing objects without having to allocate new ones. This function may also call pool-alloc_objects> internally, to allocate objects if a GC run fails to free any old objects.

Write Barrier

Each GC core has to provide the following macros. All of these might be defined empty, for GC cores which do not use them.

GC_WRITE_BARRIER(Interp *, PMC *agg, PMC *old, PMC *new)
This macro is invoked when in aggregate agg the element old is getting overwritten by new. Either old, new, or both may be NULL.
GC_WRITE_BARRIER_KEY(Interp *, PMC *agg, PMC *old, PObj *old_key, PMC *new, PObj *new_key)
Similar to GC_WRITE_BARRIER. Invoked when a hash key new_key is inserted into hash agg with value new, possibly replacing a key/value pair old_key and old, respectively. Any of old, old_key, new or new_key might be NULL.

Blocking GC

Being able to block GC is important, so newly allocated Buffers or PMCs won't be collected before they're attached to the live tree. Parrot provides locking mechanisms to prevent the GC from taking certain actions, such as marking or sweeping. GC block functions are nesting, and multiple calls to a lock function requires the same number of calls to the corresponding unlock function in order to operate the GC normally again. The following functions are used to block the GC from performing certain actions:

Parrot_block_GC_mark(Interp *interpreter)
Block the GC mark phase for the passed interpreter, but do not block the sweep phase. In a stop-the-world collector, this will prevent the entire collection run, but in an incremental collector this will only block if the GC is in the trace state.
Parrot_block_GC_sweep(Interp *interpreter)
Block the GC sweep phase for the passed interpreter, but do not block the trace phase.
Parrot_unblock_GC_mark(Interp *interpreter)
Unblock the GC mark phase for the passed interpreter, but do not unblock a blocked sweep phase, if it is blocked using Parrot_block_GC_sweep.
Parrot_unblock_GC_sweep(Interp *interpreter)
Unblock the GC sweep phase for the passed interpreter, but do not unblock the mark phase if it has been blocked by Parrot_block_GC_mark.
Parrot_is_blocked_GC_mark(Interp *interpreter)
Test whether the mark phase has been blocked. Notice that the sweep phase can be locked independently and cannot be determined using this function.
Parrot_is_blocked_GC_sweep(Interp *interpreter)
Test whether the sweep phase has been blocked. Notice that the mark phase can be locked independently and cannot be determined using this function.

PMC/Buffer API

Flags

For PMCs and Buffers to be collected properly, you must set the appropriate flags on them. Directly manipulating these flags is not recommended because the exact values can be changed over time. A series of macros have been created in include/parrot/pobject.h that set and check for these flags. Always use these provided macros when you need to test or set these flags.

PObj_custom_destroy_FLAG
The PMC has some sort of active destructor, and will have that destructor called when the PMC is destroyed. The destructor is typically called from within src/gc/api.c:Parrot_gc_free_pmc.
PObj_custom_mark_FLAG
The mark vtable slot will be called during the GC mark phase. The mark function must call Parrot_gc_mark_PObj_alive for all non-NULL objects (Buffers and PMCs) that PMC refers to. This flag is typically tested and the custom mark VTABLE function called from src/gc/api.c:mark_special.
PObj_external_FLAG
Set if the buffer points to memory that came from outside Parrot's memory system.
PObj_sysmem_FLAG
Set if the memory came from the system malloc. When the buffer is considered dead, the memory will be freed back to the system.
PObj_COW_FLAG
The buffer's memory is copy on write. Any changes to the buffer must first have the buffer's memory copied. The COW flag should then be removed.

The following flags can be used by the GC subsystem:

PObj_live_FLAG
The system considers the object to be alive for collection purposes. Objects with this flag set should never be collected, freed, destroyed, or put on the free list.
PObj_on_free_list_FLAG
The object is unused, and on the free list for later allocation.
PObj_custom_GC_FLAG
Mark the buffer as needing GC.

References

"Uniprocessor Garbage Collection Techniques" http://www.cs.rice.edu/~javaplt/311/Readings/wilson92uniprocessor.pdf

"A unified theory of garbage collection": http://portal.acm.org/citation.cfm?id=1028982

"Scalable Locality-Conscious Multithreaded Memory Allocation": http://people.cs.vt.edu/~scschnei/papers/ismm06.pdf

"Parallel and concurrent garbage collectors": http://chaoticjava.com/posts/parallel-and-concurrent-garbage-collectors/

"Region-Based Memory Management": http://www.irisa.fr/prive/talpin/papers/ic97.pdf

Dan's first musings on the GC subsystem: http://www.mail-archive.com/perl6-all@perl.org/msg14072.html

Semi-timely and ordered destruction: http://www.sidhe.org/~dan/blog/archives/000199.html