[DRAFT] PDD 1: Overview

Abstract

A high-level overview of the Parrot virtual machine.

Parrot is a virtual machine for dynamic languages like Python, PHP, Ruby, and Perl. A dynamic language is one that allows things like extension of the code base, subroutine and class definition, and altering the type system at runtime. Static languages like Java or C# restrict these features to compile time. If this sounds like an edgy idea, keep in mind that Lisp, one of the prime examples of a dynamic language, has been around since 1958. The basic paradigm shift to dynamic languages leads the way to other more advanced dynamic features like higher-order functions, closures, continuations, and coroutines.

Implementation

Parser

While individual high-level languages may implement their own parser, most will use Parrot's parser grammar engine (PGE). The parser grammar engine compiles a parsing definition (.pg) file into an executable parser for the language. The resulting parser takes source code as input and creates a raw parse tree.

Subsequent stages of compilation convert the raw parse tree into an annotated syntax tree. The tree grammar engine (TGE) compiles a transformation definition (.tg) file into an executable set of rules to transform data structures. In most compilers, the syntax tree goes through a series of transformations, starting with the raw parse tree, through a syntax tree that is close to the semantics of the HLL, and ending in a syntax tree that is close to the semantics of Parrot's bytecode. Some compilers will also insert optimization stages into the compilation process between the common transformation stages.

IMCC

The intermediate code compiler (IMCC) is the main parrot executable, and encapsulates several core low-level components.

PASM & PIR parser

This Bison and Flex based parser/lexer handles Parrot's assembly language, PASM, and the slightly higher-level language, PIR (Parrot Intermediate Representation).

Bytecode compiler

The bytecode compiler module takes a syntax tree from the parser and emits an unoptimized stream of bytecode. This code is suitable for passing straight to the interpreter, though it is probably not going to be very fast.

Note that currently, the only way to generate bytecode is by first generating PASM or PIR.

Optimizer

The optimizer module takes the bytecode stream from the compiler and optionally the syntax tree the bytecode was generated from, and optimizes the bytecode.

Interpreter

The interpreter module takes the bytecode stream from either the optimizer or the bytecode compiler and executes it. There must always be at least one interpreter module available for any program that can handle all of Perl, since it's required for use statements and BEGIN blocks.

While there must be at least one interpreter, there may be multiple interpreter modules linked into an executable. This would be the case, for example, for programs that produced Java bytecode, where one of the interpreter modules would take the bytecode stream and spit out Java bytecode instead of interpreting it.

Standalone pieces

Each piece of IMCC can, with enough support hidden away (in the form of an interpreter for the parsing module, for example), stand on its own. This means it's feasible to make the parser, bytecode compiler, optimizer and interpreter separate executables.

This allows us to develop pieces independently. It also means we can have a standalone optimizer which can spend a lot of time groveling over bytecode, far more than you might want to devote to optimizing one-liners or code that'll run only once or twice.

Subsystems

The following subsystems are each responsible for a key component of Parrot's core functionality.

I/O subsystem

The I/O subsystem provides source- and platform-independent synchronous and asynchronous I/O to Parrot. How this maps to the OS's underlying I/O code is not generally Parrot's concern, and a platform isn't obligated to provide asynchronous I/O.

Additionally, the I/O subsystem allows a program to push filters onto an input stream if necessary, to manipulate the data before it is presented to a program.

Regular expression engine

The parser grammar engine (PGE) is also Parrot's regular expression engine. The job of the regular expression engine is to compile regular expression syntax (both Perl 5 compatible syntax and Perl 6 syntax) into classes, and apply the matching rules of the classes to strings.

The regular expression engine is available to any language running on Parrot.

Data transformation engine

The tree grammar engine (TGE) is also a general-purpose data transformation tool (somewhat similar to XSLT).

API Levels

Embedding

The embedding API is the set of calls exported to an embedding application. This is a small, simple set of calls, requiring minimum effort to use.

The goal is to provide an interface that a competent programmer who is uninterested in Parrot internals can use to provide access to a Parrot interpreter within another application with very little programming or intellectual effort. Generally it should take less than thirty minutes for a simple interface, though more complete integration will take longer.

Extensions

The extension API is the set of calls exported to Parrot extensions. They provide access to most of the things an extension needs to do, while hiding the implementation details. (So that, for example, we can change the way scalars are stored without having to rewrite, or even recompile, an extension).

Guts

The guts-level APIs are the routines used within a component. These aren't guaranteed to be stable, and shouldn't be used outside a component. (For example, an extension to the interpreter shouldn't call any of the parser's internal routines).

Target Platforms

The ultimate goal of Parrot is portability to more-or-less the same platforms as Perl 5, including AIX, BeOS, BSD/OS, Cygwin, Darwin, Debian, DG/UX, DragonFlyBSD, Embedix, EPOC, FreeBSD, Gentoo, HP-UX, IRIX, Linux, Mac OS (Classic), Mac OS X, Mandriva, Minix, MS-DOS, NetBSD, NetWare, NonStop-UX, OpenBSD, OS/2, Plan 9, Red Hat, RISC OS, Slackware, Solaris, SuSE, Syllable, Symbian, TiVo (Linux), Tru64, Ubuntu, VMS, VOS, WinCE, Windows 95/98/Me/NT/2000/XP/Vista, and z/OS.

Recognizing the fact that ports depend on volunteer labor, the minimum requirements for the 1.0 launch of Parrot are portability to major versions of Linux, BSD, Mac OS X, and Windows released within 2 years prior to the 1.0 release. As we approach the 1.0 release we will actively seek porters for as many other platforms as possible.

Language Notes

Parrot for small platforms

One goal of the Parrot project, though not a requirement of the 1.0 release, is to run on small devices such as the Palm. For small platforms, any parser, compiler, and optimizer modules are replaced with a small bytecode loader module which reads in Parrot bytecode and passes it to the interpreter for execution. Note that the lack of a parser will limit the available functionality in some languages: for instance, in Perl, string eval, do, use, and require will not be available (although loading of precompiled modules via do, use, or require may be supported).

Bytecode compilation

One straightforward use of the Parrot system is to precompile a program into bytecode and save it for later use. Essentially, we would compile a program as normal, but then simply freeze the bytecode to disk for later loading.

Your HLL in; Java, CLI, or whatever out

The previous section assumes that we will be emitting Parrot bytecode. However, there are other possibilities: we could translate the bytecode to Java bytecode or .NET code, or even to a native executable. In principle, Parrot could also act as a front end to other modular compilers such as gcc or HP's GEM compiler system.

References

None.