parrotcode: Untitled | |
Contents | Language Implementations | Perl6 |
A simple introduction and overview to P6C. This document is only to be thought of as a reference to the general flow of things; to get complete details, refer to the source code. If you ignore this bit now, don't worry, it will be said many more times :)
There are 3 main types of objects that are used in P6C; it's best to become familiar with them before anything else:
tree
methods,
defined in P6C/Tree.pm.As a brief overview,
the whole process works like this: First,
perl6 code is parsed using P6C/Parser.pm,
which generates a raw parse tree.
The raw tree is then processed by methods found in P6C/Tree.pm,
turning the raw parse objects into an optree.
Context is propagated onto the optree with P6C/Context.pm,
and then evaluated and compiled to intermediate code by imcc
.
imcc
then translates...this into parrot assembly code,
which is then assembled into bytecode by the parrot assembler and run.
It's just that simple.
The first thing perl6 will do is start the first pass over your code: parse it. Assuming a normal run (no flags), perl6 will use the precompiled parser, Perl6Grammar.pm. If you made any changes to the grammar in P6C/Parser.pm, you'll need to run your code with the --force-grammar flag to recompile it.
Before the grammar itself is described, an overview of the parser will be helpful. The parser works by recursively defining a statement down to its most basic levels, and then turning the matched statement into a tree-like structure of objects. For instance, a small program like this:
$x = 4;
$y = $x + 6;
print $y;
Could be visualized as:
Program
scalar expression
operator '='
left => variable
sigil => '$'
varname => 'x'
right => scalar literal
type => number
value => 4
scalar expression
operator '='
left => variable
sigil => '$'
varname => 'y'
right => scalar expression
operator '+'
left => variable
sigil => '$'
varname => 'x'
right => scalar literal
type => number
value => 6
scalar expression
prefix 'print'
args => variable
sigil => '$'
varname => 'y'
Each node on this tree is actually an object in the form of a blessed anonymous array, created as a result of the grammar's autoaction. It holds the raw data from the parse; one element per production.
Next, on to the grammar. As stated earlier, the grammar is written in Parse::RecDescent. For almost all cases, rules should be written without an action as the final production, so that the autoaction that turns the match into an object can be applied. To get a real idea of what's going on, you should read through the grammar; it's long, but most rules are short. The base rule for the grammar is prog
, but you can probably start at stmt
unless you care about error handling and such like that.
A few notes about the grammar in general:
add_class
adds user-defined classes to the parser, and add_function
does the same for functions. add_context
is used by add_function
to create the resulting "want_for" rule for the function (to handle prototypes if it has them), as well as generating the tree deciphering method (see "Deciphering the Object Tree" for more details). Finally, got_err
handles error messages.Once the parse is complete, the object tree is deciphered by a depth-first recursive traversal of the parse tree using the tree
methods found in Tree. While there is a matching "tree" method for nearly every single rule defined in Parser, there is no way to "generate" the tree methods as they need to handle the data very specifically. However, the general gist of what they do is like this:
tree
methods of its members, gathers up the return values, and builds a branch of the optree.Propagating Context is another way of saying "Figuring out what kind of value a node is expected to yield during compilation." Context information is added to a node by adding another member, ctx
, to the node, and setting its value equal to a context object. Context objects are defined in P6C/Context.pm, which also contains the functions for figuring out context. P6C/Addcontext.pm defines the ctx_left
and <ctx_right> methods for each Node type (which deal for lvalue and rvalue context, respectively), which define how each Node type propagates its context.
Once the parse tree has been post-processed, the next stage is to travel through the op tree and compile it to IMC code. Each node type has a val
method that will be called by its parent node. This method gathers values from child nodes (by calling their val
methods), emits the code for the node's operation (using P6C/IMCC.pm::code
), and then returns the proper type for its context. Code is appended to the current function in the order in which it is generated, so subnodes have to be evaluated in the proper order. Things acting as lvalues need to define an assign
function. Currently only variables, variable declarations, and the ternary operator do so, but other things will need to as well at some point. Finally, regex nodes actually use an rx_val
function instead, which behaves differently, but serves basically the same purpose.
The compiler is very large, but there are a few things to note:
P6C/IMCC.pm::code($x)
will add the IMC code $x to the current function. Scope can further be manipulated "inside" a function by using push_scope
, which increases the depth of the scope by one, and pop_scope
, which lowers it by one.add_localvar
. Global variables can be added with add_globalvar
. Lookups can be done with findvar
, which first searches the active function's locals, then parameters, and then finally globals. This implementation may change when lexical scoping is fully implemented.From this point on, most of the work is done by external parrot processes. First, P6C/IMCC.pm is used to convert the op tree to .imc code. IMCC converts this code to .pasm. The assembler assembles the the .pasm to parrot bytecode, and then finally the bytecode is run by the driver.
Joseph F. Ryan (ryan.311@osu.edu)
$Id$
|