parrotcode: Parrot Abstract Syntax Tree | |
Contents | Documentation |
docs/pdds/pdd26_ast.pod - Parrot Abstract Syntax Tree
This PDD describes the node types and semantics of the Parrot Abstract Syntax Tree representation.
$Revision$
The Parrot Abstract Syntax Tree (PAST) is a set of Parrot classes useful for generating an abstract semantic representation of a program written in a high level language such as Perl or Python. Once a program has been translated into its equivalent PAST representation, the Parrot Compiler Toolkit can be used to generate executable PIR or bytecode from the abstract syntax tree representation.
PAST is designed to provide the common operations and semantics needed by many of the high level languages targeted by Parrot, while also being extensible to support the unique needs of specific languages.
PAST::Node
is the base class for all nodes in an abstract syntax tree.
Each PAST::Node
element has an array of children nodes (which may be empty),
as well as a number of attributes and accessor methods for the node.
Every PAST node has name
,
source
,
and pos
attributes.
The name
attribute is the node's name,
if any,
while source
and pos
are used to identify the location of the node in the original source code.
push
method,
below) and calls the appropriate accessor method for each attribute.child
to the end of the node's array of children.child
to the beginning of the node's array of children.Iterator
for the node's list of children.source
and pos
attributes to be the same as val
.
If val
is another PAST node,
then source
and pos
are simply copied from that node.
Otherwise,
val
is assumed to be a Match
object (e.g.,
from PGE) and the source and position information are obtained from that.name
attribute of the node.has_value
is true then set the node's value of attrname
to value
.
Returns the (resulting) value of attrname
for the invocant.PAST::Stmts
is simply a PAST::Node
used to contain a sequence of other PAST nodes to be evaluated.
It has no specific methods or attributes.
PAST::Val
nodes represent constant values in the abstract syntax tree.
The value
attribute represents the value of the node itself.
PAST::Var
nodes represent variable-like items within the abstract syntax tree.
The name
attribute (inherited from PAST::Node
) identifies the name of the variable as given by the high level language program.
The other attributes for PAST::Var
nodes provide the details of how the variable is defined and accessed.
PAST::Block
in which they are declared,
including any nested blocks.
If the node's isdecl
attribute is true,
then this node defines a new lexical variable within the current block.
Otherwise,
the node refers to a lexical variable already declared in the current or outer block.namespace
attribute may be used to explicitly identify the namespace of the variable,
otherwise it is assumed to be within the namespace of the PAST::Block
containing the node.PAST::Block
nodes.
The parameter's lexical name is given by the node's name
attribute.
Positional parameters are defined by the order in which they appear in the PAST::Block
node,
named parameters have values for the named
attribute.
Optional parameters are identified via the viviself
attribute (see below) indicating how the parameter is to be initialized if not supplied by the caller.
Slurpy parameters are indicated via the node's slurpy
attribute.vivibase
attribute below may be used to specify how to generate the aggregate if it doesn't already exist.name
attribute.
Nodes representing attribute variables have an optional child,
representing the object to which the attribute belongs.
If this child is not present,
the attribute is assumed to belong to the current invocant,
indicated by the special variable self
(which is implicitly passed to all subs that are flagged as a :method
.scope
is not explicitly provided in the node,
then PAST will look at the local symbol tables of any outer PAST::Block
nodes to try to determine the scope of the named variable.
If this still does not result in a scope,
then 'lexical' is assumed.viviself
attribute,
which specifies how uninitialized variables are to be initialized when first used.
The viviself
attribute may be either a string or array representing a type (e.g.,
Integer
),
or it may be a PAST representation for generating the variable's initial value.vivibase
attribute,
which specifies how the base portion of aggregate variables (see "keyed scope" above) is to be created if it doesn't already exist.
Vivibase
may either be a string or array representing the type of the newly created aggregate,
or it may be a PAST representation for generating the aggregate.isdecl
attribute.
A true value of isdecl
indicates that the variable given by this node is being created at this point within the current lexical scope.
Otherwise,
the node refers to a pre-existing variable (possibly from an outer scope).lvalue
attribute,
which indicates whether this variable is being used in an lvalue context.slurpy
attribute of parameter variables.
A true value of slurpy
indicates that the parameter variable given by this node is to be created as a slurpy parameter (consuming all remaining arguments passed in).
Named slurpy parameters are indicated by having a non-empty named
attribute and a true value of slurpy
.PAST::Op
nodes represent the operations in an abstract syntax tree.
The primary function of the node is given by its pasttype
attribute,
secondary functions may be indicated by the node's name
,
pirop
,
or other attributes as given below.
pasttype
attribute.
The pasttype
is the primary indicator of the type of operation to be performed,
the operands are typically given as the children of the node.
Defined values of pasttype
are:PAST::Op
node,
otherwise the third child is evaluated and returned as the result.
This implements the standard if-then-else logic needed by most higher level languages,
and can also be used for implementing the ternary operator.&&
operator may be implemented using an "if" node,
where the left operand is the first (condition) child,
the right operand is the second (then) child,
and the third child is left as null or uninitialized.||
) operator using this pasttype,
by setting the left operand to ||
as the first child and the right operand as the third child (leaving the second child as null).
However,
it's probably simpler to use the "unless" type as described below.if
above,
except that the second child is evaluated if the first child evaluates to false and the third child is evaluated if the first child evaluates to true.unless
type can be used to implement "short-circuit or" semantics; simply set the first child to the left operand and the second child to the right operand,
leaving the third child empty or uninitialized.
If the first child evaluates to true it is returned as the result of the operation,
otherwise the second child is evaluated and returned as the result.while
and until
above,
except the second child is evaluated before the conditional first child is evaluated for continuation of the loop.arity
(default 1).
For each iteration,
invoke the second child,
passing the elements as parameters.name
attribute,
passing the results of any child nodes as arguments.
If the node has no name
attribute,
then the first child is assumed to evaluate to a callable subroutine,
and any remaining children are used as arguments.name
on the first child,
passing the results of any child nodes as arguments.
If the node has no name
attribute,
then the first child is evaluated as a method to be called,
the second child is the invocant,
and any remaining children are arguments to the method call.pirop
attribute.
See the pirop
method below for details.
This is also the default behavior when a pirop
attribute is set and pasttype
is not.inline
attribute (a string).
See the inline
method below for details.assign
opcode,
any destination is typically generated automatically and should not be explicitly given as a child operand to the node.
The table of PIR opcodes that PAST "knows" about is given in compilers/pct/src/PAST/Compiler.pir .pasttype
is set to "inline".
The code
argument is PIR text to be inserted in the final generated code sequence.
Sequences of "%0",
"%1",
"%2",
...
"%9" in code
are replaced with the evaluated results of the first,
second,
third,
...,
tenth children nodes.
(If you need more than ten arguments to your inline PIR,
consider making it a subroutine call instead.)code
string: %r - Generate a unique PMC register for the result.
%t - Generate a unique PMC register for the result,
and initialize it with an object of type C<returns>
{{Pm: or possibly C<viviself> }}
before the execution of the inline PIR.
%u - Re-use the first child's PMC (%0) if it's a temporary
result, otherwise same as %t above.
code
, then the first child's (%0) is used as the result of this operation.PAST::Block
nodes represent lexical scopes within an abstract syntax tree, and roughly translate to individual Parrot subroutines. A PAST::Block
node nested within another PAST::Block
node acts like a nested lexical scope.
If the block has a name
attribute, that becomes the name of the resulting Parrot sub. Otherwise, a unique name is automatically generated for the block.
Each PAST::Block node can maintain its own local symbol table, see the symbol
method below.
type
. The currently understood values are 'declaration' and 'immediate'. 'Declaration' indicates that a block is simply being defined at this point, the result of the node is a reference to the block. A blocktype
of 'immediate' indicates a block that is to be immediately executed when it is evaluated in the AST, and the result of the last operation is used as the return value for the block.Value
may either be a simple string or an array of strings representing a nested namespace.symbol
method to maintain its own compile-time notion of a local symbol table. Each value of name
is given its own hash to hold information about that symbol for the block (i.e., the symbol table acts like a hash of hashes indexed by symbol name). If the symbol
method is called with named arguments, then the method sets the entries in the hash corresponding to name
in the block's symbol table. If symbol
is called with just a single name
argument, then the current hash for local symbol name
is returned.scope
entry for a symbol is typically used to provide the scope
attribute for any nested PAST::Var
nodes that do not provide an explicit scope
attribute.compiler_name
instead of being treated as standard PAST nodes. This is useful when a program may contain components of code written in other HLLs. For example, the perl6
compiler uses this feature to use PGE to compile pre-parsed Perl 6 regular expressions, rather than trying to represent the semantics of those expressions in PAST itself.PAST::Block
node having a compiler
attribute, the compiler is invoked with name
and outer
arguments so that any generated subs can have names and lexical scoping appropriate to the block (it's the responsibility of the external compiler to support this).PAST::CompUnit
nodes are used for the abstract representation of compilation units. Specific attributes and semantics are yet to be determined.
Copyright (C) 2007, The Perl Foundation.
|