parrotcode: Parrot Compiler Tools | |
Contents | Documentation |
docs/pdds/draft/pdd29_compiler_tools.pod - Parrot Compiler Tools
$Revision$
Will "Coke" Coleda Klaas-Jan Stol
In this document, when we speak of a compiler, we mean PCT-based compilers.
A High-Level Language. Examples are: Perl, Ruby, Python, Lua, Tcl, etc.
This PDD specifies the Parrot Compiler Tools (PCT).
Creating a PCT-based compiler can be done as follows:
.sub 'onload' :anon :load :init load_bytecode 'PCT.pbc' $P0 = get_hll_global ['PCT'], 'HLLCompiler' $P1 = $P0.'new'() $P1.'language'('Foo') $P1.'parsegrammar'('Foo::Grammar') $P1.'parseactions'('Foo::Grammar::Actions') .end .sub 'main' :main .param pmc args $P0 = compreg 'Foo' $P1 = $P0.'command_line'(args) .end
{{ this is the most important part; is this enough? }}
The Parrot distribution contains a Perl script to generate a compiler stub, containing all necessary files. This generated compiler will compile out of the box. It is highly suggested to use this script to get started with the PCT. The script is located in tools/dev/mk_language_shell.pl
.
{{ Not sure whether the mk_language_shell.pl script should be mentioned long term. In a sense, this script can also be considered part of the parrot compiler "tools", as it is used to create a compiler. }}
grammar Foo is PCT::Grammar; rule TOP { <statement>* {*} } rule statement { <ident> '=' <expression> {*} } rule expression is optable { ... } proto infix:<+> is precedence('1') is pirop('n_add') { ... } rule 'term:' is tighter(infix:<+>) is parsed(&term) { ... } rule term { | <ident> {*} #= ident | <number> {*} #= number }
{{ Is this a good idea? }}
class Foo::Grammar::Actions; method TOP($/) { my $past := PAST::Block.new( :blocktype('declaration'), :node($/) ); for $<statement> { $past.push( $( $_ ) ); } make $past; } method statement($/) { make PAST::Op.new( $( $<ident> ), $( $<expression> ), :pasttype('bind'), :node($/) ); } method expression($/, $key) { ... } method term($/, $key) { make $( $/{$key} ); }
Running the compiler is then done as follows:
$ parrot foo.pbc [--target=[parse|past|post|pir]] <file>
{{ other options? Maybe --target=pbc in the future, once PBC can be generated? }}
The Parrot Compiler Tools are specially designed to easily create a compiler targeting the Parrot Virtual Machine. The tools themselves run on Parrot, which implies that no other external programs are needed.
The PCT is a set of libraries and programs, to:
{{ Maybe just say it's used to create parrot-targeting compilers, not list these as =items }}
The PCT is made up of the following libraries and programs:
Although strictly speaking it is not part of the PCT, the Not Quite Perl (6) (NQP) language is typically used in all PCT-based compilers. NQP is a subset of the Perl 6 language, and is a high-level language as an alternative for PIR.
A PCT-based compiler has by default four compilation phases, or transformations. Phases can be removed and added through the API of the HLLCompiler
class. These are:
The source is read, parsed and stored in a parse tree.
The parse tree is converted into a Parrot Abstract Syntax Tree.
The PAST is converted into a Parrot Opcode Syntax Tree.
The POST is converted into executable Parrot Intermediate Representation.
The first stage of a PCT-based compiler is done by the parser
. The parser is defined as a set of Perl 6 Rules, which is processed by the Perl 6 Rules compiler. This results in a generated PIR file that implements the parser.
{{ Doesn't this make the Perl 6 rules compiler part of the PCT? }}
During the first stage, the source (input string) is parsed, resulting in a parse tree
.
The second stage converts the parse tree into a Parrot Abstract Syntax Tree (PAST). PAST is a data structure consisting of PAST nodes, each of which represents a common HLL construct. While all languages differ in syntax, many constructs in different HLLs map to the same semantics. This second transformation is executed during the parse stage. The transformations of the parse tree nodes into PAST nodes is done by so-called parse actions, which are methods of a class that is specified through the parseactions
attribute of the HLLCompiler. Such classes are implemented in NQP.
{{ How do we say that this is not obligatory; you could also use PIR, and in the future maybe other languages. }}
The third transformation converts the PAST into a Parrot Opcode Syntax Tree (POST). PAST nodes represent HLL constructs, which are transformed into a set of low-level POST nodes. A POST node is a low-level node, representing a single instruction, label, or a subroutine. While a PAST is very close to a HLL program, a POST is much closer to PIR code.
The last transformation generates PIR code from the POST.
The generated PIR is then fed into the Parrot executable, and processed into Parrot Byte Code (PBC) by the PIR compiler.
The Parrot Grammar Engine (PGE) is a component that executes regular expressions. Besides classic regular expressions, it also understands Perl 6 Rules. Such rules are special regular expressions to define a grammar.
The start symbol in a grammar is named TOP
; this is the top-level rule that is executed when the parser is invoked.
{{ insert stuff about using an operator prec. table here }}
The PCT includes a set of PAST classes. PAST classes represent common language constructs, such as a while statement
. These are described extensively in "pdds/pdd26_ast.pod" in docs.
POST::Node is the base class for all other POST classes.
The class PCT::Grammar
is a built-in grammar class that can be used as a parent class for a custom grammar. This class defines a number of rules and tokens that are inherited by child classes. Note that the concept of class
and grammar
are equivalent.
The following rules are predefined:
{{ is this necessary, or just a reference to the file? }}
All PCT-based compilers use a HLLCompiler object as a compiler driver. It acts as a facade for the compiler. This object invokes the different compiler phases.
{{ TODO: complete this }}
$S0
as a commandline prompt on the compiler in $P0
. The prompt is the text that is shown on the commandline before a command is entered when the compiler is started in interactive mode.
$S0
as a commandline banner on the compiler in $P0
. The banner is the first text that is shown when the compiler is started in interactive mode. This can be used for a copyright notice or other information.None.
docs/pdd26_ast.pod
|