|parrotcode: Readme file for PIRC compiler.|
|Contents | Compilers|
README.txt - Readme file for PIRC compiler.
PIRC is a fresh implementation of the PIR language using Bison and Flex. Its main features are:
The Makefile doesn't work perfectly; compilation is fine, but linking doesn't work, so that needs to be done manually.
cd compilers\pirc nmake pirc
link /out:pirc.exe /nodefaultlib main.obj pircompunit.obj pirlexer.obj pirparser.obj \ pirsymbol.obj pircompiler.obj pirmacro.obj hdocprep.obj \ kernel32.lib msvcrt.lib ..\..\libparrot.lib
When running PIRC, it needs the shared library
libparrot; an easy way to do this is copy
libparrot.dll in the Parrot root directory to
Running PIRC is as easy as:
See 'pirc -h' for help.
The Makefile should work fine on Linux:
cd compilers/pirc && make
When running PIRC, it needs the shared library
libparrot; in order to let PIRC find it, set the path as follows:
Running is as easy as:
The new Bison/Flex based implementation of the PIR compiler is designed as a two-stage compiler:
The heredoc preprocessor takes the input as written by the PIR programmer, and flattens out all heredoc strings. An example is shown below to illustrate this concept:
The following input:
.sub main $S0 = <<'EOS' This is a heredoc string divided over five lines. EOS .end
is transformed into:
.sub $S0 = "This is a heredoc string\n divided\n over\n five\n lines.\n" .end
In order to allow
.included file to have heredoc strings, the heredoc preprocessor also handles the
.include directive, even though logically this is a macro function. See the discussion below for how the
.include directive works.
The PIR compiler parses the output of the heredoc preprocessor. PIRC's lexer also handles macros.
The macro layer basically implements text replacements. The following directives are handled:
.include directive takes a string argument, which is the name of a file. The contents of this file are inserted at the point where the
.include directive is written. To illustrate this, consider the following example:
main.pir: ======================== .sub main print "hi\n" foo() .end .include "lib.pir" ======================== lib.pir: ======================== .sub foo print "foo\n" .end ========================
This will result in the following output:
.sub main print "hi\n" foo() .end .sub foo print "foo\n" .end
The macro directive starts a macro definition. The macro preprocessor implements the expansion of macros. For instance, given the following input:
.macro say(msg) print .msg print "\n" .endm .sub main .say("hi there!") .end
will result in this output:
.sub main print "hi there!" print "\n" .end
.macro_const directive is similar to the
.macro directive, except that a
.macro_const is just a simplified
.macro; it merely gives a name to some constant:
.macro_const PI 3.14 .sub main print "PI is approximately: " print .PI print "\n" .end
This will result in the output:
.sub main print "PI is approximately: " print 3.14 print "\n" .end
As Parrot instructions are polymorphic, the PIR compiler is responsible for selecting the right variant of the instruction. The selection is based on the types of the operands. For instance:
set $I0, 42
will select the
set_i_ic instruction: this is the
set instruction, taking an integer (i) result operand and an integer constant (ic) operand. Other examples are:
$P0 = 42 --> set_p_kic_ic # kic = key integer constant $I0 = $P0["hi"] --> set_i_p_kc # kc = key constant from constant table $P1 = new "Hash" --> new_p_sc # sc = string constant
Expressions that can be evaluated at compile-time are pre-evaluated, saving calculations during runtime. Some constant-folding is required, as Parrot depends on this. For instance:
add $I0, 1, 2
is not a valid Parrot instruction; there is no
add_i_ic_ic instruction. Instead, this will be translated to:
set $I0, 3
which, as was explained earlier, will select the
The conditional branch instructions are also pre-evaluated, if possible. For instance, consider the following statement:
if 1 < 2 goto L1
It is clear during compile time, that 1 is smaller than 2; so instead of evaluating this during runtime, we know for sure that the branch to label
L1 will be made, effectively replacing the above statement by:
Likewise, if it's clear that certain instructions don't have any effect, they can be removed altogether:
if 1 > 2 goto L1 --> nop # nop is no opcode. $I0 = $I0 + 0 --> nop
Another type of optimization is the selection of (slightly) more efficient variants of instructions. For instance, consider the following instruction:
$I0 = $I0 + $I1
which is actually syntactic sugar for:
add $I0, $I0, $I1
In C one would write (ignoring the fact that $I0 and $I0 are not a valid C identifiers):
$I0 += $I1
which is in fact valid PIR as well. When the PIR parser sees an instruction of this form, it will automatically select the variant with 2 operands instead of the 3-operand variant. So:
add $I0, $I0, $1 # $I0 is an out operand
will be optimized, as if you had written:
add $I0, $I1 # $I0 is an in/out operand
The PIR parser can do even more improvements, if it sees opportunity to do so. Consider the following statement:
$I0 = $I0 + 1
or, in Parrot assembly syntax:
add $I0, $I0, 1
Again, in C one would write (again ignoring the valid identifier issue):
$I0++, or in other words,
incrementing the given identifier. Parrot has
dec instructions built-in as well, so that the above statement
$I0 = $I0 + 1 can be optimized to:
The PIR compiler implements a vanilla register allocator. This means that each declared
.param symbol, and each PIR register ($Px, $Sx, $Ix, $Nx) is assigned a unique PASM register, that is associated with the original symbol or PIR register throughout the subroutine.
Any further optimizations on register usage can be implemented by writing a register allocator that takes this initial register allocation as input, and generating a more optimized register usage. Research and benchmarking is needed to decide whether this yields more efficient bytecode. In the end it is a choice between compile-time overhead (register allocation) or runtime memory overhead (more register space needed per sub).
The implementation of the vanilla register allocator is done in the PIR symbol management module (
The PIR parser is complete, but should be tested intensively. The back-end creates a data structure representing the input. Currently, only (almost working) PASM output is generated, but eventually a Parrot Byte Code (PBC) file should be generated. In order to do this, we need a proper API to generate the appropriate data structures (such as Parrot PackFile and friends).
The directory compilers/pirc has a number of subdirectories:
pir.l from which the lexer is generated is not processable by Cygwin's default version of Flex. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.
$ ./configure $ make
Then make sure to overwrite the supplied flex binary.
Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-)
Eventually, either IMCC needs to be fixed rigorously, or, rewritten altogether. PIRC is an attempt to do the latter. The following things need to be considered when replacing IMCC with PIRC:
PIR subs are stored as PMC constants in the constant table, but it is not clear how exactly this is to be done.
There must be a proper bytecode API for PIRC to use.
:immediateand related flags
Flags such as
:immediate must be implemented; a sub that is marked with the
:immediate flag must be run immediately after compilation.
At this moment, the following things are unclear to me; if anybody can answer these, that'd be helpful:
The following are some ideas for the near future:
languages/PIRfor a PGE based implementation.
compilers/pirc/src, a hand-written, recursive-descent PIR parser.
compilers/imcc, the current standard PIR implementation.
docs/imcc/syntax.podfor a description of PIR syntax.
docs/imcc/for more documentation about the PIR language.
docs/pdds/pdd19_pir.podfor the PIR design document.