parrotcode: Readme file for pirc/new compiler, a fresh implementation of | |
Contents | Compilers |
README.txt - Readme file for pirc/new compiler, a fresh implementation of the PIR language using Bison and Flex.
kjs
pirc/new is a fresh implementation of the PIR language. Maintaining the current default implementation (IMCC) is a bit of a pain, and it contains a lot of "XXX" and "TODO" and other kludge alerts. Eventually, this should be fixed.
PIRC is not finished yet. A lot of work is needed on the back-end before it can generate Parrot Byte Code files (PBC).
Note that pirc/new refers to a Lex/Yacc based implementation, while 'pirc' refers to the hand-written recursive-descent implementation, to be found in pirc/src directory.
The current set-up is a three-phase compiler:
The heredoc pre-processor takes the input, and converts all heredoc strings into normal strings. So, the following:
.sub main
foo(<<'HI', <<'BYE')
hi there!
HI
bye for now!
BYE
.end
is converted into:
.sub main
foo(" hi there!\n", " bye for now!\n\n")
.end
Currently there is a small issue with the 2nd and later heredoc arguments; they seem to get one newline character too many.
The heredoc pre-processor needs to know about POD comments, because the POD comment may contain a heredoc string, which should not be processed, as it is a comment. For that purpose, all comments (POD and line comments) are stripped in this phase.
The Heredoc pre-processor is located in compilers/pirc/heredoc.
The macro pre-processor takes the output of the heredoc pre-processor, and handles all macro definitions and expansions. The .include
directive is handled here too. The output of the macro pre-processor is (in case of uses of the .include
directive) one long big file with "pure" PIR code.
The macro pre-processor is located in compilers/pirc/macro.
The third pass is done by the PIR parser, which takes the "pure" PIR code from the macro pre-processor. Currently, it's only a parser, but a future extension could be to generate PASM code from the PIR input. This way, it's easy to see what ops are actually executed when running the PIR file.
The PIR parser is located in compilers/pirc/new.
The new implementation also has some unique features with respect to IMCC:
In pirc/new (a new name is yet to be defined) it is allowed to use multiple heredocs as function arguments, like so:
...
foo(<<'HI', <<'BYE')
...
HI
...
BYE
As the heredoc pre-processor handles the input before the macro pre-processor, it is now possible to expand macros specifying heredoc arguments, like so:
.macro foo(a)
print .a
.end
.sub main
.foo(<<'HI')
Hello world!
HI
.end
The generated lexer and parser are fully re-entrant. (It does need to be tested, though).
The code is provided with comments, so you can actually understand what it does.
Although IMCC does define the option '-E', it is not really working correctly. pirc has two pre-processing options: 1) running the heredoc parser only, 2) running both the heredoc and macro processors. The output of option 2 is the code that will be given to the PIR compiler.
This is a nice opportunity to clean up the grammar of the PIR language. Hacking on IMCC's grammar is possible, but not for the faint of heart.
Currently the different compilers/pre-processors are located in different directories. The different pre-processors are invoked from the main driver in pirc.c. The latter assumes all three processors are compiled, as the following executables:
heredoc pre-processor: hdocprep
macro pre-processor: macroparser
Running a file through the whole PIR compiler is then done as follows:
$ ./pirc test.pir
When you want to run the heredoc pre-processor only, do this:
$ ./pirc -H test.pir
When you want to pre-process the file only (heredoc + macro parsing), do this:
$ ./pirc -E test.pir
The file pir.l
from which the lexer is generated is not processable by Cygwin's default version of Flex. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.
http://sourceforge.net/project/downloading.php?groupname=flex&filename=flex-2.5.33.tar.gz&use_mirror=belnet
Just do:
$ ./configure
$ make
Then make sure to overwrite the supplied flex binary.
Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-)
Eventually, either IMCC needs to be fixed rigorously, or, rewritten altogether. PIRC is an attempt to do the latter. The following things need to be considered when replacing IMCC with PIRC:
PIRC needs a function to decide whether an identifier is an instruction. IMCC uses a function is_op that does this. For this to work, libparrot must be linked in, and I'm having trouble doing this.
IMCC has a register allocator, but I doubt whether it can be re-used by PIRC. The whole back-end of IMCC probably needs to be redesigned.
There must be a proper bytecode API for PIRC to use.
See also:
languages/PIR
for a PGE based implementation.compilers/pirc
, a hand-written, recursive-descent PIR parser.compilers/imcc
, the current standard PIR implementation.docs/imcc/syntax.pod
for a description of PIR syntax.docs/imcc/
for more documentation about the PIR language.
|