parrotcode: Frequently Asked Questions | |
Contents | IMCC |
IMCC and Parrot Programming for Compiler Developers - Frequently Asked Questions
Wrong FAQ, start with the Parrot FAQ first. Then come back here because this is where the fun is.
The Parrot FAQ : http://www.parrotcode.org/faq/
IMC stands for Intermediate Code; IMCC stands for Intermediate Code Compiler. You will also see the term PIR which is for Parrot Intermediate Representation and means the same thing as IMC.
It is an intermediate language that compiles either directly to Parrot Byte code, or translates to Parrot Assembly language. It is the preferred target language for compilers for the Parrot Virtual Machine. PIR is halfway between a High Level Language (HLL) and Parrot Assembly (PASM).
IMCC was a toy compiler written by Melvin Smith as a little 2-week experiment for another toy language, Cola. It was not originally a part of Parrot, and understandably wasn't designed for public consumption. Parrot's early alpha versions (0.0.6 and earlier) included only the raw Parrot assembler that compiled Parrot Assembly language. This was considered the reference assembler. The Cola compiler, on the other hand, targeted its own little back end compiler that included a register allocator, basic block tracking and medium level expression parsing. The backend compiler was eventually named IMCC and benefited from contributions from Angel Faus, Leo Toetsch, Steve Fink and Sean O'Rourke. The first version of Perl6 written by Sean used IMCC as its backend and that's how it currently exists.
Leopold Toetsch added, among many other things, the ability for IMCC to compile PASM by proxying any instructions that were not valid IMCC through to be assembled as PASM. This was a great improvement. As Parrot's calling convention changed to a continuation style (PCC), and generally became more complex, the PASM instructions required to call or declare subroutines became just as complex. IMCC abstracted some of the convention and eventually the core team stopped using the old reference assembler altogether. Leo integrated IMCC into Parrot and now IMCC is the front-end for the Parrot VM.
Static languages, such as Java, can run on VMs that are dedicated to execution of pre-compiled byte code with no problems. Languages such as Perl, Ruby and Python are not so static. They have support for runtime evaluation and compilation and their parsers are always available. These languages run on their own "dynamic" interpreters.
Since Parrot is specialized to be a dynamic VM, it must be able to compile code on the fly. For this reason, IMCC is written in C and integrated into the VM. IMCC is fast since it does very little type checking, and since most of Parrot's ops are polymorphic, IMCC punts most of the type checking and method dispatch to runtime. This allows extremely fast compile times, which is what scripters need.
PASM is an assembly language, raw and low-level. PASM does exactly what you say, and each PASM instruction represents a single VM opcode. Assembly language can be tough to debug, simply due to the amount of instructions that a high-level compiler generates for a given construct. Assembly language typically has no concept of basic blocks, namespaces, variable tracking, etc. You must track your register usage and take care of saving/restoring values in cases where you run out of registers. This is called spilling.
IMC is medium level and a bit more friendly to write or debug. IMCC also has a builtin register allocator and spiller. IMC has the concept of a "subroutine" unit, complete with local variables and high-level sub call syntax. IMCC also allows unlimited symbolic registers. It will take care of assigning the appropriate register to your variables and will usually find the most efficient mapping so as to use as few registers as possible for a given piece of code. If you use more registers than are currently available, IMCC will generate instructions to save/restore (spill) the registers for you. This is a significant piece of every compiler.
While it is possible to write more efficient code by hand directly in PASM, it is rare. IMC is still very close to PASM as far as granularity. It is also common for IMCC to generate instructions that use less registers than handwritten PASM. This is good for cache performance.
Several reasons. IMC is so much easier to read, understand and debug. When passing snippets back and forth on the Parrot internals list, IMC is preferred since the code is much shorter than the equivalent PASM. In some cases it is necessary to debug the PASM code as bugs in IMCC are found.
Hand writing and debugging of code aside, most IMC code will be mostly compiler generated. In this respect, the most important technical reason to use IMC is the amount of abstraction it provides. IMC now completely hides the Parrot calling conventions and allows different call conventions to be selected via .pragma without changes to the high-level code emitter. This allows Parrot to change somewhat without impacting existing compilers. The workload is balanced between the IMCC team and the compiler authors. The term "modular" springs to mind.
Since development on the old assembler has stopped, IMCC will be the best way to compile bytecode classes complete with metadata and externally linkable symbols. It will still be possible to construct classes on the fly with PASM, but IMC's higher level directives allow it to do compile time construction of certain things and pack them into the bytecode in a way that does not have an equivalent set of Parrot instructions. The PASM assembler may or may not ever catch up with these features.
Not yet. IMCC is currently tightly integrated to the Parrot bytecode format. One goal is to rework IMCC's modularity to make it easy to run separately, but this is not a top priority since IMCC currently only targets Parrot. Eventually IMCC will contain a config option to build without linking the Parrot VM, but IMCC must be able to do lookups of opcodes so it will require some sort of static opcode metadata.
The basic block of execution of an IMC program is the subroutine. Subs can be simple, with no arguments or returns. Line comments are allowed in IMC using #.
# Hello world
.sub _main
print "Hello world.\n"
.end
Parrot uses the filename extension to detect whether the file is an PIR file (.pir), a Parrot Assembly file (.pasm) or a pre-compiled bytecode file (.pbc).
parrot hello.pir
Use the -o option for Parrot. You can provide an output filename, or the - character which indicates standard output. If the filename has a .pbc extension, IMCC will compile the module and assemble it to bytecode.
Examples:
parrot -o hello.pasm hello.pir
parrot -o hello.pbc hello.pir
parrot -o - hello.pir
No, and it shouldn't. IMC is an intermediate language for compiling high level languages. Interpolation (print "$count items") is a high level concept and the specifics are unique to each language. Perl6 already does interpolation without special support from IMCC.
IMC has 2 classes of variables, symbolic registers and named variables. Both are mapped to real registers, but there are a few minor differences. Named variables must be declared. They may be global or local, and may be qualified by a namespace. Symbolic registers, on the other hand, do not need declaration, but their scope never extends outside of a subroutine unit. Symbolic registers basically give compiler front ends an easy way to generate code from their parse trees or abstract syntax tree (AST). To generate expressions compilers have to create temporaries.
Symbolic registers have a $ sign for the first character, have a single letter representing the register type [S(tring), N(umber), I(nteger) or P(MC)] for the second character, and one or more digits for the rest. Although Parrot has only 32 real registers of each type, IMCC can handle an arbitrarily large number of symbolic registers of a given type.
Example:
$S1 = "hiya"
$S2 = $S1 . "mel"
$I1 = 1 + 2
$I2 = $I1 * 3
This example uses symbolic STRING and INTVAL registers as temporaries. This is the typical sort of code that compilers generate from the syntax tree.
Named variables are either local, global or namespace qualified. Currently IMCC only supports locals transparently. However, globals are supported with explicit syntax. The way to declare locals in a subroutine is with the .local directive. The .local directive also requires a type (int, num, string or a classname such as PerlArray).
Example:
.sub _main
.local int i
.local num n
i = 7
n = 5.003
.end
You can't yet. IMCC still lacks a few features and this is one of those features. You can explicitly create global variables at runtime, however, but currently it only works for PMC types, like so:
.sub _main
.local Integer i
.local Integer j
i = new Integer
i = 123
# Create the global
global "i" = i
# Refer to the global
j = global "i"
.end
Happy Hacking.
|