parrotcode: The Parrot Primer | |
Contents | Documentation |
docs/intro.pod - The Parrot Primer
This document provides a gentle introduction to the Parrot virtual machine for anyone considering writing code for Parrot by hand, writing a compiler that targets Parrot, getting involved with Parrot development or simply wondering what on earth Parrot is.
Parrot is a virtual machine. To understand what a virtual machine is, consider what happens when you write a program in a language such as Perl, then run it with the applicable interpreter (in the case of Perl, the perl executable). First, the program you have written in a high level language is turned into simple instructions, for example fetch the value of the variable named x, add 2 to this value, store this value in the variable named y, etc. A single line of code in a high level language may be converted into tens of these simple instructions. This stage is called compilation.
The second stage involves executing these simple instructions. Some languages (for example, C) are often compiled to instructions that are understood by the CPU and as such can be executed by the hardware. Other languages, such as Perl, Python and Java, are usually compiled to CPU-independent instructions. A virtual machine (sometimes known as an interpreter) is required to execute those instructions.
While the central role of a virtual machine is to efficiently execute instructions, it also performs a number of other functions. One of these is to abstract away the details of the hardware and operating system that a program is running on. Once a program has been compiled to run on a virtual machine, it will run on any platform that the VM has been implemented on. VMs may also provide security by allowing more fine-grained limitations to be placed on a program, memory management functionality and support for high level language features (such as objects, data structures, types, subroutines, etc).
Parrot is designed with the needs of dynamically typed languages (such as Perl and Python) in mind, and should be able to run programs written in these languages more efficiently than VMs developed with static languages in mind (JVM, .NET). Parrot is also designed to provide interoperability between languages that compile to it. In theory, you will be able to write a class in Perl, subclass it in Python and then instantiate and use that subclass in a Tcl program.
Historically, Parrot started out as the runtime for Perl 6. Unlike Perl 5, the Perl 6 compiler and runtime (VM) are to be much more clearly separated. The name Parrot was chosen after the 2001 April Fool's Joke which had Perl and Python collaborating on the next version of their languages. The name reflects the intention to build a VM to run not just Perl 6, but also many other languages.
Parrot can currently accept instructions to execute in four forms. PIR (Parrot Intermediate Representation) is designed to be written by people and generated by compilers. It hides away some low-level details, such as the way parameters are passed to functions. PASM (Parrot Assembly) is a level below PIR - it is still human readable/writable and can be generated by a compiler, but the author has to take care of details such as calling conventions and register allocation. PAST (Parrot Abstract Syntax Tree) enables Parrot to accept an abstract syntax tree style input - useful for those writing compilers.
All of the above forms of input are automatically converted inside Parrot to PBC (Parrot Bytecode). This is much like machine code, but understood by the Parrot interpreter. It is not intended to be human-readable or human-writable, but unlike the other forms execution can start immediately, without the need for an assembly phase. Parrot bytecode is platform independent.
The Parrot instruction set includes arithmetic and logical operators, compare and branch/jump (for implementing loops, if...then constructs, etc), finding and storing global and lexical variables, working with classes and objects, calling subroutines and methods along with their parameters, I/O, threads and more.
The Parrot VM is register based. This means that, like a hardware CPU, it has a number of fast-access units of storage called registers. There are 4 types of register in Parrot: integers (I), numbers (N), strings (S) and PMCs (P). There are N of each of these, named I0,I1,..N0.., etc. Integer registers are the same size as a word on the machine Parrot is running on and number registers also map to a native floating point type. The amount of registers needed is determined per subroutine at compile-time.
PMC stands for Parrot Magic Cookie. PMCs represent any complex data structure or type, including aggregate data types (arrays, hash tables, etc). A PMC can implement its own behavior for arithmetic, logical and string operations performed on it, allowing for language-specific behavior to be introduced. PMCs can be built in to the Parrot executable or dynamically loaded when they are needed.
Parrot provides garbage collection, meaning that Parrot programs to do not need to free memory explicitly; it will be freed when it is no longer in use (that is, no longer referenced) whenever the garbage collector runs.
Periodically, numbered releases will appear on CPAN. At this stage of the project, an awful lot is changing between releases. You can get a copy of the latest Parrot from the SVN repository. This is done as follows:
svn co https://svn.perl.org/parrot/trunk parrot
You can find more instructions at: http://www.parrotcode.org/source.html
The first step to building Parrot is to run the Configure.pl program, which looks at your platform and decides how Parrot should be built. This is done by typing:
perl Configure.pl
Once this is complete, run the make
program (sometimes called nmake
or dmake
). This should complete, giving you a working Parrot executable.
Please report any problems that you encounter while building Parrot so the developers can fix them. You can do this by sending a message to bugs-parrot@bugs6.perl.org
containing a description of your problem. Please include the myconfig file that was generated as part of the build process and any errors that you observed.
Parrot has an extensive regression test suite. This can be run by typing:
make test
Substituting make for the name of the make program on your platform. The output will look something like this:
C:\Perl\bin\perl.exe t\harness --gc-debug --running-make-test
t\library\*.t t\op\*.t t\pmc\*.t t\run\*.t t\native_pbc\*.t
imcc\t\*\*.t t\dynpmc\*.t t\p6rules\*.t t\src\*.t t\perl\*.t
t\library\dumper...............ok
t\library\getopt_long..........ok
...
All tests successful, 4 test and 71 subtests skipped.
Files=163, Tests=2719, 192 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU)
It is possible that a number of tests may fail. If this is a small number, then it is probably little to worry about, especially if you have the latest Parrot sources from the SVN repository. However, please do not let this discourage you from reporting test failures, using the same method as described for reporting build problems.
Create a file called hello.pir that contains the following code.
.sub _main
print "Hello world!\n"
end
.end
Then run it by typing:
parrot hello.pir
As expected, this will display the text Hello world!
on the console, followed by a new line (due to the \n
).
Let's take the program apart. .sub _main
states that the instructions that follow make up a subroutine named _main
, until a .end
is encountered. The second line contains the print
instruction. In this case, we are calling the variant of the instruction that accepts a constant string. The assembler takes care of deciding which variant of the instruction to use for us. The third line contains the end
instruction, which causes the interpreter to terminate.
We can modify hello.pir to first store the string Hello world!\n
in a register and then use that register with the print instruction.
.sub _main
set S0, "Hello world!\n"
print S0
end
.end
Here we have stated exactly which register to use. However, by replacing S0
with $S0
we can delegate the choice of which register to use to Parrot. It is also possible to use an =
notation instead of writing the set
instruction.
.sub _main
$S0 = "Hello world!\n"
print $S0
end
.end
To make PIR even more readable, named registers can be used. These are later mapped to real numbered registers.
.sub _main
.local string hello
hello = "Hello world!\n"
print hello
end
.end
The .local
directive indicates that the named register is only needed inside the current compilation unit (that is, between .sub
and .end
). Following .local
is a type. This can be int
(for I registers), float
(for N registers), string
(for S registers), pmc
(for P registers) or the name of a PMC type.
PIR can be turned into PASM by running:
parrot -o hello.pasm hello.pir
The PASM for the final example looks like this:
_main:
set S30, "Hello world!\n"
print S30
end
PASM does not handle register allocation or provide support for named registers. It also does not have the .sub
and .end
directives, instead replacing them with a label at the start of the instructions.
This example introduces some more instructions and PIR syntax. Lines starting with a #
are comments.
.sub _main
# State the number of squares to sum.
.local int maxnum
maxnum = 10
# Some named registers we'll use. Note how we can declare many
# registers of the same type on one line.
.local int i, total, temp
total = 0
# Loop to do the sum.
i = 1
loop:
temp = i * i
total += temp
inc i
if i <= maxnum goto loop
# Output result.
print "The sum of the first "
print maxnum
print " squares is "
print total
print ".\n"
end
.end
PIR provides a bit of syntactic sugar that makes it look more high level than assembly. For example:
temp = i * i
Is just another way of writing the more assembly-ish:
mul temp, i, i
And:
if i <= maxnum goto loop
Is the same as:
le i, maxnum, loop
And:
total += temp
Is the same as:
add total, temp
As a rule, whenever a Parrot instruction modifies the contents of a register, that will be the first register when writing the instruction in assembly form.
As is usual in assembly languages, loops and selection are implemented in terms of conditional branch statements and labels, as shown above. Assembly programming is one place where using goto is not bad form!
In this example we define a factorial function and recursively call it to compute factorial.
.sub _fact
# Get input parameter.
.param int n
# return (n > 1 ? n * _fact(n - 1) : 1)
.local int result
if n > 1 goto recurse
result = 1
goto return
recurse:
$I0 = n - 1
result = _fact($I0)
result *= n
return:
.return (result)
.end
.sub _main @MAIN
.local int f, i
# We'll do factorial 0 to 1.
i = 0
loop:
f = _fact(i)
print "Factorial of "
print i
print " is "
print f
print ".\n"
inc i
if i <= 10 goto loop
# That's it.
end
.end
Let's look at the _fact
sub first. A point that was glossed over earlier is why the names of subroutines all start with an underscore. This is done simply as a way of showing that the label is global rather than scoped to a particular subroutine. This is significant as the label is then visible to other subroutines.
The first line, .param int n
, specifies that this subroutine takes one integer parameter and that we'd like to refer to the register it was passed in by the name n
for the rest of the sub.
Much of what follows has been seen in previous examples, apart from the line reading:
result = _fact($I0)
This single line of PIR actually represents quite a few lines of PASM. First, the value in register $I0
is moved into the appropriate register for it to be received as an integer parameter by the _fact
function. Other calling related registers are then set up, followed by _fact
being invoked. Then, once _fact
returns, the value returned by _fact
is placed into the register given the name result
.
Right before the .end
of the _fact
sub, a .return
directive is used to ensure the value held in the register named result
is placed into the correct register for it to be seen as a return value by the code calling the sub.
The call to _fact
in main works in just the same was as the recursive call to _fact
within the sub _fact
itself. The only remaining bit of new syntax is the :main
, written after .sub _main
. By default, PIR assumes that execution begins with the first sub in the file. This behavior can be changed by marking the sub to start in with :main
.
To compile PIR to bytecode, use the -o
flag and specify an output file with the extension .pbc.
parrot -o factorial.pbc factorial.pir
What documentation you read next depends upon what you are looking to do with Parrot. The opcodes reference and built-in PMCs reference are useful to dip into for pretty much everyone. If you intend to write or compile to PIR then there are a number of documents about PIR that are worth a read. For compiler writers, the Compiler FAQ is essential reading. If you want to get involved with Parrot development, the PDDs (Parrot Design Documents) contain some details of the internals of Parrot; a few other documents fill in the gaps. One way of helping Parrot development is to write tests, and there is a document entitled Testing Parrot that will help with this.
Much Parrot development and discussion takes place on the perl6-internals mailing list. You can subscribe by sending an email to perl6-internals-subscribe@perl.org
or read the perl6-internals NNTP archive at http://www.nntp.perl.org/group/perl.perl6.internals.
The Parrot IRC channel is hosted on irc.perl.org and is named #parrot
. Alternative IRC servers are at irc.pobox.com and irc.rhizomatic.net.
|