Parrot Compiler Tools

The previous chapters demonstrated low-level Parrot programming in PIR. That's fun, but Parrot's true power is to host programs written in high level languages such as Perl 6, Python, Ruby, Tcl, and PHP.

Parrot's language neutrality was a conscious design decision. Parrot and Perl 6 hewed closely in the early days; it would have been easy for the two to overlap and intermingle.

By keeping the two projects separate and encapsulated, the possibility arose to support many other dynamic languages equally well. This modular design also benefits designers of future languages. Instead of having to reimplement low-level features such as garbage collection and dynamic data types, language designers and compiler implementers can leave these details of infrastructure to Parrot and focus on the syntax, capabilities, and libraries of their high-level languages instead.

Parrot exposes a rich interface for these languages to use, offering several important features: a robust exceptions system, compilation into platform-independent bytecode, a clean extension and embedding interface, just-in-time compilation to machine code, native library interface mechanisms, garbage collection, support for objects and classes, and a robust concurrency model. Designing a new language or implementing a new compiler for an old language is easier with all of these features designed, implemented, tested, and supported in a VM already.

Language interoperability is a core goal for Parrot. Different languages are suited to different tasks; heated debates explode across the Internet about which language is right for which project. There's rarely a perfect fit. Developers often settle for one particular language if only because it offers the fewest disadvantages. Parrot changes this game by allowing developers to combine multiple languages seamlessly within a single project. Well-tested libraries written in one languages can interoperate with clean problem-domain expression in a second language, glued together by a third language which elegantly describes the entire system's architecture. You can use the strengths of multiple language and mitigate their weaknesses.

For language hosting and interoperability to work, languages developers need to write compilers that convert source code written in high level languages to bytecode. This process is analogous to how a compiler such as GCC converts C or C++ into machine code -- though instead of targeting machine code for a specific hardware platform, compilers written in Parrot produce Parrot code which can run on any hardware platform that can run Parrot.

Parrot includes a suite of compiler tools for every step of this conversion: lexical analysis, parsing, optimization, resource allocation, and code generation. Instead of using traditional low-level languages -- such as the C produced by lex and yacc -- to write compilers, Parrot can use any language hosted on Parrot in its compiler process. As a practical matter, the prevalent tool uses a subset of the Perl 6 programming language called Not Quite Perl (NQP) and an implementation of the Perl 6 Grammar Engine (PGE) to build compilers for Parrot.

PGE and NQP are part of the Parrot Compiler Tools. CHP-5 Chapter 5 discusses PGE and CHP-6 Chapter 6 explains NQP.

PCT Overview

The Parrot Compiler Tools (PCT) enable the creation of high-level language compilers and runtimes. Though the Perl 6 development team originally created these tools to produce Rakudo (Perl 6 on Parrot), several other Parrot-hosted compilers use them to great effect. Writing a compiler using Perl 6 syntax and dynamic language tools is much easier than writing a compiler in C, lex, and yacc.

PCT contains several classes that implement various parts of a compiler. HLL developers write language-specific subclasses to fill in the details their languages require. The PCT::HLLCompiler class specifies the compiler's interface and represents the object used to parse and execute code. The PCT::Grammar and PCT::Grammar::Actions classes represent the parser and syntax tree generators, respectively. Creating a new HLL compiler is as easy as subclassing these three entities with methods specific to your language.

Grammars and Action Files

A PCT-based compiler requires three basic files: the main entry point file, the grammar specification file, and the grammar actions file. In addition, compilers and the languages they implement often use large libaries of built-in routines to provide language-specific behaviors.

PCT's workflow is customizable, but simple. The compiler passes the source code of the HLL into the grammar engine. The grammer engine parses this code and returns a special Match object which represents a parsed version of the code. The compiler then passes this match object to the action methods, which convert it in stages into PAST. The compiler finally converts this PAST into PIR code, which it can save to a file, convert to bytecode, or execute directly.

mk_language_shell.pl

The only way creating a new language compiler could be easier is if these files created themselves. PCT includes a tool to do just that: mk_language_shell.pl. This program automatically creates a new directory in languages/ for your new language, the necessary three files, starter files for libraries, a Makefile to automate the build process, and a basic test harness to demonstrate that your language works as expects.

These generated files are all stubs which will require extensive editing to implement a full language, but they are a well-understood and working starting point. With this single command you can create a working compiler. It's up to you to fill the details.

mk_language_shell.pl prefers to run from within a working Parrot repository. It requires a single argument, the name of the new project to create. There are no hard-and-fast rules about names, but the Parrot developers reccomend that Parrot-based implementations of existing languages use unique names.

Consider the names of Perl 5 distributions: Active Perl and Strawberry Perl. Python implementations are IronPython (running on the CLR) and Jython (running on the JVM). The Ruby-on-Parrot compiler isn't just "Ruby": it's Cardinal. The Tcl compiler on Parrot is Partcl.

An entirely new language has no such constraints.

From the Parrot directory, invoke mk_language_shell.pl like:

  $ B<cd languages/>
  $ B<perl ../tools/build/mk_language_shell.pl <project name>>

Parsing Fundamentals

An important part of a compiler is the parser and lexical analyzer. The lexical analyzer converts the HLL input file into individual tokens. A token may consist of an individual punctuation ("+"), an identifier ("myVar"), a keyword ("while"), or any other artifact that stands on its own as a single unit. The parser attempts to match a stream of these input tokens against a given pattern, or grammar. The matching process orders the input tokens into an abstract syntax tree which the other portions of the compiler can process.

Parsers come in top-down and bottom-up varieties. Top-down parsers start with a top-level rule which represents the entire input. It attempts to match various combination of subrules until it has consumed the entire input. Bottom-down parsers start with individual tokens from the lexical analyzer and attempt to combine them together into larger and larger patterns until they produce a top-level token.

PGE is a top-down parser, although it also contains a bottom-up operator precedence parser to make processing token clusters such as mathematical expressions more efficient.

Driver Programs

The driver program for the new compiler must create instances of the various necessary classes that run the parser. It must also include the standard function libraries, create global variables, and handle commandline options. PCT provides several useful command-line options, but driver programs may need to override several behaviors.

PCT programs can run in two ways. An interactive mode runs one statement at a time in the console. A file mode loads and runs an entire file at once. A driver program may specificy information about the interactive prompt and environment, as well as help and error messages.

HLLCompiler class

The HLLCompiler class implements a compiler object. This object contains references to language-specific parser grammar and actions files, as well as the steps involved in the compilation process. The stub compiler created by mk_language_shell.pl might resemble:

  .sub 'onload' :anon :load :init
      load_bytecode 'PCT.pbc'
      $P0 = get_hll_global ['PCT'], 'HLLCompiler'
      $P1 = $P0.'new'()
      $P1.'language'('MyCompiler')
      $P1.'parsegrammar'('MyCompiler::Grammar')
      $P1.'parseactions'('MyCompiler::Grammar::Actions')
  .end

  .sub 'main' :main
      .param pmc args
      $P0 = compreg 'MyCompiler'
      $P1 = $P0.'command_line'(args)
  .end

The :onload function creates the driver object as an instance of HLLCompiler, sets the necessary options, and registers the compiler with Parrot. The :main function drives parsing and execution begin. It calls the compreg opcode to retrieve the registered compiler object for the language "MyCompiler" and invokes that compiler object using the options received from the commandline.

The compreg opcode hides some of Parrot's magic; you can use it multiple times in a program to compile and run different languages. You can create multiple instances of a compiler object for a single language (such as for runtime eval) or you can create compiler objects for multiple languages for easy interoperability. The Rakudo Perl 6 eval function uses this mechanism to allow runtime eval of code snippets in other languages:

  eval("puts 'Konnichiwa'", :lang<Ruby>);

HLLCompiler methods

The previous example showed the use of several HLLCompiler methods: language, parsegrammar, and parseactions. These three methods are the bare minimum interface any PCT-based compiler should provide. The language method takes a string argument that is the name of the compiler. The HLLCompiler object uses this name to register the compiler object with Parrot. The parsegrammar method creates a reference to the grammar file that you write with PGE. The parseactions method takes the class name of the NQP file used to create the AST-generator for the compiler.

If your compiler needs additional features, there are several other available methods: