Parrot Intermediate Representation
Parrot Intermediate Representation (PIR) is Parrot's native low-level language.Parrot has a pure native assembly language called PASM, described in Chapter 9. PIR is fundamentally an assembly language, but it has some higher-level features such as operator syntax, syntactic sugar for subroutine and method calls, automatic register allocation, and more friendly conditional syntax. PIR is commonly used to write Parrot libraries -- including some of PCT's compilers -- and is the target form when compiling high-level languages to Parrot.
Even so, PIR is more rigid and "close to the machine" then some higher-level languages like C. Files containing PIR code use the .pir extension.
Basics
PIR has a relatively simple syntax. Every line is a comment, a label, a statement, or a directive. Each statement or directive stands on its own line. There is no end-of-line symbol (such as a semicolon in other languages).
Comments
A comment begins with the #
symbol,
and continues until the end of the line.
Comments can stand alone on a line or follow a statement or directive.
# This is a regular comment. The PIR # interpreter ignores this.
PIR also treats inline documentation in Pod format as a comment. An equals sign as the first character of a line marks the start of a Pod block. A =cut
marker signals the end of a Pod block.
=head2 This is Pod documentation, and is treated like a comment. The PIR interpreter ignores this. =cut
Labels
A label attaches to a line of code so other statements can refer to it. Labels can contain letters, numbers, and underscores. By convention, labels use all capital letters to stand out from the rest of the source code. A label can be precede a line of code, though outdenting labels on separate lines improves readability:
GREET: say "'Allo, 'allo, 'allo."
Labels are vital to control flow.
Statements
A statement is either an opcode or syntactic sugar for one or more opcodes. An opcode is a native instruction for the virtual machine; it consists of the name of the instruction followed by zero or more arguments.
say "Norwegian Blue"
PIR also provides higher-level constructs, including symbol operators:
$I1 = 2 + 5
Under the hood, these special statement forms are just syntactic sugar for regular opcodes. The +
symbol corresponds to the add
opcode, the -
symbol to the sub
opcode, and so on. The previous example is equivalent to:
add $I1, 2, 5
Directives
Directives begin with a period (.
); Parrot's parser handles them specially. Some directives specify actions that occur at compile time. Other directives represent complex operations that require the generation of multiple instructions. The .local
directive declares a typed register.
.local string hello
PIR also has a macro facility to create user-defined directives.
Literals
Integers and floating point numbers are numeric literals. They can be positive or negative.
$I0 = 42 # positive $I1 = -1 # negative
Integer literals can also be binary, octal, or hexadecimal:
$I3 = 0b01010 # binary $I3 = 0o78 # octal $I2 = 0xA5 # hexadecimal
Floating point number literals have a decimal point, and can use scientific notation:
$N0 = 3.14 $N2 = -1.2e+4
String literals are enclosed in single or double-quotes.Strings explains the differences between the quoting types.
$S0 = "This is a valid literal string" $S1 = 'This is also a valid literal string'
Variables
PIR variables can store four different kinds of values—integers, numbers (floating point), strings, and objects. The simplest way to work with these values is through register variables. Register variables always start with a dollar sign ($
) and a single character which specifies the type of the register: integer (I
), number (N
), string (S
), or PMC (P
). Registers have numbers as well; the "first" string register is $S0
.Register numbers may or may not correspond to the register used internally; Parrot's compiler remaps registers as appropriate.
$S0 = "Who's a pretty boy, then?" say $S0
PIR also has named variables, which are declared with the .local
directive. As with registers, there are four valid types for named variables: int
, num
, string
, and pmc
.Again, for "PolyMorphic Container". After declaring a named variable, you can use the name anywhere you would use a register:
.local string hello set hello, "'Allo, 'allo, 'allo." say hello
Integer (I
) and Number (N
) registers use platform-dependent sizes and limitations.There are a few exceptions to this; Parrot smooths out some of the bumps and inconsistencies so that PIR code behaves the same way on all supported platforms. Internally, Parrot treats both I and N registers as signed quantities internally for the purposes of arithmetic. Parrot's floating point values and operations all comply with the IEEE 754 standard.
Strings (S) are buffers of variable-sized data. The most common use of S registers and variables is to store textual data. S registers may also be buffers for binary or other non-textual data, though this is rare.In general, a custom PMC is mroe useful. Parrot strings are flexible and powerful, to account for all the complexity of human-readable (and computer-representable) textual data.
The final data type is the PMC. PMC resemble classes and objects are in object-oriented languages. They are the basis for complex data structures and object-oriented behavior in Parrot.
Strings
Strings in double-quotes accept all sorts of escape sequences using backslashes. Strings in single-quotes only allow escapes for nested quotes:
$S0 = "This string is \n on two lines" $S0 = 'This is a \n one-line string with a slash in it'
Parrot supports several escape sequences in double-quoted strings:
\xhh 1..2 hex digits \ooo 1..3 oct digits \cX Control char X \x{h..h} 1..8 hex digits \uhhhh 4 hex digits \Uhhhhhhhh 8 hex digits \a An ASCII alarm character \b An ASCII backspace character \t A tab \n A newline \v A vertical tab \f \r \e \\ A backslash \" A quote
If you need more flexibility in defining a string, use a . The <<
operator starts a heredoc. The string terminator immediately follows. All text until the terminator is part of the heredoc. The terminator must appear on its own line, must appear at the beginning of the line, and may not have any trailing whitespace.
$S2 = << "End_Token" This is a multi-line string literal. Notice that it doesn't use quotation marks. End_Token
Strings: Encodings and Charsets
Strings are complicated; string declarations aren't the whole story. In olden times, strings only needed to support the ASCII character set (or charset), a mapping of 128 bit patterns to symbols and English-language characters. This worked as long as everyone using a computer read and wrote English and used a small handful of punctuation symbols.
In other words, it was woefully insufficient for international uses, polyglots, and more.
A modern string system must manage several character encodings and charsets in order to make sense out of all the string data in the world. Parrot does this. Every string has an associated encoding and an associated character set. The default charset is 8-bit ASCII, which is simple to use and is almost universally supported.
Double-quoted string constants can have an optional prefix specifying the the string's encoding and charset.As you might suspect, single-quoted strings do not support this. Parrot will maintain these values internally, and will automatically convert strings when necessary to preserve the information. String prefixes are specified as encoding:charset:
at the front of the string. Here are some examples:
$S0 = utf8:unicode:"Hello UTF8 Unicode World!" $S1 = utf16:unicode:"Hello UTF16 Unicode World!" $S2 = ascii:"This is 8-bit ASCII" $S3 = binary:"This is raw, unformatted binary data"
The binary:
charset treats the string as a buffer of raw unformatted binary data. It isn't really a string per se, because binary data contains no readable characters. As mentioned earlier, this exists to support libraries which manipulate binary data that doesn't easily fit into any other primitive data type.
When Parrot combines two strings (such as through concatenation), they must both use the same character set and encoding. Parrot will automatically upgrade one or both of the strings to use the next highest compatible format as necessary. ASCII strings will automatically upgrade to UTF-8 strings if needed, and UTF-8 will upgrade to UTF-16. All of these conversions happen inside Parrot; you the programmer don't need to worry about the details.
Named Variables
The declaration section earlier alludes to this.
Calling a value "$S0" isn't very descriptive, and usually it's a lot nicer to be able to refer to values using a helpful name. For this reason Parrot allows registers to be given temporary variable names to use instead. These named variables can be used anywhere a register would be used normally....because they actually are registers, but with fancier names. They're declared with the .local
statement which requires a variable type and a name:
.local string hello set hello, "Hello, Polly." say hello
This snippet defines a string variable named hello
, assigns it the value "Hello, Polly.", and then prints the value. Under the hood these named variables are just normal registers of course, so any operation that a register can be used for a named variable can be used for as well.
The valid types are int
, num
, string
, and pmc
It should come as no surprise that these are the same as Parrot's four built-in register types. Named variables are valid from the point of their definition to the end of the current subroutine.
The name of a variable must be a valid PIR identifier. It can contain letters, digits and underscores but the first character has to be a letter or an underscore. There is no limit to the length of an identifier, other than good taste.
As mentioned earlier, Parrot's internal compiler may remap named registers and symbolic registers to actual registers as necessary. This happens transparently, and for the most part you never need to know about it. There's one exception: when you know something outside of Parrot must refer to a specific register exactly.For example, when an NCI call takes a pointer to a register and returns a value through the pointer. Use the :unique_reg
modifier on a variable declaration to prevent potential register allocation changes:
.local pmc MyUniquePMC :unique_reg
This attribute :unique_reg
will not affect the behavior of Parrot otherwise.
PMC variables
PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object into a type before you use it. The new
instruction creates a new PMC of the specified type:
$P0 = new 'String' $P0 = "Hello, Polly." say $P0
This example creates a String
object, stores it in the PMC register $P0
, assigns it the value "Hello, Polly.", and prints it. The type provided to the .local
directive is either the generic pmc
or a type compatible with the type passed to new
:
.local String hello # or .local pmc hello hello = new 'String' hello = "Hello, Polly." say hello
PIR is a dynamic language; that dynamicism is evident in how Parrot handles PMC values. Primitive registers like strings, numbers, and integers perform a special action called when assigned to a PMC. Autoboxing is the process of converting a primitive type to a PMC object. PMC classes exist for String, Number, and Integer; notice that the primitive types are in lower-case, while the PMC classes are capitalized. If you want to box a value explicitly, use the box
opcode:
$P0 = new 'Integer' # The boxed form of int $P0 = box 42 $P1 = new 'Number' # The boxed form of num $P1 = box 3.14 $P2 = new 'String' # The boxed form of string $P2 = "This is a string!"
The PMC classes Integer
, Number
, and String
are thin overlays on the primitive types they represent. These PMC types have the benefit of the VTABLE interface. VTABLEs are a standard API that all PMCs conform to for performing standard operations. These PMC types support custom methods to perform various operations, may be passed to subroutines that expect PMC arguments, and can be subclassed by a user-defined type.
Named Constants
The .const
directive declares a named constant. It resembles .local
; it requires a type and a name. It also requires the assignment of a constant value. As with named variables, named constants are visibl only within the compilation unit where they're declared. This example declares a named string constant hello
and prints the value:
.const string hello = "Hello, Polly." say hello
Named constants may be used in all the same places as literal constants, but have to be declared beforehand:
.const int the_answer = 42 # integer constant .const string mouse = "Mouse" # string constant .const num pi = 3.14159 # floating point constant
In addition to normal local constants, you can also specify a global constant which is accessible from everywhere in the current code file:
.globalconst int days = 365
Currently there is no way to specify a PMC constant in PIR source code.
Why declare constants?
Symbol Operators
An earlier section described this already too.
PIR has many other symbolic operators: arithmetic, concatenation, comparison, bitwise, and logical. All PIR operators are translated into one or more Parrot opcodes internally, but the details of this translation stay safely hidden from the programmer. Consider this example snippet:
.local int sum sum = $I42 + 5 say sum
The statement sum = $I42 + 5
translates to the equivalent statement add sum, $I42, 5
. PIR symbolic operations are:
$I0 = $I1 + 5 # Addition $N0 = $N1 - 7 # Subtraction $I3 = 4 * 6 # Multiplication $N7 = 3.14 / $N2 # Division $S0 = $S1 . $S2 # String concatenation
PIR also provides automatic assignment operators such as +=
, -=
, and >>=
. These operators help programmers to perform common manipulations on a data value in place, and save a few keystrokes while doing them.
A complete list of PIR operators is available in Chapter 13.
=
and Type Conversion
The =
operator is very powerful. Its simplest form stores a value into one of the Parrot registers. It can assign a string value to a string
register, an integer value to an int
register, a floating point value into a number
register, etc. However, the =
operator can assign any type of value into any type of register; Parrot will handle the conversion for you automatically:
$I0 = 5 # Integer. 5 $S0 = $I0 # Stringify. "5" $N0 = $S0 # Numify. 5.0 $I0 = $N0 # Intify. 5
Notice that conversions between the numeric formats and strings only makes sense when the value to convert is a number.
$S0 = "parrot" $I0 = $S0 # 0
An earlier example showed a string literal assigned to a PMC register of type String
. This works for all the primitive types and their autoboxed PMC equivalents:
$P0 = new 'Integer' $P0 = 5 $S0 = $P0 # Stringify. "5" $N0 = $P0 # Numify. 5.0 $I0 = $P0 # Unbox. $I0 = 5 $P1 = new 'String' $P1 = "5 birds" $S1 = $P1 # Unbox. $S1 = "5 birds" $I1 = $P1 # Intify. 5 $N1 = $P1 # Numify. 5.0 $P2 = new 'Number' $P2 = 3.14 $S2 = $P2 # Stringify. "3.14" $I2 = $P2 # Intify. 3 $N2 = $P2 # Unbox. $N2 = 3.14
Compilation Units
Subroutines in PIR are roughly equivalent to the subroutines or methods of a high-level language. All code in a PIR source file must appear within a subroutine. The simplest syntax for a PIR subroutine starts with the .sub
directive and ends with the .end
directive:
.sub 'main' say "Hello, Polly." .end
This example defines a subroutine named main
that prints a string "Hello, Polly."
. Parrot will normally execute the first subroutine it encounters in the first file it runs, but you can flag any subroutine as the first one to execute with the :main
marker:
.sub 'first' say "Polly want a cracker?" .end .sub 'second' :main say "Hello, Polly." .end
This code prints out "Hello, Polly." but not "Polly want a cracker?". Though the first
function appears first in the source code, second
has the :main
flag and gets called. first
is never called. Revising that program produces different results:
.sub 'first' :main say "Polly want a cracker?" .end .sub 'second' say "Hello, Polly." .end
The output now is "Polly want a cracker?". Execution in PIR starts at the :main
function and continues until that function ends. To perform other operations, you must call other functions explicitly. Chapter 4 describes subroutines and their uses.
Flow Control
Flow control in PIR occurs entirely with conditional and unconditional branches to labels. This may seem simplistic and primitive, but here PIR shows its roots as a thin overlay on the assembly language of a virtual processor. PIR does not support high-level looping structures such as while
or for
loops. PIR has some support for basic if
branching constructs, but does not support more complicated if
/then
/else
branch structures.
The control structures of high-level languages hew tightly to the semantics of those languages; Parrot provides the minimal feature set necessary to implement any semantic of an HLL without dictating how that HLL may implement its features. Language agnosticism is an important design goal in Parrot, and creates a very flexible and powerful development environment for language developers.
The most basic branching instruction is the unconditional branch, goto
:
.sub 'main' goto L1 say "never printed" L1: say "after branch" end .end
The first say
statement never runs because the goto
always skips over it to the label L1
.
The conditional branches combine if
or unless
with goto
.
.sub 'main' $I0 = 42 if $I0 goto L1 say "never printed" L1: say "after branch" end .end
In this example, the goto
branches to the label L1
only if the value stored in $I0
is true. The unless
statement is similar, but it branches when the tested value is false. You can use PMC and STRING registers with if
and unless
. The op will call the get_bool
vtable entry on any PMC so used and will convert the STRING to a boolean value. An undefined value, 0, or an empty string are all false values. All other values are true.
The comparison operators (<
, <=
, ==
, !=
, >
, >=
) can combine with if ... goto
. These branch when the comparison is true:
.sub 'main' $I0 = 42 $I1 = 43 if $I0 < $I1 goto L1 say "never printed" L1: say "after branch" end .end
This example compares $I0
to $I1
and branches to the label L1
if $I0
is less than $I1
. The if $I0 < $I1 goto L1
statement translates directly to the lt
branch operation.
Chapter 11's "PIR Instructions" section summarizes the other comparison operators.
PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:
.sub 'main' $I0 = 1 # product $I1 = 5 # counter REDO: # start of loop $I0 = $I0 * $I1 dec $I1 if $I1 > 0 goto REDO # end of loop say $I0 end .end
This example calculates the factorial 5!
. Each time through the loop it multiplies $I0
by the current value of the counter $I1
, decrements the counter, and branches to the start of the loop. The loop ends when $I1
counts down to 0. This is a do while-style loop with the condition test at the end so that the code always runs at least once through the loop.
For a while-style loop with the condition test at the start, use a conditional branch with an unconditional branch:
.sub 'main' $I0 = 1 # product $I1 = 5 # counter REDO: # start of loop if $I1 <= 0 goto LAST $I0 = $I0 * $I1 dec $I1 goto REDO LAST: # end of loop say $I0 end .end
This example tests the counter $I1
at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1
reaches 0 and the if
branches to the LAST
label. If the counter isn't a positive number before the loop, the loop will never execute.
You can build any high-level flow control construct from conditional and unconditional branches; the lowest level of computer hardware works this way. All modern programming languages use branching constructs to implement their most complex flow control devices.
That doesn't make complex code easier to write in PIR. Fortunately, a series of macros exist to simplify flow control.
Macros
Needs supplementing; needs moving.
Subroutines
The most basic building block of code reuse in PIR is the subroutine. A large program may perform a calculation like "the factorial of a number" several times. Subroutines abstract this behavior into a single, named, stand-alone unit. PIR is a subroutine-based language in that all code in PIR must exist in a subroutine. Execution starts in the :main
subroutine, which itself can call other subroutines. Subroutines can fit together into more elaborate chunks of code reusability such as methods and objects.
Parrot supports multiple high-level languages. Each language uses a different syntax for defining and calling subroutines. The goal of PIR is not to be a high-level language in itself, but to provide the basic tools that other languages can use to implement them. PIR's subroutine syntax may seem very primitive for this reason.
Parrot Calling Conventions
The way that Parrot calls a subroutine -- passing arguments, altering control flow, and returning results -- is the "Parrot Calling Conventions" (PCC). Parrot generally hides the details of PCC from the programmer. PIR has several constructs to gloss over these details, and the average programmer will not need to worry about them. PCC uses the Continuation Passing Style (CPS) to pass control to subroutines and back again. Again, the details are irrelevant for most uses, but the power is available to anyone who wants to take advantage of it.
Subroutine Calls
PIR's simplest subroutine call syntax looks much like a subroutine call from a high-level language. This example calls the subroutine fact
with two arguments and assigns the result to $I0
:
$I0 = 'fact'(count, product)
This simple statement hides much complexity. It generates a subroutine PMC object, creates a continuation PMC object to represent the control flow up to this point, passes the arguments, looks up the subroutine by name (and by signature, if necessary)), calls the subroutine, and assigns the results of the call to the appropriate integer register. This is all in addition to the computation the subroutine itself performs.
Expanded Subroutine Syntax
The single line subroutine call is incredibly convenient, but it isn't always flexible enough. PIR also has a more verbose call syntax that is still more convenient than manual calls. This example looks up the subroutine fact
out in the global symbol table and calls it:
find_global $P1, "fact" .begin_call .arg count .arg product .call $P1 .result $I0 .end_call
The whole chunk of code from .begin_call
to .end_call
acts as a single unit. The .arg
directive sets up and passes arguments to the call. The .call
directive calls the subroutine and identifies the point at which to return control flow after the subroutine has completed. The .result
directive retrieves returned values from the call.
Subroutine Declarations
In addition to syntax for subroutine calls, PIR provides syntax for subroutine definitions: the .sub
and .end
directives shown in earlier examples. The .param
directive defines input parameters and creates local named variables for them (similar to .local
):
.param int c
The .return
directive allows the subroutine to return control flow to the calling subroutine, and optionally returns result output values.
Here's a complete code example that implements the factorial algorithm. The subroutine fact
is a separate subroutine, assembled and processed after the main
function. Parrot resolves global symbols like the fact
label between different units.
# factorial.pir .sub 'main' :main .local int count .local int product count = 5 product = 1 $I0 = 'fact'(count, product) say $I0 end .end .sub 'fact' .param int c .param int p loop: if c <= 1 goto fin p = c * p dec c branch loop fin: .return (p) .end
This example defines two local named variables, count
and product
, and assigns them the values 1 and 5. It calls the fact
subroutine with both variables as arguments. The fact
subroutine uses .param
to retrieve these parameters and .return
to return the result. The final printed result is 120.
As usual, execution of the program starts at the :main
subroutine.
Named Arguments
We have to get our terms straight here. Which are "arguments" (passed in) and which are "parameters" (processed from within)?
Parameters passed only by their order are positional arguments. The only differentiator between positional arguments is their positions in the function call. Putting positional arguments in a different order will produce different effects, or may cause errors. Parrot also supports named parameters. Instead of passing parameters by their position in the string, parameters are passed by name and can be in any order. Here's an example:
.sub 'MySub' .param string yrs :named("age") .param string call :named("name") $S0 = "Hello " . call $S1 = "You are " . yrs $S1 .= " years old" say $S0 say $S1 .end .sub 'main' :main 'MySub'("age" => 42, "name" => "Bob") .end
Named arguments are convenient because the order of the pairs does not matter. You can also pass these pairs in the opposite order:
.sub 'main' :main 'MySub'("name" => "Bob", "age" => 42) # Same! .end
Optional Arguments
Some functions have arguments with appropriate default values so that callers don't always have to pass them. Parrot provides a mechanism to identify optional arguments as well as flag values to determine if the caller has passed optional arguments.
Optional parameters appear in PIR as if they're actually two parameters: the value and a flag:
.param string name :optional .param int has_name :opt_flag
The :optional
flag specifies that the given parameter is optional. The :opt_flag
specifies an integer which parameter contains a boolean flag; this flag is true if the value was passed, and false otherwise. To provide a default value for an optional parameter, you can write:
.param string name :optional .param int has_name :opt_flag if has_name goto we_have_a_name name = "Default value" we_have_a_name:
Optional parameters can be positional or named parameters. Optional positional parameters must appear at the end of the list of positional parameters. Also, the :opt_flag
parameter must always appear directly after the :optional
parameter.
.sub 'Foo' .param int optvalue :optional .param int hasvalue :opt_flag .param pmc notoptional # WRONG! ... .sub 'Bar' .param int hasvalue :opt_flag .param int optvalue :optional # WRONG! ... .sub 'Baz' .param int optvalue :optional .param pmc notoptional .param int hasvalue :opt_flag # WRONG! ...
You may mix optional parameters with named parameters:
.sub 'MySub' .param int value :named("answer") :optional .param int has_value :opt_flag ...
You can call this function in two ways:
'MySub'("answer" => 42) # with a value 'MySub'() # without
Sub PMCs
Subroutines are a PMC type in Parrot. You can store them in PMC registers and manipulate them just as you do the other PMC types. Look up a subroutine in the current namespace with the get_global
opcode:
$P0 = get_global "MySubName"
To find a subroutine in a different namespace, first look up the appropriate the namespace PMC, then use that with get_global
:
$P0 = get_namespace "MyNameSpace" $P1 = get_global $P0, "MySubName"
You can obviously invoke a Sub PMC:
$P0(1, 2, 3)
You can get or even change its name:
$S0 = $P0 # Get the current name $P0 = "MyNewSubName" # Set a new name
You can get a hash of the complete metadata for the subroutine:
$P1 = inspect $P0
The metadata fields in this hash are
- pos_required
- pos_optional
- named_required
- named_optional
- pos_slurpy
- named_slurpy
The number of required positional parameters to the Sub
The number of optional positional parameters to the Sub
The number of required named parameters to the Sub
The number of optional named parameters to the Sub
Returns true if the sub has a slurpy parameter to eat up extra positional args
Returns true if the sub has a slurpy parameter to eat up extra named args
Instead of getting the whole inspection hash, you ask about individual pieces of metadata:
$I0 = inspect $P0, "pos_required"
To discover to get the total number of defined parameters to the Sub, call the arity
method:
$I0 = $P0.'arity'()
To fetch the namespace PMC that the Sub was defined into, call the get_namespace
method:
$P1 = $P0.'get_namespace'()
The Commandline
Programs written in Parrot have access to arguments passed on the command line like any other program would:
.sub 'MyMain' :main .param pmc all_args :slurpy ... .end
Please verify and expand.
The all_args
PMC is a ResizableStringArray PMC, which means you can loop over the results, access them individually, or even modify them.
Continuation Passing Style
Continuations are snapshots of control flow: frozen images of the current execution state of the VM. Once you have a continuation, you can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program.
Continuations are not a new concept; they've boggled the minds of new Lisp and Scheme programmers for many years. Despite their power and heritage, they're not visible to most other modern programming languages or their runtimes. Parrot aims to change that: it performs almost all control flow through the use of continuations. PIR and PCT hide most of this complexity from developers, but the full power of continuations is available.
When Parrot invokes a function, it creates a continuation representing the current point in the program. It passes this continuation as an invisible parameter to the function call. When that function returns, it invokes the continuation -- in effect, it performs a goto to the point of creation of that continuation. If you have a continuation, you can invoke it to return to its point of creation any time you want.
This type of flow control -- invoking continuations instead of performing bare jumps -- is Continuation Passing Style (CPS). CPS allows parrot to offer all sorts of neat features, such as tail-call optimizations and lexical subroutines.
Tailcalls
A subroutine may set up and call another subroutine, then return the result of the second call directly. This is a tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:
call add_two(5) subroutine add_two(value) value = add_one(value) return add_one(value)
In this example, the subroutine add_two
makes two calls to c<add_one>. The second call to add_one
is the return value. add_one
gets called; its result gets returned to the caller of add_two
. Nothing in add_two
uses that return value directly.
A simple optimization is available for this type of code. The second call to add_one
will return to the same place that add_two
returns; it's perfectly safe and correct to use the same return continuation that add_two
uses. The two subroutine calls can share a return continuation, instead of having to create a new continuation for each call.
PIR provides the .tailcall
directive to identify similar situations. Use it in place of the .return
directive. .tailcall
performs this optimization by reusing the return continuation of the parent function to make the tailcall:
.sub 'main' :main .local int value value = add_two(5) say value .end .sub 'add_two' .param int value .local int val2 val2 = add_one(value) .tailcall add_one(val2) .end .sub 'add_one' .param int a .local int b b = a + 1 .return (b) .end
This example above print the correct value "7".
Creating and Using Continuations
While Parrot's use of continuations and CPS is invisible to most code, you can create and use them explicitly if you like. Continuations are like any other PMC. Create one with the new
opcode:
$P0 = new 'Continuation'
The new continuation starts off in an undefined state. If you attempt to invoke a new continuation without initializing it, Parrot will raise an exception. To prepare the continuation for use, assign it a destination label with the set_addr
opcode:
$P0 = new 'Continuation' set_addr $P0, my_label my_label: ...
To jump to the continuation's stored label and return the context to the state it was in at the point of its creation, invoke the continuation:
invoke $P0 # Explicit using "invoke" opcode $P0() # Same, but nicer syntax
Even though you can use the subroutine notation $P0()
to invoke the continuation, it doesn't make any sense to pass arguments or obtain return values:
$P0 = new 'Continuation' set_addr $P0, my_label $P0(1, 2) # WRONG! $P1 = $P0() # WRONG!
Lexical Subroutines
Parrot offers support for lexical subroutines. You can define a subroutine by name inside a larger subroutine, where the inner subroutine is only visible and callable from the outer. The inner subroutine inherits all the lexical variables from the outer subroutine, but can itself define its own lexical variables that the outer subroutine cannot access. PIR lacks the concept of blocks or nested lexical scopes; this is how it performs the same function.
If a subroutine is lexical, find its :outer
Sub with the get_outer
method:
$P1 = $P0.'get_outer'()
If there is no :outer
PMC, this returns a NULL PMC. Conversely, you can set the outer sub:
$P0.'set_outer'($P1)
Scope and HLLs
As mentioned previously, High Level Languages such as Perl, Python, and Ruby allow nested scopes, or blocks within blocks that can have their own lexical variables. Even this construct is common in the C programming language:
{ int x = 0; int y = 1; { int z = 2; /* x, y, and z are all visible here */ } /* only x and y are visible here */ }
In the inner block, all three varaibles are visible. The variable z
is only visible inside that block. The outer block has no knowledge of z
. A very direct, naiumlve translation of this code to PIR might be:
.param int x .param int y .param int z x = 0 y = 1 z = 2 ...
This PIR code is similar, but the handling of the variable z
is different: z
is visible throughout the entire current subroutine, where it is not visible throughout the entire C function. To help approximate this effect, PIR supplies lexical subroutines to create nested lexical scopes.
PIR Scoping
Only one PIR structure supports scoping like this: the subroutine.... and objects that inherit from subroutines, such as methods, coroutines, and multisubs There are no blocks in PIR that have their own scope besides subroutines. Fortunately, we can use these lexical subroutines to simulate this behavior that HLLs require:
.sub 'MyOuter' .local int x,y .lex 'x', x .lex 'y', y 'MyInner'() # only x and y are visible here .end .sub 'MyInner' :outer('MyOuter') .local int z .lex 'z', z # x, y, and z are all "visible" here .end
This paragraph is unclear.
This example calls the variables in MyInner
"visible". This is because lexically-defined variables need to be accessed with the get_lex
and set_lex
opcodes. These two opcodes don't just access the value of a register, where the value is stored while it's being used, but they also make sure to interact with the LexPad
PMC that's storing the data. If the value isn't properly stored in the LexPad, then they won't be available in nested inner subroutines, or available from :outer
subroutines either.
Lexical Variables
What's the point of this paragraph?
As we have seen above, we can declare a new subroutine to be a nested inner subroutine of an existing outer subroutine using the :outer
flag. The outer flag is used to specify the name of the outer subroutine. Where there may be multiple subroutines with the same name, we can use the :subid
flag on the outer subroutine to give it a different--and unique--name that the lexical subroutines can reference in their :outer
declarations. Within lexical subroutines, the .lex
command defines a local variable that follows these scoping rules.
LexPad and LexInfo PMCs
Subs store information about lexical variables in two different PMCs: the LexPad PMC and the LexInfo PMC. They're not visible to PIR code; Parrot uses them internally to store information about lexical variables.
LexInfo PMCs store read-only information about the lexical variables used in a Sub. Parrot creates them when it compiles a Sub. Not all subroutines get a LexInfo PMC by default; only those that need it. One way to identify such a Sub is its use of the .lex
directive, but this only works for languages which know the names of lexical variables at compile time. If that's not true of your language, declare the Sub with the :lex
attribute.
LexPad PMCs store run-time information about lexical variables. This includes their current values and their type information. Parrot creates LexPad PMCs for Subs that already have a LexInfo PMC. As you can imagine, Parrot must create a new LexPad for each invocation of a Sub, lest a recursive call overwrite previous lexical values.
Call the get_lexinfo
method on a Subroutine PMC to access its associated LexInfo PMC:
$P0 = find_global "MySubroutine" $P1 = $P0.'get_lexinfo'()
Once you have the LexInfo PMC, you can inspect the lexicals it represents:
$I0 = elements $P1 # the number of lexical variables it holds $P0 = $P1["name"] # the entry for lexical variable "name"
There's not much else you can do (though the PMC behaves like a Hash PMC, so you can iterate over its keys and values).
There is no easy way to get a reference to the current LexPad PMC in a given subroutine, but they're not useful from PIR anyway. Remember that subroutines themselves can be lexical; the lexical environment of a given variable can extend to multiple subroutines with their own LexPads. The opcodes find_lex
and store_lex
search through nested LexPads recursively to find the proper environment for the given variables.
Compilation Units Revisited
A subroutine is a section of code that forms a single unit. The factorial calculation example had two separate subroutines: the main
subroutine and the fact
subroutine. Here is that algorithm in a single subroutine:
.sub 'main' $I1 = 5 # counter bsr fact say $I0 $I1 = 6 # counter bsr fact say $I0 end fact: $I0 = 1 # product L1: $I0 = $I0 * $I1 dec $I1 if $I1 > 0 goto L1 ret .end
The unit of code from the fact
label definition to ret
is a reusable routine, but is only usable from within the main
subroutine. There are several problems with this simple approach. In terms of the interface, the caller has to know to pass the argument to fact
in $I1
and to get the result from $I0
. This differs from Parrot's well-understood calling conventions.
Another disadvantage of this approach is that main
and fact
share the same subroutine. Parrot processes them as one piece of code. They share registers, and they'd share any LexInfo and LexPad PMCs, if any were needed by main
. The fact
routine is also not easily usable from outside the main
subroutine, so other parts of your code won't have access to it.
NameSpaces, Methods, and VTABLES
PIR provides syntax to simplify writing methods and method calls for object-oriented programming. PIR allows you to define your own classes, and with those classes you can define method interfaces to them. Method calls follow the Parrot's calling conventions, including the various parameter configurations, lexical scoping, and other aspects already shown.
Parrot supports several built-in classes, such as ResizablePMCArray
and Integer
, written in C and compiled with the rest of Parrot. You may also declare your own classes in PIR. Like other object oriented systems, Parrot classes provide their own namespaces and support methods and attributes.
NameSpaces
NameSpaces provide a categorization mechanism to avoid name collisions. This is most useful when producing encapsulated libraries or when building large systems. Each namespace provides a separate location for function names and global variables.
Without a namespace (or in a program that eschews namespaces), all subroutines and global variables would live in one big bag, running the risk of namespace collisions thanks to namespace pollution. You couldn't tell which subroutine performed which operation when two task contexts use the same word to mean two different things.
NameSpaces are very effective at hiding private information as well as gathering similar things together.
For example, the Math
namespace could store subroutines that manipulate numbers. The Images
namespace could store subroutines create and manipulate images. If your program must add two numbers together and perform additive image composition, you can use the appropriate namespaced add
functions without conflict or confusion. Within the Image
namespace you could have also have sub namespaces for jpeg
and MRI
and schematics
; each of these could have its own add
subroutine without getting into each other's way.
Declare a namespace in PIR with the .namespace []
directive. The brackets are not optional, but the keys inside them are. For example:
.namespace [ ] # The root namespace .namespace [ "Foo" ] # The namespace "Foo" .namespace [ "Foo" ; "Bar" ] # NameSpace Foo::Bar .namespace # WRONG! The [] are needed
You may nest namespaces to arbitrary depth by separating name components with semicolons. NameSpaces are PMC, so you can access them and manipulate them just like other data objects. Get the PMC for the root namespace using the get_root_namespace
opcode:
$P0 = get_root_namespace
The current namespace may be different from the root namespace; retrieved it with the get_namespace
opcode:
$P0 = get_namespace # get current namespace PMC $P0 = get_namespace ["Foo"] # get PMC for namespace "Foo"
Parrot arranges its namespaces in a hiarachy. The root namespace is the root of the tree. Beneath the root are HLL namespaces; hLL compiler gets its own HLL namespace where it can store its data during compilation and runtime. Each HLL namespace may itself be the root of a tree of namespaces.
NameSpace PMC
There are multiple ways to address a namespace in PIR, depending on the starting point of the lookup. They may start at the root namespace:
# Get namespace "a/b/c" starting at the root namespace $P0 = get_root_namespace ["a" ; "b" ; "c"]
... or from the current HLL's namespace as a root:
# Get namespace "a/b/c" starting in the current HLL namespace. $P0 = get_hll_namespace ["a" ; "b" ; "c"]
... but this is identical to a root namespace lookup with the HLL as the first branch:
$P0 = get_root_namespace ["hll" ; "a" ; "b" ; "c"]
... and, of course, relative to the current namespace without a root:
# Get namespace "a/b/c" starting in the current namespace $P0 = get_namespace ["a" ; "b" ; "c"]
Given a namespace PMC, retrieve global variables and subroutine PMCs with the get_global
opcode:
$P1 = get_global $S0 # Get global in current namespace $P1 = get_global ["Foo"], $S0 # Get global in namespace "Foo" $P1 = get_global $P0, $S0 # Get global in $P0 namespace PMC
Operations on the NameSpace PMC
You can perform other operations on the NameSpace PMC. You can find methods and variables that are stored in the namespace or add new ones.
For example, to add one namespace to another current namespace, use the add_namespace
method:
$P0 = get_namespace $P0.'add_namespace'($P1)
You can also find a namespace nested in a namespace with the find_namespace
method. Note that this finds only a namespace, where the find_global
opcode will find any PMC stored in that namespace under the given name:
$P0 = get_namespace $P1 = $P0.'find_namespace'("MyOtherNameSpace")
You may also wish to create a namespace if it doesn't exist and find it otherwise. That's the purpose of the make_namespace
method:
$P1 = $P0.'make_namespace'("MyNameSpace")
To manipulate Sub PMCs in a namespace, use the add_sub
and find_sub
methods. As with find_namespace
, find_sub
returns only a Sub PMC and never any other kind of global symbol:
$P0.'add_sub'("MySub", $P2) $P1 = $P0.'find_sub'("MySub")
Similarly, the add_var
and find_var
methods work on PMCs of any type:
$P0.'add_var'("MyVar", $P3) # Add variable "MyVar" in $P3 $P1 = $P0.'find_var'("MyVar") # Find it
You can get the name of a namespace with the get_name
method; this returns a ResizableStringArray of STRINGs:
$P3 = $P0.'get_name'()
Request a namespace's parent namespace with the get_parent
method:
$P5 = $P0.'get_parent'()
Find a class associated with a namespace with the get_class
method:
$P6 = $P0.'get_class'()
Calling Methods
Namespaces enable plenty of interesting behaviors, such as object oriented programming and method calls. Methods resemble subroutines with one big change: they require an invocant (an object PMC passed as the self
parameter).
The basic syntax for a method call resembles a subroutine call. Previous examples have demonstrated it already. A PIR method call takes a variable for the invocant PMC and a string with the name of the method:
object."methodname"(arguments)
If you forget the quotes around the method's name, PIR will treat the method name as a named variable which contains the method's name:
.local string methname = "Foo" object.methname() # Same as object."Foo"() object."Foo"() # Same
The invocant can be a variable or register, and the method name can be a literal string, string variable, or method object PMC.
Defining Methods
Define a method like any other subroutine, respecting two changes. First, a method must be a member of a namespace (the namespace representing the class to which the method belongs). Second, they require the :method
flag.
.namespace [ "MyClass" ] .sub 'MyMethod' :method ... .end
Inside the method, access the invocant object through the self
parameter. self
isn't the only name you can call this value, however. You can also use the :invocant
flag to define a new name for the invocant object:
(See TT #483)
.sub 'MyMethod' :method $S0 = self # Already defined as "self" say $S0 .end .sub 'MyMethod2' :method .param pmc item :invocant # "self" is now "item" $S0 = item say $S0 .end
This example defines two methods in the Foo
class. It calls one from the main body of the subroutine and the other from within the first method:
.sub 'main' .local pmc class .local pmc obj newclass class, "Foo" # create a new Foo class new obj, "Foo" # instantiate a Foo object obj."meth"() # call obj."meth" which is actually say "done" # in the "Foo" namespace end .end .namespace [ "Foo" ] # start namespace "Foo" .sub 'meth' :method # define Foo::meth global say "in meth" $S0 = "other_meth" # method names can be in a register too self.$S0() # self is the invocant .end .sub 'other_meth' :method # define another method say "in other_meth" # as earlier, Parrot provides a return .end # statement
Each method call looks up the method name in the object's class namespace. The .sub
directive automatically makes a symbol table entry for the subroutine in the current namespace.
You can pass multiple arguments to a method and retrieve multiple return values just like a single line subroutine call:
(res1, res2) = obj."method"(arg1, arg2)
VTABLEs
PMCs all implement a common interface of functions called VTABLEs. Every PMC implements the same set of these interfaces, which perform very specific low-level tasks on the PMC. The term VTABLE was originally a shortened form of the name "virtual function table", although that name isn't used any more by the developers, or in any of the documentation.In fact, if you say "virtual function table" to one of the developers, they probably won't know what you are talking about. The virtual functions in the VTABLE, called VTABLE interfaces, are similar to ordinary functions and methods in many respects. VTABLE interfaces are occasionally called "VTABLE functions", or "VTABLE methods" or even "VTABLE entries" in casual conversation. A quick comparison shows that VTABLE interfaces are not really subroutines or methods in the way that those terms have been used throughout the rest of Parrot. Like methods on an object, VTABLE interfaces are defined for a specific class of PMC, and can be invoked on any member of that class. Likewise, in a VTABLE interface declaration, the self
keyword is used to describe the object that it is invoked upon. That's where the similarities end, however. Unlike ordinary subroutines or methods, VTABLE methods cannot be invoked directly, they are also not inherited through class hierarchies like how methods are. With all this terminology discussion out of the way, we can start talking about what VTABLES are and how they are used in Parrot.
VTABLE interfaces are the primary way that data in the PMC is accessed and modified. VTABLES also provide a way to invoke the PMC if it's a subroutine or subroutine-like PMC. VTABLE interfaces are not called directly from PIR code, but are instead called internally by Parrot to implement specific opcodes and behaviors. For instance, the invoke
opcode calls the invoke
VTABLE interface of the subroutine PMC, while the inc
opcode on a PMC calls the increment
VTABLE interface on that PMC. What VTABLE interface overrides do, in essence, is to allow the programmer to change the very way that Parrot accesses PMC data in the most fundamental way, and changes the very way that the opcodes act on that data.
PMCs, as we will look at more closely in later chapters, are typically implemented using PMC Script, a layer of syntax and macros over ordinary C code. A PMC compiler program converts the PMC files into C code for compilation as part of the ordinary build process. However, VTABLE interfaces can be written and overwritten in PIR using the :vtable
flag on a subroutine declaration. This technique is used most commonly when subclassing an existing PMC class in PIR code to create a new data type with custom access methods.
VTABLE interfaces are declared with the :vtable
flag:
.sub 'set_integer' :vtable #set the integer value of the PMC here .end
in which case the subroutine must have the same name as the VTABLE interface it is intended to implement. VTABLE interfaces all have very specific names, and you can't override one with just any arbitrary name. However, if you would like to name the function something different but still use it as a VTABLE interface, you could add an additional name parameter to the flag:
.sub 'MySetInteger' :vtable('set_integer') #set the integer value of the PMC here .end
VTABLE interfaces are often given the :method
flag also, so that they can be used directly in PIR code as methods, in addition to being used by Parrot as VTABLE interfaces. This means we can have the following:
.namespace [ "MyClass" ] .sub 'ToString' :vtable('get_string') :method $S0 = "hello!" .return($S0) .end .namespace [ "OtherClass" ] .local pmc myclass = new "MyClass" say myclass # say converts to string internally $S0 = myclass # Convert to a string, store in $S0 $S0 = myclass.'ToString'() # The same
Inside a VTABLE interface definition, the self
local variable contains the PMC on which the VTABLE interface is invoked, just like in a method declaration.
Roles
As we've seen above and in the previous chapter, Class PMCs and NameSpace PMCs work to keep classes and methods together in a logical way. There is another factor to add to this mix: The Role PMC.
Roles are like classes, but don't stand on their own. They represent collections of methods and VTABLES that can be added into an existing class. Adding a role to a class is called composing that role, and any class that has been composed with a role does
that role.
Roles are created as PMCs and can be manipulated through opcodes and methods like other PMCs:
$P0 = new 'Role' $P1 = get_global "MyRoleSub" $P0.'add_method'("MyRoleSub", $P1)
Once we've created a role and added methods to it, we can add that role to a class, or even to another role:
$P1 = new 'Role' $P2 = new 'Class' $P1.'add_role'($P0) $P2.'add_role'($P0) add_role $P2, $P0 # Same!
Now that we have added the role, we can check whether we implement it:
$I0 = does $P2, $P0 # Yes
We can get a list of roles from our Class PMC:
$P3 = $P2.'roles'()
Roles are very useful for ensuring that related classes all implement a common interface.
Coroutines
We've mentioned coroutines several times before, and we're finally going to explain what they are. Coroutines are similar to subroutines except that they have an internal notion of state.And the cool new name!. Coroutines, in addition to performing a normal .return
to return control flow back to the caller and destroy the lexical environment of the subroutine, may also perform a .yield
operation. .yield
returns a value to the caller like .return
can, but it does not destroy the lexical state of the coroutine. The next time the coroutine is called, it continues execution from the point of the last .yield
, not at the beginning of the coroutine.
In a Coroutine, when we continue from a .yield
, the entire lexical environment is the same as it was when .yield
was called. This means that the parameter values don't change, even if we call the coroutine with different arguments later.
Defining Coroutines
Coroutines are defined like any ordinary subroutine. They do not require any special flag or any special syntax to mark them as being a coroutine. However, what sets them apart is the use of the .yield
directive. .yield
plays several roles:
- Identifies coroutines
- Creates a continuation
- Returns a value
When Parrot sees a yield, it knows to create a Coroutine PMC object instead of a Sub PMC.
Continuations, as we have already seen, allow us to continue execution at the point of the continuation later. It's like a snapshot of the current execution environment. .yield
creates a continuation in the coroutine and stores the continuation object in the coroutine object or later resuming from the point of the .yield
.
.yield
can return a valueor many values, or no values to the caller. It is basically the same as a .return
in this regard.
Here is a quick example of a simple coroutine:
.sub 'MyCoro' .yield(1) .yield(2) .yield(3) .return(4) .end .sub 'main' :main $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 .end
This is obviously a contrived example, but it demonstrates how the coroutine stores it's state. The coroutine stores it's state when we reach a .yield
directive, and when the coroutine is called again it picks up where it last left off. Coroutines also handle parameters in a way that might not be intuitive. Here's an example of this:
.sub 'StoredConstant' .param int x .yield(x) .yield(x) .yield(x) .end .sub 'main' :main $I0 = StoredConstant(5) # $I0 = 5 $I0 = StoredConstant(6) # $I0 = 5 $I0 = StoredConstant(7) # $I0 = 5 $I0 = StoredConstant(8) # $I0 = 8 .end
Notice how even though we are calling the StoredConstant
coroutine with different arguments each time, the value of parameter x
doesn't change until the coroutine's state resets after the last .yield
. Remember that a continuation takes a snapshot of the current state, and the .yield
directive takes a continuation. The next time we call the coroutine, it invokes the continuation internally, and returns us to the exact same place in the exact same condition as we were when we called the .yield
. In order to reset the coroutine and enable it to take a new parameter, we must either execute a .return
directive or reach the end of the coroutine.
Multiple Dispatch
Multiple dispatch is when there are multiple subroutines in a single namespace with the same name. These functions must differ, however, in their parameter list, or "signature". All subs with the same name get put into a single PMC called a MultiSub. The MultiSub is like a list of subroutines. When the multisub is invoked, the MultiSub PMC object searches through the list of subroutines and searches for the one with the closest matching signature. The best match is the sub that gets invoked.
Defining MultiSubs
MultiSubs are subroutines with the :multi
flag applied to them. MultiSubs (also called "Multis") must all differ from one another in the number and/or type of arguments passed to the function. Having two multisubs with the same function signature could result in a parsing error, or the later function could overwrite the former one in the multi.
Multisubs are defined like this:
.sub 'MyMulti' :multi # does whatever a MyMulti does .end
Multis belong to a specific namespace. Functions in different namespaces with the same name do not conflict with each other. It's only when multiple functions in a single namespace need to have the same name that a multi is used.
Multisubs take a special designator called a multi signature. The multi signature tells Parrot what particular combination of input parameters the multi accepts. Each multi will have a different signature, and Parrot will be able to dispatch to each one depending on the arguments passed. The multi signature is specified in the :multi
directive:
.sub 'Add' :multi(I, I) .param int x .param int y .return(x + y) .end .sub 'Add' :multi(N, N) .param num x .param num y .return(x + y) .end .sub 'Start' :main $I0 = Add(1, 2) # 3 $N0 = Add(3.14, 2.0) # 5.14 $S0 = Add("a", "b") # ERROR! No (S, S) variant! .end
Multis can take I, N, S, and P types, but they can also use _
(underscore) to denote a wildcard, and a string that can be the name of a particular PMC type:
.sub 'Add' :multi(I, I) # Two integers ... .sub 'Add' :multi(I, 'Float') # An integer and Float PMC ... # Two Integer PMCs .sub 'Add' :multi('Integer', _) ...
When we call a multi PMC, Parrot will try to take the most specific best-match variant, and will fall back to more general variants if a perfect best-match cannot be found. So if we call 'Add'(1, 2)
, Parrot will dispatch to the (I, I)
variant. If we call 'Add'(1, "hi")
, Parrot will match the (I, _)
variant, since the string in the second argument doesn't match I
or 'Float'
. Parrot can also choose to automatically promote one of the I, N, or S values to an Integer, Float, or String PMC.
To make the decision about which multi variant to call, Parrot takes a Manhattan Distance between the two. Parrot calculates the distance between the multi signatures and the argument signature. Every difference counts as one step. A difference can be an autobox from a primitive type to a PMC, or the conversion from one primitive type to another, or the matching of an argument to a _
wildcard. After Parrot calculates the distance to each variant, it calls the function with the lowest distance. Notice that it's possible to define a variant that is impossible to call: for every potential combination of arguments there is a better match. This isn't necessarily a common occurrence, but it's something to watch out for in systems with a lot of multis and a limited number of data types in use.
Classes and Objects
It may seem more appropriate for a discussion of PIR's support for classes and objects to reside in its own chapter, instead of appearing in a generic chapter about PIR programming "basics". However, part of PIR's core functionality is its support for object-oriented programming. PIR doesn't use all the fancy syntax as other OO languages, and it doesn't even support all the features that most modern OO languages have. What PIR does have is support for some of the basic structures and abilities, the necessary subset to construct richer and higher-level object systems.
PMCs as Classes
PMCs aren't exactly "classes" in the way that this term is normally used in object-oriented programming languages. They are polymorphic data items that can be one of a large variety of predefined types. As we have seen briefly, and as we will see in more depth later, PMCs have a standard interface called the VTABLE interface. VTABLEs are a standard list of functions that all PMCs implement.Alternately, PMCs can choose not to implement each interface explicitly and instead let Parrot call the default implementations.
VTABLEs are very strict: There are a fixed number with fixed names and fixed argument lists. You can't just create any random VTABLE interface that you want to create, you can only make use of the ones that Parrot supplies and expects. To circumvent this limitation, PMCs may have METHODS in addition to VTABLEs. METHODs are arbitrary code functions that can be written in C, may have any name, and may implement any behavior.
VTABLE Interfaces
Internally, all operations on PMCs are performed by calling various VTABLE interfaces.
Class and Object PMCs
The details about various PMC classes are managed by the Class PMC. Class PMCs contain information about the class, available methods, the inheritance hierarchy of the class, and various other details. Classes can be created with the newclass
opcode:
$P0 = newclass "MyClass"
Once we have created the class PMC, we can instantiate objects of that class using the new
opcode. The new
opcode takes either the class name or the Class PMC as an argument:
$P1 = new $P0 # $P0 is the Class PMC $P2 = new "MyClass" # Same
The new
opcode can create two different types of PMC. The first type are the built-in core PMC classes. The built-in PMCs are written in C and cannot be extended from PIR without subclassing. However, you can also create user-defined PMC types in PIR. User-defined PMCs use the Object PMC type for instantiation. Object PMCs are used for all user-defined type and keep track of the methods and VTABLE override definitions. We're going to talk about methods and VTABLE overrides in the next chapter.
Subclassing PMCs
Existing built-in PMC types can be subclassed to associate additional data and methods with that PMC type. Subclassed PMC types act like their PMC base types, by sharing the same VTABLE methods and underlying data types. However, the subclass can define additional methods and attribute data storage. If necessary new VTABLE interfaces can be defined in PIR and old VTABLE methods can be overridden using PIR. We'll talk about defining methods and VTABLE interface overrides in the next chapter.
Creating a new subclass of an existing PMC class is done using the subclass
keyword:
# create an anonymous subclass $P0 = subclass 'ResizablePMCArray' # create a subclass named "MyArray" $P0 = subclass 'ResizablePMCArray', 'MyArray'
This returns a Class
PMC which can be used to create and modify the class by adding attributes or creating objects of that class. You can also use the new class PMC to create additional subclasses:
$P0 = subclass 'ResizablePMCArray', 'MyArray' $P1 = subclass $P0, 'MyOtherArray'
Once you have created these classes, you can create them like normal with the new
keyword:
$P0 = new 'MyArray' $P1 = new 'MyOtherArray'
Attributes
Classes and subclasses can be given attributes which are named data fields. Attributes are created with the addattribute
opcode, and can be set and retrieved with the setattribute
and getattribute
opcodes respectively:
# Create the new class with two attributes $P0 = newclass 'MyClass' addattribute $P0, 'First' addattribute $P0, 'Second' # Create a new item of type MyClass $P1 = new 'MyClass' # Set values to the attributes setattribute $P1, 'First', 'First Value' setattribute $P1, 'Second', 'Second Value' # Get the attribute values $S0 = getattribute $P1, 'First' $S1 = getattribute $P1, 'Second'
Those values added as attributes don't need to be strings, even though both of the ones in the example are. They can be integers, numbers or PMCs too.
Input and Output
Like almost everything else in Parrot, input and output are handled by PMCs. Using the print
opcode or the say
opcode like we've already seen in some examples does this internally without your knowledge. However, we can do it explicitly too. First we'll talk about basic I/O, and then we will talk about using PMC-based filehandles for more advanced operations.
Basic I/O Opcodes
We've seen print
and say
. These are carry-over artifacts from Perl, when Parrot was simply the VM backend to the Perl 6 language. print
prints the given string argument, or the stringified form of the argument, if it's not a string, to standard output. say
does the same thing but also appends a trailing newline to it. Another opcode worth mentioning is the printerr
opcode, which prints an argument to the standard error output instead.
We can read values from the standard input using the read
and readline
ops. read
takes an integer value and returns a string with that many characters. readline
reads an entire line of input from the standard input, and returns the string without the trailing newline. Here is a simple echo program that reads in characters from the user and echos them to standard output:
.sub 'main' loop_top: $S0 = read 10 print $S0 goto loop_top .end
Filehandles
The ops we have seen so far are useful if all your I/O operations are limited to the standard streams. However, there are plenty of other places where you might want to get data from and send data to. Things like files, sockets, and databases all might need to have data sent to them. These things can be done by using a file handle.
Filehandles are PMCs that describe a file and keep track of an I/O operations internal state. We can get Filehandles for the standard streams using dedicated opcodes:
$P0 = getstdin # Standard input handle $P1 = getstdout # Standard output handle $P2 = getstderr # Standard error handle
If we have a file, we can create a handle to it using the open
op:
$P0 = open "my/file/name.txt"
We can also specify the exact mode that the file handle will be in:
$P0 = open "my/file/name.txt", "wa"
The mode string at the end should be familiar to C programmers, because they are mostly the same values:
r : read w : write wa : append p : pipe
So if we want a handle that we can read and write to, we write the mode string "rw"
. If we want to be able to read and write to it, but we don't want write operations to overwrite the existing contents, we use "rwa"
instead.
When we are done with a filehandle that we've created, we can shut it down with the close
op. Notice that we don't want to be closing any of the standard streams.
close $P0
With a filehandle, we can perform all the same operations as we could earlier, but we pass the filehandle as an additional argument to tell the op where to write or read the data from.
print "hello" # Write "hello!" to STDOUT $P0 = getstdout print $P0, "hello" # Same, but more explicit say $P0, " world!" # say to STDOUT $P1 = open "myfile.txt", "wa" print $P1, "foo" # Write "foo" to myfile.txt
Filehandle PMCs
Let's see a little example of a program that reads in data from a file, and prints it to STDOUT.
.sub 'main' $P0 = getstdout $P1 = open "myfile.txt", "r" loop_top: $S0 = readline $P1 print $P0, $S0 if $P1 goto loop_top close $P1 .end
This example shows that treating a filehandle PMC like a boolean value returns whether or not we have reached the end of the file. A true return value means there is more file to read. A false return value means we are at the end. In addition to this behavior, Filehandle PMCs have a number of methods that can be used to perform various operations.
$P0.'open'(STRING filename, STRING mode)
Opens the filehandle. Takes two optional strings: the name of the file to open and the open mode. If no filename is given, the previous filename associated with the filehandle is opened. If no mode is given, the previously-used mode is used.
$P0.'isatty'()
Returns a boolean value whether the filehandle is a TTY terminal
$P0.'close'()
Closes the filehandle. Can be reopened with $P0.'is_closed'()
Returns true if the filehandle is closed, false if it is opened.
$P0.'read'(INTVAL length)
Reads $P0.'readline'()
Reads an entire line (up to a newline character or EOF) from the filehandle.
$P0.'readline_interactive'(STRING prompt)
Displays the string $P0.'readall'(STRING name)
Reads the entire file $P0.'flush'()
Flushes the buffer
$P0.'print'(PMC to_print)
Prints the given value to the filehandle. The $P0.'puts'(STRING to_print)
Prints the given string value to the filehandle
$P0.'buffer_type'(STRING new_type)
If $P0.'buffer_size'(INTVAL size)
If $P0.'mode'()
Returns the current file access mode.
$P0.'encoding'(STRING encoding)
Sets the filehandle's string encoding to $P0.'eof'()
Returns true if the filehandle is at the end of the current file, false otherwise.
$P0.'get_fd'()
Returns the integer file descriptor of the current file, but only on operating systems that use file descriptors. Returns
$P0 = new 'Filehandle' $P0.'open'("myfile.txt", "r") $P0 = open "myfile.txt", "r" # Same!The
open
opcode internally creates a new filehandle PMC and calls the 'open'()
method on it. So even though the above two code snippets act in an identical way, the later one is a little more concise to write. The caveat is that the open
opcode creates a new PMC for every call, while the 'open'()
method call can reuse an existing filehandle PMC for a new file.
.'open'
later.
$P0.'close'() close $P0 # SameThe
close
opcode calls the 'close'()
method on the Filehandle PMC internally, so these two calls are equivalent.
length
bytes from the filehandle.
$S0 = read $P0, 10 $P0.'read'(10)The two calls are equivalent, and the
read
opcode calls the 'read'()
method internally.
prompt
and then reads a line of input.
name
into a string. If the filehandle is closed, it will open the file given by name
, read the entire file, and then close the handle. If the filehandle is already open, name
should not be passed (it is an optional parameter).
print
opcode uses the 'print'()
method internally.
print "Hello" $P0 = getstdout print $P0, "Hello!" # Same $P0.'print'("Hello!") # Same
new_type
is given, changes the buffer to the new type. If it is not, returns the current type. Acceptable types are:
unbuffered line-buffered full-buffered
size
is given, set the size of the buffer. If not, returns the size of the current buffer.
encoding
if given, returns the current encoding otherwise.
-1
on systems that do not support this.Exceptions
Parrot includes a robust exception mechanism that is not only used internally to implement a variety of control flow constructs, but is also available for use directly from PIR code. Exceptions, in as few words as possible, are error conditions in the program. Exceptions are thrown when an error occurs, and they can be caught by special routines called handlers. This enables Parrot to recover from errors in a controlled way, instead of crashing and terminating the process entirely.
Exceptions, like most other data objects in Parrot, are PMCs. They contain and provide access to a number of different bits of data about the error, such as the location where the error was thrown (including complete backtraces), any annotation information from the file, and other data.
Throwing Exceptions
Many exceptions are used internally in Parrot to indicate error conditions. Opcodes such as die
and warn
throw exceptions internally to do what they are supposed to do. Other opcodes such as div
throw exceptions only when an error occurs, such as an attempted division by zero.
Exceptions can also be thrown manually using the throw
opcode. Here's an example:
$P0 = new 'Exception' throw $P0
This throws the exception object as an error. If there are any available handlers in scope, the interpreter will pass the exception object to the handler and continue execution there. If there are no handlers available, Parrot will exit.
Exception Attributes
Since Exceptions are PMC objects, they can contain a number of useful data items. One such data item is the message:
$P0 = new 'Exception' $P1 = new 'String' $P1 = "this is an error message for the exception" $P0["message"] = $P1
Another is the severity and the type:
$P0["severity"] = 1 # An integer value $P0["type"] = 2 # Also an Integer
Finally, there is a spot for additional data to be included:
$P0["payload"] = $P2 # Any arbitrary PMC
Exception Handlers
Exception handlers are labels in PIR code that can be jumped to when an exception is thrown. To list a label as an exception handler, the push_eh
opcode is used. All handlers exist on a stack. Pushing a new handler adds it to the top of the stack, and using the pop_eh
opcode pops the handler off the top of the stack.
push_eh my_handler # something that might cause an error my_handler: # handle the error here
Catching Exceptions
The exception PMC that was thrown can be caught using the .get_results()
directive. This returns the Exception PMC object that was thrown from inside the handler:
my_handler: .local pmc err .get_results(err)
With the exception PMC available, the various attributes of that PMC can be accessed and analyzed for additional information about the error.
Exception Handler PMCs
Like all other interesting data types in Parrot, exception handlers are a PMC type. When using the syntax above with push_eh LABEL
, the handler PMC is created internally by Parrot. However, you can create it explicitly too if you want:
$P0 = new 'ExceptionHandler' set_addr $P0, my_handler push_eh $P0 ... my_handler: ...
Rethrowing and Exception Propagation
Exception handlers are nested and are stored in a stack. This is because not all handlers are intended to handle all exceptions. If a handler cannot deal with a particular exception, it can rethrow
the exception to the next handler in the stack. Exceptions propagate through the handler stack until it reaches the default handler which causes Parrot to exit.
Annotations
Annotations are pieces of metadata that can be stored in a bytecode file to give some information about what the original source code looked like. This is especially important when dealing with high-level languages. We'll go into detail about annotations and their use in Chapter 10.
Annotations are created using the .annotation
keyword. Annotations consist of a key/value pair, where the key is a string and the value is an integer, a number, or a string. Since annotations are stored compactly as constants in the compiled bytecode, PMCs cannot be used.
.annotation 'file', 'mysource.lang' .annotation 'line', 42 .annotation 'compiletime', 0.3456
Annotations exist, or are "in force" throughout the entire subroutine, or until they are redefined. Creating a new annotation with the same name as an old one overwrites it with the new value. The current hash of annotations can be retrieved with the annotations
opcode:
.annotation 'line', 1 $P0 = annotations # {'line' => 1} .annotation 'line', 2 $P0 = annotations # {'line' => 2}
Or, to retrieve a single annotation by name, you can write:
$I0 = annotations 'line'
Annotations in Exceptions
Exception objects contain information about the annotations that were in force when the exception was thrown. These can be retrieved with the 'annotation'()
method of the exception PMC object:
$I0 = $P0.'annotations'('line') # only the 'line' annotation $P1 = $P0.'annotations'() # hash of all annotations
Exceptions can also give out a backtrace to try and follow where the program was exactly when the exception was thrown:
$P1 = $P0.'backtrace'()
The backtrace PMC is an array of hashes. Each element in the array corresponds to a function in the current call stack. Each hash has two elements: 'annotation'
which is the hash of annotations that were in effect at that point, and 'sub'
which is the Sub PMC of that function.