Parrot Intermediate Representation

Parrot Intermediate Representation (PIR) is Parrot's native low-level language.Parrot also has a pure native assembly language called PASM, described in Chapter 9. PIR is fundamentally an assembly language, but it has some higher-level features such as operator syntax, syntactic sugar for subroutine and method calls, automatic register allocation, and more friendly conditional syntax. PIR is commonly used to write Parrot libraries -- including some of Parrot's compilers -- and is the target form when compiling high-level languages to Parrot. Even so, PIR is more rigid and "close to the machine" then some higher-level languages like C. Files containing PIR code use the .pir extension.

Basic Syntax

PIR has a relatively simple syntax. Every line is a comment, a label, a statement, or a directive. There is no end-of-line symbol (such as a semicolon in C), the end of the line is the end of the statement or directive.

Comments

A comment begins with the # symbol, and continues until the end of the line. Comments can stand alone on a line or follow a statement or directive.

    # This is a regular comment. The PIR
    # interpreter ignores this.

PIR also treats inline documentation in Pod format as a comment. An equals sign as the first character of a line marks the start of a Pod block. A =cut marker signals the end of a Pod block.

  =head2

  This is Pod documentation, and is treated like a
  comment. The PIR interpreter ignores this.

  =cut

Labels

A label attaches a name to a line of code so other statements can refer to it. Labels can contain letters, numbers, and underscores. By convention, labels use all capital letters to stand out from the rest of the source code. It's fine to put a label on the same line as a statement or directive:

    GREET: print "'Allo, 'allo, 'allo."

Readability is improved by putting labels on separate lines, outdented to stand apart from the ordiary code flow:

  GREET:
    print "'Allo, 'allo, 'allo."

Statements

A statement is either an opcode or syntactic sugar for one or more opcodes. An opcode is a native instruction for the virtual machine; it consists of the name of the instruction followed by zero or more arguments.

  print "Norwegian Blue"

PIR also provides higher-level constructs, including symbol operators:

  $I1 = 2 + 5

Under the hood, these special statement forms are just syntactic sugar for regular opcodes. The + symbol corresponds to the add opcode, the - symbol to the sub opcode, and so on. The previous example is equivalent to:

  add $I1, 2, 5

Directives

Directives look similar to opcodes, but they begin with a period (.). Parrot's parser handles them specially. Some directives specify actions that occur at compile time. Other directives represent complex operations that require the generation of multiple instructions. The .local directive, for example, declares a named variable.

  .local string hello

Literals

Integers and floating point numbers are numeric literals. They can be positive or negative.

  $I0 = 42       # positive
  $I1 = -1       # negative

Integer literals can also be binary or hexadecimal:

  $I3 = 0b01010  # binary
  $I2 = 0xA5     # hexadecimal

Floating point number literals have a decimal point, and can use scientific notation:

  $N0 = 3.14
  $N2 = -1.2e+4

String literals are enclosed in single or double-quotes.Strings explains the differences between the quoting types.

  $S0 = "This is a valid literal string"
  $S1 = 'This is also a valid literal string'

Variables

PIR variables can store four different kinds of values—integers, numbers (floating point), strings, and objects. Parrot's objects are called PMCs, for "PolyMorphic Container".

The simplest kind of variable is a register variable. The name of a register variable always starts with a dollar sign ($), followed by a single character which specifies the type of the variable—integer (I), number (N), string (S), or PMC (P)—, and end with a unique number. Register variables don't need to be predeclared:

  $S0 = "Who's a pretty boy, then?"
  print $S0

PIR also has named variables, which are declared with the .local directive. As with register variables, there are four valid types: int, num, string, and pmc. Named variables have to be declared, but otherwise behave exactly the same as register variables.

  .local string hello
  hello = "'Allo, 'allo, 'allo."
  print hello

Constants

The .const directive declares a named constant. Named constants are similar to named variables, but their values are set in the declaration and can never be changed. Like .local, .const takes a type and a name. It also requires a literal argument to set the value of the constant.

  .const int    frog = 4                       # integer constant
  .const string name = "Superintendent Parrot" # string constant
  .const num    pi   = 3.14159                 # floating point constant

Named constants may be used in all the same places as literals, but have to be declared beforehand. The following example declares a named string constant hello and prints the value:

  .const string hello = "Hello, Polly."
  print hello

Control Structures

Rather than providing a pre-packaged set of control structures like if and while, PIR gives you the building blocks to construct your own.PIR has many advanced features, but at heart it is an assembly language. The most basic of these building blocks is goto, which jumps to a named label.This is not your father's goto. It can only jump inside a subroutine, and only to a named label. In the following code example, the print statement will run immediately after the goto statement:

    goto GREET
    # ... some skipped code ...

  GREET:
    print "'Allo, 'allo, 'allo."

Variations on the basic goto check whether a particular condition is true or false before jumping:

  if $I0 > 5 goto GREET

All of the traditional control structures can be constructed from PIR's control building blocks.

Subroutines

A PIR subroutine starts with the .sub directive and ends with the .end directive. Parameter declarations use the .param directive, and look a lot like named variable declarations. The following example declares a subroutined named greeting, that takes a single string parameter named hello:

  .sub greeting
      .param string hello
      print hello
  .end

That's All Folks

You now know everything you need to know about PIR. Everything else you read or learn about PIR will use one of these fundamental language structures. The rest is vocabulary.

Working with Variables

We call the simple $I0-style variables "register variables" for a specific reason: Parrot is a register-based virtual machine. It has 4 typed register sets: integers, floating-point numbers, strings, and objects. When you're working with register variables or named variables, you're actually working directly with register storage locations in the virtual machine.

If you've ever worked with an assembly language before, you may immediately jump to the conclusion that $I0 is the zeroth integer register, but Parrot is a bit smarter than that. The number of a register variable usually does not correspond to the register used internally; Parrot's compiler maps registers as appropriate for speed and memory considerations. The only guarantee Parrot gives you is that you'll always get the same storage location when you use $I0 in the same subroutine.

The most basic operation on a variable is assignment using the = operator:

  $I0 = 42        # set integer variable to the value 42
  $N3 = 3.14159   # set number variable to an approximation of pi
  $I1 = $I0       # set $I1 to the value of $I0

The exchange opcode swaps the contents of two variables of the same type. The following example sets $I0 to the value of $I1, and sets $I1 to the value of $I0.

  exchange $I0, $I1

The null opcode sets an integer or number variable to a zero value, and undefines a string or object.

  null $I0  # 0
  null $N0  # 0.0
  null $S0  # NULL
  null $P0  # PMCNULL

Working with Numbers

PIR has an extensive set of instructions that work with integers, floating-point numbers, and numeric PMCs. Many of these instructions have a variant that modifies the result in place:

  $I10 = $I11 + $I2
  $I0 += $I1

The first form of + stores the sum of the two arguments in the result variable. The second variant, +=, adds the single argument to the result variable and stores the sum back in the result variable.

The arguments can be Parrot literals, variables, or constants. If the result is an integer type, like $I0, the arguments must also be integers. A number result, like $N0, usually requires number arguments, but many numeric instructions also allow the final argument to be an integer. Instructions with a PMC result may accept an integer, floating-point, or PMC final argument:

  $P0 = $P1 * $P2
  $P0 = $P1 * $I2
  $P0 = $P1 * $N2
  $P0 *= $P1
  $P0 *= $I1
  $P0 *= $N1

We won't list every numeric opcode here, but we'll list some of the most common ones. You can get a complete list in "PIR Opcodes" in Chapter 11.

Unary numeric opcodes

The unary opcodes have a single argument, and either return a result or modify the argument in place. Some of the most common unary numeric opcodes are inc (increment), dec (decrement), abs (absolute value), neg (negate), and fact (factorial):

  $N0 = abs -5.0  # the absolute value of -5.0 is 5.0
  $I1 = fact  5   # the factorial of 5 is 120
  inc $I1         # 120 incremented by 1 is 121

Binary numeric opcodes

Binary opcodes have two arguments and a result. Parrot provides addition (+ or add), subtraction (- or sub), multiplication (* or mul), division (/ or div), modulus (% or mod), and exponent (pow) opcodes, as well as gcd (greatest common divisor) and lcm (least common multiple).

  $I0 = 12 / 5
  $I0 = 12 % 5

Floating-point operations

Although most of the numeric operations work with both numbers and integers, a few require the result to be a number. Among these are ln (natural log), log2 (log base 2), log10 (log base 10), and exp (ex), as well as a full set of trigonometric opcodes such as sin (sine), cos (cosine), tan (tangent), sec (secant), cosh (hyperbolic cosine), tanh (hyperbolic tangent), sech (hyperbolic secant), asin (arc sine), acos (arc cosine), atan (arc tangent), asec (arc secant), exsec (exsecant), hav (haversine), and vers (versine). All angle arguments for the trigonometric functions are in radians:

  $N0 = sin $N1
  $N0 = exp 2

The majority of the floating-point operations have a single argument and a single result. Even though the result must be a number, the source can be either an integer or number.

Logical and Bitwise Operations

The logical opcodes evaluate the truth of their arguments. They're often used to make decisions on control flow. Logical operations are implemented for integers and numeric PMCs. Numeric values are false if they're 0, and true otherwise. Strings are false if they're the empty string or a single character "0", and true otherwise. PMCs are true when their get_bool vtable method returns a nonzero value.

The and opcode returns the first argument if it's false and the second argument otherwise:

  $I0 = and 0, 1  # returns 0
  $I0 = and 1, 2  # returns 2

The or opcode returns the first argument if it's true and the second argument otherwise:

  $I0 = or 1, 0  # returns 1
  $I0 = or 0, 2  # returns 2

  $P0 = or $P1, $P2

Both and and or are short-circuiting. If they can determine what value to return from the first argument, they'll never evaluate the third. This is significant only for PMCs, as they might have side effects on evaluation.

The xor opcode returns the first argument if it is the only true value, returns the second argument if it is the only true value, and returns false if both values are true or both are false:

  $I0 = xor 1, 0  # returns 1
  $I0 = xor 0, 1  # returns 1
  $I0 = xor 1, 1  # returns 0
  $I0 = xor 0, 0  # returns 0

The not opcode returns a true value when the argument is false, and a false value if the argument is true:

  $I0 = not $I1
  $P0 = not $P1

The bitwise opcodes operate on their values a single bit at a time. band, bor, and bxor return a value that is the logical AND, OR, or XOR of each bit in the source arguments. They each take a two arguments. They also have a variant that modifies the result in place. bnot is the logical NOT of each bit in a single source argument.

  $I0 = bnot $I1
  $P0 = band $P1
  $I0 = bor $I1, $I2
  $P0 = bxor $P1, $I2

The logical and arithmetic shift operations shift their values by a specified number of bits:

  $I0 = shl $I1, $I2        # shift $I1 left by count $I2
  $I0 = shr $I1, $I2        # arithmetic shift right
  $P0 = lsr $P1, $P2        # logical shift right

Working with Strings

Parrot strings are buffers of variable-sized data. The most common use of strings is to store text data. Strings can also hold binary or other non-textual data, though this is rare.In general, a custom PMC is more useful. Parrot strings are flexible and powerful, to handle the complexity of human-readable (and computer-representable) text data. String operations work with string literals, variables, and constants, and with string-like PMCs.

Escape Sequences

Strings in double-quotes accept all sorts of escape sequences using backslashes. Strings in single-quotes only allow escapes for nested quotes:

  $S0 = "This string is \n on two lines"
  $S0 = 'This is a \n one-line string with a slash in it'

Parrot supports several escape sequences in double-quoted strings:

Heredocs

If you need more flexibility in defining a string, use a heredoc string literal. The << operator starts a heredoc. The string terminator immediately follows. All text until the terminator is part of the string. The terminator must appear on its own line, must appear at the beginning of the line, and may not have any trailing whitespace.

  $S2 = << "End_Token"

  This is a multi-line string literal. Notice that
  it doesn't use quotation marks.

  End_Token

Concatenating strings

Use the . operator to concatenate strings. It has a .= variant to modify the result in place.

  $S0 = "ab"
  $S1 = $S0 . "cd"  # concatenates $S0 with "cd"
  print $S1         # prints "abcd"
  print "\n"

  $S1 .= "xy"       # appends "xy" to $S1
  print $S1         # prints "abcdxy"
  print "\n"

The first . operation in the example above concatenates the string "cd" onto the string "ab" and stores the result in $S1. The second .= operation appends "xy" onto the string "abcd" in $S1.

Repeating strings

The repeat opcode repeats a string a specified number of times:

  $S0 = "a"
  $S1 = repeat $S0, 5
  print $S1            # prints "aaaaa"
  print "\n"

In this example, repeat generates a new string with "a" repeated five times and stores it in $S1.

Length of a string

The length opcode returns the length of a string in characters. This won't be the same as the length in bytes for multibyte encoded strings:

  $S0 = "abcd"
  $I0 = length $S0                # the length is 4
  print $I0
  print "\n"

length doesn't have an equivalent for PMC strings.

Substrings

The simplest version of the substr opcode takes three arguments: a source string, an offset position, and a length. It returns a substring of the original string, starting from the offset position (0 is the first character) and spanning the length:

  $S0 = substr "abcde", 1, 2        # $S0 is "bc"

This example extracts a two-character string from "abcde" at a one-character offset from the beginning of the string (starting with the second character). It generates a new string, "bc", in the destination register $S0.

When the offset position is negative, it counts backward from the end of the string. So an offset of -1 starts at the last character of the string.

substr also has a four-argument form, where the fourth argument is a string to replace the substring. This variant modifies the source string and returns the removed substring.

  $S1 = "abcde"
  $S0 = substr $S1, 1, 2, "XYZ"
  print $S0                        # prints "bc"
  print "\n"
  print $S1                        # prints "aXYZde"
  print "\n"

The example above replaces the substring "bc" in $S1 with the string "XYZ", and returns "bc" in $S0.

When the offset position in a replacing substr is one character beyond the original string length, substr appends the replacement string just like the concatenation operator. If the replacement string is an empty string, the characters are just removed from the original string.

When you don't need to capture the replaced string, there's an optimized version of substr that just does a replace without returning the removed substring.

  $S1 = "abcde"
  $S1 = substr 1, 2, "XYZ"
  print $S1                        # prints "aXYZde"
  print "\n"

Converting characters

The chr opcode takes an integer value and returns the corresponding character in the ASCII character set as a one-character string. The ord opcode takes a single character string and returns the integer value of the character at the first position in the string. Notice that the integer value of the character will differ depending on the current encoding of the string:

  $S0 = chr 65              # $S0 is "A"
  $I0 = ord $S0             # $I0 is 65, if $S0 is ASCII or UTF-8

ord has a two-argument variant that takes a character offset to select a single character from a multicharacter string. The offset must be within the length of the string:

  $I0 = ord "ABC", 2        # $I0 is 67

A negative offset counts backward from the end of the string, so -1 is the last character.

  $I0 = ord "ABC", -1       # $I0 is 67

Formatting strings

The sprintf opcode generates a formatted string from a series of values. It takes two arguments: a string specifying the format, and an array PMC containing the values to be formatted. The format string and the result can be either strings or PMCs:

  $S0 = sprintf $S1, $P2
  $P0 = sprintf $P1, $P2

The format string is similar to C's sprintf function, but with extensions for Parrot data types. Each format field in the string starts with a % and ends with a character specifying the output format. The output format characters are listed in Table 3-2.

Each format field can be specified with several options: flags, width, precision, and size. The format flags are listed in Table 3-3.

The width is a number defining the minimum width of the output from a field. The precision is the maximum width for strings or integers, and the number of decimal places for floating-point fields. If either width or precision is an asterisk (*), it takes its value from the next argument in the PMC.

The size modifier defines the type of the argument the field takes. The flags are listed in Table 3-4.

The values in the aggregate PMC must have a type compatible with the specified size.

Here's a short illustration of string formats:

  $P2 = new "Array"
  $P0 = new "Int"
  $P0 = 42
  push $P2, $P0
  $P1 = new "Num"
  $P1 = 10
  push $P2, $P1
  $S0 = sprintf "int %#Px num %+2.3Pf\n", $P2
  print $S0     # prints "int 0x2a num +10.000"
  print "\n"

The first eight lines create a Array with two elements: a Int and a Num. The format string of the sprintf has two format fields. The first, %#Px, takes a PMC argument from the aggregate (P) and formats it as a hexadecimal integer (x), with a leading 0x (#). The second format field, %+2.3Pf, takes a PMC argument (P) and formats it as a floating-point number (f), with a minimum of two whole digits and a maximum of three decimal places (2.3) and a leading sign (+).

The test files t/op/string.t and t/src/sprintf.t have many more examples of format strings.

Joining strings

The join opcode joins the elements of an array PMC into a single string. The first argument separates the individual elements of the PMC in the final string result.

  $P0 = new "Array"
  push $P0, "hi"
  push $P0, 0
  push $P0, 1
  push $P0, 0
  push $P0, "parrot"
  $S0 = join "__", $P0
  print $S0              # prints "hi__0__1__0__parrot"

This example builds a Array in $P0 with the values "hi", 0, 1, 0, and "parrot". It then joins those values (separated by the string "__") into a single string, and stores it in $S0.

Splitting strings

Splitting a string yields a new array containing the resulting substrings of the original string.

  $P0 = split "", "abc"
  set $P1, $P0[0]
  print $P1              # 'a'
  set $P1, $P0[2]
  print $P1              # 'c'

This example splits the string "abc" into individual characters and stores them in an array in $P0. It then prints out the first and third elements of the array.

Testing for substrings

The index opcode searches for a substring within a string. If it finds the substring, it returns the position where the substring was found as a character offset from the beginning of the string. If it fails to find the substring, it returns -1:

  $I0 = index "Beeblebrox", "eb"
  print $I0                       # prints 2
  print "\n"
  $I0 = index "Beeblebrox", "Ford"
  print $I0                       # prints -1
  print "\n"

index also has a three-argument version, where the fourth argument defines an offset position for starting the search:

  $I0 = index "Beeblebrox", "eb", 3
  print $I0                         # prints 5
  print "\n"

This finds the second "eb" in "Beeblebrox" instead of the first, because the search skips the first three characters in the string.

Bitwise Operations

The bitwise opcodes also have string variants for AND, OR, and XOR: bors, bands, and bxors. These take string or string-like PMC arguments and perform the logical operation on each byte of the strings to produce the result string.

  $S0 = bors $S1
  $P0 = bands $P1
  $S0 = bors $S1, $S2
  $P0 = bxors $P1, $S2

The bitwise string opcodes only have meaningful results when they're used with simple ASCII strings because the bitwise operation is done per byte.

Copy-On-Write

Strings use copy-on-write (COW) optimizations. A call to $S1 = $S0, doesn't immediately make a copy of $S0, it only makes both variables point to the same string. Parrot doesn't make a copy of the string until one of two strings is modified.

  $S0 = "Ford"
  $S1 = $S0
  $S1 = "Zaphod"
  print $S0                # prints "Ford"
  print $S1                # prints "Zaphod"

Modifying one of the two variables causes a new string to be created, preserving the old value in $S0 and assigning the new value to the new string in $S1. The benefit here is avoiding the cost of copying a string until a copy is actually needed.

Encodings and Charsets

Every string in Parrot has an associated encoding and character set. The default charset is 8-bit ASCII, which is almost universally supported. Double-quoted string constants can have an optional prefix specifying the string's encoding and charset.As you might suspect, single-quoted strings do not support this. Parrot maintains these values internally, and automatically converts strings when necessary to preserve the information. String prefixes are specified as encoding:charset: at the front of the string. Here are some examples:

  $S0 = utf8:unicode:"Hello UTF8 Unicode World!"
  $S1 = utf16:unicode:"Hello UTF16 Unicode World!"
  $S2 = ascii:"This is 8-bit ASCII"
  $S3 = binary:"This is raw, unformatted binary data"

The binary: charset treats the string as a buffer of raw unformatted binary data. It isn't really a string per se, because binary data contains no readable characters. This exists to support libraries which manipulate binary data that doesn't easily fit into any other primitive data type.

When Parrot operates on two strings (as in concatenation), they must both use the same character set and encoding. Parrot automatically upgrades one or both of the strings to the next highest compatible format as necessary. ASCII strings will automatically upgrade to UTF-8 strings if needed, and UTF-8 will upgrade to UTF-16.

Working with PMCs

Polymorphic Containers (PMCs) are the basis for complex data types and object-oriented behavior in Parrot. In PIR, any variable that isn't a low-level integer, number, or string is a PMC. PMCs act much like integer, number, or string variables, but you have to instantiate a new PMC object before you use it. The new opcode creates a new PMC object of the specified type.

  $P0 = new 'String'
  $P0 = "That's a bollard and not a parrot"
  print $P0

This example creates a String object, stores it in the PMC register variable $P0, assigns it the value "That's a bollard and not a parrot", and prints it.

Every PMC has a type that indicates what data it can store and what behavior it supports. The typeof opcode reports the type of a PMC. When the result is a string variable, typeof returns the name of the type:

  $P0 = new "String"
  $S0 = typeof $P0               # $S0 is "String"
  print $S0
  print "\n"

When the result is a PMC variable, typeof returns the Class for that object type.

Scalars

In most of the examples we've shown so far, PMCs just duplicate the functionality of integers, numbers, and strings. Parrot provides a set of simple PMCs for this exact purpose. Integer, Number, and String are thin overlays on Parrot's low-level integers, numbers, and strings.

An earlier example showed a string literal assigned to a PMC register of type String. This works for all the low-level types and their PMC equivalents:

  $P0 = new 'Integer'
  $P0 = 5

  $P1 = new 'String'
  $P1 = "5 birds"

  $P2 = new 'Number'
  $P2 = 3.14

Like literals, low-level integer, number, or string variables can be directly assigned to a PMC. The PMC handles the conversion from the low-level type to its own internal storage.This kind of conversion of a simpler type to a more complex type is often called "boxing".

  $I0 = 5
  $P0 = new 'Integer'
  $P0 = $I0

  $S1 = "5 birds"
  $P1 = new 'String'
  $P1 = $S0

  $N2 = 3.14
  $P2 = new 'Number'
  $P2 = $N2

The box opcode is a handy shortcut to create the appropriate PMC object from an integer, number, or string literal or variable.

  $P0 = box 3       # $P0 is an "Integer"

  $P1 = box "hello" # $P1 is a "String"

  $P2 = box 3.14    # $P2 is a "Number"

In the reverse situation, when assigning a PMC to an integer, number, or string variable, the PMC also has the ability to convert its value to the low-level type.The reverse of "boxing" is "unboxing".

  $P0 = box 5
  $S0 = $P0           # the string "5"
  $N0 = $P0           # the number 5.0
  $I0 = $P0           # the integer 5

  $P1 = box "5 birds"
  $S1 = $P1           # the string "5 birds"
  $I1 = $P1           # the integer 5
  $N1 = $P1           # the number 5.0

  $P2 = box 3.14
  $S2 = $P2           # the string "3.14"
  $I2 = $P2           # the integer 3
  $N2 = $P2           # the number 3.14

This example creates Integer, Number, and String PMCs, and shows the effect of assigning each one back to a low-level type.

Converting a String to an integer or number only makes sense when the contents of the string are a number. The String PMC will attempt to extract a number from the beginning of the string, but otherwise will simply return a false value.

Aggregates

PMCs can define complex types that hold multiple values, commonly called aggregates. Two of the most basic aggregate types are ordered arrays and associative arrays. The primary difference between these is that ordered arrays are indexed using integer keys, and while associative arrays are indexed with string keys.

The most important feature added for aggregates is keyed access. Elements within an aggregate PMC can be stored and retrieved by a numeric or string key.

PIR also offers a extensive set of operations for manipulating aggregate data types.

Ordered and associative arrays are not the only types of aggregates available, but they are a good demonstration of using integers and strings as keys in an aggregate.

Ordered Arrays

Parrot provides several ordered array PMCs, differentiated by whether the array is intended to store booleans, integers, numbers, strings, or other PMCs, and whether the array should keep a fixed length or dynamically resize for the number of elements it stores.

The all of the ordered array PMCs have zero-based integer keys. The syntax for keyed access to a PMC puts the key in square brackets after the register name:

  $P0 = new "Array"    # obtain a new array object
  $P0 = 2              # set its length
  set $P0[0], 10       # set first element to 10
  set $P0[1], $I31     # set second element to $I31
  set $I0, $P0[0]      # get the first element
  set $I1, $P0         # get array length

A key on the destination register of a set operation sets a value for that key in the aggregate. A key on the source register of a set returns the value for that key. If you set P0 without a key, you set the length of the array, not one of its values.Array is an autoextending array, so you never need to set its length. Other array types may require the length to be set explicitly. And if you assign the Array to an integer, you get the length of the array.

We mention "other array types" above, not as a vague suggestion that there may be other types of arrays eventually, but as an indication that we actually have several types of array PMCs in Parrot's core. Parrot comes with FixedPMCArray, ResizablePMCArray, FixedIntegerArray, ResizableIntegerArray, FixedFloatArray, ResizableFloatArray, FixedStringArray, ResizableStringArray, FixedBooleanArray, and ResizableBooleanArray types. These various types of arrays use various packing methods to create higher memory efficiency for their contents then using a single generic array type would be able to. The trade-off for higher memory efficiency is that these array PMCs can only hold a single type of data.

The array PMC types that start with "Fixed" have a fixed size and do not automatically extend themselves if you attempt to add data to a higher index then the array contains. The "Resizable" variants will automatically extend themselves as more data are added, but the cost is in algorithmic complexity of checking array bounds and reallocating array memory.

To retrieve the number of items currently in an array, you can use the elements opcode.

  set $P0, 100         # allocate store for 100 elements
  set $I0, $P0          # obtain current allocation size
  elements $I0, $P0     # get element count

Some other useful instructions for working with arrays are push, pop, shift, and unshift (you'll find them in "PIR Opcodes" in Chapter 11).

Associative Arrays

Other programming languages might refer to the same concept using different terms such as "dictionary" or "hash table" or "associative array". The Hash PMC is an unordered aggregate which uses string keys to identify elements within it.

  new $P1, "Hash"      # generate a new hash object
  set $P1["key"], 10   # set key and value
  set $I0, $P1["key"]   # obtain value for key
  set $I1, $P1          # number of entries in hash

The exists opcode tests whether a keyed value exists in an aggregate. It returns 1 if it finds the key in the aggregate, and returns 0 if it doesn't. It doesn't care if the value itself is true or false, only that the key has been set:

  new $P0, "Hash"
  set $P0["key"], 0
  exists $I0, $P0["key"] # does a value exist at "key"
  print $I0             # prints 1
  print "\n"

The delete opcode is also useful for working with hashes: it removes a key/value pair.

Iterators

Iterators extract values from an aggregate PMC one at a time and without extracting duplicates. Iterators are most useful in loops where an action needs to be performed on every element in an aggregate. You create an iterator by creating a new Iterator PMC, and passing the aggregate PMC to new as an additional parameter:

  new $P1, "Iterator", $P2

Alternatively, you can use the iter opcode to do the same thing:

  iter $P1, $P2     # Same!

The include file iterator.pasm defines some constants for working with iterators. The .ITERATE_FROM_START and .ITERATE_FROM_END constants are used to select whether an array iterator starts from the beginning or end of the array. Since Hash PMCs are unordered, these two constants do not have any affect on Hash iterators.

A value can be extracted from the iterator using the shift opcode. Evaluating the iterator PMC as a boolean returns whether the iterator has reached the end of the aggregate or not.

  .include "iterator.pasm"
      new $P2, "Array"
      push $P2, "a"
      push $P2, "b"
      push $P2, "c"
      new $P1, "Iterator", $P2
      set $P1, .ITERATE_FROM_START

  iter_loop:
      unless $P1, iter_end
      shift $P5, $P1
      print $P5                        # prints "a", "b", "c"
      branch iter_loop
  iter_end:
      # ...

Hash iterators work similarly to array iterators, but they extract keys only. With the key, you can find it's value from the original hash PMC. With hashes it's only meaningful to iterate in one direction since they don't define any order for their keys.

  .include "iterator.pasm"
      new $P2, "Hash"
      set $P2["a"], 10
      set $P2["b"], 20
      set $P2["c"], 30
      new $P1, "Iterator", $P2
      set $P1, .ITERATE_FROM_START_KEYS

  iter_loop:
      unless $P1, iter_end
      shift $S5, $P1                    # one of the keys "a", "b", "c"
      set $I9, $P2[$S5]
      print $I9                        # prints e.g. 20, 10, 30
      branch iter_loop
  iter_end:
      # ...

Multi-level Keys

Arrays and hashes can hold any data type, including other aggregates. Accessing elements deep within nested data structures is a common operation, so PIR provides a way to do it in a single instruction. Complex keys specify a series of nested data structures, with each individual key separated by a semicolon:

  $P0 = new "Hash"
  $P1 = new "Array"
  $P1[2] = 42
  $P0["answer"] = $P1
  $I1 = 2
  $I0 = $P0["answer";$I1]
  print $I0
  print "\n"

This example builds up a data structure of a hash containing an array. The complex key $P0["answer";I1] retrieves an element of the array within the hash. You can also set a value using a complex key:

  $P0["answer";0] = 5

The individual keys are integers or strings, or registers with integer or string values.

Assignment

We mentioned before that set on two PMCs simply aliases them both to the same object, and that clone creates a complete duplicate object. But if you just want to assign the value of one PMC to another PMC, you need the assign opcode:

  new $P0, "Int"
  new $P1, "Int"
  set $P0, 42
  set $P2, $P0
  assign $P1, $P0     # note: $P1 has to exist already
  inc $P0
  print $P0          # prints 43
  print "\n"
  print $P1          # prints 42
  print "\n"
  print $P2          # prints 43
  print "\n"

This example creates two Int PMCs: P0 and P1. It gives P0 a value of 42. It then uses set to give the same value to P2, but uses assign to give the value to P1. When P0 is incremented, P2 also changes, but P1 doesn't. The destination register for assign must have an existing object of the right type in it, since assign doesn't create a new object (as with clone) or reuse the source object (as with set).

Assignment

PMC registers contain references to PMC structures internally. So, the set opcode doesn't copy the entire PMC, it only copies the reference to the PMC data. Here's an example that shows a side effect of this operation:

  $P0 = new "String"
  $P0 = "Ford"
  $P1 = $P0
  $P1 = "Zaphod"
  print $P0                # prints "Zaphod"
  print $P1                # prints "Zaphod"

In this example, $P0 and $P1 are both references to the same internal data structure, so when we set $P1 to the string literal "Zaphod", it overwrites the previous value "Ford". Now, both $P0 and $P1 point to the String PMC "Zaphod", even though it appears that we only set one of those two registers to that value.

Copying and Cloning

The clone opcode makes a deep copy of a string or PMC. Earlier in this chapter we saw that PMC and String values used with the set opcode didn't create a copy of the underlying data structure, it only created a copy of the reference to that structure. With strings, this doesn't cause a problem because strings use Copy On Write (COW) semantics to automatically create a copy of the string when one reference is modified. However, as we saw, PMCs don't have this same behavior and so making a change to one PMC reference would modify the data that all the other references to that same PMC pointed to.

Instead of just copying the pointer like set would do, we can use the clone opcode to create a deep copy of the PMC, not just a shallow copy of the reference.

  $P0 = new "String"
  $P0 = "Ford"
  $P1 = clone $P0
  $P0 = "Zaphod"
  print $P0        # prints "Zaphod"
  print $P1        # prints "Ford"

This example creates an identical, independent clone of the PMC in P0 and puts a pointer to it in P1. Later changes to P0 have no effect on the PMC referenced in P1.

With simple strings, the copes created by clone are COW exactly the same as the copy created by set, so there is no difference between these two opcodes for strings. By convention, set is used with strings more often then clone, but there is no rule about this.

Properties

PMCs can have additional values attached to them as "properties" of the PMC. What these properties do is entirely up to the language being implemented. Most usually properties are used to hold extra metadata about the PMC that is used by the high-level language (HLL).

The setprop opcode sets the value of a named property on a PMC. It takes three arguments: the PMC to be set with a property, the name of the property, and a PMC containing the value of the property. The getprop opcode returns the value of a property. It also takes three arguments: the PMC to store the property's value, the name of the property, and the PMC from which the property value is to be retrieved. Internally a PMCs properties are stored in a Hash structure, where the name of the property is stored in a special properties Hash.

  new $P0, "String"
  set $P0, "Zaphod"
  new $P1, "Int"
  set $P1, 1
  setprop $P0, "constant", $P1       # set a property on $P0
  getprop $P3, "constant", $P0       # retrieve a property on $P0
  print $P3                          # prints 1
  print "\n"

This example creates a String object in P0, and a Int object with the value 1 in P1. setprop sets a property named "constant" on the object in P0 and gives the property the value in P1.The "constant" property is ignored by PIR, but may be significant to the HLL that set it. getprop retrieves the value of the property "constant" on P0 and stores it in P3.

Properties are kept in a separate hash for each PMC. Property values are always PMCs, but only references to the actual PMCs. Trying to fetch the value of a property that doesn't exist returns a Undef.

delprop deletes a property from a PMC.

  delprop $P1, "constant"  # delete property

You can also return a complete hash of all properties on a PMC with prophash.

  prophash $P0, $P1         # set $P0 to the property hash of $P1

VTABLE Interfaces

Internally, all operations on PMCs are performed by calling various VTABLE interfaces.

These PMC types have the benefit of the VTABLE interface. VTABLEs are a standard API that all PMCs conform to for performing standard operations. These PMC types support custom methods to perform various operations, may be passed to subroutines that expect PMC arguments, and can be subclassed by a user-defined type.

PMCs are are polymorphic data items that can be one of a large variety of predefined types. As we have seen briefly, and as we will see in more depth later, PMCs have a standard interface called the VTABLE interface. VTABLEs are a standard list of functions that all PMCs implement or, PMCs can choose not to implement each interface explicitly and instead let Parrot call the default implementations.

VTABLEs are very strict: There are a fixed number with fixed names and fixed argument lists. You can't just create any random VTABLE interface that you want to create, you can only make use of the ones that Parrot supplies and expects. To circumvent this limitation, PMCs may have METHODS in addition to VTABLEs. METHODs are arbitrary code functions that can be written in C, may have any name, and may implement any behavior.

Operations on a PMC are implemented by vtable functions. The result of an operation is entirely determined by the behavior of the PMCs vtable. Since PMCs define their own behavior for these vtable functions, it's important to familiarize yourself with the behavior of the particular PMC before you start performing a lot of operations on it.

In the chapter on PIR, we've seen a number of these vtable functions already, and seen how they implement the behaviors found inside the various opcodes. The vtable interface is standard, and all PMCs implement the exact same set of vtables. We've seen some of the vtables and their uses, and more of them will be discussed in this chapter and later in the various reference chapters.

Control Structures

Control flow in PIR occurs entirely with conditional and unconditional branches to labels. This may seem simplistic and primitive, but here PIR shows its roots as a thin overlay on the assembly language of a virtual processor. PIR does not support high-level looping structures such as while or for loops. PIR has some support for basic if branching constructs, but does not support more complicated if/then/else branch structures.

The control structures of high-level languages hew tightly to the semantics of those languages; Parrot provides the minimal feature set necessary to implement any semantic of an HLL without dictating how that HLL may implement its features. Language agnosticism is an important design goal in Parrot, and creates a very flexible and powerful development environment for language developers.

The most basic branching instruction is the unconditional branch, goto:

  .sub 'main'
      goto L1
      print "never printed"

  L1:
      print "after branch"
  .end

The first print statement never runs because the goto always skips over it to the label L1.

The conditional branches combine if or unless with goto.

  .sub 'main'
      $I0 = 42
      if $I0 goto L1
      say "never printed"
  L1:
      say "after branch"
  .end

In this example, the goto branches to the label L1 only if the value stored in $I0 is true. The unless statement is similar, but it branches when the tested value is false. You can use PMC and STRING registers with if and unless. The op will call the get_bool vtable entry on any PMC so used and will convert the STRING to a boolean value. An undefined value, 0, or an empty string are all false values. All other values are true.

The comparison operators (<, <=, ==, !=, >, >=) can combine with if ... goto. These branch when the comparison is true:

  .sub 'main'
      $I0 = 42
      $I1 = 43
      if $I0 < $I1 goto L1
      say "never printed"
  L1:
      say "after branch"
  .end

This example compares $I0 to $I1 and branches to the label L1 if $I0 is less than $I1. The if $I0 < $I1 goto L1 statement translates directly to the lt branch operation.

Chapter 11's "PIR Instructions" section summarizes the other comparison operators.

PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:

  .sub 'main'
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto REDO  # end of loop

      say $I0
  .end

This example calculates the factorial 5!. Each time through the loop it multiplies $I0 by the current value of the counter $I1, decrements the counter, and branches to the start of the loop. The loop ends when $I1 counts down to 0. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.

For a while-style loop with the condition test at the start, use a conditional branch with an unconditional branch:

  .sub 'main'
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      if $I1 <= 0 goto LAST
      $I0 = $I0 * $I1
      dec $I1
      goto REDO
  LAST:                     # end of loop

      say $I0
  .end

This example tests the counter $I1 at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1 reaches 0 and the if branches to the LAST label. If the counter isn't a positive number before the loop, the loop never executes.

You can build any high-level flow control construct from conditional and unconditional branches; the lowest level of computer hardware works this way. All modern programming languages use branching constructs to implement their most complex flow control devices.

Branch instructions transfer control to a relative offset from the current instruction. The rightmost argument to every branch opcode is a label, which the assembler converts to the integer value of the offset. You can also branch on a literal integer value, but there's rarely any need to do so. The simplest branch instruction is branch:

    branch L1                # branch 4
    print "skipped\n"
  L1:
    print "after branch\n"

This example unconditionally branches to the location of the label L1, skipping over the first print statement.

Jump instructions transfer control to an absolute address. The jump opcode doesn't calculate an address from a label, so it's used together with set_addr:

    set_addr $I0, L1
    jump $I0
    print "skipped\n"
    end
  L1:
    print "after jump\n"

The set_addr opcode takes a label or an integer offset and returns an absolute address.

You've probably noticed the end opcode as the last statement in many examples above. This terminates the execution of the current run loop. Terminating the main bytecode segment (the first run loop) stops the interpreter. Without the end statement, execution just falls off the end of the bytecode segment, with a good chance of crashing the interpreter.

Conditional Branches

Unconditional jumps and branches aren't really enough for flow control. What you need to implement the control structures of high-level languages is the ability to select different actions based on a set of conditions. PIR has opcodes that conditionally branch based on the truth of a single value or the comparison of two values. The following example has if and unless conditional branches:

    set $I0, 0
    if $I0, TRUE
    unless $I0, FALSE
    print "skipped\n"
    end
  TRUE:
    print "shouldn't happen\n"
    end
  FALSE:
    print "the value was false\n"

if branches if its first argument is a true value, and unless branches if its first argument is a false value. In this case, the if doesn't branch because I0 is false, but the unless does branch. The comparison branching opcodes compare two values and branch if the stated relation holds true. These are eq (branch when equal), ne (when not equal), lt (when less than), gt (when greater than), le (when less than or equal), and ge (when greater than or equal). The two compared arguments must be the same register type:

    set $I0, 4
    set $I1, 4
    eq $I0, $I1, EQUAL
    print "skipped\n"
    end
  EQUAL:
    print "the two values are equal\n"

This compares two integers, I0 and I1, and branches if they're equal. Strings of different character sets or encodings are converted to Unicode before they're compared. PMCs have a cmp vtable method. This gets called on the left argument to perform the comparison of the two objects.

The comparison opcodes don't specify if a numeric or string comparison is intended. The type of the register selects for integers, floats, and strings. With PMCs, the vtable method cmp or is_equal of the first argument is responsible for comparing the PMC meaningfully with the other operand. If you need to force a numeric or string comparison on two PMCs, use the alternate comparison opcodes that end in the _num and _str suffixes.

  eq_str $P0, $P1, label     # always a string compare
  gt_num $P0, $P1, label     # always numerically

Finally, the eq_addr opcode branches if two PMCs or strings are actually the same object (have the same address):

  eq_addr $P0, $P1, same_pmcs_found

Iteration

PIR doesn't define high-level loop constructs. These are built up from a combination of conditional and unconditional branches. A do-while style loop can be constructed with a single conditional branch:

    set $I0, 0
    set $I1, 10
  REDO:
    inc $I0
    print $I0
    print "\n"
    lt $I0, $I1, REDO

This example prints out the numbers 1 to 10. The first time through, it executes all statements up to the lt statement. If the condition evaluates as true (I0 is less than I1) it branches to the REDO label and runs the three statements in the loop body again. The loop ends when the condition evaluates as false.

Conditional and unconditional branches can build up quite complex looping constructs, as follows:

    # loop ($i=1; $i<=10; $i++) {
    #    print "$i\n";
    # }
  loop_init:
    set $I0, 1
    branch loop_test
  loop_body:
    print $I0
    print "\n"
    branch loop_continue
  loop_test:
    le $I0, 10, loop_body
    branch out
  loop_continue:
    inc $I0
    branch loop_test
  out:
    # ... 

This example emulates a counter-controlled loop like Perl 6's loop keyword or C's for. The first time through the loop it sets the initial value of the counter in loop_init, tests that the loop condition is met in loop_test, and then executes the body of the loop in loop_body. If the test fails on the first iteration, the loop body will never execute. The end of loop_body branches to loop_continue, which increments the counter and then goes to loop_test again. The loop ends when the condition fails, and it branches to out. The example is more complex than it needs to be just to count to 10, but it nicely shows the major components of a loop.

Macros

Needs supplementing; needs moving.

Subroutines

Subroutines in PIR are roughly equivalent to the subroutines or methods of a high-level language. All code in a PIR source file must occur within a subroutine. The simplest syntax for a PIR subroutine starts with the .sub directive and ends with the .end directiveThe name main is only a convention.:

  .sub 'main'
      say "Hello, Polly."
  .end

This example defines a subroutine named main that prints a string "Hello, Polly.". Parrot will normally execute the first subroutine it encounters in the first file it runs, but you can flag any subroutine as the first one to execute with the :main marker:

  .sub 'first'
      say "Polly want a cracker?"
  .end

  .sub 'second' :main
      say "Hello, Polly."
  .end

This code prints out "Hello, Polly." but not "Polly want a cracker?". Though the first function appears first in the source code, second has the :main flag and gets called. first is never called. Revising that program produces different results:

  .sub 'first' :main
      say "Polly want a cracker?"
  .end

  .sub 'second'
      say "Hello, Polly."
  .end

The output now is "Polly want a cracker?". Execution in PIR starts at the :main function and continues until that function ends. To perform other operations, you must call other functions explicitly. Chapter 4 describes subroutines and their uses.

The most basic building block of code reuse in PIR is the subroutine. A large program may perform a calculation like "the factorial of a number" several times. Subroutines abstract this behavior into a single, named, stand-alone unit. PIR is a subroutine-based language in that all code in PIR must exist in a subroutine. Execution starts in the :main subroutine, which itself can call other subroutines. Subroutines can fit together into more elaborate chunks of code reusability such as methods and objects.

Parrot supports multiple high-level languages. Each language uses a different syntax for defining and calling subroutines. The goal of PIR is not to be a high-level language in itself, but to provide the basic tools that other languages can use to implement them. PIR's subroutine syntax may seem very primitive for this reason.

Parrot Calling Conventions

The .sub directive defines globally accessible subroutine objects.

Subroutine objects of all kinds can be called with the invoke opcode. There is also an invoke Px instruction for calling objects held in a different register.

The invokecc opcode is like invoke, but it also creates and stores a new return continuation. When the called subroutine invokes this return continuation, it returns control to the instruction after the function call. This kind of call is known as Continuation Passing Style (CPS).

The way that Parrot calls a subroutine -- passing arguments, altering control flow, and returning results -- is the "Parrot Calling Conventions" (PCC). Parrot generally hides the details of PCC from the programmer. PIR has several constructs to gloss over these details, and the average programmer will not need to worry about them. PCC uses the Continuation Passing Style (CPS) to pass control to subroutines and back again. Again, the details are irrelevant for most uses, but the power is available to anyone who wants to take advantage of it.

Subroutine Calls

PIR's simplest subroutine call syntax looks much like a subroutine call from a high-level language. This example calls the subroutine fact with two arguments and assigns the result to $I0:

  $I0 = 'fact'(count, product)

This simple statement hides much complexity. It generates a subroutine PMC object, creates a continuation PMC object to represent the control flow up to this point, passes the arguments, looks up the subroutine by name (and by signature, if necessary)), calls the subroutine, and assigns the results of the call to the appropriate integer register. This is all in addition to the computation the subroutine itself performs.

The single line subroutine call is incredibly convenient, but it isn't always flexible enough. PIR also has a more verbose call syntax that is still more convenient than manual calls. This example looks up the subroutine fact out in the global symbol table and calls it:

  find_global $P1, "fact"

  .begin_call
    .arg count
    .arg product
    .call $P1
    .result $I0
  .end_call

The whole chunk of code from .begin_call to .end_call acts as a single unit. The .arg directive sets up and passes arguments to the call. The .call directive calls the subroutine and identifies the point at which to return control flow after the subroutine has completed. The .result directive retrieves returned values from the call.

Subroutine Declarations

In addition to syntax for subroutine calls, PIR provides syntax for subroutine definitions: the .sub and .end directives shown in earlier examples. The .param directive defines input parameters and creates local named variables for them (similar to .local):

  .param int c

The .return directive allows the subroutine to return control flow to the calling subroutine, and optionally returns result output values.

Here's a complete code example that implements the factorial algorithm. The subroutine fact is a separate subroutine, assembled and processed after the main function. Parrot resolves global symbols like the fact label between different units.

  # factorial.pir
  .sub 'main' :main
     .local int count
     .local int product
     count   = 5
     product = 1

     $I0 = 'fact'(count, product)

     say $I0
  .end

  .sub 'fact'
     .param int c
     .param int p

  loop:
     if c <= 1 goto fin
     p = c * p
     dec c
     branch loop
  fin:
     .return (p)
  .end

This example defines two local named variables, count and product, and assigns them the values 1 and 5. It calls the fact subroutine with both variables as arguments. The fact subroutine uses .param to retrieve these parameters and .return to return the result. The final printed result is 120.

As usual, execution of the program starts at the :main subroutine.

Parameters and Arguments

Named Parameters

We have to get our terms straight here. Which are "arguments" (passed in) and which are "parameters" (processed from within).

Parameters passed only by their order are positional arguments. The only differentiator between positional arguments is their positions in the function call. Putting positional arguments in a different order will produce different effects, or may cause errors. Parrot also supports named parameters. Instead of passing parameters by their position in the string, parameters are passed by name and can be in any order. Here's an example:

 .sub 'MySub'
    .param string yrs :named("age")
    .param string call :named("name")
    $S0 = "Hello " . call
    $S1 = "You are " . yrs
    $S1 = $S1 . " years old"
    say $S0
    say $S1
 .end

 .sub 'main' :main
    'MySub'("age" => 42, "name" => "Bob")
 .end

You can also pass these pairs in the opposite order:

 .sub 'main' :main
    'MySub'("name" => "Bob", "age" => 42)    # Same!
 .end

Named arguments can be a big help because you don't have to worry about the exact order of variables, especially as argument lists get very long.

Optional Parameters

Some functions have arguments with appropriate default values, so that callers don't always have to pass them. Parrot provides a mechanism to identify optional argument. Parrot also provides a flag value to determine if the caller has passed in an optional argument.

Optional parameters appear in PIR as if they're actually two parameters: the value and its flag:

  .param string name     :optional
  .param int    has_name :opt_flag

The :optional flag specifies that the given parameter is optional. The :opt_flag specifies an integer which parameter contains a boolean flag; this flag is true if the value was passed, and false otherwise. To provide a default value for an optional parameter, you can write:

    .param string name     :optional
    .param int    has_name :opt_flag

    if has_name goto we_have_a_name
    name = "Default value"
  we_have_a_name:

Optional parameters can be positional or named parameters. When using them with positional parameters, they must appear at the end of the list of positional parameters. Also, the :opt_flag parameter must always appear directly after the :optional parameter.

  .sub 'Foo'
    .param int optvalue :optional
    .param int hasvalue :opt_flag
    .param pmc notoptional          # WRONG!
    ...

  .sub 'Bar'
     .param int hasvalue :opt_flag
     .param int optvalue :optional  # WRONG!
     ...

  .sub 'Baz'
    .param int optvalue :optional
    .param pmc notoptional
    .param int hasvalue :opt_flag   # WRONG!
    ...

You may mix optional parameters with named parameters:

  .sub 'MySub'
    .param int value     :named("answer") :optional
    .param int has_value :opt_flag
    ...

You can call this function in two ways:

  'MySub'("answer" => 42)  # with a value
  'MySub'()                # without

Commandline Arguments

Programs written in Parrot have access to arguments passed on the command line:

  .sub 'MyMain' :main
    .param pmc all_args :slurpy
    # ...
  .end

Please verify and expand.

The all_args PMC is a ResizableStringArray PMC, which means you can loop over the results, access them individually, or even modify them.

Continuations

A continuation is a subroutine that captures a complete copy of the caller's context. Invoking a continuation starts or restarts it at the entry point:

    new $P1, "Integer"
    set $P1, 5

    newsub $P0, 'Continuation', _con
  _con:
    print "in cont "
    print $P1
    print "\n"
    dec $P1
    unless $P1, done
    invoke                        # $P0
  done:
    print "done\n"

This prints:

  in cont 5
  in cont 4
  in cont 3
  in cont 2
  in cont 1
  done

Continuations are a kind of subroutine that take a snapshots of control flow. They are frozen images of the current execution state of the VM. Once you have a continuation, you can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program.

Continuations are like any other PMC; you can create one with the new opcode:

  $P0 = new 'Continuation'

The new continuation starts off in an undefined state. If you attempt to invoke a new continuation without initializing it, Parrot will throw an exception. To prepare the continuation for use, assign it a destination label with the set_addr opcode:

    $P0 = new 'Continuation'
    set_addr $P0, my_label

  my_label:
    # ...

To jump to the continuation's stored label and return the context to the state it was in at the point of its creation, invoke the continuation:

  invoke $P0  # Explicit using "invoke" opcode
  $P0()       # Same, but nicer syntax

Even though you can use the subroutine notation $P0() to invoke the continuation, it doesn't make any sense to pass arguments or obtain return values:

  $P0 = new 'Continuation'
  set_addr $P0, my_label

  $P0(1, 2)      # WRONG!

  $P1 = $P0()    # WRONG!

Continuation Passing Style

Parrot uses continuations internally for control flow. When Parrot invokes a function, it creates a continuation representing the current point in the program. It passes this continuation as an invisible parameter to the function call. When that function returns, it invokes the continuation -- in effect, it performs a goto to the point of creation of that continuation. If you have a continuation, you can invoke it to return to its point of creation any time you want.

This type of flow control -- invoking continuations instead of performing bare jumps -- is Continuation Passing Style (CPS).

Tailcalls

In many cases, a subroutine will set up and call another subroutine, and then return the result of the second call directly. This is a tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:

  call add_two(5)

  subroutine add_two(value)
    value = add_one(value)
    return add_one(value)

In this example, the subroutine add_two makes two calls to c<add_one>. The second call to add_one is the return value. add_one gets called; its result gets returned to the caller of add_two. Nothing in add_two uses that return value directly.

A simple optimization is available for this type of code. The second call to add_one can return to the same place that add_two returns; therefore, it's perfectly safe and correct to use the same return continuation that add_two uses. The two subroutine calls can share a return continuation, instead of having to create a new continuation for each call.

PIR provides the .tailcall directive to identify similar situations. Use it in place of the .return directive. .tailcall performs this optimization by reusing the return continuation of the parent function to make the tailcall:

  .sub 'main' :main
      .local int value
      value = add_two(5)
      say value
  .end

  .sub 'add_two'
      .param int value
      .local int val2
      val2 = add_one(value)
      .tailcall add_one(val2)
  .end

  .sub 'add_one'
      .param int a
      .local int b
      b = a + 1
      .return (b)
  .end

This example above will print out the correct value "7".

Native Call Interface

A special version of the Parrot calling conventions are used by the Native Call Interface (NCI) for calling subroutines with a known prototype in shared libraries. This is not really portable across all libraries, but it's worth a short example. This is a simplified version of the first test in t/pmc/nci.t:

    loadlib $P1, "libnci_test"      # get library object for a shared lib
    print "loaded\n"
    dlfunc $P0, $P1, "nci_dd", "dd" # obtain the function object
    print "dlfunced\n"
    set $I0, 1                      # prototype used - unchecked
    set_args "0", 4.0               # set the argument
    get_results "0", $N5            # prepare to store the return value
    invokecc $P0                    # call nci_dd
    ne $N5, 8.0, nok_1              # the test functions returns 2*arg
    print "ok 1\n"
    end
    nok_1:
    #...

This example shows two new instructions: loadlib and dlfunc. The loadlib opcode obtains a handle for a shared library. It searches for the shared library in the current directory, in runtime/parrot/dynext, and in a few other configured directories. It also tries to load the provided filename unaltered and with appended extensions like .so or .dll. Which extensions it tries depends on the OS Parrot is running on.

The dlfunc opcode gets a function object from a previously loaded library (second argument) of a specified name (third argument) with a known function signature (fourth argument). The function signature is a string where the first character is the return value and the rest of the parameters are the function parameters. The characters used in NCI function signatures are listed in Table 9-5.

For more information on callback functions, read the documentation in docs/pdds/pdd16_native_call.pod and docs/pmc/struct.pod.

Coroutines

As we mentioned in the previous chapter, coroutines are subroutines that can suspend themselves and return control to the caller--and then pick up where they left off the next time they're called, as if they never left.

In PIR, coroutines are subroutine-like objects:

  newsub P0, .Coroutine, _co_entry

The Coroutine object has its own user stack, register frame stacks, control stack, and pad stack. The pad stack is inherited from the caller. The coroutine's control stack has the caller's control stack prepended, but is still distinct. When the coroutine invokes itself, it returns to the caller and restores the caller's context (basically swapping all stacks). The next time the coroutine is invoked, it continues to execute from the point at which it previously returned:

    new_pad 0                # push a new lexical pad on stack
    new P0, "Int"            # save one variable in it
    set P0, 10
    store_lex -1, "var", P0

    newsub P0, .Coroutine, _cor
                             # make a new coroutine object
    saveall                  # preserve environment
    invoke                   # invoke the coroutine
    restoreall
    print "back\n"
    saveall
    invoke                   # invoke coroutine again
    restoreall
    print "done\n"
    pop_pad
    end

  _cor:
    find_lex P1, "var"       # inherited pad from caller
    print "in cor "
    print P1
    print "\n"
    inc P1                   # var++
    saveall
    invoke                   # yield(  )
    restoreall
    print "again "
    branch _cor              # next invocation of the coroutine

This prints out the result:

  in cor 10
  back
  again in cor 11
  done

The invoke inside the coroutine is commonly referred to as yield. The coroutine never ends. When it reaches the bottom, it branches back up to _cor and executes until it hits invoke again.

The interesting part about this example is that the coroutine yields in the same way that a subroutine is called. This means that the coroutine has to preserve its own register values. This example uses saveall but it could have only stored the registers the coroutine actually used. Saving off the registers like this works because coroutines have their own register frame stacks.

We've mentioned coroutines several times before, and we're finally going to explain what they are. Coroutines are similar to subroutines except that they have an internal notion of state And the cool new name!. Coroutines, in addition to performing a normal .return to return control flow back to the caller and destroy the lexical environment of the subroutine, may also perform a .yield operation. .yield returns a value to the caller like .return can, but it does not destroy the lexical state of the coroutine. The next time the coroutine is called, it continues execution from the point of the last .yield, not at the beginning of the coroutine.

In a Coroutine, when we continue from a .yield, the entire lexical environment is the same as it was when .yield was called. This means that the parameter values don't change, even if we call the coroutine with different arguments later.

Coroutines are defined like any ordinary subroutine. They do not require any special flag or any special syntax to mark them as being a coroutine. However, what sets them apart is the use of the .yield directive. .yield plays several roles:

Here is a quick example of a simple coroutine:

  .sub 'MyCoro'
    .yield(1)
    .yield(2)
    .yield(3)
    .return(4)
  .end

  .sub 'main' :main
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
  .end

This is obviously a contrived example, but it demonstrates how the coroutine stores it's state. The coroutine stores it's state when we reach a .yield directive, and when the coroutine is called again it picks up where it last left off. Coroutines also handle parameters in a way that might not be intuitive. Here's an example of this:

  .sub 'StoredConstant'
    .param int x
    .yield(x)
    .yield(x)
    .yield(x)
  .end

  .sub 'main' :main
    $I0 = StoredConstant(5)       # $I0 = 5
    $I0 = StoredConstant(6)       # $I0 = 5
    $I0 = StoredConstant(7)       # $I0 = 5
    $I0 = StoredConstant(8)       # $I0 = 8
  .end

Notice how even though we are calling the StoredConstant coroutine with different arguments each time, the value of parameter x doesn't change until the coroutine's state resets after the last .yield. Remember that a continuation takes a snapshot of the current state, and the .yield directive takes a continuation. The next time we call the coroutine, it invokes the continuation internally, and returns us to the exact same place in the exact same condition as we were when we called the .yield. In order to reset the coroutine and enable it to take a new parameter, we must either execute a .return directive or reach the end of the coroutine.

Multiple Dispatch

Multiple dispatch is when there are multiple subroutines in a single namespace with the same name. These functions must differ, however, in their parameter list, or "signature". All subs with the same name get put into a single PMC called a MultiSub. The MultiSub is like a list of subroutines. When the multisub is invoked, the MultiSub PMC object searches through the list of subroutines and searches for the one with the closest matching signature. The best match is the sub that gets invoked.

MultiSubs are subroutines with the :multi flag applied to them. MultiSubs (also called "Multis") must all differ from one another in the number and/or type of arguments passed to the function. Having two multisubs with the same function signature could result in a parsing error, or the later function could overwrite the former one in the multi.

Multisubs are defined like this:

  .sub 'MyMulti' :multi
      # does whatever a MyMulti does
  .end

Multis belong to a specific namespace. Functions in different namespaces with the same name do not conflict with each other this is one of the reasons for having multisubs in the first place!. It's only when multiple functions in a single namespace need to have the same name that a multi is used.

Multisubs take a special designator called a multi signature. The multi signature tells Parrot what particular combination of input parameters the multi accepts. Each multi will have a different signature, and Parrot will be able to dispatch to each one depending on the arguments passed. The multi signature is specified in the :multi directive:

  .sub 'Add' :multi(I, I)
    .param int x
    .param int y
    .return(x + y)
  .end

  .sub 'Add' :multi(N, N)
    .param num x
    .param num y
    .return(x + y)
  .end

  .sub 'Start' :main
    $I0 = Add(1, 2)      # 3
    $N0 = Add(3.14, 2.0) # 5.14
    $S0 = Add("a", "b")  # ERROR! No (S, S) variant!
  .end

Multis can take I, N, S, and P types, but they can also use _ (underscore) to denote a wildcard, and a string that can be the name of a particular PMC type:

  .sub 'Add' :multi(I, I)  # Two integers
    ...

  .sub 'Add' :multi(I, 'Float')  # An integer and Float PMC
    ...

                           # Two Integer PMCs
  .sub 'Add' :multi('Integer', _)
    ...

When we call a multi PMC, Parrot will try to take the most specific best-match variant, and will fall back to more general variants if a perfect best-match cannot be found. So if we call 'Add'(1, 2), Parrot will dispatch to the (I, I) variant. If we call 'Add'(1, "hi"), Parrot will match the (I, _) variant, since the string in the second argument doesn't match I or 'Float'. Parrot can also choose to automatically promote one of the I, N, or S values to an Integer, Float, or String PMC.

To make the decision about which multi variant to call, Parrot takes a Manhattan Distance between the two. Parrot calculates the distance between the multi signatures and the argument signature. Every difference counts as one step. A difference can be an autobox from a primitive type to a PMC, or the conversion from one primitive type to another, or the matching of an argument to a _ wildcard. After Parrot calculates the distance to each variant, it calls the function with the lowest distance. Notice that it's possible to define a variant that is impossible to call: for every potential combination of arguments there is a better match. This isn't necessarily a common occurrence, but it's something to watch out for in systems with a lot of multis and a limited number of data types in use.

Sub PMCs

Subroutines are a PMC type in Parrot. You can store them in PMC registers and manipulate them just as you do the other PMC types. Look up a subroutine in the current namespace with the get_global opcode:

  $P0 = get_global "MySubName"

To find a subroutine in a different namespace, first look up the appropriate the namespace PMC, then use that with get_global:

  $P0 = get_namespace "MyNamespace"
  $P1 = get_global $P0, "MySubName"

You can obviously invoke a Sub PMC:

  $P0(1, 2, 3)

You can get or even change its name:

  $S0 = $P0               # Get the current name
  $P0 = "MyNewSubName"    # Set a new name

You can get a hash of the complete metadata for the subroutine:

  $P1 = inspect $P0

The metadata fields in this hash are

Instead of getting the whole inspection hash, you ask about individual pieces of metadata:

  $I0 = inspect $P0, "pos_required"

To discover to get the total number of defined parameters to the Sub, call the arity method:

  $I0 = $P0.'arity'()

To fetch the namespace PMC that the Sub was defined into, call the get_namespace method:

  $P1 = $P0.'get_namespace'()

Evaluating a Code String

This isn't really a subroutine operation, but it does produce a code object that can be invoked. In this case, it's a bytecode segment object.

The first step is to get an assembler or compiler for the target language:

  compreg $P1, "PIR"

Within the Parrot interpreter there are currently three registered languages: PASM, PIR, and PASM1. The first two are for parrot assembly language and parrot intermediate representation code. The third is for evaluating single statements in PASM. Parrot automatically adds an end opcode at the end of PASM1 strings before they're compiled.

This example places a bytecode segment object into the destination register P0 and then invokes it with invoke:

  compreg P1, "PASM1"                # get compiler
  set S1, "in eval\n"
  compile P0, P1, "print S1"
  invoke                             # eval code P0
  print "back again\n"

You can register a compiler or assembler for any language inside the Parrot core and use it to compile and invoke code from that language. These compilers may be written in PIR or reside in shared libraries.

  compreg "MyLanguage", $P10

In this example the compreg opcode registers the subroutine-like object P10 as a compiler for the language "MyLanguage". See examples/compilers and examples/japh/japh16.pasm for an external compiler in a shared library.

Lexicals and Globals

So far, we've been treating Parrot registers like the variables of a high-level language. This is fine, as far as it goes, but it isn't the full picture. The dynamic nature and introspective features of languages like Perl make it desirable to manipulate variables by name, instead of just by register or stack location. These languages also have global variables, which are visible throughout the entire program. Storing a global variable in a register would either tie up that register for the lifetime of the program or require some unwieldy way to shuffle the data into and out of registers.

Parrot provides structures for storing both global and lexically scoped named variables. Lexical and global variables must be PMC values. PIR provides instructions for storing and retrieving variables from these structures so the PIR opcodes can operate on their values.

Globals

Global variables are stored in a Hash, so every variable name must be unique. PIR has two opcodes for globals, set_global and get_global:

  new P10, "Int"
  set P10, 42
  set_global "$foo", P10
  # ...
  get_global P0, "$foo"
  print P0                        # prints 42

The first two statements create a Int in the PMC register P10 and give it the value 42. In the third statement, set_global stores that PMC as the named global variable $foo. At some later point in the program, get_global retrieves the PMC from the global variable by name, and stores it in P0 so it can be printed.

The set_global opcode only stores a reference to the object. If we add an increment statement:

  inc $P10

after the set_global it increments the stored global, printing 43. If that's not what you want, you can clone the PMC before you store it. Leaving the global variable as an alias does have advantages, though. If you retrieve a stored global into a register and modify it as follows:

  get_global P0, "varname"
  inc P0

the value of the stored global is directly modified, so you don't need to call set_global again.

The two-argument forms of set_global and get_global store or retrieve globals from the outermost namespace (what Perl users will know as the "main" namespace). A simple flat global namespace isn't enough for most languages, so Parrot also needs to support hierarchical namespaces for separating packages (classes and modules in Perl 6). Use set_rootglobal and get_root_global add an argument to select a nested namespace:

  set_root_global ["Foo"], "var", P0 # store P0 as var in the Foo namespace
  get_root_global P1, ["Foo"], "var"  # get Foo::var

Eventually the global opcodes will have variants that take a PMC to specify the namespace, but the design and implementation of these aren't finished yet.

Lexicals

Lexical variables are stored in a lexical scratchpad. There's one pad for each lexical scope. Every pad has both a hash and an array, so elements can be stored either by name or by numeric index.

Basic instructions

To store a lexical variable in the current scope pad, use store_lex. Likewise, use find_lex to retrieve a variable from the current pad.

  new $P0, "Int"            # create a variable
  set $P0, 10               # assign value to it
  store_lex "foo", $P0      # store the var with the variable name "foo"
  # ...
  find_lex $P1, "foo"       # get the var "foo" into P1
  print $P1
  print "\n"                # prints 10

As we have seen above, we can declare a new subroutine to be a nested inner subroutine of an existing outer subroutine using the :outer flag. The outer flag is used to specify the name of the outer subroutine. Where there may be multiple subroutines with the same name such is the case with multisubs, which we will discuss soon, we can use the :subid flag on the outer subroutine to give it a different--and unique--name that the lexical subroutines can reference in their :outer declarations. Within lexical subroutines, the .lex command defines a local variable that follows these scoping rules.

LexPad and LexInfo PMCs

Information about lexical variables in a subroutine is stored in two different types of PMCs: The LexPad PMC that we already mentioned briefly, and the LexInfo PMCs which we haven't. Neither of these PMC types are really usable from PIR code, but are instead used by Parrot internally to store information about lexical variables.

LexInfo PMCs are used to store information about lexical variables at compile time. This is read-only information that is generated during compilation to represent what is known about lexical variables. Not all subroutines get a LexInfo PMC by default, you need to indicate to Parrot somehow that you require a LexInfo PMC to be created. One way to do this is with the .lex directive that we looked at above. Of course, the .lex directive only works for languages where the names of lexical variables are all known at compile time. For languages where this information isn't known, the subroutine can be flagged with :lex instead.

LexPad PMCs are used to store run-time information about lexical variables. This includes their current values and their type information. LexPad PMCs are created at runtime for subs that have a LexInfo PMC already. These are created each time the subroutine is invoked, which allows for recursive subroutine calls without overwriting variable names.

With a Subroutine PMC, you can get access to the associated LexInfo PMC by calling the 'get_lexinfo' method:

  $P0 = find_global "MySubroutine"
  $P1 = $P0.'get_lexinfo'()

Once you have the LexInfo PMC, there are a limited number of operations that you can call with it:

  $I0 = elements $P1    # Get the number of lexical variables from it
  $P0 = $P1["name"]     # Get the entry for lexical variable "name"

There really isn't much else useful to do with LexInfo PMCs, they're mostly used by Parrot internally and aren't helpful to the PIR programmer.

There is no easy way to get a reference to the current LexPad PMC in a given subroutine, but like LexInfo PMCs that doesn't matter because they aren't useful from PIR anyway. Remember that subroutines themselves can be lexical and that therefore the lexical environment of a given variable can extend to multiple subroutines and therefore multiple LexPads. The opcodes find_lex and store_lex automatically search through nested LexPads recursively to find the proper environment information about the given variables.

Lexical Subroutines

Parrot offers support for lexical subroutines. You can define a subroutine by name inside a larger subroutine, where the inner subroutine is only visible and callable from the outer. The inner subroutine inherits all the lexical variables from the outer subroutine, but can itself define its own lexical variables that the outer subroutine cannot access. PIR lacks the concept of blocks or nested lexical scopes; this is how it performs the same function.

If the subroutine is lexical, you can get its :outer with the get_outer method on the Sub PMC:

  $P1 = $P0.'get_outer'()

If there is no :outer PMC, this returns a NULL PMC. Conversely, you can set the outer sub:

  $P0.'set_outer'($P1)

Scope and HLLs

As mentioned previously, High Level Languages such as Perl, Python, and Ruby allow nested scopes, or blocks within blocks that can have their own lexical variables. Even this construct is common in the C programming language:

  {
      int x = 0;
      int y = 1;
      {
          int z = 2;
          /* x, y, and z are all visible here */
      }

      /* only x and y are visible here */
  }

In the inner block, all three varaibles are visible. The variable z is only visible inside that block. The outer block has no knowledge of z. A very direct, naiumlve translation of this code to PIR might be:

  .param int x
  .param int y
  .param int z
  x = 0
  y = 1
  z = 2
  ...

This PIR code is similar, but the handling of the variable z is different: z is visible throughout the entire current subroutine, where it is not visible throughout the entire C function. To help approximate this effect, PIR supplies lexical subroutines to create nested lexical scopes.

PIR Scoping

Only one PIR structure supports scoping like this: the subroutine... and objects that inherit from subroutines, such as methods, coroutines, and multisubs. There are no blocks in PIR that have their own scope besides subroutines. Fortunately, we can use these lexical subroutines to simulate this behavior that HLLs require:

  .sub 'MyOuter'
      .local int x,y
      .lex 'x', x
      .lex 'y', y
      'MyInner'()
      # only x and y are visible here
  .end

  .sub 'MyInner' :outer('MyOuter')
      .local int z
      .lex 'z', z
      #x, y, and z are all "visible" here
  .end

In the example above we put the word "visible" in quotes. This is because lexically-defined variables need to be accessed with the get_lex and set_lex opcodes. These two opcodes don't just access the value of a register, where the value is stored while it's being used, but they also make sure to interact with the LexPad PMC that's storing the data. If the value isn't properly stored in the LexPad, then they won't be available in nested inner subroutines, or available from :outer subroutines either.

Namespaces

Namespaces provide a mechanism where names can be reused. This may not sound like much, but in large complicated systems, or systems with many included libraries, it can be very handy. Each namespace gets its own area for function names and global variables. This way you can have multiple functions named create or new or convert, for instance, without having to use Multi-Method Dispatch (MMD) which we will describe later. Namespaces are also vital for defining classes and their methods, which we already mentioned. We'll talk about all those uses here.

Namespaces are specified with the .namespace [] directive. The brackets are not optional, but the keys inside them are. Here are some examples:

  .namespace [ ]               # The root namespace
  .namespace [ "Foo" ]         # The namespace "Foo"
  .namespace [ "Foo" ; "Bar" ] # Namespace Foo::Bar
  .namespace                   # WRONG! The [] are needed

Using semicolons, namespaces can be nested to any arbitrary depth. Namespaces are special types of PMC, so we can access them and manipulate them just like other data objects. We can get the PMC for the root namespace using the get_root_namespace opcode:

  $P0 = get_root_namespace

The current namespace, which might be different from the root namespace can be retrieved with the get_namespace opcode:

  $P0 = get_namespace             # get current namespace PMC
  $P0 = get_namespace ["Foo"]     # get PMC for namespace "Foo"

Namespaces are arranged into a large n-ary tree. There is the root namespace at the top of the tree, and in the root namespace are various special HLL namespaces. Each HLL compiler gets its own HLL namespace where it can store its data during compilation and runtime. Each HLL namespace may have a large hierarchy of other namespaces. We'll talk more about HLL namespaces and their significance in chapter 10.

The root namespace is a busy place. Everybody could be lazy and use it to store all their subroutines and global variables, and then we would run into all sorts of collisions. One library would define a function "Foo", and then another library could try to create another subroutine with the same name. This is called namespace pollution, because everybody is trying to put things into the root namespace, and those things are all unrelated to each other. Best practices requires that namespaces be used to hold private information away from public information, and to keep like things together.

As an example, the namespace Integers could be used to store subroutines that deal with integers. The namespace images could be used to store subroutines that deal with creating and manipulating images. That way, when we have a subroutine that adds two numbers together, and a subroutine that performs additive image composition, we can name them both add without any conflict or confusion. And within the image namespace we could have sub namespaces for jpeg and MRI and schematics, and each of these could have a add method without getting into each other's way.

The short version is this: use namespaces. There aren't any penalties to them, and they do a lot of work to keep things organized and separated.

Namespace PMC

The .namespace directive that we've seen sets the current namespace. In PIR code, we have multiple ways to address a namespace:

  # Get namespace "a/b/c" starting at the root namespace
  $P0 = get_root_namespace ["a" ; "b" ; "c"]

  # Get namespace "a/b/c" starting in the current HLL namespace.
  $P0 = get_hll_namespace ["a" ; "b" ; "c"]
  # Same
  $P0 = get_root_namespace ["hll" ; "a" ; "b" ; "c"]

  # Get namespace "a/b/c" starting in the current namespace
  $P0 = get_namespace ["a" ; "b" ; "c"]

Once we have a namespace PMC we can retrieve global variables and subroutine PMCs from it using the following functions:

  $P1 = get_global $S0            # Get global in current namespace
  $P1 = get_global ["Foo"], $S0   # Get global in namespace "Foo"
  $P1 = get_global $P0, $S0       # Get global in $P0 namespace PMC

Operations on the Namespace PMC

We've seen above how to find a Namespace PMC. Once you have it, there are a few things you can do with it. You can find methods and variables that are stored in the namespace, or you can add new ones:

  $P0 = get_namespace
  $P0.'add_namespace'($P1)      # Add Namespace $P1 to $P0
  $P1 = $P0.'find_namespace'("MyOtherNamespace")

  # Find namespace "MyNamespace" in $P0, create it if it
  #    doesn't exist
  $P1 = $P0.'make_namespace'("MyNamespace")

  $P0.'add_sub'("MySub", $P2)   # Add Sub PMC $P2 to the namespace
  $P1 = $P0.'find_sub'("MySub") # Find it

  $P0.'add_var'("MyVar", $P3)   # Add variable "MyVar" in $P3
  $P1 = $P0.'find_var'("MyVar") # Find it

  # Return the name of Namespace $P0 as a ResizableStringArray
  $P3 = $P0.'get_name'()

  # Find the parent namespace that contains this one:
  $P5 = $P0.'get_parent'()

  # Get the Class PMC associated with this namespace:
  $P6 = $P0.'get_class'()

There are a few other operations that can be done on Namespaces, but none as interesting as these. We'll talk about Namespaces throughout the rest of this chapter.

Classes and Objects

This section revolves around one complete example that defines a class, instantiates objects, and uses them. The whole example is included at the end of the section.

Class declaration

The newclass opcode defines a new class. It takes two arguments, the name of the class and the destination register for the class PMC. All classes (and objects) inherit from the ParrotClass PMC, which is the core of the Parrot object system.

    newclass $P1, "Foo"

To instantiate a new object of a particular class, you first look up the integer value for the class type with the find_type opcode, then create an object of that type with the new opcode:

    find_type I1, "Foo"
    new P3I I1

The new opcode also checks to see if the class defines a method named "__init" and calls it if it exists.

Attributes

The addattribute opcode creates a slot in the class for an attribute (sometimes known as an instance variable) and associates it with a name:

    addattribute $P1, ".i"                # Foo.i

This chunk of code from the __init method looks up the position of the first attribute, creates a Int PMC, and stores it as the first attribute:

    classoffset $I0, $P2, "Foo"    # first "Foo" attribute of object P2
    new $P6, "Int"                 # create storage for the attribute
    setattribute $P2, $I0, $P6     # store the first attribute

The classoffset opcode takes a PMC containing an object and the name of its class, and returns an integer index for the position of the first attribute. The setattribute opcode uses the integer index to store a PMC value in one of the object's attribute slots. This example initializes the first attribute. The second attribute would be at I0 + 1, the third attribute at I0 + 2, etc:

    inc $I0
    setattribute $P2, $I0, $P7       # store next attribute
    #...

There is also support for named parameters with fully qualified parameter names (although this is a little bit slower than getting the class offset once and accessing several attributes by index):

    new $P6, "Int"
    setattribute $P2, "Foo\x0.i", $P6   # store the attribute

You use the same integer index to retrieve the value of an attribute. The getattribute opcode takes an object and an index as arguments and returns the attribute PMC at that position:

    classoffset $I0, $P2, "Foo"         # first "Foo" attribute of object P2
    getattribute $P10, $P2, $I0         # indexed get of attribute

or

    getattribute $P10, $P2, "Foo\x0.i"  # named get

To set the value of an attribute PMC, first retrieve it with getattribute and then assign to the returned PMC. Because PMC registers are only pointers to values, you don't need to store the PMC again after you modify its value:

    getattribute $P10, $P2, $I0
    set $P10, $I5

Methods

Methods in PIR are just subroutines installed in the namespace of the class. You define a method with the .pcc_sub directive before the label:

This routine returns half of the value of the first attribute of the object. Method calls use the Parrot calling conventions so they always pass the invocant object (often called self) in P2. Invoking the return continuation in P1 returns control to the caller.

The .pcc_sub directive automatically stores the subroutine as a global in the current namespace. The .namespace directive sets the current namespace:

  .namespace [ "Foo" ]

If the namespace is explicitly set to an empty string or key, then the subroutine is stored in the outermost namespace.

The callmethodcc opcode makes a method call. It follows the Parrot calling conventions, so it expects to find the invocant object in P2, the method object in P0, etc. It adds one bit of magic, though. If you pass the name of the method in S0, callmethodcc looks up that method name in the invocant object and stores the method object in P0 for you:

    set $S0, "_half"            # set method name
    set $P2, $P3                # the object
    callmethodcc                # create return continuation, call
    print $I5                   # result of method call
    print "\n"

The callmethodcc opcode also generates a return continuation and stores it in P1. The callmethod opcode doesn't generate a return continuation, but is otherwise identical to callmethodcc. Just like ordinary subroutine calls, you have to preserve and restore any registers you want to keep after a method call. Whether you store individual registers, register frames, or half register frames is up to you.

Overriding vtable functions

Every object inherits a default set of vtable functions from the ParrotObject PMC, but you can also override them with your own methods. The vtable functions have predefined names that start with a double underscore "__". The following code defines a method named __init in the Foo class that initializes the first attribute of the object with an integer:

  .sub __init:
    classoffset I0, P2, "Foo"     # lookup first attribute position
    new P6, "Int"                 # create storage for the attribute
    setattribute P2, I0, P6       # store the first attribute
    invoke P1                     # return

Ordinary methods have to be called explicitly, but the vtable functions are called implicitly in many different contexts. Parrot saves and restores registers for you in these calls. The __init method is called whenever a new object is constructed:

    find_type I1, "Foo"
    new P3, I1          # call __init if it exists

A few other vtable functions in the complete code example for this section are __set_integer_native, __add, __get_integer, __get_string, and __increment. The set opcode calls Foo's __set_integer_native vtable function when its destination register is a Foo object and the source register is a native integer:

    set $P3, 30          # call __set_integer_native method

The add opcode calls Foo's __add vtable function when it adds two Foo objects:

    new $P4, $I1          # same with P4
    set $P4, $12
    new $P5, $I1          # create a new store for add

    add $P5, $P3, $P4     # __add method

The inc opcode calls Foo's __increment vtable function when it increments a Foo object:

    inc $P3              # __increment

Foo's __get_integer and __get_string vtable functions are called whenever an integer or string value is retrieved from a Foo object:

    set $I10, $P5         # __get_integer
    #...
    print $P5            # calls __get_string, prints 'fortytwo'

Inheritance

The subclass opcode creates a new class that inherits methods and attributes from another class. It takes 3 arguments: the destination register for the new class, a register containing the parent class, and the name of the new class:

    subclass $P3, $P1, "Bar"

For multiple inheritance, the addparent opcode adds additional parents to a subclass.

  newclass $P4, "Baz"
  addparent $P3, $P4

To override an inherited method, define a method with the same name in the namespace of the subclass. The following code overrides Bar's __increment method so it decrements the value instead of incrementing it:

  .namespace [ "Bar" ]

  .sub __increment:
    classoffset I0, P2, "Foo"     # get Foo's attribute slot offset
    getattribute P10, P2, I0      # get the first Foo attribute
    dec P10                       # the evil line
    invoke P1

Notice that the attribute inherited from Foo can only be looked up with the Foo class name, not the Bar class name. This preserves the distinction between attributes that belong to the class and inherited attributes.

Object creation for subclasses is the same as for ordinary classes:

    find_type $I1, "Bar"
    new $P5, $I1

Calls to inherited methods are just like calls to methods defined in the class:

    set $P5, 42                  # inherited __set_integer_native
    inc $P5                      # overridden __increment
    print $P5                    # prints 41 as Bar's __increment decrements
    print "\n"

    set $S0, "_half"             # set method name
    set $P2, $P5                 # the object
    callmethodcc                 # create return continuation, call
    print $I5
    print "\n"

Additional Object Opcodes

The isa and can instructuions are also useful when working with objects. isa checks whether an object belongs to or inherits from a particular class. can checks whether an object has a particular method. Both return a true or false value.

    $I0 = isa $P3, "Foo"         # 1
    $I0 = isa $P3, "Bar"         # 1
    $I0 = can $P3, "add"         # 1

It may seem more appropriate for a discussion of PIR's support for classes and objects to reside in its own chapter, instead of appearing in a generic chapter about PIR programming "basics". However, part of PIR's core functionality is its support for object-oriented programming. PIR doesn't use all the fancy syntax as other OO languages, and it doesn't even support all the features that most modern OO languages have. What PIR does have is support for some of the basic structures and abilities, the necessary subset to construct richer and higher-level object systems.

Attributes

Classes and subclasses can be given attributes in addition to methods, which we will talk about in the next chapter which are named data fields. Attributes are created with the addattribute opcode, and can be set and retrieved with the setattribute and getattribute opcodes respectively:

  # Create the new class with two attributes
  $P0 = newclass 'MyClass'
  addattribute $P0, 'First'
  addattribute $P0, 'Second'

  # Create a new item of type MyClass
  $P1 = new 'MyClass'

  # Set values to the attributes
  setattribute $P1, 'First', 'First Value'
  setattribute $P1, 'Second', 'Second Value'

  # Get the attribute values
  $S0 = getattribute $P1, 'First'
  $S1 = getattribute $P1, 'Second'

Those values added as attributes don't need to be strings, even though both of the ones in the example are. They can be integers, numbers or PMCs too.

Methods

PIR provides syntax to simplify writing methods and method calls for object-oriented programming. We've seen some method calls in the examples above, especially when we were talking about the interfaces to certain PMC types. We've also seen a little bit of information about classes and objects in the previous chapter. PIR allows you to define your own classes, and with those classes you can define method interfaces to them. Method calls follow the same Parrot calling conventions that we have seen above, including all the various parameter configurations, lexical scoping, and other aspects we have already talked about.

The second type of class can be defined in PIR at runtime. We saw some examples of this in the last chapter using the newclass and subclass opcodes. We also talked about class attribute values. Now, we're going to talk about associating subroutines with these classes, and they're called methods. Methods are just like other normal subroutines with two major changes: they are marked with the :method flag, and they exist in a namespace. Before we can talk about methods, we need to discuss namespaces first.

Methods are just like subroutines, except they are invoked on a object PMC, and that PMC is passed as the c<self> parameter.

The basic syntax for a method call is similar to the single line subroutine call above. It takes a variable for the invocant PMC and a string with the name of the method:

  object."methodname"(arguments)

Notice that the name of the method must be contained in quotes. If the name of the method is not contained in quotes, it's treated as a named variable that does. Here's an example:

  .local string methname = "Foo"
  object.methname()               # Same as object."Foo"()
  object."Foo"()                  # Same

The invocant can be a variable or register, and the method name can be a literal string, string variable, or method object PMC.

Defining Methods

Methods are defined like any other subroutine except with two major differences: They must be inside a namespace named after the class they are a part of, and they must use the :method flag.

  .namespace [ "MyClass"]

  .sub "MyMethod" :method
    ...

Inside the method, the invocant object can be accessed using the self keyword. self isn't the only name you can call this value, however. You can also use the :invocant flag to define a new name for the invocant object:

(See TT #483)

  .sub "MyMethod" :method
    $S0 = self                    # Already defined as "self"
    say $S0
  .end

  .sub "MyMethod2" :method
    .param pmc item :invocant     # "self" is now called "item"
    $S0 = item
    say $S0
  .end

This example defines two methods in the Foo class. It calls one from the main body of the subroutine and the other from within the first method:

  .sub main
    .local pmc class
    .local pmc obj
    newclass class, "Foo"       # create a new Foo class
    new obj, "Foo"              # instantiate a Foo object
    obj."meth"()                # call obj."meth" which is actually
    print "done\n"              # in the "Foo" namespace
  .end

  .namespace [ "Foo" ]          # start namespace "Foo"

  .sub meth :method             # define Foo::meth global
     print "in meth\n"
     $S0 = "other_meth"         # method names can be in a register too
     self.$S0()                 # self is the invocant
  .end

  .sub other_meth :method       # define another method
     print "in other_meth\n"    # as above Parrot provides a return
  .end                          # statement

Each method call looks up the method name in the object's class namespace. The .sub directive automatically makes a symbol table entry for the subroutine in the current namespace.

When a .sub is declared as a :method, it automatically creates a local variable named self and assigns it the object passed in P2. You don't need to write .param pmc self to get it, it comes free with the method.

You can pass multiple arguments to a method and retrieve multiple return values just like a single line subroutine call:

  (res1, res2) = obj."method"(arg1, arg2)

Introspection

The details about various PMC classes are managed by the Class PMC. Class PMCs contain information about the class, available methods, the inheritance hierarchy of the class, and various other details. Classes can be created with the newclass opcode:

  $P0 = newclass "MyClass"

Once we have created the class PMC, we can instantiate objects of that class using the new opcode. The new opcode takes either the class name or the Class PMC as an argument:

  $P1 = new $P0        # $P0 is the Class PMC
  $P2 = new "MyClass"  # Same

The new opcode can create two different types of PMC. The first type are the built-in core PMC classes. The built-in PMCs are written in C and cannot be extended from PIR without subclassing. However, you can also create user-defined PMC types in PIR. User-defined PMCs use the Object PMC type for instantiation. Object PMCs are used for all user-defined type and keep track of the methods and VTABLE override definitions. We're going to talk about methods and VTABLE overrides in the next chapter.

Subclassing PMCs

Existing built-in PMC types can be subclassed to associate additional data and methods with that PMC type. Subclassed PMC types act like their PMC base types, by sharing the same VTABLE methods and underlying data types. However, the subclass can define additional methods and attribute data storage. If necessary new VTABLE interfaces can be defined in PIR and old VTABLE methods can be overridden using PIR. We'll talk about defining methods and VTABLE interface overrides in the next chapter.

Creating a new subclass of an existing PMC class is done using the subclass keyword:

  # create an anonymous subclass
  $P0 = subclass 'ResizablePMCArray'

  # create a subclass named "MyArray"
  $P0 = subclass 'ResizablePMCArray', 'MyArray'

This returns a Class PMC which can be used to create and modify the class by adding attributes or creating objects of that class. You can also use the new class PMC to create additional subclasses:

  $P0 = subclass 'ResizablePMCArray', 'MyArray'
  $P1 = subclass $P0, 'MyOtherArray'

Once you have created these classes, you can create them like normal with the new keyword:

  $P0 = new 'MyArray'
  $P1 = new 'MyOtherArray'

Vtable Overrides

PMCs all subscribe to a common interface of functions called VTABLEs. Every PMC implements the same set of these interfaces, which perform very specific low-level tasks on the PMC. The term VTABLE was originally a shortened form of the name "virtual function table", although that name isn't used any more by the developers, or in any of the documentation. The virtual functions in the VTABLE, called VTABLE interfaces, are similar to ordinary functions and methods in many respects. VTABLE interfaces are occasionally called "VTABLE functions", or "VTABLE methods" or even "VTABLE entries" in casual conversation. A quick comparison shows that VTABLE interfaces are not really subroutines or methods in the way that those terms have been used throughout the rest of Parrot. Like methods on an object, VTABLE interfaces are defined for a specific class of PMC, and can be invoked on any member of that class. Likewise, in a VTABLE interface declaration, the self keyword is used to describe the object that it is invoked upon. That's where the similarities end, however. Unlike ordinary subroutines or methods, VTABLE methods cannot be invoked directly, they are also not inherited through class hierarchies like how methods are. With all this terminology discussion out of the way, we can start talking about what VTABLES are and how they are used in Parrot.

VTABLE interfaces are the primary way that data in the PMC is accessed and modified. VTABLES also provide a way to invoke the PMC if it's a subroutine or subroutine-like PMC. VTABLE interfaces are not called directly from PIR code, but are instead called internally by Parrot to implement specific opcodes and behaviors. For instance, the invoke opcode calls the invoke VTABLE interface of the subroutine PMC, while the inc opcode on a PMC calls the increment VTABLE interface on that PMC. What VTABLE interface overrides do, in essence, is to allow the programmer to change the very way that Parrot accesses PMC data in the most fundamental way, and changes the very way that the opcodes act on that data.

PMCs, as we will look at more closely in later chapters, are typically implemented using PMC Script, a layer of syntax and macros over ordinary C code. A PMC compiler program converts the PMC files into C code for compilation as part of the ordinary build process. However, VTABLE interfaces can be written and overwritten in PIR using the :vtable flag on a subroutine declaration. This technique is used most commonly when subclassing an existing PMC class in PIR code to create a new data type with custom access methods.

VTABLE interfaces are declared with the :vtable flag:

  .sub 'set_integer' :vtable
      #set the integer value of the PMC here
  .end

in which case the subroutine must have the same name as the VTABLE interface it is intended to implement. VTABLE interfaces all have very specific names, and you can't override one with just any arbitrary name. However, if you would like to name the function something different but still use it as a VTABLE interface, you could add an additional name parameter to the flag:

  .sub 'MySetInteger' :vtable('set_integer')
      #set the integer value of the PMC here
  .end

VTABLE interfaces are often given the :method flag also, so that they can be used directly in PIR code as methods, in addition to being used by Parrot as VTABLE interfaces. This means we can have the following:

  .namespace [ "MyClass" ]

  .sub 'ToString' :vtable('get_string') :method
      $S0 = "hello!"
      .return($S0)
  .end

  .namespace [ "OtherClass" ]

  .local pmc myclass = new "MyClass"
  say myclass                 # say converts to string internally
  $S0 = myclass               # Convert to a string, store in $S0
  $S0 = myclass.'ToString'()  # The same

Inside a VTABLE interface definition, the self local variable contains the PMC on which the VTABLE interface is invoked, just like in a method declaration.

Roles

As we've seen above and in the previous chapter, Class PMCs and NameSpace PMCs work to keep classes and methods together in a logical way. There is another factor to add to this mix: The Role PMC.

Roles are like classes, but don't stand on their own. They represent collections of methods and VTABLES that can be added into an existing class. Adding a role to a class is called composing that role, and any class that has been composed with a role does that role.

Roles are created as PMC and can be manipulated through opcodes and methods like other PMCs:

  $P0 = new 'Role'
  $P1 = get_global "MyRoleSub"
  $P0.'add_method'("MyRoleSub", $P1)

Once we've created a role and added methods to it, we can add that role to a class, or even to another role:

  $P1 = new 'Role'
  $P2 = new 'Class'
  $P1.'add_role'($P0)
  $P2.'add_role'($P0)
  add_role $P2, $P0    # Same!

Now that we have added the role, we can check whether we implement it:

  $I0 = does $P2, $P0  # Yes

We can get a list of roles from our Class PMC:

  $P3 = $P2.'roles'()

Roles are very useful for ensuring that related classes all implement a common interface.

Filehandles

Like almost everything else in Parrot, input and output are handled by PMCs.

We've seen print and say. print prints the given string argument, or the stringified form of the argument, if it's not a string, to standard output. say does the same thing but also appends a trailing newline to it. Another opcode worth mentioning is the printerr opcode, which prints an argument to the standard error output instead.

We can read values from the standard input using the read and readline ops. read takes an integer value and returns a string with that many characters. readline reads an entire line of input from the standard input, and returns the string without the trailing newline. Here is a simple echo program that reads in characters from the user and echos them to standard output:

  .sub 'main'
    loop_top:
      $S0 = read 10
      print $S0
      goto loop_top
  .end

The ops we have seen so far are useful if all your I/O operations are limited to the standard streams. However, there are plenty of other places where you might want to get data from and send data to. Things like files, sockets, and databases all might need to have data sent to them. These things can be done by using a file handle.

Filehandles are PMCs that describe a file and keep track of an I/O operations internal state. We can get Filehandles for the standard streams using dedicated opcodes:

  $P0 = getstdin    # Standard input handle
  $P1 = getstdout   # Standard output handle
  $P2 = getstderr   # Standard error handle

If we have a file, we can create a handle to it using the open op:

  $P0 = open "my/file/name.txt"

We can also specify the exact mode that the file handle will be in:

  $P0 = open "my/file/name.txt", "wa"

The mode string at the end should be familiar to C programmers, because they are mostly the same values:

  r  : read
  w  : write
  wa : append
  p  : pipe

So if we want a handle that we can read and write to, we write the mode string "rw". If we want to be able to read and write to it, but we don't want write operations to overwrite the existing contents, we use "rwa" instead.

When we are done with a filehandle that we've created, we can shut it down with the close op. Notice that we don't want to be closing any of the standard streams.

  close $P0

With a filehandle, we can perform all the same operations as we could earlier, but we pass the filehandle as an additional argument to tell the op where to write or read the data from.

  print "hello"       # Write "hello!" to STDOUT

  $P0 = getstdout
  print $P0, "hello"  # Same, but more explicit

  say $P0, " world!"  # say to STDOUT

  $P1 = open "myfile.txt", "wa"
  print $P1, "foo"    # Write "foo" to myfile.txt

Filehandle PMCs

Let's see a little example of a program that reads in data from a file, and prints it to STDOUT.

  .sub 'main'
    $P0 = getstdout
    $P1 = open "myfile.txt", "r"
    loop_top:
      $S0 = readline $P1
      print $P0, $S0
      if $P1 goto loop_top
    close $P1
  .end

This example shows that treating a filehandle PMC like a boolean value returns whether or not we have reached the end of the file. A true return value means there is more file to read. A false return value means we are at the end. In addition to this behavior, Filehandle PMCs have a number of methods that can be used to perform various operations.

$P0.'open'(STRING filename, STRING mode)
Opens the filehandle. Takes two optional strings: the name of the file to open and the open mode. If no filename is given, the previous filename associated with the filehandle is opened. If no mode is given, the previously-used mode is used.
  $P0 = new 'Filehandle'
  $P0.'open'("myfile.txt", "r")

  $P0 = open "myfile.txt", "r"   # Same!
The open opcode internally creates a new filehandle PMC and calls the 'open'() method on it. So even though the above two code snippets act in an identical way, the later one is a little more concise to write. The caveat is that the open opcode creates a new PMC for every call, while the 'open'() method call can reuse an existing filehandle PMC for a new file.
$P0.'isatty'()
Returns a boolean value whether the filehandle is a TTY terminal
$P0.'close'()
Closes the filehandle. Can be reopened with .'open' later.
  $P0.'close'()

  close $P0   # Same
The close opcode calls the 'close'() method on the Filehandle PMC internally, so these two calls are equivalent.
$P0.'is_closed'()
Returns true if the filehandle is closed, false if it is opened.
$P0.'read'(INTVAL length)
Reads length bytes from the filehandle.
  $S0 = read $P0, 10

  $P0.'read'(10)
The two calls are equivalent, and the read opcode calls the 'read'() method internally.
$P0.'readline'()
Reads an entire line (up to a newline character or EOF) from the filehandle.
$P0.'readline_interactive'(STRING prompt)
Displays the string prompt and then reads a line of input.
$P0.'readall'(STRING name)
Reads the entire file name into a string. If the filehandle is closed, it will open the file given by name, read the entire file, and then close the handle. If the filehandle is already open, name should not be passed (it is an optional parameter).
$P0.'flush'()
Flushes the buffer
$P0.'print'(PMC to_print)
Prints the given value to the filehandle. The print opcode uses the 'print'() method internally.
  print "Hello"

  $P0 = getstdout
  print $P0, "Hello!"    # Same

  $P0.'print'("Hello!")  # Same
$P0.'puts'(STRING to_print)
Prints the given string value to the filehandle
$P0.'buffer_type'(STRING new_type)
If new_type is given, changes the buffer to the new type. If it is not, returns the current type. Acceptable types are:
  unbuffered
  line-buffered
  full-buffered
$P0.'buffer_size'(INTVAL size)
If size is given, set the size of the buffer. If not, returns the size of the current buffer.
$P0.'mode'()
Returns the current file access mode.
$P0.'encoding'(STRING encoding)
Sets the filehandle's string encoding to encoding if given, returns the current encoding otherwise.
$P0.'eof'()
Returns true if the filehandle is at the end of the current file, false otherwise.
$P0.'get_fd'()
Returns the integer file descriptor of the current file, but only on operating systems that use file descriptors. Returns -1 on systems that do not support this.

Exceptions

Exceptions and Exception Handlers

Exceptions provide a way of calling a piece of code outside the normal flow of control. They are mainly used for error reporting or cleanup tasks, but sometimes exceptions are just a funny way to branch from one code location to another one. The design and implementation of exceptions in Parrot isn't complete yet, but this section will give you an idea where we're headed.

Exceptions are objects that hold all the information needed to handle the exception: the error message, the severity and type of the error, etc. The class of an exception object indicates the kind of exception it is.

Exception handlers are derived from continuations. They are ordinary subroutines that follow the Parrot calling conventions, but are never explicitly called from within user code. User code pushes an exception handler onto the control stack with the set_eh opcode. The system calls the installed exception handler only when an exception is thrown (perhaps because of code that does division by zero or attempts to retrieve a global that wasn't stored.)

    newsub P20, .ExceptionHandler, _handler
    set_eh P20                  # push handler on control stack
    null P10                    # set register to null
    get_global P10, "none"     # may throw exception
    clear_eh                    # pop the handler off the stack
    #...

  _handler:                     # if not, execution continues here
    is_null P10, not_found      # test P10
    #...

This example creates a new exception handler subroutine with the newsub opcode and installs it on the control stack with the set_eh opcode. It sets the P10 register to a null value (so it can be checked later) and attempts to retrieve the global variable named none. If the global variable is found, the next statement (clear_eh) pops the exception handler off the control stack and normal execution continues. If the get_global call doesn't find none it throws an exception by pushing an exception object onto the control stack. When Parrot sees that it has an exception, it pops it off the control stack and calls the exception handler _handler.

The first exception handler in the control stack sees every exception thrown. The handler has to examine the exception object and decide whether it can handle it (or discard it) or whether it should rethrow the exception to pass it along to an exception handler deeper in the stack. The rethrow opcode is only valid in exception handlers. It pushes the exception object back onto the control stack so Parrot knows to search for the next exception handler in the stack. The process continues until some exception handler deals with the exception and returns normally, or until there are no more exception handlers on the control stack. When the system finds no installed exception handlers it defaults to a final action, which normally means it prints an appropriate message and terminates the program.

When the system installs an exception handler, it creates a return continuation with a snapshot of the current interpreter context. If the exception handler just returns (that is, if the exception is cleanly caught) the return continuation restores the control stack back to its state when the exception handler was called, cleaning up the exception handler and any other changes that were made in the process of handling the exception.

Exceptions thrown by standard Parrot opcodes (like the one thrown by get_global above or by the throw opcode) are always resumable, so when the exception handler function returns normally it continues execution at the opcode immediately after the one that threw the exception. Other exceptions at the run-loop level are also generally resumable.

  new $P10, 'Exception'    # create new Exception object
  set $P10, 'I die'        # set message attribute
  throw $P10               # throw it

Exceptions are designed to work with the Parrot calling conventions. Since the return addresses of bsr subroutine calls and exception handlers are both pushed onto the control stack, it's generally a bad idea to combine the two.

Parrot includes a robust exception mechanism that is not only used internally to implement a variety of control flow constructs, but is also available for use directly from PIR code. Exceptions, in as few words as possible, are error conditions in the program. Exceptions are thrown when an error occurs, and they can be caught by special routines called handlers. This enables Parrot to recover from errors in a controlled way, instead of crashing and terminating the process entirely.

Exceptions, like most other data objects in Parrot, are PMCs. They contain and provide access to a number of different bits of data about the error, such as the location where the error was thrown (including complete backtraces), any annotation information from the file, and other data.

Throwing Exceptions

Many exceptions are used internally in Parrot to indicate error conditions. Opcodes such as die and warn throw exceptions internally to do what they are supposed to do. Other opcodes such as div throw exceptions only when an error occurs, such as an attempted division by zero.

Exceptions can also be thrown manually using the throw opcode. Here's an example:

  $P0 = new 'Exception'
  throw $P0

This throws the exception object as an error. If there are any available handlers in scope, the interpreter will pass the exception object to the handler and continue execution there. If there are no handlers available, Parrot will exit.

Exception Attributes

Since Exceptions are PMC objects, they can contain a number of useful data items. One such data item is the message:

  $P0 = new 'Exception'
  $P1 = new 'String'
  $P1 = "this is an error message for the exception"
  $P0["message"] = $P1

Another is the severity and the type:

  $P0["severity"] = 1   # An integer value
  $P0["type"] = 2       # Also an Integer

Finally, there is a spot for additional data to be included:

  $P0["payload"] = $P2  # Any arbitrary PMC

Exception Handlers

Exception handlers are labels in PIR code that can be jumped to when an exception is thrown. To list a label as an exception handler, the push_eh opcode is used. All handlers exist on a stack. Pushing a new handler adds it to the top of the stack, and using the pop_eh opcode pops the handler off the top of the stack.

  push_eh my_handler
    # something that might cause an error

  my_handler:
    # handle the error here

Catching Exceptions

The exception PMC that was thrown can be caught using the .get_results() directive. This returns the Exception PMC object that was thrown from inside the handler:

  my_handler:
    .local pmc err
    .get_results(err)

With the exception PMC available, the various attributes of that PMC can be accessed and analyzed for additional information about the error.

Exception Handler PMCs

Like all other interesting data types in Parrot, exception handlers are a PMC type. When using the syntax above with push_eh LABEL, the handler PMC is created internally by Parrot. However, you can create it explicitly too if you want:

  $P0 = new 'ExceptionHandler'
  set_addr $P0, my_handler
  push_eh $P0
  ...

  my_handler:
    ...

Rethrowing Exceptions

Exception handlers are nested and are stored in a stack. This is because not all handlers are intended to handle all exceptions. If a handler cannot deal with a particular exception, it can rethrow the exception to the next outer handler handler. If none of the set handlers can handle the exception, the exception is a fatal error and Parrot will exit.

Annotations

Annotations are pieces of metadata that can be stored in a bytecode file to give some information about what the original source code looked like. This is especially important when dealing with high-level languages. We'll go into detail about annotations and their use in Chapter 10.

Annotations are created using the c<.annotation> keyword. Annotations consist of a key/value pair, where the key is a string and the value is an integer, a number, or a string. Since annotations are stored compactly as constants in the compiled bytecode, PMCs cannot be used.

  .annotation 'file', 'mysource.lang'
  .annotation 'line', 42
  .annotation 'compiletime', 0.3456

Annotations exist, or are "in force" throughout the entire subroutine, or until they are redefined. Creating a new annotation with the same name as an old one overwrites it with the new value. The current hash of annotations can be retrieved with the annotations opcode:

  .annotation 'line', 1
  $P0 = annotations # {'line' => 1}
  .annotation 'line', 2
  $P0 = annotations # {'line' => 2}

Or, to retrieve a single annotation by name, you can write:

  $I0 = annotations 'line'

Annotations in Exceptions

Exception objects contain information about the annotations that were in force when the exception was thrown. These can be retrieved with the 'annotation'() method of the exception PMC object:

  $I0 = $P0.'annotations'('line')  # only the 'line' annotation
  $P1 = $P0.'annotations'()        # hash of all annotations

Exceptions can also give out a backtrace to try and follow where the program was exactly when the exception was thrown:

  $P1 = $P0.'backtrace'()

The backtrace PMC is an array of hashes. Each element in the array corresponds to a function in the current call stack. Each hash has two elements: 'annotation' which is the hash of annotations that were in effect at that point, and 'sub' which is the Sub PMC of that function.

Events

An event is a notification that something has happened: a timer expired, an IO operation finished, a thread sent a message to another thread, or the user pressed Ctrl-C to interrupt program execution.

What all of these events have in common is that they arrive asynchronously. It's generally not safe to interrupt program flow at an arbitrary point and continue at a different position, so the event is placed in the interpreter's task queue. The run loops code regularly checks whether an event needs to be handled. Event handlers may be an internal piece of code or a user-defined event handler subroutine.

Events are still experimental in Parrot, so the implementation and design is subject to change.

Timers

Timer objects are the replacement for Perl 5's alarm handlers. They are also a significant improvement. Timers can fire once or repeatedly, and multiple timers can run independently. The precision of a timer is limited by the OS Parrot runs on, but it is always more fine-grained then a whole second. The final syntax isn't yet fixed, so please consult the documentation for examples.

Signals

Signal handling is related to events. When Parrot gets a signal it needs to handle from the OS, it converts that signal into an event and broadcasts it to all running threads. Each thread independently decides if it's interested in this signal and, if so, how to respond to it.

    newsub P20, .ExceptionHandler, _handler
    set_eh P20                  # establish signal handler
    print "send SIGINT:\n"
    sleep 2                     # press ^C after you saw start
    print "no SIGINT\n"
    end
  _handler:
    .include "signal.pasm"      # get signal definitions
    print "caught "
    set I0, P5["type"]         # if _type is negative, the ...
    neg I0, I0                  # ... negated type is the signal
    ne I0, .SIGINT, nok
    print "SIGINT\n"
  nok:
    end

This example creates a signal handler and pushes it on to the control stack. It then prompts the user to send a SIGINT from the shell (this is usually Ctrl-C, but it varies in different shells), and waits for 2 seconds. If the user doesn't send a SIGINT in 2 seconds the example just prints "no SIGINT" and ends. If the user does send a SIGINT, the signal handler catches it, prints out "caught SIGINT" and ends.Currently, only Linux installs a SIGINT sigaction handler, so this example won't work on other platforms.

Threads

Threads allow multiple pieces of code to run in parallel. This is useful when you have multiple physical CPUs to share the load of running individual threads. With a single processor, threads still provide the feeling of parallelism, but without any improvement in execution time. Even worse, sometimes using threads on a single processor will actually slow down your program.

Still, many algorithms can be expressed more easily in terms of parallel running pieces of code and many applications profit from taking advantage of multiple CPUs. Threads can vastly simplify asynchronous programs like internet servers: a thread splits off, waits for some IO to happen, handles it, and relinquishes the processor again when it's done.

Parrot compiles in thread support by default (at least, if the platform provides some kind of support for it). Unlike Perl 5, compiling with threading support doesn't impose any execution time penalty for a non-threaded program. Like exceptions and events, threads are still under development, so you can expect significant changes in the near future.

As outlined in the previous chapter, Parrot implements three different threading models. (Note: As of version 1.0, the TQueue PMC will be deprecated, rendering the following discussion obsolete.) The following example uses the third model, which takes advantage of shared data. It uses a TQueue (thread-safe queue) object to synchronize the two parallel running threads. This is only a simple example to illustrate threads, not a typical usage of threads (no-one really wants to spawn two threads just to print out a simple string).

    get_global P5, "_th1"              # locate thread function
    new P2, "ParrotThread"              # create a new thread
    find_method P0, P2, "thread3"       # a shared thread's entry
    new P7, "TQueue"                    # create a Queue object
    new P8, "Int"                       # and a Int
    push P7, P8                         # push the Int onto queue
    new P6, "String"                    # create new string
    set P6, "Js nte artHce\n"
    set I3, 3                           # thread function gets 3 args
    invoke                              # _th1.run(P5,P6,P7)
    new P2, "ParrotThread"              # same for a second thread
    get_global P5, "_th2"
    set P6, "utaohrPro akr"             # set string to 2nd thread's
    invoke                              # ... data, run 2nd thread too
    end                                 # Parrot joins both

  .pcc_sub _th1:                        # 1st thread function
  w1: sleep 0.001                       # wait a bit and schedule
    defined I1, P7                      # check if queue entry is ...
    unless I1, w1                       # ... defined, yes: it's ours
    set S5, P6                          # get string param
    substr S0, S5, I0, 1                # extract next char
    print S0                            # and print it
    inc I0                              # increment char pointer
    shift P8, P7                        # pull item off from queue
    if S0, w1                           # then wait again, if todo
    invoke P1                           # done with string

  .pcc_sub _th2:                        # 2nd thread function
  w2: sleep 0.001
    defined I1, P7                      # if queue entry is defined
    if I1, w2                           # then wait
    set S5, P6
    substr S0, S5, I0, 1                # if not print next char
    print S0
    inc I0
    new P8, "Int"                       # and put a defined entry
    push P7, P8                         # onto the queue so that
    if S0, w2                           # the other thread will run
    invoke P1                           # done with string

This example creates a ParrotThread object and calls its thread3 method, passing three arguments: a PMC for the _th1 subroutine in P5, a string argument in P6, and a TQueue object in P7 containing a single integer. Remember from the earlier section "Parrot calling conventions" that registers 5-15 hold the arguments for a subroutine or method call and I3 stores the number of arguments. The thread object is passed in P2.

This call to the thread3 method spawns a new thread to run the _th1 subroutine. The main body of the code then creates a second ParrotThread object in P2, stores a different subroutine in P5, sets P6 to a new string value, and then calls the thread3 method again, passing it the same TQueue object as the first thread. This method call spawns a second thread. The main body of code then ends, leaving the two threads to do the work.

At this point the two threads have already started running. The first thread (_th1) starts off by sleeping for a 1000th of a second. It then checks if the TQueue object contains a value. Since it contains a value when the thread is first called, it goes ahead and runs the body of the subroutine. The first thing this does is shift the element off the TQueue. It then pulls one character off a copy of the string parameter using substr, prints the character, increments the current position (I0) in the string, and loops back to the w1 label and sleeps. Since the queue doesn't have any elements now, the subroutine keeps sleeping.

Meanwhile, the second thread (_th2) also starts off by sleeping for a 1000th of a second. It checks if the shared TQueue object contains a defined value but unlike the first thread it only continues sleeping if the queue does contain a value. Since the queue contains a value when the second thread is first called, the subroutine loops back to the w2 label and continues sleeping. It keeps sleeping until the first thread shifts the integer off the queue, then runs the body of the subroutine. The body pulls one character off a copy of the string parameter using substr, prints the character, and increments the current position in the string. It then creates a new Int, pushes it onto the shared queue, and loops back to the w2 label again to sleep. The queue has an element now, so the second thread keeps sleeping, but the first thread runs through its loop again.

The two threads alternate like this, printing a character and marking the queue so the next thread can run, until there are no more characters in either string. At the end, each subroutine invokes the return continuation in P1 which terminates the thread. The interpreter waits for all threads to terminate in the cleanup phase after the end in the main body of code.

The final printed result (as you might have guessed) is:

  Just another Parrot Hacker

The syntax for threads isn't carved in stone and the implementation still isn't finished but as this example shows, threads are working now and already useful.

Several methods are useful when working with threads. The join method belongs to the ParrotThread class. When it's called on a ParrotThread object, the calling code waits until the thread terminates.

    new $P2, "ParrotThread"       # create a new thread
    set $I5, $P2                  # get thread ID

    find_method $P0, $P2, "join"  # get the join method...
    invoke                        # ...and join (wait for) the thread
    set $P16, $P5                 # the return result of the thread

kill and detach are interpreter methods, so you have to grab the current interpreter object before you can look up the method object.

    set $I5, $P2                  # get thread ID of thread P2
    getinterp $P3                 # get this interpreter object
    find_method $P0, $P3, "kill"  # get kill method
    invoke                        # kill thread with ID I5

    find_method $P0, $P3, "detach"
    invoke                      # detach thread with ID I5

By the time you read this, some of these combinations of statements and much of the threading syntax above may be reduced to a simpler set of opcodes.

Loading Bytecode

In addition to running Parrot bytecode on the command-line, you can also load pre-compiled bytecode directly into your PIR source file. The load_bytecode opcode takes a single argument: the name of the bytecode file to load. So, if you create a file named file.pasm containing a single subroutine:

  # file.pasm
  .sub _sub2:               # .sub stores a global sub
     print "in sub2\n"
     invoke P1

and compile it to bytecode using the -o command-line switch:

  $ parrot -o file.pbc file.pasm

You can then load the compiled bytecode into main.pasm and directly call the subroutine defined in file.pasm:

  # main.pir
  main:
    load_bytecode "file.pbc"    # compiled file.pasm
    get_global $P0, "_sub2"
    invokecc

The load_bytecode opcode also works with source files, as long as Parrot has a compiler registered for that type of file:

  # main2.pir
  main:
    load_bytecode "file.pasm"  # PIR source code
    set_global $P0, "_sub2"
    invokecc

Subroutines marked with :load run as soon as they're loaded (before load_bytecode returns), rather than waiting to be called. A subroutine marked with :main will always run first, no matter what name you give it or where you define it in the file.

  # file3.pir
  .sub :load                    # mark the sub as to be run
    print "file3\n"
    invoke $P1                   # return

  # main3.pasm
  first:                        # first is never invoked
    print "never\n"
    invoke $P1

  .sub :main                    # because _main is marked as the
    print "main\n"              # MAIN entry of program execution
    load_bytecode "file3.pasm"
    print "back\n"

This example uses both :load and :main. Because the main subroutine is defined with :main it will execute first even though another subroutine comes before it in the file. main prints a line, loads the PIR source file, and then prints another line. Because _entry in file3.pasm is marked with :load it runs before load_bytecode returns, so the final output is:

  main
  file3
  back