Subroutines

Subroutines in PIR are roughly equivalent to the subroutines or methods of a high-level language. They're the most basic building block of code reuse in PIR. Each high-level language has different syntax and semantics for defining and calling subroutines, so Parrot's subroutines need to be flexible enough to handle a broad array of behaviors.

A subroutine declaration starts with the .sub directive and ends with the .end directive. This example defines a subroutine named hello that prints a string "Hello, Polly.":

  .sub 'hello'
      say "Hello, Polly."
  .end

The quotes around the subroutine name are optional as long as the name of the subroutine uses only plain alphanumeric ASCII characters. You must use quotes if the subroutine name uses Unicode characters, characters from some other character set or encoding, or is otherwise an invalid PIR identifier.

A subroutine call consists of the name of the subroutine to call followed by a list of (zero or more) arguments in parentheses. You may precede the call with a list of (zero or more) return values. This example calls the subroutine fact with two arguments and assigns the result to $I0:

  $I0 = 'fact'(count, product)

Modifiers

A modifier is an annotation to a basic subroutine declarationor parameter declaration that selects an optional feature. Modifiers all start with a colon (:). A subroutine can have multiple modifiers.

When you execute a PIR file as a program, Parrot normally runs the first subroutine it encounters, but you can mark any subroutine as the first one to run with the :main modifier:

  .sub 'first'
      say "Polly want a cracker?"
  .end

  .sub 'second' :main
      say "Hello, Polly."
  .end

This code prints "Hello, Polly." but not "Polly want a cracker?". The first subroutine is first in the source code, but second has the :main modifier. Parrot will never call first in this program. If you remove the :main modifier, the code will print "Polly want a cracker?" instead.

The :load modifier tells Parrot to run the subroutine when it loads the current file as a library. The :init modifier tells Parrot to run the subroutine only when it executes the file as a program (and not as a library). The :immediate modifier tells Parrot to run the subroutine as soon as it gets compiled. The :postcomp modifier also runs the subroutine right after compilation, but only if the subroutine was declared in the main program file (when not loaded as a library).

By default, Parrot stores all subroutines in the namespace currently active at the point of their declaration. The :anon modifier tells Parrot not to store the subroutine in the namespace. The :nsentry modifier stores the subroutine in the currently active namespace with a different name. For example, Parrot will store this subroutine in the current namespace as bar, not foo:

  .sub 'foo' :nsentry('bar')
    #...
  .end

Chapter 7 on "Classes and Objects" explains other subroutine modifiers.

Parameters and Arguments

The .param directive defines the parameters for the subroutine and creates local named variables for them (similar to .local):

  .param int c

The .return directive returns control flow to the calling subroutine. To return results, pass them as arguments to .return.

  .return($P0)

This example implements the factorial algorithm using two subroutines, main and fact:

  # factorial.pir
  .sub 'main' :main
     .local int count
     .local int product
     count   = 5
     product = 1

     $I0 = 'fact'(count, product)

     say $I0
  .end

  .sub 'fact'
     .param int c
     .param int p

  loop:
     if c <= 1 goto fin
     p = c * p
     dec c
     branch loop
  fin:
     .return (p)
  .end

This example defines two local named variables, count and product, and assigns them the values 1 and 5. It calls the fact subroutine with both variables as arguments. The fact subroutine uses the .param directive to retrieve these parameters and the .return directive to return the result. The final printed result is 120.

Positional Parameters

The default way of matching the arguments passed in a subroutine call to the parameters defined in the subroutine's declaration is by position. If you declare three parameters -- an integer, a number, and a string:

  .sub 'foo'
    .param int a
    .param num b
    .param string c
    # ...
  .end

... then calls to this subroutine must also pass three arguments -- an integer, a number, and a string:

  'foo'(32, 5.9, "bar")

Parrot will assign each argument to the corresponding parameter in order from first to last. Changing the order of the arguments or leaving one out is an error.

Named Parameters

Named parameters are an alternative to positional parameters. Instead of passing parameters by their position in the string, Parrot assigns arguments to parameters by their name. Consequently you may pass named parameters in any order. Declare named parameters with with the :named modifier.

This example declares two named parameters in the subroutine shoutout -- name and years -- each declared with the :named modifier and followed by the name to use when pass arguments. The string name can match the parameter name (as with the name parameter), but it can also be different (as with the years parameter):

 .sub 'shoutout'
    .param string name :named("name")
    .param string years :named("age")
    $S0  = "Hello " . name
    $S1  = "You are " . years
    $S1 .= " years old"
    say $S0
    say $S1
 .end

Pass named arguments to a subroutine as a series of name/value pairs, with the elements of each pair separated by an arrow =>.

 .sub 'main' :main
    'shoutout'("age" => 42, "name" => "Bob")
 .end

The order of the arguments does not matter:

 .sub 'main' :main
    'shoutout'("name" => "Bob", "age" => 42)
 .end

Optional Parameters

Another alternative to the required positional parameters is optional parameters. Some parameters are unnecessary for certain calls. Parameters marked with the :optional modifier do not produce errors about invalid parameter counts if they are not present. A subroutine with optional parameters should gracefully handle the missing argument, either by providing a default value or by performing an alternate action that doesn't need that value.

Checking the value of the optional parameter isn't enough to know whether the call passed such an argument, because the user might have passed a null or false value intentionally. PIR also provides an :opt_flag modifier for a boolean check whether the caller passed an argument:

  .param string name     :optional
  .param int    has_name :opt_flag

When an integer parameter with the :opt_flag modifier immediately follows an :optional parameter, it will be true if the caller passed the argument and false otherwise.

This example demonstrates how to provide a default value for an optional parameter:

    .param string name     :optional
    .param int    has_name :opt_flag

    if has_name goto we_have_a_name
    name = "default value"
  we_have_a_name:

When the has_name parameter is true, the if control statement jumps to the we_have_a_name label, leaving the name parameter unmodified. When has_name is false (when the caller passed no argument for name) the if statement does nothing. The next line sets the name parameter to a default value.

The :opt_flag parameter never takes an argument from the passed-in argument list. It's purely for bookkeeping within the subroutine.

Optional parameters can be positional or named parameters. Optional parameters must appear at the end of the list of positional parameters after all the required parameters. An optional parameter must immediately precede its :opt_flag parameter whether it's named or positional:

  .sub 'question'
    .param int value     :named("answer") :optional
    .param int has_value :opt_flag
    #...
  .end

You can call this subroutine with a named argument or with no argument:

  'question'("answer" => 42)
  'question'()

Aggregating Parameters

Another alternative to a sequence of positional parameters is an aggregating parameter which bundles a list of arguments into a single parameter. The :slurpy modifier creates a single array parameter containing all the provided arguments:

  .param pmc args :slurpy
  $P0 = args[0]           # first argument
  $P1 = args[1]           # second argument

As an aggregating parameter will consume all subsequent parameters, you may use an aggregating parameter with other positional parameters only after all other positional parameters:

  .param string first
  .param int second
  .param pmc the_rest :slurpy

  $P0 = the_rest[0]           # third argument
  $P1 = the_rest[1]           # fourth argument

When you combine :named and :slurpy on a parameter, the result is a single associative array containing the named arguments passed into the subroutine call:

  .param pmc all_named :slurpy :named

  $P0 = all_named['name']             # 'name' => 'Bob'
  $P1 = all_named['age']              # 'age'  => 42

Flattening Arguments

A flattening argument breaks up a single argument to fill multiple parameters. It's the complement of an aggregating parameter. The :flat modifier splits arguments (and return values) into a flattened list. Passing an array PMC to a subroutine with :flat:

  $P0 = new "ResizablePMCArray"
  $P0[0] = "Bob"
  $P0[1] = 42
  'foo'($P0 :flat)

... allows the elements of that array to fill the required parameters:

  .param string name  # Bob
  .param int age      # 42

Arguments on the Command Line

Arguments passed to a PIR program on the command line are available to the :main subroutine of that program as strings in a ResizableStringArray PMC. If you call a program args.pir, passing it three arguments:

  $ parrot args.pir foo bar baz

... they will be accessible at index 1, 2, and 3 of the PMC parameter.Index 0 is unused.

  .sub 'main' :main
    .param pmc all_args
    $S1 = all_args[1]   # foo
    $S2 = all_args[2]   # bar
    $S3 = all_args[3]   # baz
    # ...
  .end

Because all_args is a ResizableStringArray PMC, you can loop over the results, access them individually, or even modify them.

Compiling and Loading Libraries

In addition to running PIR files on the command-line, you can also load a library of pre-compiled bytecode directly into your PIR source file. The load_bytecode opcode takes a single argument: the name of the bytecode file to load. If you create a file named foo_file.pir containing a single subroutine:

  # foo_file.pir
  .sub 'foo_sub'              # .sub stores a global sub
     say "in foo_sub"
  .end

... and compile it to bytecode using the -o command-line switch:

  $ parrot -o foo_file.pbc foo_file.pir

... you can then load the compiled bytecode into main.pir and directly call the subroutine defined in foo_file.pir:

  # main.pir
  .sub 'main' :main
    load_bytecode "foo_file.pbc"    # compiled foo_file.pir
    foo_sub()
  .end

The load_bytecode opcode also works with source files, as long as Parrot has a compiler registered for that type of file:

  # main2.pir
  .sub 'main' :main
    load_bytecode "foo_file.pir"  # PIR source code
    foo_sub()
  .end

Sub PMC

Subroutines are a PMC type in Parrot. You can store them in PMC registers and manipulate them just as you do with other PMCs. Parrot stores subroutines in namespaces; retrieve them with the get_global opcode:

  $P0 = get_global "my_sub"

To find a subroutine in a different namespace, first look up the appropriate the namespace object, then use that as the first parameter to get_global:

  $P0 = get_namespace ["My";"Namespace"]
  $P1 = get_global $P0, "my_sub"

You can invoke a Sub object directly:

  $P0(1, 2, 3)

You can get or even change its name:

  $S0 = $P0               # Get the current name
  $P0 = "my_new_sub"      # Set a new name

You can get a hash of the complete metadata for the subroutine:

  $P1 = inspect $P0

... which contains the fields:

Instead of fetching the entire inspection hash, you can also request individual pieces of metadata:

  $P1 = inspect $P0, "pos_required"

The arity method on the sub object returns the total number of defined parameters of all varieties:

  $I0 = $P0.'arity'()

The get_namespace method on the sub object fetches the namespace PMC which contains the Sub:

  $P1 = $P0.'get_namespace'()

Evaluating a Code String

One way of producing a code object during a running program is by compiling a code string. In this case, it's a bytecode segment object.

The first step is to fetch a compiler object for the target language:

  $P1 = compreg "PIR"

Parrot registers a compiler for PIR by default, so it's always available. The following example fetches a compiler object for PIR and places it in the named variable compiler. It then generates a code object from a string by calling compiler as a subroutine and places the resulting bytecode segment object into the named variable generated and then invokes it as a subroutine:

  .local pmc compiler, generated
  .local string source
  source    = ".sub foo\n$S1 = 'in eval'\nprint $S1\n.end"
  compiler  = compreg "PIR"                
  generated = compiler(source)
  generated()
  say "back again"

You can register a compiler or assembler for any language inside the Parrot core and use it to compile and invoke code from that language.

In the following example, the compreg opcode registers the subroutine-like object $P10 as a compiler for the language "MyLanguage":

  compreg "MyLanguage", $P10

Lexicals

Variables stored in a namespace are global variables. They're accessible from anywhere in the program if you specify the right namespace path. High-level languages also have lexical variables which are only accessible from the local section of code (or scope) where they appear, or in a section of code embedded within that scope.A scope is roughly equivalent to a block in C. In PIR, the section of code between a .sub and a .end defines a scope for lexical variables.

While Parrot stores global variables in namespaces, it stores lexical variables in lexical padsThink of a pad like a box to hold a collection of lexical variables.. Each lexical scope has its own pad. The store_lex opcode stores a lexical variable in the current pad. The find_lex opcode retrieves a variable from the current pad:

  $P0 = new "Integer"       # create a variable
  $P0 = 10                  # assign value to it
  store_lex "foo", $P0      # store with lexical name "foo"
  # ...
  $P1 = find_lex "foo"      # get the lexical "foo" into $P1
  say $P1                   # prints 10

The .lex directive defines a local variable that follows these scoping rules:

      .local pmc foo
      .lex 'foo', foo

LexPad and LexInfo PMCs

Parrot uses two different PMCs to store information about a subroutine's lexical variables: the LexPad PMC and the LexInfo PMC. Neither of these PMC types are usable directly from PIR code; Parrot uses them internally to store information about lexical variables.

LexInfo PMCs store information about lexical variables at compile time. Parrot generates this read-only information during compilation to represent what it knows about lexical variables. Not all subroutines get a LexInfo PMC by default; subroutines need to indicate to Parrot that they require a LexInfo PMC. One way to do this is with the .lex directive. Of course, the .lex directive only works for languages that know the names of their lexical variables at compile time. Languages where this information is not available can mark the subroutine with :lex instead.

LexPad PMCs store run-time information about lexical variables. This includes their current values and type information. Parrot creates a new LexPad PMC for subs that have a LexInfo PMC already. It does so for each invocation of the subroutine, which allows for recursive subroutine calls without overwriting lexical variables.

The get_lexinfo method on a sub retrieves its associated LexInfo PMC:

  $P0 = get_global "MySubroutine"
  $P1 = $P0.'get_lexinfo'()

The LexInfo PMC supports a few introspection operations. The elements opcode retrieves the number of elements it contains. String key access operations retrieve entries from the LexInfo PMC as if it were an associative array.

  $I0 = elements $P1    # number of lexical variables
  $P0 = $P1["name"]     # lexical variable "name"

There is no easy way to retrieve the current LexPad PMC in a given subroutine, but they are of limited use in PIR.

Nested Scopes

PIR has no separate syntax for blocks or lexical scopes; subroutines define lexical scopes in PIR. Because PIR disallows nested .sub/.end declarations, it needs a way to identify which lexical scopes are the parents of inner lexical scopes. The :outer modifier declares a subroutine as a nested inner lexical scope of another existing subroutine. The modifier takes one argument, the name of the outer subroutine:

  .sub 'foo'
    # defines lexical variables
  .end

  .sub 'bar' :outer('foo')
    # can access foo's lexical variables
  .end

Sometimes a name alone isn't sufficient to uniquely identify the outer subroutine. The :subid modifier allows the outer subroutine to declare a truly unique name usable with :outer:

  .sub 'foo' :subid('barsouter')
    # defines lexical variables
  .end

  .sub 'bar' :outer('barsouter')
    # can access foo's lexical variables
  .end

The get_outer method on a Sub PMC retrieves its :outer sub.

  $P1 = $P0.'get_outer'()

If there is no :outer sub, this will return a null PMC. The set_outer method on a Sub object sets the :outer sub:

  $P0.'set_outer'($P1)

Scope and Visibility

High-level languages such as Perl, Python, and Ruby allow nested scopes, or blocks within blocks that have their own lexical variables. This construct is common even in C:

  {
      int x = 0;
      int y = 1;
      {
          int z = 2;
          /* x, y, and z are all visible here */
      }

      /* only x and y are visible here */
  }

In the inner block, all three variables are visible. The variable z is only visible inside that block. The outer block has no knowledge of z. A naïve translation of this code to PIR might be:

  .param int x
  .param int y
  .param int z
  x = 0
  y = 1
  z = 2
  #...

This PIR code is similar, but the handling of the variable z is different: z is visible throughout the entire current subroutine. It was not visible throughout the entire C function. A more accurate translation of the C scopes uses :outer PIR subroutines instead:

  .sub 'MyOuter'
      .local pmc x, y
      .lex 'x', x
      .lex 'y', y
      x = new 'Integer'
      x = 10
      'MyInner'()
      # only x and y are visible here
      say y                             # prints 20
  .end

  .sub 'MyInner' :outer('MyOuter')
      .local pmc x, new_y, z
      .lex 'z', z
      find_lex x, 'x'
      say x                            # prints 10
      new_y = new 'Integer'
      new_y = 20
      store_lex 'y', new_y
  .end

The find_lex and store_lex opcodes don't just access the value of a variable directly in the scope where it's declared, they interact with the LexPad PMC to find lexical variables within outer lexical scopes. All lexical variables from an outer lexical scope are visible from the inner lexical scope.

Note that you can only store PMCs -- not primitive types -- as lexicals.

Multiple Dispatch

Multiple dispatch subroutines (or multis) have several variants with the same name but different sets of parameters. The set of parameters for a subroutine is its signature. When a multi is called, the dispatch operation compares the arguments passed in to the signatures of all the variants and invokes the subroutine with the best match.

Parrot stores all multiple dispatch subs with the same name in a namespace within a single PMC called a MultiSub. The MultiSub is an invokable list of subroutines. When a multiple dispatch sub is called, the MultiSub PMC searches its list of variants for the best matching candidate.

The :multi modifier on a .sub declares a MultiSub:

  .sub 'MyMulti' :multi()
      # does whatever a MyMulti does
  .end

Each variant in a MultiSub must have a unique type or number of parameters declared, so the dispatcher can calculate a best match. If you had two variants that both took four integer parameters, the dispatcher would never be able to decide which one to call when it received four integer arguments.

The :multi modifier takes one or more arguments defining the multi signature. The multi signature tells Parrot what particular combination of input parameters the multi accepts:

  .sub 'Add' :multi(I, I)
    .param int x
    .param int y
     $I0 = x + y
    .return($I0)
  .end

  .sub 'Add' :multi(N, N)
    .param num x
    .param num y
    $N0 = x + y
    .return($N0)
  .end

  .sub 'Start' :main
    $I0 = Add(1, 2)      # 3
    $N0 = Add(3.14, 2.0) # 5.14
    $S0 = Add("a", "b")  # ERROR! No (S, S) variant!
  .end

Multis can take I, N, S, and P types, but they can also use _ (underscore) to denote a wildcard, and a string which names a PMC type:

  .sub 'Add' :multi(I, I)          # two integers
    #...
  .end

  .sub 'Add' :multi(I, 'Float')    # integer and Float PMC
    #...
  .end

  .sub 'Add' :multi('Integer', _)  # Integer PMC and wildcard
    #...
  .end

When you call a MultiSub, Parrot will try to take the most specific best-match variant, but will fall back to more general variants if it cannot find a perfect match. If you call Add with (1, 2), Parrot will dispatch to the (I, I) variant. If you call it with (1, "hi"), Parrot will match the (I, _) variant, as the string in the second argument doesn't match I or Float. Parrot can also promote one of the I, N, or S values to an Integer, Float, or String PMC.

To make the decision about which multi variant to call, Parrot calculates the Manhattan Distance between the argument signature and the parameter signature of each variant. Every difference between each element counts as one step. A difference can be a promotion from a primitive type to a PMC, the conversion from one primitive type to another, or the matching of an argument to a _ wildcard. After Parrot calculates the distance to each variant, it calls the one with the lowest distance. Notice that it's possible to define a variant that is impossible to call: for every potential combination of arguments there is a better match. This is uncommon, but possible in systems with many multis and a limited number of data types.

Continuations

Continuations are subroutines that take snapshots of control flow. They are frozen images of the current execution state of the VM. Once you have a continuation, you can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program.

Continuations are like any other PMC; create one with the new opcode:

  $P0 = new 'Continuation'

The new continuation starts in an undefined state. If you attempt to invoke a new continuation without initializing it, Parrot will throw an exception. To prepare the continuation for use, assign it a destination label with the set_addr opcode:

    $P0 = new 'Continuation'
    set_addr $P0, my_label

  my_label:
    # ...

To jump to the continuation's stored label and return the context to the state it was in at the point of its creation, invoke the continuation:

  $P0()

Even though you can use the subroutine call notation $P0() to invoke the continuation, you cannot pass arguments or obtain return values.

Continuation Passing Style

Parrot uses continuations internally for control flow. When Parrot invokes a subroutine, it creates a continuation representing the current point in the program. It passes this continuation as an invisible parameter to the subroutine call. To return from that subroutine, Parrot invokes the continuation to return to the point of creation of that continuation. If you have a continuation, you can invoke it to return to its point of creation any time you want.

This type of flow control -- invoking continuations instead of performing bare jumps -- is called Continuation Passing Style (CPS).

Tailcalls

Many subroutines set up and call another subroutine and then return the result of the second call directly. This is a tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:

  call add_two(5)

  subroutine add_two(value)
    value = add_one(value)
    return add_one(value)

In this example, the subroutine add_two makes two calls to add_one. The second call to add_one is the return value. add_one gets called; its result gets returned to the caller of add_two. Nothing in add_two uses that return value directly.

A simple optimization is available for this type of code. The second call to add_one can return to the same place that add_two returns; it's perfectly safe and correct to use the same return continuation that add_two uses. The two subroutine calls can share a return continuation.

PIR provides the .tailcall directive to identify similar situations. Use it in place of the .return directive. .tailcall performs this optimization by reusing the return continuation of the parent subroutine to make the tailcall:

  .sub 'main' :main
      .local int value
      value = add_two(5)
      say value
  .end

  .sub 'add_two'
      .param int value
      .local int val2
      val2 = add_one(value)
      .tailcall add_one(val2)
  .end

  .sub 'add_one'
      .param int a
      .local int b
      b = a + 1
      .return (b)
  .end

This example prints the correct value 7.

Coroutines

Coroutines are similar to subroutines except that they have an internal notion of state. In addition to performing a normal .return to return control flow back to the caller and destroy the execution environment of the subroutine, coroutines may also perform a .yield operation. .yield returns a value to the caller like .return can, but it does not destroy the execution state of the coroutine. The next call to the coroutine continues execution from the point of the last .yield, not at the beginning of the coroutine.

Inside a coroutine continuing from a .yield, the entire execution environment is the same as it was when the coroutine .yielded. This means that the parameter values don't change, even if the next invocation of the coroutine had different arguments passed in.

Coroutines look like ordinary subroutines. They do not require any special modifier or any special syntax to mark them as being a coroutine. What sets them apart is the use of the .yield directive. .yield plays several roles:

Here is a simple coroutine example:

  .sub 'MyCoro'
    .yield(1)
    .yield(2)
    .yield(3)
    .return(4)
  .end

  .sub 'main' :main
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
  .end

This contrived example demonstrates how the coroutine stores its state. When Parrot encounters the .yield, the coroutine stores its current execution environment. At the next call to the coroutine, it picks up where it left off.

Native Call Interface

The Native Call Interface (NCI) is a special version of the Parrot calling conventions for calling functions in shared C libraries with a known signature. This is a simplified version of the first test in t/pmc/nci.t:

    .local pmc library
    library = loadlib "libnci_test"         # library object
    say "loaded"

    .local pmc ddfunc
    ddfunc = dlfunc library, "nci_dd", "dd" # function object
    say "dlfunced"

    .local num result
    result = ddfunc( 4.0 )                  # call the function

    ne result, 8.0, nok_1
    say "ok 1"
    end
  nok_1:
    say "not ok 1"

    #...

This example shows two new opcodes: loadlib and dlfunc. The loadlib opcode obtains a handle for a shared library. It searches for the shared library in the current directory, in runtime/parrot/dynext, and in a few other configured directories. It also tries to load the provided filename unaltered and with appended extensions like .so or .dll. Which extensions it tries depends on the operating system Parrot is running on.

The dlfunc opcode gets a function object from a previously loaded library (second argument) of a specified name (third argument) with a known function signature (fourth argument). The function signature is a string where the first character is the return value and the rest of the parameters are the function parameters. Table 6-1 lists the characters used in NCI function signatures.