Subroutines
Subroutines in PIR are roughly equivalent to the subroutines or methods of a high-level language. They're the most basic building block of code reuse in PIR. Each high-level language has different syntax and semantics for defining and calling subroutines, so Parrot's subroutines need to be flexible enough to handle a broad array of behaviors.
A subroutine declaration starts with the .sub
directive and ends with the .end
directive.
This example defines a subroutine named hello
that prints a string "Hello,
Polly.":
.sub 'hello' say "Hello, Polly." .end
The quotes around the subroutine name are optional as long as the name of the subroutine uses only plain alphanumeric ASCII characters. You must use quotes if the subroutine name uses Unicode characters, characters from some other character set or encoding, or is otherwise an invalid PIR identifier.
A subroutine call consists of the name of the subroutine to call followed by a list of (zero or more) arguments in parentheses.
You may precede the call with a list of (zero or more) return values.
This example calls the subroutine fact
with two arguments and assigns the result to $I0
:
$I0 = 'fact'(count, product)
Modifiers
A modifier is an annotation to a basic subroutine declarationor parameter declaration that selects an optional feature. Modifiers all start with a colon (:
). A subroutine can have multiple modifiers.
When you execute a PIR file as a program, Parrot normally runs the first subroutine it encounters, but you can mark any subroutine as the first one to run with the :main
modifier:
.sub 'first' say "Polly want a cracker?" .end .sub 'second' :main say "Hello, Polly." .end
This code prints "Hello, Polly." but not "Polly want a cracker?". The first
subroutine is first in the source code, but second
has the :main
modifier. Parrot will never call first
in this program. If you remove the :main
modifier, the code will print "Polly want a cracker?" instead.
The :load
modifier tells Parrot to run the subroutine when it loads the current file as a library. The :init
modifier tells Parrot to run the subroutine only when it executes the file as a program (and not as a library). The :immediate
modifier tells Parrot to run the subroutine as soon as it gets compiled. The :postcomp
modifier also runs the subroutine right after compilation, but only if the subroutine was declared in the main program file (when not loaded as a library).
By default, Parrot stores all subroutines in the namespace currently active at the point of their declaration. The :anon
modifier tells Parrot not to store the subroutine in the namespace. The :nsentry
modifier stores the subroutine in the currently active namespace with a different name. For example, Parrot will store this subroutine in the current namespace as bar
, not foo
:
.sub 'foo' :nsentry('bar') #... .end
Chapter 7 on "Classes and Objects" explains other subroutine modifiers.
Parameters and Arguments
The .param
directive defines the parameters for the subroutine and creates local named variables for them (similar to .local
):
.param int c
The .return
directive returns control flow to the calling subroutine. To return results, pass them as arguments to .return
.
.return($P0)
This example implements the factorial algorithm using two subroutines, main
and fact
:
# factorial.pir .sub 'main' :main .local int count .local int product count = 5 product = 1 $I0 = 'fact'(count, product) say $I0 .end .sub 'fact' .param int c .param int p loop: if c <= 1 goto fin p = c * p dec c branch loop fin: .return (p) .end
This example defines two local named variables, count
and product
, and assigns them the values 1 and 5. It calls the fact
subroutine with both variables as arguments. The fact
subroutine uses the .param
directive to retrieve these parameters and the .return
directive to return the result. The final printed result is 120.
Positional Parameters
The default way of matching the arguments passed in a subroutine call to the parameters defined in the subroutine's declaration is by position. If you declare three parameters -- an integer, a number, and a string:
.sub 'foo' .param int a .param num b .param string c # ... .end
... then calls to this subroutine must also pass three arguments -- an integer, a number, and a string:
'foo'(32, 5.9, "bar")
Parrot will assign each argument to the corresponding parameter in order from first to last. Changing the order of the arguments or leaving one out is an error.
Named Parameters
Named parameters are an alternative to positional parameters. Instead of passing parameters by their position in the string, Parrot assigns arguments to parameters by their name. Consequently you may pass named parameters in any order. Declare named parameters with the :named
modifier.
This example declares two named parameters in the subroutine shoutout
-- name
and years
-- each declared with the :named
modifier and followed by the name to use when pass arguments. The string name can match the parameter name (as with the name
parameter), but it can also be different (as with the years
parameter):
.sub 'shoutout' .param string name :named("name") .param string years :named("age") $S0 = "Hello " . name $S1 = "You are " . years $S1 .= " years old" say $S0 say $S1 .end
Pass named arguments to a subroutine as a series of name/value pairs, with the elements of each pair separated by an arrow =>
.
.sub 'main' :main 'shoutout'("age" => 42, "name" => "Bob") .end
The order of the arguments does not matter:
.sub 'main' :main 'shoutout'("name" => "Bob", "age" => 42) .end
Optional Parameters
Another alternative to the required positional parameters is optional parameters. Some parameters are unnecessary for certain calls. Parameters marked with the :optional
modifier do not produce errors about invalid parameter counts if they are not present. A subroutine with optional parameters should gracefully handle the missing argument, either by providing a default value or by performing an alternate action that doesn't need that value.
Checking the value of the optional parameter isn't enough to know whether the call passed such an argument, because the user might have passed a null or false value intentionally. PIR also provides an :opt_flag
modifier for a boolean check whether the caller passed an argument:
.param string name :optional .param int has_name :opt_flag
When an integer parameter with the :opt_flag
modifier immediately follows an :optional
parameter, it will be true if the caller passed the argument and false otherwise.
This example demonstrates how to provide a default value for an optional parameter:
.param string name :optional .param int has_name :opt_flag if has_name goto we_have_a_name name = "default value" we_have_a_name:
When the has_name
parameter is true, the if
control statement jumps to the we_have_a_name
label, leaving the name
parameter unmodified. When has_name
is false (when the caller passed no argument for name
) the if
statement does nothing. The next line sets the name
parameter to a default value.
The :opt_flag
parameter never takes an argument from the passed-in argument list. It's purely for bookkeeping within the subroutine.
Optional parameters can be positional or named parameters. Optional parameters must appear at the end of the list of positional parameters after all the required parameters. An optional parameter must immediately precede its :opt_flag
parameter whether it's named or positional:
.sub 'question' .param int value :named("answer") :optional .param int has_value :opt_flag #... .end
You can call this subroutine with a named argument or with no argument:
'question'("answer" => 42) 'question'()
Aggregating Parameters
Another alternative to a sequence of positional parameters is an aggregating parameter which bundles a list of arguments into a single parameter. The :slurpy
modifier creates a single array parameter containing all the provided arguments:
.param pmc args :slurpy $P0 = args[0] # first argument $P1 = args[1] # second argument
As an aggregating parameter will consume all subsequent parameters, you may use an aggregating parameter with other positional parameters only after all other positional parameters:
.param string first .param int second .param pmc the_rest :slurpy $P0 = the_rest[0] # third argument $P1 = the_rest[1] # fourth argument
When you combine :named
and :slurpy
on a parameter, the result is a single associative array containing the named arguments passed into the subroutine call:
.param pmc all_named :slurpy :named $P0 = all_named['name'] # 'name' => 'Bob' $P1 = all_named['age'] # 'age' => 42
Flattening Arguments
A flattening argument breaks up a single argument to fill multiple parameters. It's the complement of an aggregating parameter. The :flat
modifier splits arguments (and return values) into a flattened list. Passing an array PMC to a subroutine with :flat
:
$P0 = new "ResizablePMCArray" $P0[0] = "Bob" $P0[1] = 42 'foo'($P0 :flat)
... allows the elements of that array to fill the required parameters:
.param string name # Bob .param int age # 42
Arguments on the Command Line
Arguments passed to a PIR program on the command line are available to the :main
subroutine of that program as strings in a ResizableStringArray
PMC. If you call a program args.pir, passing it three arguments:
$ parrot args.pir foo bar baz
... they will be accessible at index 1, 2, and 3 of the PMC parameter.Index 0 is unused.
.sub 'main' :main .param pmc all_args $S1 = all_args[1] # foo $S2 = all_args[2] # bar $S3 = all_args[3] # baz # ... .end
Because all_args
is a ResizableStringArray
PMC, you can loop over the results, access them individually, or even modify them.
Compiling and Loading Libraries
In addition to running PIR files on the command-line, you can also load a library of pre-compiled bytecode directly into your PIR source file. The load_bytecode
opcode takes a single argument: the name of the bytecode file to load. If you create a file named foo_file.pir containing a single subroutine:
# foo_file.pir .sub 'foo_sub' # .sub stores a global sub say "in foo_sub" .end
... and compile it to bytecode using the -o
command-line switch:
$ parrot -o foo_file.pbc foo_file.pir
... you can then load the compiled bytecode into main.pir and directly call the subroutine defined in foo_file.pir:
# main.pir .sub 'main' :main load_bytecode "foo_file.pbc" # compiled foo_file.pir foo_sub() .end
The load_bytecode
opcode also works with source files, as long as Parrot has a compiler registered for that type of file:
# main2.pir .sub 'main' :main load_bytecode "foo_file.pir" # PIR source code foo_sub() .end
Sub PMC
Subroutines are a PMC type in Parrot. You can store them in PMC registers and manipulate them just as you do with other PMCs. Parrot stores subroutines in namespaces; retrieve them with the get_global
opcode:
$P0 = get_global "my_sub"
To find a subroutine in a different namespace, first look up the appropriate the namespace object, then use that as the first parameter to get_global
:
$P0 = get_namespace ["My";"Namespace"] $P1 = get_global $P0, "my_sub"
You can invoke a Sub object directly:
$P0(1, 2, 3)
You can get or even change its name:
$S0 = $P0 # Get the current name $P0 = "my_new_sub" # Set a new name
You can get a hash of the complete metadata for the subroutine:
$P1 = inspect $P0
... which contains the fields:
- pos_required
- pos_optional
- named_required
- named_optional
- pos_slurpy
- named_slurpy
The number of required positional parameters
The number of optional positional parameters
The number of required named parameters
The number of optional named parameters
True if the sub has an aggregating parameter for positional args
True if the sub has an aggregating parameter for named args
Instead of fetching the entire inspection hash, you can also request individual pieces of metadata:
$P1 = inspect $P0, "pos_required"
The arity
method on the sub object returns the total number of defined parameters of all varieties:
$I0 = $P0.'arity'()
The get_namespace
method on the sub object fetches the namespace PMC which contains the Sub:
$P1 = $P0.'get_namespace'()
Evaluating a Code String
One way of producing a code object during a running program is by compiling a code string. In this case, it's a bytecode segment object.
The first step is to fetch a compiler object for the target language:
$P1 = compreg "PIR"
Parrot registers a compiler for PIR by default, so it's always available. The following example fetches a compiler object for PIR and places it in the named variable compiler
. It then generates a code object from a string by calling compiler
as a subroutine and places the resulting bytecode segment object into the named variable generated
and then invokes it as a subroutine:
.local pmc compiler, generated .local string source source = ".sub foo\n$S1 = 'in eval'\nprint $S1\n.end" compiler = compreg "PIR" generated = compiler(source) generated() say "back again"
You can register a compiler or assembler for any language inside the Parrot core and use it to compile and invoke code from that language.
In the following example, the compreg
opcode registers the subroutine-like object $P10
as a compiler for the language "MyLanguage":
compreg "MyLanguage", $P10
Lexicals
Variables stored in a namespace are global variables. They're accessible from anywhere in the program if you specify the right namespace path. High-level languages also have lexical variables which are only accessible from the local section of code (or scope) where they appear, or in a section of code embedded within that scope.A scope is roughly equivalent to a block in C. In PIR, the section of code between a .sub
and a .end
defines a scope for lexical variables.
While Parrot stores global variables in namespaces, it stores lexical variables in lexical padsThink of a pad like a box to hold a collection of lexical variables.. Each lexical scope has its own pad. The store_lex
opcode stores a lexical variable in the current pad. The find_lex
opcode retrieves a variable from the current pad:
$P0 = new "Integer" # create a variable $P0 = 10 # assign value to it store_lex "foo", $P0 # store with lexical name "foo" # ... $P1 = find_lex "foo" # get the lexical "foo" into $P1 say $P1 # prints 10
The .lex
directive defines a local variable that follows these scoping rules:
.local pmc foo .lex 'foo', foo
LexPad and LexInfo PMCs
Parrot uses two different PMCs to store information about a subroutine's lexical variables: the LexPad
PMC and the LexInfo
PMC. Neither of these PMC types are usable directly from PIR code; Parrot uses them internally to store information about lexical variables.
LexInfo
PMCs store information about lexical variables at compile time. Parrot generates this read-only information during compilation to represent what it knows about lexical variables. Not all subroutines get a LexInfo
PMC by default; subroutines need to indicate to Parrot that they require a LexInfo
PMC. One way to do this is with the .lex
directive. Of course, the .lex
directive only works for languages that know the names of their lexical variables at compile time. Languages where this information is not available can mark the subroutine with :lex
instead.
LexPad
PMCs store run-time information about lexical variables. This includes their current values and type information. Parrot creates a new LexPad
PMC for subs that have a LexInfo
PMC already. It does so for each invocation of the subroutine, which allows for recursive subroutine calls without overwriting lexical variables.
The get_lexinfo
method on a sub retrieves its associated LexInfo
PMC:
$P0 = get_global "MySubroutine" $P1 = $P0.'get_lexinfo'()
The LexInfo
PMC supports a few introspection operations. The elements
opcode retrieves the number of elements it contains. String key access operations retrieve entries from the LexInfo
PMC as if it were an associative array.
$I0 = elements $P1 # number of lexical variables $P0 = $P1["name"] # lexical variable "name"
There is no easy way to retrieve the current LexPad
PMC in a given subroutine, but they are of limited use in PIR.
Nested Scopes
PIR has no separate syntax for blocks or lexical scopes; subroutines define lexical scopes in PIR. Because PIR disallows nested .sub
/.end
declarations, it needs a way to identify which lexical scopes are the parents of inner lexical scopes. The :outer
modifier declares a subroutine as a nested inner lexical scope of another existing subroutine. The modifier takes one argument, the name of the outer subroutine:
.sub 'foo' # defines lexical variables .end .sub 'bar' :outer('foo') # can access foo's lexical variables .end
Sometimes a name alone isn't sufficient to uniquely identify the outer subroutine. The :subid
modifier allows the outer subroutine to declare a truly unique name usable with :outer
:
.sub 'foo' :subid('barsouter') # defines lexical variables .end .sub 'bar' :outer('barsouter') # can access foo's lexical variables .end
The get_outer
method on a Sub
PMC retrieves its :outer
sub.
$P1 = $P0.'get_outer'()
If there is no :outer
sub, this will return a null PMC. The set_outer
method on a Sub
object sets the :outer
sub:
$P0.'set_outer'($P1)
Scope and Visibility
High-level languages such as Perl, Python, and Ruby allow nested scopes, or blocks within blocks that have their own lexical variables. This construct is common even in C:
{ int x = 0; int y = 1; { int z = 2; /* x, y, and z are all visible here */ } /* only x and y are visible here */ }
In the inner block, all three variables are visible. The variable z
is only visible inside that block. The outer block has no knowledge of z
. A naïve translation of this code to PIR might be:
.param int x .param int y .param int z x = 0 y = 1 z = 2 #...
This PIR code is similar, but the handling of the variable z
is different: z
is visible throughout the entire current subroutine. It was not visible throughout the entire C function. A more accurate translation of the C scopes uses :outer
PIR subroutines instead:
.sub 'MyOuter' .local pmc x, y .lex 'x', x .lex 'y', y x = new 'Integer' x = 10 'MyInner'() # only x and y are visible here say y # prints 20 .end .sub 'MyInner' :outer('MyOuter') .local pmc x, new_y, z .lex 'z', z find_lex x, 'x' say x # prints 10 new_y = new 'Integer' new_y = 20 store_lex 'y', new_y .end
The find_lex
and store_lex
opcodes don't just access the value of a variable directly in the scope where it's declared, they interact with the LexPad
PMC to find lexical variables within outer lexical scopes. All lexical variables from an outer lexical scope are visible from the inner lexical scope.
Note that you can only store PMCs -- not primitive types -- as lexicals.
Multiple Dispatch
Multiple dispatch subroutines (or multis) have several variants with the same name but different sets of parameters. The set of parameters for a subroutine is its signature. When a multi is called, the dispatch operation compares the arguments passed in to the signatures of all the variants and invokes the subroutine with the best match.
Parrot stores all multiple dispatch subs with the same name in a namespace within a single PMC called a MultiSub
. The MultiSub
is an invokable list of subroutines. When a multiple dispatch sub is called, the MultiSub
PMC searches its list of variants for the best matching candidate.
The :multi
modifier on a .sub
declares a MultiSub
:
.sub 'MyMulti' :multi() # does whatever a MyMulti does .end
Each variant in a MultiSub
must have a unique type or number of parameters declared, so the dispatcher can calculate a best match. If you had two variants that both took four integer parameters, the dispatcher would never be able to decide which one to call when it received four integer arguments.
The :multi
modifier takes one or more arguments defining the multi signature. The multi signature tells Parrot what particular combination of input parameters the multi accepts:
.sub 'Add' :multi(I, I) .param int x .param int y $I0 = x + y .return($I0) .end .sub 'Add' :multi(N, N) .param num x .param num y $N0 = x + y .return($N0) .end .sub 'Start' :main $I0 = Add(1, 2) # 3 $N0 = Add(3.14, 2.0) # 5.14 $S0 = Add("a", "b") # ERROR! No (S, S) variant! .end
Multis can take I, N, S, and P types, but they can also use _
(underscore) to denote a wildcard, and a string which names a PMC type:
.sub 'Add' :multi(I, I) # two integers #... .end .sub 'Add' :multi(I, 'Float') # integer and Float PMC #... .end .sub 'Add' :multi('Integer', _) # Integer PMC and wildcard #... .end
When you call a MultiSub
, Parrot will try to take the most specific best-match variant, but will fall back to more general variants if it cannot find a perfect match. If you call Add
with (1, 2)
, Parrot will dispatch to the (I, I)
variant. If you call it with (1, "hi")
, Parrot will match the (I, _)
variant, as the string in the second argument doesn't match I
or Float
. Parrot can also promote one of the I, N, or S values to an Integer, Float, or String PMC.
To make the decision about which multi variant to call, Parrot calculates the Manhattan Distance between the argument signature and the parameter signature of each variant. Every difference between each element counts as one step. A difference can be a promotion from a primitive type to a PMC, the conversion from one primitive type to another, or the matching of an argument to a _
wildcard. After Parrot calculates the distance to each variant, it calls the one with the lowest distance. Notice that it's possible to define a variant that is impossible to call: for every potential combination of arguments there is a better match. This is uncommon, but possible in systems with many multis and a limited number of data types.
Continuations
Continuations are subroutines that take snapshots of control flow. They are frozen images of the current execution state of the VM. Once you have a continuation, you can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program.
Continuations are like any other PMC; create one with the new
opcode:
$P0 = new 'Continuation'
The new continuation starts in an undefined state. If you attempt to invoke a new continuation without initializing it, Parrot will throw an exception. To prepare the continuation for use, assign it a destination label with the set_addr
opcode:
$P0 = new 'Continuation' set_addr $P0, my_label my_label: # ...
To jump to the continuation's stored label and return the context to the state it was in at the point of its creation, invoke the continuation:
$P0()
Even though you can use the subroutine call notation $P0()
to invoke the continuation, you cannot pass arguments or obtain return values.
Continuation Passing Style
Parrot uses continuations internally for control flow. When Parrot invokes a subroutine, it creates a continuation representing the current point in the program. It passes this continuation as an invisible parameter to the subroutine call. To return from that subroutine, Parrot invokes the continuation to return to the point of creation of that continuation. If you have a continuation, you can invoke it to return to its point of creation any time you want.
This type of flow control -- invoking continuations instead of performing bare jumps -- is called Continuation Passing Style (CPS).
Tailcalls
Many subroutines set up and call another subroutine and then return the result of the second call directly. This is a tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:
call add_two(5) subroutine add_two(value) value = add_one(value) return add_one(value)
In this example, the subroutine add_two
makes two calls to add_one
. The second call to add_one
is the return value. add_one
gets called; its result gets returned to the caller of add_two
. Nothing in add_two
uses that return value directly.
A simple optimization is available for this type of code. The second call to add_one
can return to the same place that add_two
returns; it's perfectly safe and correct to use the same return continuation that add_two
uses. The two subroutine calls can share a return continuation.
PIR provides the .tailcall
directive to identify similar situations. Use it in place of the .return
directive. .tailcall
performs this optimization by reusing the return continuation of the parent subroutine to make the tailcall:
.sub 'main' :main .local int value value = add_two(5) say value .end .sub 'add_two' .param int value .local int val2 val2 = add_one(value) .tailcall add_one(val2) .end .sub 'add_one' .param int a .local int b b = a + 1 .return (b) .end
This example prints the correct value 7
.
Coroutines
Coroutines are similar to subroutines except that they have an internal notion of state. In addition to performing a normal .return
to return control flow back to the caller and destroy the execution environment of the subroutine, coroutines may also perform a .yield
operation. .yield
returns a value to the caller like .return
can, but it does not destroy the execution state of the coroutine. The next call to the coroutine continues execution from the point of the last .yield
, not at the beginning of the coroutine.
Inside a coroutine continuing from a .yield
, the entire execution environment is the same as it was when the coroutine .yield
ed. This means that the parameter values don't change, even if the next invocation of the coroutine had different arguments passed in.
Coroutines look like ordinary subroutines. They do not require any special modifier or any special syntax to mark them as being a coroutine. What sets them apart is the use of the .yield
directive. .yield
plays several roles:
- Identifies coroutines
- Creates a continuation
- Returns a value
When Parrot sees a .yield
, it knows to create a Coroutine PMC object instead of a Sub
PMC.
.yield
creates a continuation in the coroutine and stores the continuation object in the coroutine object for later resuming from the point of the .yield
.
.yield
can return a value ... or many values, or no values. to the caller. It is basically the same as a .return
in this regard.
Here is a simple coroutine example:
.sub 'MyCoro' .yield(1) .yield(2) .yield(3) .return(4) .end .sub 'main' :main $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 $I0 = MyCoro() # 1 $I0 = MyCoro() # 2 $I0 = MyCoro() # 3 $I0 = MyCoro() # 4 .end
This contrived example demonstrates how the coroutine stores its state. When Parrot encounters the .yield
, the coroutine stores its current execution environment. At the next call to the coroutine, it picks up where it left off.
Native Call Interface
The Native Call Interface (NCI) is a special version of the Parrot calling conventions for calling functions in shared C libraries with a known signature. This is a simplified version of the first test in t/pmc/nci.t:
.local pmc library library = loadlib "libnci_test" # library object say "loaded" .local pmc ddfunc ddfunc = dlfunc library, "nci_dd", "dd" # function object say "dlfunced" .local num result result = ddfunc( 4.0 ) # call the function ne result, 8.0, nok_1 say "ok 1" end nok_1: say "not ok 1" #...
This example shows two new opcodes: loadlib
and dlfunc
. The loadlib
opcode obtains a handle for a shared library. It searches for the shared library in the current directory, in runtime/parrot/dynext, and in a few other configured directories. It also tries to load the provided filename unaltered and with appended extensions like .so or .dll. Which extensions it tries depends on the operating system Parrot is running on.
The dlfunc
opcode gets a function object from a previously loaded library (second argument) of a specified name (third argument) with a known function signature (fourth argument). The function signature is a string where the first character is the return value and the rest of the parameters are the function parameters. Table 6-1 lists the characters used in NCI function signatures.