PDD 19: Parrot Intermediate Representation (PIR)
Version
$Revision$
Abstract
This document outlines the architecture and core syntax of Parrot Intermediate Representation (PIR).
Description
PIR is a stable, middle-level language intended both as a target for the generated output from high-level language compilers, and for human use developing core features and extensions for Parrot.
Basic Syntax
A valid PIR program consists of a sequence of statements, directives, comments and empty lines.
Statements
A statement starts with an optional label, contains an instruction, and is terminated by a newline (<NL>). Each statement must be on its own line.
[label:] [instruction] <NL>
An instruction may be either a low-level opcode or a higher-level PIR operation, such as a subroutine call, a method call, a directive, or PIR syntactic sugar.
Directives
A directive provides information for the PIR compiler that is outside the normal flow of executable statements. Directives are all prefixed with a ".", as in .local
or .sub
.
Comments
Comments start with #
and last until the following newline. PIR also allows comments in Pod format. Comments, Pod content, and empty lines are ignored.
Identifiers
Identifiers start with a letter or underscore, then may contain additionally letters, digits, and underscores. Identifiers don't have any limit on length at the moment, but some sane-but-generous length limit may be imposed in the future (256 chars, 1024 chars?). The following examples are all valid identifiers.
a _a A42
Opcode names are not reserved words in PIR, and may be used as variable names. For example, you can define a local variable named print
. [See RT #24251] Note that currently, by using an opcode name as a local variable name, the variable will hide the opcode name, effectively making the opcode unusable. In the future this will be resolved.
The PIR language is designed to have as few reserved keywords as possible. Currently, in contrast to opcode names, PIR keywords are reserved, and cannot be used as identifiers. Some opcode names are, in fact, PIR keywords, which therefore cannot be used as identifiers. This, too, will be resolved in a future re-implementation of the PIR compiler.
The following are PIR keywords, and cannot currently be used as identifiers:
goto if int null num pmc string unless
Labels
A label declaration consists of a label name followed by a colon. A label name conforms to the standard requirements for identifiers. A label declaration may occur at the start of a statement, or stand alone on a line, but always within a subroutine.
A reference to a label consists of only the label name, and is generally used as an argument to an instruction or directive.
A PIR label is accessible only in the subroutine where it's defined. A label name must be unique within a subroutine, but it can be reused in other subroutines.
goto label1 ... label1:
Registers and Variables
There are two ways of referencing Parrot's registers. The first is through named local variables declared with .local
.
.local pmc foo
The type of a named variable can be int
, num
, string
or pmc
, corresponding to the types of registers. No other types are used.
The second way of referencing a register is through a register variable $In
, $Sn
, $Nn
, or $Pn
. The capital letter indicates the type of the register (integer, string, number, or PMC). n consists of digit(s) only. There is no limit on the size of n. There is no direct correspondence between the value of n and the position of the register in the register set, $P42
may be stored in the zeroth PMC register, if it is the only register in the subroutine.
Constants
Constants may be used in place of registers or variables. A constant is not allowed on the left side of an assignment, or in any other context where the variable would be modified.
- 'single-quoted string constant' Are delimited by single-quotes (
- "double-quoted string constants" Are delimited by double-quotes (
- <<"heredoc", <<'heredoc' Heredocs work like single or double quoted strings. All lines up to the terminating delimiter are slurped into the string. The delimiter has to be on its own line, at the beginning of the line and with no trailing whitespace.Assignment of a heredoc:
- charset:"string constant" Like above with a character set attached to the string. Valid character sets are currently:
'
). They are taken to be ASCII encoded. No escape sequences are processed.
"
). A "
inside a string must be escaped by \
. The default encoding for a double-quoted string constant is 7-bit ASCII, other character sets and encodings must be marked explicitly using a charset or encoding flag.
$S0 = <<"EOS" ... EOSA heredoc as an argument:
function(<<"END_OF_HERE", arg) ... END_OF_HERE .return(<<'EOS') ... EOS .yield(<<'EOS') ... EOSAlthough currently not possible, a future implementation of the PIR language will allow you to use multiple heredocs within a single statement or directive:
function(<<'INPUT', <<'OUTPUT', 'some test') ... INPUT ... OUTPUT
ascii
(the default), binary
, unicode
(with UTF-8 as the default encoding), and iso-8859-1
.String escape sequences
Inside double-quoted strings the following escape sequences are processed.
\xhh 1..2 hex digits \ooo 1..3 oct digits \cX control char X \x{h..h} 1..8 hex digits \uhhhh 4 hex digits \Uhhhhhhhh 8 hex digits \a, \b, \t, \n, \v, \f, \r, \e, \\, \"
- encoding:charset:"string constant" Like above with an extra encoding attached to the string. For example:
- numeric constants Both integers (
set S0, utf8:unicode:"«"The encoding and charset are attached to the string constant, and adopted by any string container the constant is assigned to.The standard escape sequences are honored within strings with an alternate encoding, so in the example above, you can include a particular Unicode character as either a literal sequence of bytes, or as an escape sequence.
42
) and numbers (3.14159
) may appear as constants. 0x
and 0b
denote hex and binary constants respectively.Directives
- .local <type> <identifier> [:unique_reg] Define a local name identifier within a subroutine with the given type. You can define multiple identifiers of the same type by separating them with commas:
- When a subroutine has a small fixed number of registers
- When a named variable or named register is used throughout the entire subroutine
- When a reference needs to be made to a register
- .lex <string constant>, <reg> Declare a lexical variable that is an alias for a PMC register. For example, given this preamble:
- .const <type> <identifier> = <const> Define a constant named identifier of type type and assign value const to it. The type must be
- .globalconst <type> <identifier> = <const> As
- .sub
- .end End a subroutine. Always paired with
- .namespace [ <identifier> ; <identifier> ]
- .loadlib 'lib_name' Load the given library at compile time, that is, as soon that line is parsed. See also the
- .HLL <hll_name> Define the HLL namespace from that point on in the file. Takes one string constant, the name of the HLL. By default, the HLL namespace is 'parrot'.
- .line <integer> Set the current PIR line number to the value specified. This is useful in case the PIR code is generated from some source PIR files, and error messages should print the source file's line number, not the line number of the generated file. Note that line numbers increment per line of PIR; if you are trying to store High Level Language debug information, you should instead be using the
- .file <quoted_string> Set the current PIR file name to the value specified. This is useful in case the PIR code is generated from some source PIR files, and error messages should print the source file's name, not the name of the generated file.
- .annotate <key>, <value> Makes an entry in the bytecode annotations table. This is used to store high level language debug information. Examples:
.local int i, jThe optional
:unique_reg
modifier will force the register allocator to associate the identifier with a unique register for the duration of the subroutine. If the register allocator is thought of as an optimization tool for allowing fewer registers to be used in a register frame by reusing unused registers, then the :unique_reg
directive forces this optimization to be turned off. This can be important in a number of situations:
.lex '$a', $P0 $P1 = new 'Integer' These two opcodes have an identical effect: $P0 = $P1 store_lex '$a', $P1 And these two opcodes also have an identical effect: $P1 = $P0 $P1 = find_lex '$a'
int
, num
, string
or a string constant indicating the PMC type. This allows you to create PMC constants representing subroutines; the value of the constant in that case is the name of the subroutine. If the referred subroutine has an :immediate
flag and it returns a value, then that value is stored instead of the subroutine..const
declarations representing subroutines can only be written within a .sub
. The constant is stored in the constant table of the current bytecode file.
.const
above, but the defined constant is globally accessible. .globalconst
may only be used withing a .sub
.
.sub <identifier> [:<flag> ...] .sub <quoted string> [:<flag> ...]Define a subroutine. All code in a PIR source file must be defined in a subroutine. See the section "Subroutine flags" for available flags. Optional flags are a list of flag, separated by spaces.The name of the sub may be either a bare identifier or a quoted string constant. Bare identifiers must be valid PIR identifiers (see Identifiers above), but string sub names can contain any characters, including characters from different character sets (see Constants above).Always paired with
.end
.
.sub
.
.namespace [ <key>? ] key: <identifier> [';' <identifier>]*Defines the namespace from this point onwards. By default the program is not in any namespace. If you specify more than one, separated by semicolons, it creates nested namespaces, by storing the inner namespace object in the outer namespace's global pad.You can specify the root namespace by using empty brackets, such as:
.namespace [ ]The brackets are not optional, although the key inside them is.
loadlib
opcode, which does the same at run time.A library loaded this way is also available at runtime, as if it has been loaded again in :load
, so there is no need to call loadlib
at runtime.
.annotate
directive.
.annotate "file", "aardvark.p6" .annotate "line", 5 .annotate "column", 24An annotation stays in effect until the next annotation with the same key or the end of the current compilation unit (that is, if you use a tool such as
pbc_merge
to link multiple bytecode files, then annotations will not spill over from one mergee's bytecode to another).One annotation covers many PIR instructions. If the result of compiling one line of HLL code is 15 lines of PIR, you only need to emit one annotation before the first of those 15 lines to set the line number.
.annotate "line", 42The key must always be a quoted string. The value may be an integer, a number or a quoted string. Note that integer values are stored most compactly; should you instead of the above annotate directive emit:
.annotate "line", "42"then instead "42" is stored as a string, taking up more space in the resulting bytecode file.
Subroutine flags
- :main Define "main" entry point to start execution. If multiple subroutines are marked as :main, the last marked subroutine is used. Only the first file loaded or compiled counts; subs marked as :main are ignored by the load_bytecode op. If no :main flag is specified at all, execution starts at the first subroutine in the file.
- :load Run this subroutine when loaded by the load_bytecode op (i.e. neither in the initial program file nor compiled from memory). This is complementary to what :init does (below); to get both behaviours, use :init :load. If multiple subs have the :load pragma, the subs are run in source code order.
- :init Run the subroutine when the program is run directly (that is, not loaded as a module), including when it is compiled from memory. This is complementary to what :load does (above); to get both behaviours, use :init :load.
- :anon Do not install this subroutine in the namespace. Allows the subroutine name to be reused.
- :multi(type1, type2...) Engage in multiple dispatch with the listed types. See "pdds/pdd27_multi_dispatch.pod" in docs for more information on the multiple dispatch system.
- :immediate Execute this subroutine immediately after being compiled, which is analogous to
- :postcomp Execute immediately after being compiled, but only if the subroutine is in the initial file (i.e. not in PIR compiled as result of a
- :method
- :vtable
- :outer(subname) The marked
- :subid( <string_constant> ) Specifies a unique string identifier for the subroutine. This is useful for referring to a particular subroutine with
- :instanceof( <string_constant> ) The
- :nsentry( <string_constant> ) Specify the name by which the subroutine is stored in the namespace. The default name by which a subroutine is stored in the namespace (if this flag is missing), is the subroutine's name as given after the
BEGIN
in Perl 5.In addition, if the sub returns a PMC value, that value replaces the sub in the constant table of the bytecode file. This makes it possible to build constants at compile time, provided that (a) the generated constant can be computed at compile time (i.e. doesn't depend on the runtime environment), and (b) the constant value is of a PMC class that supports saving in a bytecode file.{{ TODO: need a freeze/thaw reference }}.For instance, after compilation of the sub 'init', that sub is executed immediately (hence the :immediate flag). Instead of storing the sub 'init' in the constants table, the value returned by 'init' is stored, which in this example is a FixedIntegerArrray.
.sub main :main .const "Sub" initsub = "init" .end .sub init :immediate .local pmc array array = new 'FixedIntegerArray' array = 256 # set size to 256 # code to initialize array .return (array) .end
load_bytecode
instruction from another file).As an example, suppose file main.pir
contains:
.sub main load_bytecode 'foo.pir' .endand the file
foo.pir
contains:
.sub foo :immediate print '42' .end .sub bar :postcomp print '43' .endExecuting
foo.pir
will run both foo
and bar
. On the other hand, executing main.pir
will run only foo
. If foo.pir
is compiled to bytecode, only foo
will be run, and loading foo.pbc
will not run either foo
or bar
.
.sub bar :method .sub bar :method('foo')The marked
.sub
is a method, added as a method in the class that corresponds to the current namespace, and not stored in the namespace. In the method body, the object PMC can be referred to with self
.If a string argument is given to :method
the method is stored with that name instead of the .sub
name.
.sub bar :vtable .sub bar :vtable('foo')The marked
.sub
overrides a vtable function, and is not stored in the namespace. By default, it overrides a vtable function with the same name as the .sub
name. To override a different vtable function, use :vtable('...')
. For example, to have a .sub
named ToString also be the vtable function get_string
), use :vtable('get_string')
.When the :vtable flag is set, the object PMC can be referred to with self
, as with the :method flag.
.sub
is lexically nested within the sub known by subname.
:outer
, even though several subroutines in the file may have the same name (because they are multi, or in different namespaces).
:instanceof
pragma is an experimental pragma that creates a sub as a PMC type other than 'Sub'. However, as currently implemented it doesn't work well with :outer
or existing PMC types such as Closure
, Coroutine
, etc.
.sub
directive. This flag allows to override this.Directives used for Parrot calling conventions.
- .begin_call and .end_call Directives to start and end a subroutine invocation, respectively.
- .begin_return and .end_return Directives to start and end a statement to return values.
- .begin_yield and .end_yield Directives to start and end a statement to yield values.
- .call Takes either 2 arguments: the sub and the return continuation, or the sub only. For the latter case an invokecc gets emitted. Providing an explicit return continuation is more efficient, if its created outside of a loop and the call is done inside a loop.
- .invocant Directive to specify the object for a method call. Use it in combination with
- .meth_call Directive to do a method call. It calls the specified method on the object that was specified with the
- .nci_call Directive to make a call through the Native Calling Interface (NCI). The specified subroutine must be loaded using the <dlfunc> op that takes the library, function name and function signature as arguments. See "pdds/pdd16_native_call" in docs for details.
- .set_return <var> [:<flag>]* Between
- .set_yield <var> [:<flag>]* Between
- .set_arg <var> [:<flag>]* Between
- .get_result <var> [:<flag>]* Between
.meth_call
.
.invocant
directive.
.begin_return
and .end_return
, specify one or more of the return value(s) of the current subroutine. Available flags: :flat
, :named
.
.begin_yield
and .end_yield
, specify one or more of the yield value(s) of the current subroutine. Available flags: :flat
, :named
.
.begin_call
and .call
, specify an argument to be passed. Available flags: :flat
, :named
.
.call
and .end_call
, specify where one or more return value(s) should be stored. Available flags: :slurpy
, :named
, :optional
, and :opt_flag
.Directives for subroutine parameters
- .param <type> <identifier> [:<flag>]* At the top of a subroutine, declare a local variable, in the manner of
.local
, into which parameter(s) of the current subroutine should be stored. Available flags: :slurpy
, :named
, :optional
, :opt_flag
and :unique_reg
.Parameter Passing and Getting Flags
See PDD03 for a description of the meaning of the flag bits SLURPY
, OPTIONAL
, OPT_FLAG
, and FLAT
, which correspond to the calling convention flags :slurpy
, :optional
, :opt_flag
, and :flat
.
Catching Exceptions
Using the push_eh
op you can install an exception handler. If an exception is thrown, Parrot will execute the installed exception handler. In order to retrieve the thrown exception, use the .get_results
directive. This directive always takes one argument: an exception object.
push_eh handler ... handler: .local pmc exception .get_results (exception) ...
This is syntactic sugar for the get_results
op, but any flags set on the targets will be handled automatically by the PIR compiler. The .get_results
directive must be the first instruction of the exception handler; only declarations (.lex, .local) may come first.
To resume execution after handling the exception, just invoke the continuation stored in the exception.
... .get_results(exception) ... continuation = exception['resume'] continuation() ...
See PDD23 for accessing the various attributes of the exception object.
Syntactic Sugar
Any PASM opcode is a valid PIR instruction. In addition, PIR defines some syntactic shortcuts. These are provided for ease of use by humans producing and maintaining PIR code.
- goto <identifier>
- if <var> goto <identifier> If var evaluates as true, jump to the named identifier.
- unless <var> goto <identifier> Unless var evaluates as true, jump to the named identifier.
- if null <var> goto <identifier> If var evaluates as null, jump to the named identifier.
- unless null <var> goto <identifier> Unless var evaluates as null, jump to the named identifier.
- if <var1> <relop> <var2> goto <identifier> The relop can be:
- unless <var1> <relop> <var2> goto <identifier> The relop can be:
- <var1> = <var2> Assign a value.
- <var1> = <unary> <var2> Unary operations
- <var1> = <var2> <binary> <var3> Binary arithmetic operations
- <var1> <op>= <var2> This is equivalent to
- <var> = <var> [ <var> ] A keyed
- <var> [ <var> ] = <var> A keyed
- <var> = <opcode> <arguments> Many opcodes can use this PIR syntactic sugar. The first argument for the opcode is placed before the
- ([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...]) This is short for:
- <var> = <var>([arg [:<flag> ...], ...])
- <var>([arg [:<flag> ...], ...])
- <var>."_method"([arg [:<flag> ...], ...])
- <var>.<var>([arg [:<flag> ...], ...]) Function or method call. These notations are shorthand for a longer PCC function call. var can denote a global subroutine, a local identifier or a reg.
- .return ([<var> [:<flag> ...], ...]) Return from the current subroutine with zero or more values.
- .tailcall <var>(args)
- .tailcall <var>.'somemethod'(args)
- .tailcall <var>.<var>(args) Tail call: call a function or method and return from the sub with the function or method call return values.Internally, the call stack doesn't increase because of a tail call, so you can write recursive functions and not have stack overflows.Whitespace surrounding the dot ('.') that separates the object from the method is not allowed.
branch
to identifier (label or subroutine name).Examples:
goto END
<, <=, ==, != >= >
. which translate to the PASM opcodes lt
, le
, eq
, ne
, ge
or gt
. If var1 relop var2 evaluates as true, jump to the named identifier.
<, <=, ==, != >= >
. Unless var1 relop var2 evaluates as true, jump to the named identifier.
!
(NOT), -
(negation) and ~
(bitwise NOT).
+
(addition), -
(subtraction), *
(multiplication), /
(division), %
(modulus) and **
(exponent). Binary .
is concatenation and only valid for string arguments.<<
and >>
are arithmetic shifts left and right. >>>
is the logical shift right.Binary logic operations &&
(AND), ||
(OR) and ~~
(XOR).Binary bitwise operations &
(bitwise AND), |
(bitwise OR) and ~
(bitwise XOR).Binary relational operations <, <=, ==, != >= >
.
<var1> = <var1> <op> <var2>
. Where op is called an assignment operator and can be any of the following binary operators described earlier: +
, -
, *
, /
, %
, .
, &
, |
, ~
, <<
, >>
or >>>
.
set
operation for PMCs to retrieve a value from an aggregate. This maps to:
set <var>, <var> [ <var> ]
set
operation to set a value in an aggregate. This maps to:
set <var> [ <var> ], <var>
=
, and all remaining arguments go after the opcode name. For example:
new $P0, 'Type'becomes:
$P0 = new 'Type'Note that this only works for opcodes that have have a leading
OUT
parameter. [this restriction unimplemented: RT #36283]
.begin_call .set_arg <arg1> <flag2> ... .call <var2> .get_result <var1> <flag1> ... .end_call
Assignment and Morphing
The =
syntactic sugar in PIR, when used in the simple case of:
<var1> = <var2>
directly corresponds to the set
opcode. So, two low-level arguments (int, num, or string registers, variables, or constants) are a direct C assignment, or a C-level conversion (int cast, float cast, a string copy, or a call to one of the conversion functions like string_to_num
).
Assigning a PMC argument to a low-level argument calls the get_integer
, get_number
, or get_string
vtable function on the PMC. Assigning a low-level argument to a PMC argument calls the set_integer_native
, set_number_native
, or set_string_native
vtable function on the PMC (assign to value semantics). Two PMC arguments are a direct C assignment (assign to container semantics).
For assign to value semantics for two PMC arguments use assign
, which calls the assign_pmc
vtable function.
Macros
This section describes the macro layer of the PIR language. The macro layer of the PIR compiler handles the following directives:
.include
'<filename>'.macro
<identifier> [<parameters>].endm
.macro_const
<identifier> (<literal>|<reg>)
The .include
directive takes a string argument that contains the name of the PIR file that is included. The contents of the included file are inserted as if they were written at the point where the .include
directive occurs.
The include file is searched for in the current directory and in runtime/parrot/include, in that order. The first file of that name to be found is included.
The .include
directive's search order is subject to change.
The .macro
directive starts the a macro definition named by the specified identifier. The optional parameter list is a comma-separated list of identifiers, enclosed in parentheses. See .endm
for ending the macro definition.
Closes a macro definition.
.macro_const PI 3.14
The .macro_const
directive is a special type of macro; it allows the user to use a symbolic name for a constant value. Like .macro
, the substitution occurs at compile time. It takes two arguments (not comma separated), the first is an identifier, the second a constant value or a register.
The macro layer is completely implemented in the lexical analysis phase. The parser does not know anything about what happens in the lexical analysis phase.
When the .include
directive is encountered, the specified file is opened and the following tokens that are requested by the parser are read from that file.
A macro expansion is a dot-prefixed identifier. For instance, if a macro was defined as shown below:
.macro foo(bar) ... .endm
this macro can be expanded by writing .foo(42)
. The body of the macro will be inserted at the point where the macro expansion is written.
A .macro_const
expansion is more or less the same as a .macro
expansion, except that a constant expansion cannot take any arguments, and the substitution of a .macro_const
contains no newlines, so it can be used within a line of code.
Macro parameter list
The parameter list for a macro is specified in parentheses after the name of the macro. Macro parameters are not typed.
.macro foo(bar, baz, buz) ... .endm
The number of arguments in the call to a macro must match the number of parameters in the macro's parameter list. Macros do not perform multidispatch, so you can't have two macros with the same name but different parameters. Calling a macro with the wrong number of arguments gives the user an error.
If a macro defines no parameter list, parentheses are optional on both the definition and the call. This means that a macro defined as:
.macro foo ... .endm
can be expanded by writing either .foo
or .foo()
. And a macro definition written as:
.macro foo() ... .endm
can also be expanded by writing either .foo
or .foo()
.
Note: IMCC requires you to write parentheses if the macro was declared with (empty) parentheses. Likewise, when no parentheses were written (implying an empty parameter list), no parentheses may be used in the expansion.
- Heredoc arguments
Heredoc arguments are not allowed when expanding a macro. The next implementation of PIR ("PIRC") will be able to handle this correctly. This means that, currently, when using IMCC, the following is not allowed:
.macro foo(bar) ... .endm .foo(<<'EOS') This is a heredoc string. EOS
Using braces, { }, allows you to span multiple lines for an argument. See runtime/parrot/include/hllmacros.pir for examples and possible usage. A simple example is this:
.macro foo(a,b) .a .b .endm .sub main .foo({ print "1" print "2" }, { print "3" print "4" }) .end
This will expand the macro foo
, after which the input to the PIR parser is:
.sub main print "1" print "2" print "3" print "4" .end
which will result in the output:
1234
Unique local labels
Within the macro body, the user can declare a unique label identifier using the value of a macro parameter, like so:
.macro foo(a) ... .label $a: ... .endm
Unique local variables
Note: this is not yet implemented in IMCC.
Within the macro body, the user can declare a local variable with a unique name.
.macro foo() ... .macro_local int b ... .b = 42 print .b # prints the value of the unique variable (42) ... .endm
The .macro_local
directive declares a local variable with a unique name in the macro. When the macro .foo()
is called, the resulting code that is given to the parser will read as follows:
.sub main .local int local__foo__b__2 ... local__foo__b__2 = 42 print local__foo__b__2 .end
The user can also declare a local variable with a unique name set to the symbolic value of one of the macro parameters.
.macro foo(b) ... .macro_local int $b ... .$b = 42 print .$b # prints the value of the unique variable (42) print .b # prints the value of parameter "b", which is # also the name of the variable. ... .endm
So, the special $
character indicates whether the symbol is interpreted as just the value of the parameter, or that the variable by that name is meant. Obviously, the value of b
should be a string.
The automatic name munging on .macro_local
variables allows for using multiple macros, like so:
.macro foo(a) .macro_local int $a .endm .macro bar(b) .macro_local int $b .endm .sub main .foo("x") .bar("x") .end
This will result in code for the parser as follows:
.sub main .local int local__foo__x__2 .local int local__bar__x__4 .end
Each expansion is associated with a unique number; for labels declared with .macro_label
and locals declared with .macro_local
expansions, this means that multiple expansions of a macro will not result in conflicting label or local names.
Ordinary local variables
Defining a non-unique variable can still be done, using the normal syntax:
.macro foo(b) .local int b .macro_local int $b .endm
When invoking the macro foo
as follows:
.foo("x")
there will be two variables: b
and x
. When the macro is invoked twice:
.sub main .foo("x") .foo("y") .end
the resulting code that is given to the parser will read as follows:
.sub main .local int b .local int local__foo__x .local int b .local int local__foo__y .end
Obviously, this will result in an error, as the variable b
is defined twice. If you intend the macro to create unique variables names, use .macro_local
instead of .local
to take advantage of the name munging.
Examples
Subroutine Definition
A simple subroutine, marked with :main
, indicating it's the entry point in the file. Other sub flags include :load
, :init
, etc.
.sub sub_label :main .param int a .param int b .param int c .begin_return .set_return xy .end_return .end
Subroutine Call
Invocation of a subroutine. In this case a continuation subroutine is created.
.const "Sub" $P0 = "sub_label" $P1 = new 'Continuation' set_addr $P1, ret_addr # ... .local int x .local num y .local string z .begin_call .set_arg x .set_arg y .set_arg z .call $P0, $P1 # r = _sub_label(x, y, z) ret_addr: .local int r # optional - new result var .get_result r .end_call
NCI Call
load_lib $P0, "libname" dlfunc $P1, $P0, "funcname", "signature" # ... .begin_call .set_arg x .set_arg y .set_arg z .nci_call $P1 # r = funcname(x, y, z) .local int r # optional - new result var .get_result r .end_call
Subroutine Call Syntactic Sugar
Below there are three different ways to invoke the subroutine sub_label
. The first retrieves a single return value, the second retrieves 3 return values, whereas the last discards any return values.
.local int r0, r1, r2 r0 = sub_label($I0, $I1, $I2) (r0, r1, r2) = sub_label($I0, $I1, $I2) sub_label($I0, $I1, $I2)
This also works for NCI calls, as the subroutine PMC will be a NCI sub, and on invocation will do the Right Thing.
Instead of the label a subroutine object can be used too:
get_global $P0, "sub_label" $P0(args)
Methods
.namespace [ "Foo" ] .sub _sub_label :method [,Subpragma, ...] .param int a .param int b .param int c # ... self."_other_meth"() # ... .begin_return .set_return xy .end_return ... .end
The variable "self" automatically refers to the invocating object, if the subroutine declaration contains "method".
Calling Methods
The syntax is very similar to subroutine calls. The call is done with .meth_call
which must immediately be preceded by the .invocant
:
.local int x, y, z .local pmc class, obj newclass class, "Foo" new obj, class .begin_call .set_arg x .set_arg y .set_arg z .invocant obj .meth_call "method" [, $P1 ] # r = obj."method"(x, y, z) .local int r # optional - new result var .get_result r .end_call ...
The return continuation is optional. The method can be a string constant or a string variable.
Returning and Yielding
.return ( a, b ) # return the values of a and b .return () # return no value .tailcall func_call() # tail call function .tailcall o."meth"() # tail method call
Similarly, one can yield using the .yield directive
.yield ( a, b ) # yield with the values of a and b .yield () # yield with no value
Implementation
There are multiple implementations of PIR, each of which will meet this specification for the syntax. Currently there are the following implementations:
- compilers/imcc
- compilers/pirc
- languages/PIR
This is the current implementation being used in Parrot. Some of the specified syntactic constructs in this PDD are not implemented in IMCC; these constructs are marked with notes saying so.
This is a new implementation which will fix several of IMCC's shortcomings. It will replace IMCC.
This is a PGE-based implementation, but needs to be updated and completed.
Attachments
N/A
Footnotes
N/A
References
N/A