docs/pdds/pdd19_pir.pod - Parrot Intermediate Representation


This document describes PIR, a stable, middle-level language for both compiler and human to target on.




Comments and empty lines ^

Comments start with # and last until the following newline. These and empty lines are ignored.

PIR allows POD blocks.

Statements ^

A valid PIR program consists of a sequence of statements. A statement is terminated by a newline (<NL>). So, each statement has to be on its own line.

General statement format ^

Any statement can start with a optional label and is terminated by a newline:

  [label:] [instruction] <NL>

Labels ^

PIR code has both local and global labels. Global labels start with an underscore, local labels shouldn't. Optional label for the given instruction, can stand on its own line. A label must conform to the syntax of identifier described below.

The name of a global label has to be unique, since it can be called at any point in the program. A local label is accessible only in the compilation unit where it's defined. A local label name must be unique within a compilation unit, but it can be reused in other compilation units.


  branch L1   # local label
  bsr    _L2  # global label


Terms used here ^


Identifiers start with a letter or underscore, then may contain additionally letters, digits, underscores and ::. Identifiers don't have any limit on length.

{{ REVIEW: identifier length limit }}

{{ REVIEW: can op-names be used as identifiers? See #24251. }}



Can be int, float, string or pmc.

{{ REFERENCE: RT#42769 }}


A PASM register In, Sn, Nn, Pn, or a PIR temporary register $In, $Sn, $Nn, $Pn, where n consists of digit(s) only. n must be between 1 and 99.

{{ REVIEW: n limit }}


A local identifier, a reg or a constant (when allowed). A constant is not allowed on the left side of an assignment.

{{ REVIEW: any other places where constant is not allowed }}

Constants ^

'char constant'

Are delimited by '. They are taken to be ascii encoded. No escape sequences are processed.

"string constants"

Are delimited by ". A " inside a string must be escaped by \. Only 7-bit ASCII is accepted in string constants; to use characters outside thar range, specify an encoding in the way below.

<<"heredoc", <<'heredoc'

Heredocs work like single or double quoted strings. All lines up to the terminating delimiter are slurped into the string. The delimiter has to be on its own line, at the beginning of the line and with no trailing whitespace.

Assignment of a heredoc:

A heredoc as an argument:

  function(<<"END_OF_HERE", arg)


Only one heredoc can be active per statement line.

{{ REVIEW: it would be useful to have multiple heredocs per statement, which allows for writing:

   function(<<'INPUT', <<'OUTPUT', 'some test')

charset:"string constant"

Like above with a character set attached to the string. Valid character sets are currently: ascii (the default), binary, unicode (with UTF-8 as the default encoding), and iso-8859-1.

String escape sequences ^

Inside double-quoted strings the following escape sequences are processed.

  \xhh        1..2 hex digits
  \ooo        1..3 oct digits
  \cX         control char X
  \x{h..h}    1..8 hex digits
  \uhhhh      4 hex digits
  \Uhhhhhhhh  8 hex digits
  \a, \b, \t, \n, \v, \f, \r, \e, \\
encoding:charset:"string constant"

Like above with an extra encoding attached to the string. For example:

  set S0, utf8:unicode:"«"
The encoding and charset gets attached to the string, no further processing is done, specifically escape sequences are not honored.

numeric constants

0x and 0b denote hex and binary constants respectively.

Directive instructions ^

.pragma n_operators

Convert arithmethic infix operators to n_infix operations. The unary opcodes abs, not, bnot, bnots, and neg are also changed to use a n_ prefix.

 .pragma n_operators 1
 .sub foo
   $P0 = $P1 + $P2           # n_add $P0, $P1, $P2
   $P2 = abs $P0             # n_abs $P2, $P0
.loadlib "lib_name"

Load the given library at compile time, that is, as soon that line is parsed. See also the loadlib opcode, which does the same at run time.

A library loaded this way is also available at runtime, as if it has been loaded again in :load, so there is no need to call loadlib at runtime.

.HLL "hll_name", "hll_lib"

Define the HLL for the current file. If the string hll_lib isn't empty this compile time pragma also loads the shared lib for the HLL, so that integer type constants are working for creating new PMCs.

.HLL_map 'CoreType', 'UserType'

Whenever Parrot has to create PMCs inside C code on behalf of the running user program it consults the current type mapping for the executing HLL and creates a PMC of type 'UserType' instead of 'CoreType', if such a mapping is defined.

E.g. with this code snippet ...

  .loadlib 'dynlexpad'

  .HLL "Foo", ""
  .HLL_map 'LexPad', 'DynLexPad'

  .sub main :main
... all subroutines for language Foo would use a dynamic lexpad pmc.

{{ PROPOSAL: stop using integer constants for types RT#45453 }}

.sub <identifier> [:<flag> ...]

Define a compilation unit with the label identifier. All code in a PIR source file must be defined in a compilation unit. See PIR Calling Conventions for available flags. Optional flags are a list of flag, separated by empty spaces, and empty spaces only.

{{ PROPOSAL: remove the optional comma in flag list RT#45697 }}

Always paired with .end.


End a compilation unit. Always paired with .sub.


Define a compilation unit containing PASM code. Always paired with .eom.


End a compilation unit containing PASM code. Always paired with .emit.

.local <type> <identifier> [:unique_reg]

Define a local name identifier for this compilation unit and of the given type. You can define multiple identifiers of the same type by separating them with commas:

  .local int i, j
The optional :unique_reg modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit.

.sym <type> <identifier> [:unique_reg]

Same as .local.

{{ PROPOSAL: remove .sym, see RT#45405 }}

.lex <identifier>, <reg>

Declare a lexical variable that is an alias for a PMC register. The PIR compiler calls this method in response to a .lex STRING, PREG directive. For example, given this preamble:

    .lex "$a", $P0
    $P1 = new 'Integer'

    These two opcodes have an identical effect:

    $P0 = $P1
    store_lex "$a", $P1

    And these two opcodes also have an identical effect:

    $P1 = $P0
    $P1 = find_lex "$a"
.const <type> <identifier> = <const>

Define a constant named identifier of type type and assign value const to it.

.globalconst <type> <identifier> = <const>

As .const above, but the defined constant is globally accessible.

.namespace <identifier>

Open a new scope block. This "namespace" is not the same as the .namespace [ <identifier> ] syntax, which is used for storing subroutines in a particular namespace in the global symbol table. This directive is useful in cases such as (pseudocode):

  local x = 1;
  print(x);       # prints 1
  do              # open a new namespace/scope block
    local x = 2;  # this x hides the previous x
    print(x);     # prints 2
  end             # close the current namespace
  print(x);       # prints 1 again
All types of common language constructs such as if, for, while, repeat and such that have nested scopes, can use this directive.

.endnamespace <identifier>

Closes the scope block that was opened with .namespace <identifier>.

.namespace [ <identifier> ; <identifier> ]

Defines the namespace from this point onwards. By default the program is not in any namespace. If you specify more than one, separated by semicolons, it creates nested namespaces, by storing the inner namespace object with a \0 prefix in the outer namespace's global pad.


Directives used for Parrot Calling Conventions. These are:

.pcc_begin and .pcc_end

.pcc_begin_return and .pcc_end_return

.pcc_begin_yield and .pcc_end_yield


{{ REVIEW: Do we still want/need the "pcc_" prefix? See #45925. }}

Directives for subroutine parameters and return ^

.param <type> <identifier> [:<flag>]*

At the top of a subroutine, declare a local variable, in the mannter of .local, into which parameter(s) of the current subroutine should be stored. Available flags: :slurpy, :optional, :opt_flag and :unique_reg.

.param <type> "<identifier>" => <identifier> [:<flag>]*

Define a named parameter. This is syntactic sugar for:

 .param <type> <identifier> :named("<identifier>")
.param <reg> [:<flag>]*

{{ Specifying a register for a parameter does not work. See. RT#46455. }}

At the top of a subroutine, specify where parameter(s) of the current subroutine should be stored. Available flags: :slurpy, :optional, :opt_flag and :unique_reg.

.return <var> [:<flag> ...]

Between .pcc_begin_return and .pcc_end_return, specify one or more of the return value(s) of the current subroutine. Available flags: :flat.

Directives for making a PCC call ^

.arg <var> [:<flag> ...]

Between .pcc_begin and .pcc_call, specify an argument to be passed. Available flags: :flat.

.result <var> [:<flag> ...]

Between .pcc_call and .pcc_end, specify where one or more return value(s) should be stored. Available flags: :slurpy, :optional, and :opt_flag.

Shorthand directives for PCC call and return ^

([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])

This is short for:

  .pcc_arg <arg1> <flag2>
  .pcc_call <var2>
  .result <var1> <flag1>
<var> = <var>([arg [:<flag> ...], ...])

<var>([arg [:<flag> ...], ...])

<var>."_method"([arg [:<flag> ...], ...])

<var>._method([arg [:<flag> ...], ...])

Function or method call. These notations are shorthand for a longer PCC function call with .pcc_* directives. var can denote a global subroutine, a local identifier or a reg.

{{We should review the (currently inconsistent) specification of the method name. Currently it can be a bare word, a quoted string or a string register. See #45859.}}

.return ([<var> [:<flag> ...], ...])

Return from the current compilation unit with zero or more values.

The surrounded parentheses are mandatory. Besides making sequence break more conspicuous, this is necessary to distinguish this syntax from other uses of the .return directive that will be probably deprecated.

.return <var>(args)

.return <var>."somemethod"(args)

.return <var>.somemethod(args)

Tail call: call a function or method and return from the sub with the function or method call return values.

Internally, the call stack doesn't increase because of a tail call, so you can write recursive functions and not have stack overflows.

Parameter Passing and Getting Flags ^

See PDD03 for a description of the meaning of the flag bits SLURPY, OPTIONAL, OPT_FLAG, and FLAT, which correspond to the calling convention flags :slurpy, :optional, :opt_flag, and :flat.

{{ TODO: once these flag bits are solidified by long-term use, then we may choose to copy appropriate bits of the documentation to here. }}

Instructions ^

Instructions may be a valid PASM instruction or anything listed here below:

goto <identifier>

branch to identifier (label or subroutine name).


  goto END
if <var> goto <identifier>

If var evaluates as true, jump to the named identifier. Translate to if var, identifier.

unless <var> goto <identifier>

Unless var evaluates as true, jump to the named identifier. Translate to unless var, identifier.

if null <var> goto <identifier>

If var evaluates as null, jump to the named identifier. Translate to if_null var, identifier.

unless null <var> goto <identifier>

Unless var evaluates as null, jump to the named identifier. Translate to unless_null var, identifier.

if <var1> <relop> <var2> goto <identifier>

The relop can be: <, <=, ==, != >= > which translate to the PASM opcodes lt, le, eq, ne, ge or gt. If var1 relop var2 evaluates as true, jump to the named identifier.

unless <var1> <relop> <var2> goto <identifier>

The relop can be: <, <=, ==, != >= > which translate to the PASM opcodes lt, le, eq, ne, ge or gt. Unless var1 relop var2 evaluates as true, jump to the named identifier.

<var1> = <var2>

Assign a value. Translates to set var1, var2.

<var1> = <unary> <var2>

The unarys !, - and ~ generate not, neg and bnot ops.

<var1> = <var2> <binary> <var3>

The binarys +, -, *, /, % and ** generate add, sub, mul, div, mod and pow arithmetic ops. binary . is concat and only valid for string arguments.

<< and >> are arithmetic shifts shl and shr. >>> is the logical shift lsr.

&&, || and ~~ are logic and, or and xor.

&, | and ~ are binary band, bor and bxor.

<var1> <op>= <var2>

This is equivalent to <var1> = <var1> <op> <var2>. Where op is called an assignment operator and can be any of the following binary operators described earlier: +, -, *, /, %, ., &, |, ~, <<, >> or >>>.

<var> = <var> [ <var> ]

This generates either a keyed set operation or substr var, var, var, 1 for string arguments and an integer key.

<var> = <var> [ <key> ]

where key is:

 <var1> .. <var2>
returns a slice defined starting at var1 and ending at var2.

 .. <var2>
returns a slice starting at the first element, and ending at var2.

 <var1> ..
returns a slice starting at var1 to the end of the array.

see src/pmc/slice.pmc and t/pmc/slice.t.

<var> [ <var> ] = <var>

A keyed set operation or the assign substr op with a length of 1.

<var> = new '<type>'

Create a new PMC of type type stored in var. Translate to new var, 'type'.

<var1> = new '<type>', <var2>

Create a new PMC of type type stored in var1 and using var2 as PMC containing initialization data. Translate to new var1, 'type', var2

<var1> = defined <var2>

Assign to var1 the value for definedness of var2. Translate to defined var1, var2.

<var1> = defined <var2> [ <var3> ]

defined var1, var2[var3] the keyed op.

global "string" = <var>

{{ DEPRECATED: op store_global was deprecated }}

<var> = global "string"

{{ DEPRECATED: op find_global was deprecated }}

<var1> = clone <var2>

Assing to var1 a clone of var2. Translate to clone var1, var2.

<var> = addr <identifier>

Assign to var the address of label identified by identifier. Translate to set_addr var, var.

<var> = null

Set var to null. Translate to null <var.


Return the address of a label.