NAME

docs/pdds/pdd19_pir.pod - Parrot Intermediate Representation

VERSION

$Revision$

ABSTRACT

This document outlines the architecture and core syntax of Parrot Intermediate Representation (PIR).

DESCRIPTION

PIR is a stable, middle-level language intended both as a target for the generated output from high-level language compilers, and for human use developing core features and extensions for Parrot.

Basic Syntax

A valid PIR program consists of a sequence of statements, directives, comments and empty lines.

Statements

A statement starts with an optional label, contains an instruction, and is terminated by a newline (<NL>). Each statement must be on its own line.

  [label:] [instruction] <NL>

An instruction may be either a low-level opcode or a higher-level PIR operation, such as a subroutine call, a method call, a directive, or PIR syntactic sugar.

Directives

A directive provides information for the PIR compiler that is outside the normal flow of executable statements. Directives are all prefixed with a ".", as in .local or .sub.

Comments

Comments start with # and last until the following newline. PIR also allows comments in Pod format. Comments, Pod content, and empty lines are ignored.

Identifiers start with a letter or underscore, then may contain additionally letters, digits, and underscores. Identifiers don't have any limit on length at the moment, but some sane-but-generous length limit may be imposed in the future (256 chars, 1024 chars?). The following examples are all valid identifiers.

    a
    _a
    A42

Opcode names are not reserved words in PIR, and may be used as variable names. For example, you can define a local variable named print. [See RT #24251] Note that currently, by using an opcode name as a local variable name, the variable will hide the opcode name, effectively making the opcode unusable. In the future this will be resolved.

The PIR language is designed to have as few reserved keywords as possible. Currently, in contrast to opcode names, PIR keywords are reserved, and cannot be used as identifiers. Some opcode names are, in fact, PIR keywords, which therefore cannot be used as identifiers. This, too, will be resolved in a future re-implementation of the PIR compiler.

The following are PIR keywords, and cannot be used as identifiers:

 goto      if       int         null
 num       pmc      string      unless

Labels

A label declaration consists of a label name followed by a colon. A label name conforms to the standard requirements for identifiers. A label declaration may occur at the start of a statement, or stand alone on a line, but always within a subroutine.

A reference to a label consists of only the label name, and is generally used as an argument to an instruction or directive.

A PIR label is accessible only in the subroutine where it's defined. A label name must be unique within a subroutine, but it can be reused in other subroutines.

  goto label1
     ...
  label1:

Registers and Variables

There are two ways of referencing Parrot's registers. The first is through named local variables declared with .local.

  .local pmc foo

The type of a named variable can be int, num, string or pmc, corresponding to the types of registers. No other types are used.

The second way of referencing a register is through a register variable $In, $Sn, $Nn, or $Pn. The capital letter indicates the type of the register (integer, string, number, or PMC). n consists of digit(s) only. There is no limit on the size of n. There is no direct correspondence between the value of n and the position of the register in the register set, $P42 may be stored in the zeroth PMC register, if it is the only register in the subroutine.

Constants

Constants may be used in place of registers or variables. A constant is not allowed on the left side of an assignment, or in any other context where the variable would be modified.

'single-quoted string constant'
"double-quoted string constants"
<<"heredoc", <<'heredoc'
charset:"string constant"

String escape sequences

Inside double-quoted strings the following escape sequences are processed.

  \xhh        1..2 hex digits
  \ooo        1..3 oct digits
  \cX         control char X
  \x{h..h}    1..8 hex digits
  \uhhhh      4 hex digits
  \Uhhhhhhhh  8 hex digits
  \a, \b, \t, \n, \v, \f, \r, \e, \\, \"

encoding:charset:"string constant"
numeric constants

Directives

.local <type> <identifier> [:unique_reg]
.lex <string constant>, <reg>
.const <type> <identifier> = <const>
.globalconst <type> <identifier> = <const>
.sub
.end
.namespace [ <identifier> ; <identifier> ]
.loadlib 'lib_name'
.HLL <hll_name>
.HLL_map <core_type> = <user_type>
.line <integer>
.file <quoted_string>
.annotate <key>, <value>

Subroutine flags

:main
:load
:init
:anon
:multi(Type1, Type2...)
:immediate
:postcomp
:method
:vtable
:outer(subname)
:subid( <string_constant> )
:instanceof( <string_constant> )
:nsentry( <string_constant> )

Directives used for Parrot calling conventions.

.begin_call and .end_call
.begin_return and .end_return
.begin_yield and .end_yield
.call
.invocant
.meth_call
.nci_call
.set_return <var> [:<flag>]*
.set_yield <var> [:<flag>]*
.set_arg <var> [:<flag>]*
.get_result <var> [:<flag>]*

Directives for subroutine parameters

.param <type> <identifier> [:<flag>]*

Parameter Passing and Getting Flags

See PDD03 for a description of the meaning of the flag bits SLURPY, OPTIONAL, OPT_FLAG, and FLAT, which correspond to the calling convention flags :slurpy, :optional, :opt_flag, and :flat.

Catching Exceptions

Using the push_eh op you can install an exception handler. If an exception is thrown, Parrot will execute the installed exception handler. In order to retrieve the thrown exception, use the .get_results directive. This directive always takes one argument: an exception object.

   push_eh handler
   ...
 handler:
   .local pmc exception
   .get_results (exception)
   ...

This is syntactic sugar for the get_results op, but any flags set on the targets will be handled automatically by the PIR compiler. The .get_results directive must be the first instruction of the exception handler; only declarations (.lex, .local) may come first.

To resume execution after handling the exception, just invoke the continuation stored in the exception.

   ...
   .get_results(exception)
   ...
   continuation = exception['resume']
   continuation()
   ...

See PDD23 for accessing the various attributes of the exception object.

Syntactic Sugar

Any PASM opcode is a valid PIR instruction. In addition, PIR defines some syntactic shortcuts. These are provided for ease of use by humans producing and maintaining PIR code.

goto <identifier>
if <var> goto <identifier>
unless <var> goto <identifier>
if null <var> goto <identifier>
unless null <var> goto <identifier>
if <var1> <relop> <var2> goto <identifier>
unless <var1> <relop> <var2> goto <identifier>
<var1> = <var2>
<var1> = <unary> <var2>
<var1> = <var2> <binary> <var3>
<var1> <op>= <var2>
<var> = <var> [ <var> ]
<var> [ <var> ] = <var>
<var> = <opcode> <arguments>
([<var1> [:<flag1> ...], ...]) = <var2>([<arg1> [:<flag2> ...], ...])
<var> = <var>([arg [:<flag> ...], ...])
<var>([arg [:<flag> ...], ...])
<var>."_method"([arg [:<flag> ...], ...])
<var>.<var>([arg [:<flag> ...], ...])
.return ([<var> [:<flag> ...], ...])
.tailcall <var>(args)
.tailcall <var>.'somemethod'(args)
.tailcall <var>.<var>(args)

Assignment and Morphing

The = syntactic sugar in PIR, when used in the simple case of:

  <var1> = <var2>

directly corresponds to the set opcode. So, two low-level arguments (int, num, or string registers, variables, or constants) are a direct C assignment, or a C-level conversion (int cast, float cast, a string copy, or a call to one of the conversion functions like string_to_num).

Assigning a PMC argument to a low-level argument calls the get_integer, get_number, or get_string vtable function on the PMC. Assigning a low-level argument to a PMC argument calls the set_integer_native, set_number_native, or set_string_native vtable function on the PMC (assign to value semantics). Two PMC arguments are a direct C assignment (assign to container semantics).

For assign to value semantics for two PMC arguments use assign, which calls the assign_pmc vtable function.

Macros

This section describes the macro layer of the PIR language. The macro layer of the PIR compiler handles the following directives:

.include '<filename>'

The .include directive takes a string argument that contains the name of the PIR file that is included. The contents of the included file are inserted as if they were written at the point where the .include directive occurs.

The include file is searched for in the current directory and in runtime/parrot/include, in that order. The first file of that name to be found is included.

{{ NOTE: the .include directive's search order is subject to change. }}

.macro <identifier> [<parameters>]

The .macro directive starts the a macro definition named by the specified identifier. The optional parameter list is a comma-separated list of identifiers, enclosed in parentheses. See .endm for ending the macro definition.

.endm

Closes a macro definition.

.macro_const <identifier> (<literal>|<reg>)

 .macro_const   PI  3.14

The .macro_const directive is a special type of macro; it allows the user to use a symbolic name for a constant value. Like .macro, the substitution occurs at compile time. It takes two arguments (not comma separated), the first is an identifier, the second a constant value or a register.

The macro layer is completely implemented in the lexical analysis phase. The parser does not know anything about what happens in the lexical analysis phase.

When the .include directive is encountered, the specified file is opened and the following tokens that are requested by the parser are read from that file.

A macro expansion is a dot-prefixed identifier. For instance, if a macro was defined as shown below:

 .macro foo(bar)
 ...
 .endm

this macro can be expanded by writing .foo(42). The body of the macro will be inserted at the point where the macro expansion is written.

A .macro_const expansion is more or less the same as a .macro expansion, except that a constant expansion cannot take any arguments, and the substitution of a .macro_const contains no newlines, so it can be used within a line of code.

Macro parameter list

The parameter list for a macro is specified in parentheses after the name of the macro. Macro parameters are not typed.

 .macro foo(bar, baz, buz)
 ...
 .endm

The number of arguments in the call to a macro must match the number of parameters in the macro's parameter list. Macros do not perform multidispatch, so you can't have two macros with the same name but different parameters. Calling a macro with the wrong number of arguments gives the user an error.

If a macro defines no parameter list, parentheses are optional on both the definition and the call. This means that a macro defined as:

 .macro foo
 ...
 .endm

can be expanded by writing either .foo or .foo(). And a macro definition written as:

 .macro foo()
 ...
 .endm

can also be expanded by writing either .foo or .foo().

Note: IMCC requires you to write parentheses if the macro was declared with (empty) parentheses. Likewise, when no parentheses were written (implying an empty parameter list), no parentheses may be used in the expansion.

Heredoc arguments

Heredoc arguments are not allowed when expanding a macro. The next implementation of PIR ("PIRC") will be able to handle this correctly. This means that, currently, when using IMCC, the following is not allowed:

   .macro foo(bar)
   ...
   .endm

   .foo(<<'EOS')
 This is a heredoc
    string.

 EOS

Using braces, { }, allows you to span multiple lines for an argument. See runtime/parrot/include/hllmacros.pir for examples and possible usage. A simple example is this:

 .macro foo(a,b)
   .a
   .b
 .endm

 .sub main
   .foo({ print "1"
          print "2"
        }, {
          print "3"
          print "4"
        })
 .end

This will expand the macro foo, after which the input to the PIR parser is:

 .sub main
   print "1"
   print "2"
   print "3"
   print "4"
 .end

which will result in the output:

Unique local labels

Within the macro body, the user can declare a unique label identifier using the value of a macro parameter, like so:

  .macro foo(a)
  ...
 .label $a:
  ...
  .endm

Unique local variables

Note: this is not yet implemented in IMCC.

Within the macro body, the user can declare a local variable with a unique name.

  .macro foo()
  ...
  .macro_local int b
  ...
  .b = 42
  print .b # prints the value of the unique variable (42)
  ...
  .endm

The .macro_local directive declares a local variable with a unique name in the macro. When the macro .foo() is called, the resulting code that is given to the parser will read as follows:

  .sub main
    .local int local__foo__b__2
    ...
    local__foo__b__2 = 42
    print local__foo__b__2

  .end

The user can also declare a local variable with a unique name set to the symbolic value of one of the macro parameters.

  .macro foo(b)
  ...
  .macro_local int $b
  ...
  .$b = 42
  print .$b # prints the value of the unique variable (42)
  print .b  # prints the value of parameter "b", which is
            # also the name of the variable.
  ...
  .endm

So, the special $ character indicates whether the symbol is interpreted as just the value of the parameter, or that the variable by that name is meant. Obviously, the value of b should be a string.

The automatic name munging on .macro_local variables allows for using multiple macros, like so:

  .macro foo(a)
  .macro_local int $a
  .endm

  .macro bar(b)
  .macro_local int $b
  .endm

  .sub main
    .foo("x")
    .bar("x")
  .end

This will result in code for the parser as follows:

  .sub main
    .local int local__foo__x__2
    .local int local__bar__x__4
  .end

Each expansion is associated with a unique number; for labels declared with .macro_label and locals declared with .macro_local expansions, this means that multiple expansions of a macro will not result in conflicting label or local names.

Ordinary local variables

Defining a non-unique variable can still be done, using the normal syntax:

  .macro foo(b)
  .local int b
  .macro_local int $b
  .endm

When invoking the macro foo as follows:

  .foo("x")

there will be two variables: b and x. When the macro is invoked twice:

  .sub main
    .foo("x")
    .foo("y")
  .end

the resulting code that is given to the parser will read as follows:

  .sub main
    .local int b
    .local int local__foo__x
    .local int b
    .local int local__foo__y
  .end

Obviously, this will result in an error, as the variable b is defined twice. If you intend the macro to create unique variables names, use .macro_local instead of .local to take advantage of the name munging.

EXAMPLES

Subroutine Definition

A simple subroutine, marked with :main, indicating it's the entry point in the file. Other sub flags include :load, :init, etc.

    .sub sub_label :main
      .param int a
      .param int b
      .param int c

      .begin_return
        .set_return xy
      .end_return

    .end

Subroutine Call

Invocation of a subroutine. In this case a continuation subroutine is created.

    .const "Sub" $P0 = "sub_label"
    $P1 = new 'Continuation'
    set_addr $P1, ret_addr
    # ...
    .local int x
    .local num y
    .local string z
    .begin_call
      .set_arg x
      .set_arg y
      .set_arg z
      .call $P0, $P1    # r = _sub_label(x, y, z)
  ret_addr:
      .local int r      # optional - new result var
      .get_result r
    .end_call

NCI Call

    load_lib $P0, "libname"
    dlfunc $P1, $P0, "funcname", "signature"
    # ...
    .begin_call
      .set_arg x
      .set_arg y
      .set_arg z
      .nci_call $P1 # r = funcname(x, y, z)
      .local int r  # optional - new result var
      .get_result r
    .end_call

Subroutine Call Syntactic Sugar

Below there are three different ways to invoke the subroutine sub_label. The first retrieves a single return value, the second retrieves 3 return values, whereas the last discards any return values.

  .local int r0, r1, r2
  r0 = sub_label($I0, $I1, $I2)
  (r0, r1, r2) = sub_label($I0, $I1, $I2)
  sub_label($I0, $I1, $I2)

This also works for NCI calls, as the subroutine PMC will be a NCI sub, and on invocation will do the Right Thing.

Instead of the label a subroutine object can be used too:

   get_global $P0, "sub_label"
   $P0(args)

Methods

  .namespace [ "Foo" ]

  .sub _sub_label :method [,Subpragma, ...]
    .param int a
    .param int b
    .param int c
    # ...
    self."_other_meth"()
    # ...
    .begin_return
    .set_return xy
    .end_return
    ...
  .end

The variable "self" automatically refers to the invocating object, if the subroutine declaration contains "method".

Calling Methods

The syntax is very similar to subroutine calls. The call is done with .meth_call which must immediately be preceded by the .invocant:

   .local int x, y, z
   .local pmc class, obj
   newclass class, "Foo"
   new obj, class
   .begin_call
   .set_arg x
   .set_arg y
   .set_arg z
   .invocant obj
   .meth_call "method" [, $P1 ] # r = obj."method"(x, y, z)
   .local int r  # optional - new result var
   .get_result r
   .end_call
   ...

The return continuation is optional. The method can be a string constant or a string variable.

Returning and Yielding

  .return ( a, b )      # return the values of a and b

  .return ()            # return no value

  .tailcall func_call()   # tail call function

  .tailcall o."meth"()    # tail method call

Similarly, one can yield using the .yield directive

  .yield ( a, b )      # yield with the values of a and b

  .yield ()            # yield with no value

IMPLEMENTATION

There are multiple implementations of PIR, each of which will meet this specification for the syntax. Currently there are the following implementations:

compilers/imcc

This is the current implementation being used in Parrot. Some of the specified syntactic constructs in this PDD are not implemented in IMCC; these constructs are marked with notes saying so.

compilers/pirc

This is a new implementation which will fix several of IMCC's shortcomings. It will replace IMCC in the not too distant future.

languages/PIR

This is a PGE-based implementation, but needs to be updated and completed.

ATTACHMENTS

N/A

FOOTNOTES

N/A

REFERENCES

N/A

parrotcode: Parrot Intermediate Representation
Contents \| Documentation