parrotcode: Parrot Intermediate Representation | |
Contents | Documentation |
docs/pdds/pdd19_pir.pod - Parrot Intermediate Representation
This document outlines the architecture and core syntax of the Parrot Intermediate Representation (PIR).
This document describes PIR, a stable, middle-level language for both compiler and human to target on.
$Revision$
PIR is a stable, middle-level language intended both as a target for the generated output from high-level language compilers, and for human use developing core features and extensions for Parrot.
A valid PIR program consists of a sequence of statements, directives, comments and empty lines.
A statement starts with an optional label, contains an instruction, and is terminated by a newline (<NL>). Each statement must be on its own line.
[label:] [instruction] <NL>
An instruction may be either a low-level opcode or a higher-level PIR operation, such as a subroutine call, a method call, a directive, or PIR syntactic sugar.
A directive provides information for the PIR compiler that is outside the normal flow of executable statements. Directives are all prefixed with a ".", as in .local
or .sub
.
Comments start with #
and last until the following newline. PIR also allows comments in Pod format. Comments, Pod content, and empty lines are ignored.
Identifiers start with a letter or underscore, then may contain additionally letters, digits, and underscores. Identifiers don't have any limit on length at the moment, but some sane-but-generous length limit may be imposed in the future (256 chars, 1024 chars?). The following examples are all valid identifiers.
a
_a
A42
Opcode names are not reserved words in PIR, and may be used as variable names. For example, you can define a local variable named print
. [See #24251.]
NOTE: The use of ::
in identifiers is deprecated.
A label declaration consists of a label name followed by a colon. A label name conforms to the standard requirements for identifiers. A label declaration may occur at the start of a statement, or stand alone on a line, but always within a compilation unit.
A reference to a label consists of only the label name, and is generally used as an argument to an instruction or directive.
A PIR label is accessible only in the compilation unit where it's defined. A label name must be unique within a compilation unit, but it can be reused in other compilation units.
goto label1
...
label1:
There are three ways of referencing Parrot's registers. The first is direct access to a specific register by name In, Sn, Nn, Pn. The second is through a temporary register variable $In, $Sn, $Nn, $Pn. n consists of digit(s) only. There is no limit on the size of n.
The third syntax for accessing registers is through named local variables declared with .local
.
.local pmc foo
The type of a named variable can be int
, num
, string
or pmc
, corresponding to the types of registers. No other types are used. [See RT#42769]
The difference between direct register access and register variables or local variables is largely a matter of allocation. If you directly reference P99
, Parrot will blindly allocate 100 registers for that compilation unit. If you reference $P99
or a named variable foo
, on the other hand, Parrot will intelligently allocate a literal register in the background. So, $P99
may be stored in P0
, if it is the only register in the compilation unit.
Constants may be used in place of registers or variables. A constant is not allowed on the left side of an assignment, or in any other context where the variable would be modified.
'
). They are taken to be ASCII encoded. No escape sequences are processed."
). A "
inside a string must be escaped by \
. Only 7-bit ASCII is accepted in string constants; to use characters outside that range, specify an encoding in the way below. $S0 = <<"EOS"
...
EOS
function(<<"END_OF_HERE", arg)
...
END_OF_HERE
.return(<<'EOS')
...
EOS
.yield(<<'EOS')
...
EOS
function(<<'INPUT', <<'OUTPUT', 'some test')
...
INPUT
...
OUTPUT
ascii
(the default), binary
, unicode
(with UTF-8 as the default encoding), and iso-8859-1
.Inside double-quoted strings the following escape sequences are processed.
\xhh 1..2 hex digits
\ooo 1..3 oct digits
\cX control char X
\x{h..h} 1..8 hex digits
\uhhhh 4 hex digits
\Uhhhhhhhh 8 hex digits
\a, \b, \t, \n, \v, \f, \r, \e, \\
set S0, utf8:unicode:"«"
0x
and 0b
denote hex and binary constants respectively. .local int i, j
:unique_reg
modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit. .sym <type> <identifier> [:unique_reg]
.local
. .lex "$a", $P0
$P1 = new 'Integer'
These two opcodes have an identical effect:
$P0 = $P1
store_lex "$a", $P1
And these two opcodes also have an identical effect:
$P1 = $P0
$P1 = find_lex "$a"
.const
above, but the defined constant is globally accessible. local x = 1;
print(x); # prints 1
do # open a new namespace/scope block
local x = 2; # this x hides the previous x
print(x); # prints 2
end # close the current namespace
print(x); # prints 1 again
.namespace
and .endnamespace
are deprecated. They were a hackish attempt at implementing scopes in Parrot, but didn't actually turn out to be useful.}}abs
, not
, bnot
, bnots
, and neg
are also changed to use a n_
prefix. .pragma n_operators 1
.sub foo
...
$P0 = $P1 + $P2 # n_add $P0, $P1, $P2
$P2 = abs $P0 # n_abs $P2, $P0
loadlib
opcode, which does the same at run time.:load
, so there is no need to call loadlib
at runtime. .loadlib 'dynlexpad'
.HLL "Foo", ""
.HLL_map 'LexPad', 'DynLexPad'
.sub main :main
...
.sub <identifier> [:<flag> ...]
.sub <quoted string> [:<flag> ...]
.end
..sub
..eom
..emit
.{{ DEPRECATED: the "pcc_" prefix. See #45925. }}
.begin_return
and .end_return
, specify one or more of the return value(s) of the current subroutine. Available flags: :flat
, :named
..begin_call
and .call
, specify an argument to be passed. Available flags: :flat
, :named
..call
and .end_call
, specify where one or more return value(s) should be stored. Available flags: :slurpy
, :named
, :optional
, and :opt_flag
..local
, into which parameter(s) of the current subroutine should be stored. Available flags: :slurpy
, :named
, :optional
, :opt_flag
and :unique_reg
. .param <type> <identifier> :named("<identifier>")
See PDD03 for a description of the meaning of the flag bits SLURPY
, OPTIONAL
, OPT_FLAG
, and FLAT
, which correspond to the calling convention flags :slurpy
, :optional
, :opt_flag
, and :flat
.
{{ TODO: once these flag bits are solidified by long-term use, then we may choose to copy appropriate bits of the documentation to here. }}
Any PASM opcode is a valid PIR instruction. In addition, PIR defines some syntactic shortcuts. These are provided for ease of use by humans producing and maintaing PIR code.
branch
to identifier (label or subroutine name). goto END
if var, identifier
.unless var, identifier
.if_null var, identifier
.unless_null var, identifier
.<, <=, ==, != >= >
which translate to the PASM opcodes lt
, le
, eq
, ne
, ge
or gt
. If var1 relop var2 evaluates as true, jump to the named identifier.<, <=, ==, != >= >
which translate to the PASM opcodes lt
, le
, eq
, ne
, ge
or gt
. Unless var1 relop var2 evaluates as true, jump to the named identifier.set var1, var2
.!
, -
and ~
generate not
, neg
and bnot
ops.+
, -
, *
, /
, %
and **
generate add
, sub
, mul
, div
, mod
and pow
arithmetic ops. binary .
is concat
and only valid for string arguments.<<
and >>
are arithmetic shifts shl
and shr
. >>>
is the logical shift lsr
.&&
, ||
and ~~
are logic and
, or
and xor
.&
, |
and ~
are binary band
, bor
and bxor
.<var1> = <var1> <op> <var2>
. Where op is called an assignment operator and can be any of the following binary operators described earlier: +
, -
, *
, /
, %
, .
, &
, |
, ~
, <<
, >>
or >>>
.set
operation or substr var, var, var, 1
for string arguments and an integer key...
notation in keys is deprecated, so this syntactic sugar for slices is also deprecated. See the (currently experimental) slice
opcode instead. }}key
is: <var1> .. <var2>
var1
and ending at var2
. .. <var2>
var2
. <var1> ..
var1
to the end of the array.set
operation.substr
op with a length of 1. }}=
, and all remaining arguments go after the opcode name. For example: new $P0, 'Type'
$P0 = new 'Type'
.begin_call
.arg <arg1> <flag2>
...
.call <var2>
.result <var1> <flag1>
...
.end_call
.return
directive that will be probably deprecated.This section describes the macro layer of the PIR language. The macro layer of the PIR compiler handles the following directives:
.include
The .include
directive takes a string argument that contains the name of the PIR file that is included.
.macro
The .macro
directive starts the definition of a macro.
.macro_const
The .macro_const
directive is a special type of macro; it allows the user to use a symbolic name for a constant value. Like .macro
, the substitution occurs at compile time.
{{ NOTE: .constant
is deprecated, replaced by .macro_const
. }}
The macro layer is completely implemented in the lexical analysis phase. The parser does not know anything about what happens in the lexical analysis phase.
When the .include
directive is encountered, the specified file is opened and the following tokens that are requested by the parser are read from that file.
A macro expansion is a dot-prefixed identifier. For instance, if a macro was defined as shown below:
.macro foo(bar)
...
.endm
this macro can be expanded by writing .foo(42)
. The body of the macro will be inserted at the point where the macro expansion is written.
A .macro_const
expansion is more or less the same as a .macro
expansion, except that a constant expansion cannot take any arguments, and the substitution of a .macro_const
contains no newlines, so it can be used within a line of code.
The parameter list for a macro is specified in parentheses after the name of the macro. Macro parameters are not typed.
.macro foo(bar, baz, buz)
...
.endm
The number of arguments in the call to a macro must match the number of parameters in the macro's parameter list. Macros do not perform multidispatch, so you can't have two macros with the same name but different parameters. Calling a macro with the wrong number of arguments gives the user an error.
If a macro defines no parameter list, parentheses are optional on both the definition and the call. This means that a macro defined as:
.macro foo
...
.endm
can be expanded by writing either .foo
or .foo()
. And a macro definition written as:
.macro foo()
...
.endm
can also be expanded by writing either .foo
or .foo()
.
{{ NOTE: this is a change from the current implementation, which requires the definition and call of a zero-parameter macro to match in the use of parentheses. }}
Heredoc arguments are not allowed when expanding a macro. This means that the following is not allowed:
.macro foo(bar)
...
.endm
.foo(<<'EOS')
This is a heredoc
string.
EOS
{{ NOTE: This is likely because the parsing of heredocs happens later than the preprocessing of macros. Might be nice if we could parse heredocs at the macro level, but not a high priority. }}
Within the macro body, the user can declare a unique label identifier using the value of a macro parameter, like so:
.macro foo(a)
...
.label $a:
...
.endm
{{ NOTE: Currently, IMCC still allows for writing .local
to declare a local label, but that is deprecated. Use .label
instead. }}
Within the macro body, the user can declare a local variable with a unique name.
.macro foo()
...
.macro_local int b
...
.b = 42
print .b # prints the value of the unique variable (42)
...
.endm
The .macro_local
directive declares a local variable with a unique name in the macro. When the macro .foo()
is called, the resulting code that is given to the parser will read as follows:
.sub main
.local int local__foo__b
...
local__foo__b = 42
print local__foo__b
.end
The user can also declare a local variable with a unique name set to the symbolic value of one of the macro parameters.
.macro foo(b)
...
.macro_local int $b
...
.$b = 42
print .$b # prints the value of the unique variable (42)
print .b # prints the value of parameter "b", which is
# also the name of the variable.
...
.endm
So, the special $
character indicates whether the symbol is interpreted as just the value of the parameter, or that the variable by that name is meant. Obviously, the value of b
should be a string.
The automatic name munging on .macro_local
variables allows for using multiple macros, like so:
.macro foo(a)
.macro_local int $a
.endm
.macro bar(b)
.macro_local int $b
.endm
.sub main
.foo("x")
.bar("x")
.end
This will result in code for the parser as follows:
.sub main
.local int local__foo__x
.local int local__bar__x
.end
{{ PROPOSAL: should .macro_local
also add a random value to the munged name, to allow multiple calls to the same macro from within the same compilation unit? May not be used often enough to be worth adding it. The same effect can be achieved by using a symbolic parameter name for the macro local, it's just slightly less convenient. }}
Defining a non-unique variable can still be done, using the normal syntax:
.macro foo(b)
.local int b
.macro_local int $b
.endm
When invoking the macro foo
as follows:
.foo("x")
there will be two variables: b
and x
. When the macro is invoked twice:
.sub main
.foo("x")
.foo("y")
.end
the resulting code that is given to the parser will read as follows:
.sub main
.local int b
.local int local__foo__x
.local int b
.local int local__foo__y
.end
Obviously, this will result in an error, as the variable b
is defined twice. If you intend the macro to create unique variables names, use .macro_local
instead of .local
to take advantage of the name munging.
The =
syntactic sugar in PIR, when used in the simple case of:
<var1> = <var2>
directly corresponds to the set
opcode. So, two low-level arguments (int, num, or string registers, variables, or constants) are a direct C assignment, or a C-level conversion (int cast, float cast, a string copy, or a call to one of the conversion functions like string_to_num
).
A PMC source with a low-level destination, calls the get_integer
, get_number
, or get_string
vtable function on the PMC. A low-level source with a PMC destination calls the set_integer_native
, set_number_native
, or set_string_native
vtable function on the PMC (assign to value semantics). Two PMC arguments are a direct C assignment (assign to container semantics).
For assign to value semantics for two PMC arguments use assign
, which calls the assign_pmc
vtable function.
{{ NOTE: response to the question:
<pmichaud> I don't think that 'morph' as a method call is a good idea
<pmichaud> we need something that says "assign to value" versus "assign to container"
<pmichaud> we can't eliminate the existing 'morph' opcode until we have a replacement
}}
N/A
N/A
See docs/imcc/macros.pod
|