IMCC - syntax
- 0.1 initial
This document describes the IMCC syntax.
Comments start with # and last until the following newline.
These and empty lines are ignored.
A valid imcc program consists of a sequence of statements.
A statement is terminated by a newline (<NL>).
[label:] [instruction] <NL>
Optional label for the given instruction, can stand on its own line. Global labels start with an underscore, local labels shouldn't. A label must conform to the syntax of identifier described below.
- <identifier>
- Start with a letter or underscore, then may contain additionally digits and ::.
- Example:
a
_a
A42
a::b_c
- <type>
- int, float, string, pmc or a valid parrot PMC type like Array.
- <reg>
- A PASM register In, Sn, Nn, Pn, or a IMCC temporary register $In, $Sn, $Nn, $Pn, where n consists of digit(s) only.
- <var>
- A local identifier or a reg or a constant (when allowed).
- 'char constant'
- Are delimited by '. They are taken to be
ascii
encoded. No escape sequences are processed.
- "string constants"
- Are delimited by ". A " inside a string must be escaped by \". Only 7-bit ASCII is accepted in string constants; to use characters outside thar range, specify an encoding in the way below.
- <<"heredoc", <<'heredoc'
- Heredocs work like single or double quoted strings. All lines up to the terminating delimiter is slurped into the string. The delimiter has to be on its own line with no trailing whitespace.
$S0 = <<'EOT'
...
EOT
function(<<"END_OF_HERE", arg)
...
END_OF_HERE
- Only one heredoc can be active per statement line.
- charset:"string constant"
- Like above with a chracter set attached to the string. Valid character sets are currently:
ascii
(the default), binary
, unicode
(with UTF-8 as the default encoding), and iso-8859-1
.
Inside double-quoted strings the following escape sequences are processed.
\xhh 1..2 hex digits
\ooo 1..3 oct digits
\cX control char X
\x{h..h} 1..8 hex digits
\uhhhh 4 hex digits
\Uhhhhhhhh 8 hex digits
\a, \b, \t, \n, \v, \f, \r, \e, \\
- encoding:charset:"string constant"
- Like above with an extra encoding attached to the string. For eample:
set S0, utf8:unicode:"«"
- The encoding and charset gets attached to the string, no further processing is done, specifically escape sequences are not honored.
- numeric constants
- 0x and 0b denote hex and binary constants.
- .pragma n_operators
- Convert arithmethic infix operators to n_infix operations. The unary opcodes
abs
, not
, bnot
, bnots
, and neg
are also changed to use a n_ prefix.
.pragma n_operators 1
.sub foo
...
$P0 = $P1 + $P2 # n_add $P0, $P1, $P2
$P2 = abs $P0 # n_abs $P2, $P0
- .loadlib "lib_name"
- Load the given library at compile time, that is, as soon that line is parsed. See also the
loadlib
opcode, which does the same at run time.
- A library loaded this way is also available at runtime, as if it has been loaded again in
:load
, so there is no need to call loadlib
at runtime.
- .HLL "hll_name", "hll_lib"
- Define the HLL for the current file. If the string
hll_lib
isn't empty this compile time pragma also loads the shared lib for the HLL, so that integer type constants are working for creating new PMCs.
- .HLL_map .CoreType, .UserType
- Whenever Parrot has to create PMCs inside C code on behalf of the running user program it consults the current type mapping for the executing HLL and creates a PMC of type .UserType instead of .CoreType, if such a mapping is defined.
- E.g. with this code snippet ...
.loadlib 'dynlexpad'
.HLL "Foo", ""
.HLL_map .LexPad, .DynLexPad
.sub main :main
...
- ... all subroutines for language Foo would use a dynamic lexpad pmc.
- .sub <identifier> [:<flag> ...]
- .end
- Define a compilation unit with the label identifier:. See PIR Calling Conventions for available flags.
- .emit
- .eom
- Define a compilation unit containing PASM code.
- .local <type> <identifier> [:unique_reg]
- .sym <type> <identifier> [:unique_reg]
- Define a local name identifier for this compilation unit and of the given type. You can define multiple identifiers of the same type by separating them with commas:
.sym int i, j
- The optional
:unique_reg
modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit.
- .lex <identifier>, <reg>
- Declare a lexical variable that is an alias for a PMC register. The PIR compiler calls this method in response to a .lex STRING, PREG directive. For example, given this preamble:
.lex "$a", $P0
$P1 = new Integer
These two opcodes have an identical effect:
$P0 = $P1
store_lex "$a", $P1
And these two opcodes also have an identical effect:
$P1 = $P0
$P1 = find_lex "$a"
- .const <type> <identifier> = <const>
- .globalconst <type> <identifier> = <const>
- Define a named constant of style type and value const restricted to one sub or globally. If type denotes a PMC type, const must be a string constant.
- .namespace <identifier>
- Open a new scope block. This "namespace" is not the same as the .namespace [ <identifier> ] syntax, which is used for storing subroutines in a particular namespace in the global symboltable. This directive is useful in cases such as (pseudocode):
local x = 1;
print(x); # prints 1
do # open a new namespace/scope block
local x = 2; # this x hides the previous x
print(x); # prints 2
end # close the current namespace
print(x); # prints 1 again
- All types of common language constructs such as if, for, while, repeat and such that have nested scopes, can use this directive.
- .endnamespace <identifier>
- Closes the scope block that was opened with .namespace <identifier>.
- .namespace [ <identifier> ]
- .namespace [ <identifier> ; <identifier> ]
- Defines the namespace from this point onwards. By default the program is not in any namespace. If you specify more than one, separated by semicolons, it creates nested namespaces, by storing the inner namespace object with a
\0
prefix in the outer namespace's global pad.
- .pcc_*
- Directives used for Parrot Calling Conventions.
- .param <type> <identifier> [:<flag> ...]
- At the top of a subroutine, declare a local variable, in the mannter of .local, into which parameter(s) of the current subroutine should be stored. Available flags:
:slurpy
, :optional
, :opt_flag
and :unique_reg
.
- .param <reg> [:<flag> ...]
- At the top of a subroutine, specify where parameter(s) of the current subroutine should be stored. Available flags:
:slurpy
, :optional
, :opt_flag
and :unique_reg
.
- .return <var> [:<flag> ...]
- Between .pcc_begin_return and .pcc_end_return, specify one or more of the return value(s) of the current subroutine. Available flags:
:flat
.
- .arg <var> [:<flag> ...]
- Between .pcc_begin and .pcc_call, specify an argument to be passed. Available flags:
:flat
.
- .result <var> [:<flag> ...]
- Between .pcc_call and .pcc_end, specify where one or more return value(s) should be stored. Available flags:
:slurpy
, :optional
, and :opt_count
.
- ([<var> [:<flag> ...], ...]) = <var>([arg [:<flag> ...], ...])
- <var> = <var>([arg [:<flag> ...], ...])
- <var>([arg [:<flag> ...], ...])
- <var>."_method"([arg [:<flag> ...], ...])
- <var>._method([arg [:<flag> ...], ...])
- Function or method call. These notations are shorthand for a longer PCC function call with .pcc_* directives. var can denote a global subroutine, a local identifier or a reg.
- .return ([<var> [:<flag> ...], ...])
- Return from the current compilation unit with zero or more values.
- The surrounded parentheses are mandatory. Besides making sequence break more conspicuous, this is necessary to distinguish this syntax from other uses of the .return directive that will be probably deprecated.
- .return <var>(args)
- .return <var>."somemethod"(args)
- .return <var>.somemethod(args)
- Tail call: call a function or method and return from the sub with the function or method call return values.
- Internally, the call stack doesn't increase because of a tail call, so you can write recursive functions and not have stack overflows.
See PDD03 for a description of the meaning of the flag bits SLURPY
, OPTIONAL
, OPT_FLAG
, and FLAT
, which correspond to the calling convention flags :slurpy
, :optional
, :opt_flag
, and :flat
.
[TODO - once these flag bits are solidified by long-term use, then we may choose to copy appropriate bits of the documentation to here.]
Instructions may be a valid PASM instruction or anything listed here below:
- goto <identifier>
- branch <identifier>.
- if <var> goto <identifier>
- unless <var> goto <identifier>
- Translate to if x, identifier or unless ...
- if null <var> goto <identifier>
- unless null <var> goto <identifier>
- Translate to if_null x, identifier or unless_null ...
- if <var> <relop> <var> goto <identifier>
- The relop <, <=, ==, != >= > translate to the PASM opcodes lt, le, eq, ne, ge or gt var, var, identifier.
- unless <var> <relop> <var> goto <identifier>
- Like above, but branch if condition isn't met.
- <var> = <var>
- set var, var
- <var> = <unary> <var>
- The unarys !, - and ~ generate not, neg and bnot ops.
- <var> = <var> <binary> <var>
- The binarys +, -, *, /, % and ** generate add, sub, mul, div, mod and pow arithmetic ops. binary . is concat and valid for string arguments.
- << and >> are arithmetic shifts shl and shr. >>> is the logical shift lsr.
- &&, || and ~~ are logic and, or and xor.
- &, | and ~ are binary band, bor and bxor.
- <var> = <var> [ <var> ]
- This generates either a keyed set operation or substr var, var, var, 1 for string arguments and an integer key.
- <var> [ <var> ] = <var>
- A keyed set operation or the assign substr op with a length of 1.
- <var> = new <type>
- new var, .type
- <var> = new <type>, <var>
- new var, .type, var
- <var> = defined <var>
- defined var, var
- <var> = defined <var> [ <var> ]
- defined var, var[var] the keyed op.
- global "string" = <var>
- store_global "string", var
- <var> = global "string"
- find_global var, "string"
- <var> = clone <var>
- clone var, var
- <var> = addr <var>
- set_addr var, var
- <var> = null
- null <var>
parsing.pod, calling_conventions.pod
imcc.l, imcc.y
Leopold Toetsch <lt@toetsch.at>