NAME

pirgrammar.pod - The Grammar of languages/PIR

DESCRIPTION

This document provides a more readable grammar of languages/PIR. The actual specification for PIR is a bit more complex. This grammar for humans does not contain error handling and other issues unimportant for this PIR reference.

STATUS

For a bugs and issues, see the section KNOWN ISSUES AND BUGS.

The grammar includes some constructs that are in the IMCC parser, but are not implemented. An example of this is the .global directive.

Please note that languages/PIR is not the official definition of the PIR language. The reference implementation of PIR is IMCC, located in parrot/compilers/IMCC. However, languages/PIR tries to be as close to IMCC as possible. IMCC's grammar could use some cleaning up; languages/PIR might be a basis to start with a clean reimplementation of PIR in C (using Lex/Yacc).

VERSION

0.1.4

LEXICAL CONVENTIONS

PIR Directives

PIR has a number of directives. All directives start with a dot. Macro identifiers (when using a macro, on expansion) also start with a dot (see below). Therefore, it is important not to use any of the PIR directives as a macro identifier. The PIR directives are:

  .arg            .invocant          .pcc_call
  .const          .lex               .pcc_end_return
  .emit           .line              .pcc_end_yield
  .end            .loadlib           .pcc_end
  .endnamespace   .local             .pcc_sub
  .eom            .meth_call         .pragma
  .get_results    .namespace         .return
  .global         .nci_call          .result
  .globalconst    .param             .sub
  .HLL_map        .pcc_begin_return  .sym
  .HLL            .pcc_begin_yield   .yield
  .include        .pcc_begin

Registers

PIR has two types of registers: real registers and symbolic or temporary (or virtual if you like) registers. Real registers are actual registers in the Parrot VM. The symbolic, or temporary registers are mapped to those actual registers. Real registers are written like:

  [S|N|I|P]n, where n is a positive integer.

whereas symbolic registers have a $ prefix, like this: $P10.

Symbolic registers can be thought of local variable identifiers that don't need a declaration. This prevents you from writing .local directives if you're in a hurry. Of course, it would make the code more self-documenting if .locals would be used.

Constants

An integer constant is a string of one or more digits. Examples: 0, 42.

A floatin-point constant is a string of one or more digits, followed by a dot and one or more digits. Examples: 1.1, 42.567

A string constant is a single or double quoted series of characters. Examples: 'hello world', "Parrot".

TODO: PMC constants.

Identifiers

An identifier starts with a character from [_a-zA-Z], followed by zero or more characters from [_a-zA-Z0-9].

Examples: x, x1, _foo

Labels

A label is an identifier with a colon attached to it.

Examples: LABEL:

Macro identifiers

A macro identifier is an identifier prefixed with an dot. A macro identifier is used when expanding the macro (on usage), not in the macro definition.

Examples: .myMacro

GRAMMAR RULES

Compilation Units

A PIR program consists of one or more compilation units. A compilation unit is a global, sub, constant or macro definition, or a pragma or emit block. PIR is a line oriented language, which means that each statement ends in a newline (indicated as "nl"). Moreover, compilation units are always separated by a newline. Each of the different compilation units are discussed in this document.

  program:
    compilation_unit [ nl compilation_unit ]*

  compilation_unit:
      global_def
    | sub_def
    | const_def
    | expansion
    | pragma
    | emit

Subroutine definitions

  sub_def:
    [ ".sub" | ".pcc_sub" ] sub_id sub_pragmas nl body

  sub_id:
    identifier | string_constant

  sub_pragmas:
    sub_pragma [ ","? sub_pragma ]*


  sub_pragma:
      ":load"
    | ":init"
    | ":immediate"
    | ":postcomp"
    | ":main"
    | ":anon"
    | ":lex"
    | wrap_pragma
    | vtable_pragma
    | multi_pragma
    | outer_pragma

  wrap_pragma:
    ":wrap" parenthesized_string

  vtable_pragma:
    ":vtable" parenthesized_string?

  parenthesized_string:
    "(" string_constant ")"

  multi_pragma:
    ":multi" "(" multi_types? ")"

  outer_pragma:
    ":outer" "(" sub_id ")"

  multi_tyes:
    multi_type [ "," multi_type ]*

  multi_type:
      type
    | "_"
    | keylist
    | identifier
    | string_constant

  body:
    param_decl*
    labeled_pir_instr*
    ".end"

  param_decl:
    ".param"  [ [ type identifier ] | register ] [ get_flags | ":unique_reg" ]* nl

  get_flags:
    [ ":slurpy"
    | ":optional"
    | ":opt_flag"
    | named_flag
    ]+

  named_flag:
    ":named" parenthesized_string?

Examples subroutine

The simplest example for a subroutine definition looks like:

  .sub foo
  # PIR instructions go here
  .end

The body of the subroutine can contain PIR instructions. The subroutine can be given one or more flags, indicating the sub should behave in a special way. Below is a list of these flags and their meaning. The flag :unique_reg is discussed in the section defining local declarations.

 :load

Run this subroutine during the load_library opcode. :load is ignored, if another subroutine in that file is marked with :main. If multiple subs have the :load pragma, the subs are run in source code order.

 :init

Run the subroutine when the program is run directly (that is, not loaded as a module). This is different from :load, which runs a subroutine when a library is being loaded. To get both behaviours, use :init :load.

 :postcomp

Same as :immediate.

 :immediate

This subroutine is executed immediately after being compiled. (Analagous to BEGIN in perl5.)

 :main

Indicates that the sub being defined is the entry point of the program. It can be compared to the main function in C.

 :method

Indicates the sub being defined is an instance method. The method belongs to the class whose namespace is currently active. (so, to define a method for a class 'Foo', the 'Foo' namespace should be currently active). In the method body, the object PMC can be referred to with self.

 :vtable or vtable('x')

Indicates the sub being defined replaces a vtable entry. This flag can only be used when defining a method.

 :multi(type [, type]*)

Engage in multiple dispatch with the listed types.

 :outer('bar')

Indicates the sub being defined is lexically nested within the subroutine 'bar'.

 :anon

Do not install this subroutine in the namespace. Allows the subroutine name to be reused.

 :lex

Indicates the sub being defined needs to store lexical variables. This flag is not necessary if any lexical declarations are done (see below), the PIR compiler will figure this out by itself. The :lex attribute is necessary to tell Parrot the subroutine will store or find lexicals.

 :wrap('bar')

This flag is not (yet?) implemented in IMCC. It would indicate that this sub is wrapping the sub "bar". That means that when "bar" is invoked, this sub is called before and after. It is undecided yet whether this flag will be implemented. If so, its syntax may change.

The sub flags are listed after the sub name. They may be separated by a comma, but this is not necessary. The subroutine name can also be a string instead of a bareword, as is shown in this example:

  .sub 'foo' :load, :init :anon
  # PIR body
  .end

Parameter definitions have the following syntax:

  .sub main
    .param int argc :optional
    .param int has_argc :optional
    .param num nParam
    .param pmc argv :slurpy
    .param string sParam :named('foo')
    .param $P0 :named('bar')
    # body
  .end

As shown, parameter definitions may take flags as well. These flags are listed here:

 :slurpy

The parameter should be of type pmc and acts like a container that slurps up all remaining arguments. Details can be found in PDD03 - Parrot Calling Conventions.

 :named('x')

The parameter is known in the called sub by name 'x'. The :named flag can also be used without an identifier, in combination with the :flat or :slurpy flag, i.e. on a container holding several values:

  .param pmc args :slurpy :named

and

  .arg args :flat :named

 :optional

Indicates the parameter being defined is optional.

 :opt_flag

This flag can be given to a parameter defined after an optional parameter. During runtime, the parameter is automatically given a value, and is not passed by the caller. The value of this parameter indicates whether the previous (optional) parameter was present.

The correct order of the parameters depends on the flag they have.

PIR instructions

  labeled_pir_instr:
    label? instr nl

  labeled_pasm_instr:
    label? pasm_instr nl

  instr:
    pir_instr | pasm_instr

NOTE: the rule 'pasm_instr' is not included in this reference grammar. pasm_instr defines the syntax for pure PASM instructions.

  pir_instr:
      local_decl
    | lexical_decl
    | const_def
    | globalconst_def
    | conditional_stat
    | assignment_stat
    | open_namespace
    | close_namespace
    | return_stat
    | sub_invocation
    | macro_invocation
    | jump_stat
    | source_info

Local declarations

  local_decl:
    [ ".local" | ".sym" ] type local_id_list

  local_id_list:
    local_id [ "," local_id ]*

  local_id:
    identifier ":unique_reg"?

Examples local declarations

Local temporary variables can be declared by the directives .local or .sym. There is no difference between these directives, except within macro definitions. (See Macros).

  .local int i
  .local num a, b, c
  .sym string s1, s2
  .sym pmc obj

The optional :unique_reg modifier will force the register allocator to associate the identifier with a unique register for the duration of the compilation unit.

  .local int j :unique_reg

Lexical declarations

  lexical_decl:
    ".lex" string_constant "," target

Example lexical declarations

The declaration

  .lex 'i', $P0

indicates that the value in $P0 is stored as a lexical variable, named by 'i'. Once the above lexical declaration is written, and given the following statement:

  $P1 = new .Integer

then the following two statements have an identical effect:

  $P0 = $P1

  store_lex "i", $P1

Likewise, these two statements also have an identical effect:

  $P1 = $P0

  $P1 = find_lex "i"

Instead of a register, one can also specify a local variable, like so:

  .local pmc p
  .lex 'i', p

The same is true when a parameter should be stored as a lexical:

  .param pmc p
  .lex 'i', p

So, now it is also clear why .lex 'i', p is not a declaration of p: it needs a separate declaration, because it may either be a .local or a .param. The .lex directive merely is a shortcut for saving and retrieving lexical variables.

Global definitions

  global_def:
    ".global" identifier

Example global declarations

This syntax is defined in the parser of IMCC, but its functionality is not implemented. The goal is to allow for global definitions outside of subroutines. That way, the variable can be accessed by all subroutines without doing a global lookup. It is unclear whether this feature will be implemented.

An example is:

  .global my_global_var

Constant definitions

  const_def:
    ".const" type identifier "=" constant_expr

Example constant definitions

  .const int answer = 42

defines an integer constant by name 'answer', giving it a value of 42. Note that the constant type and the value type should match, i.e. you cannot assign a floating point number to an integer constant. The PIR parser will check for this.

Global constant definitions

  globalconst_def:
    ".globalconst" type identifier "=" constant_expr

Example global constant definitions

This directive is similar to const_def, except that once a global constant has been defined, it is accessible from all subroutines.

  .sub main :main
    .global const int answer = 42
    foo()
  .end

  .sub foo
    print answer # prints 42
  .end

Conditional statements

  conditional_stat:
      [ "if" | "unless" ]
    [ [ "null" target "goto" identifier ]
    | [ simple_expr [ relational_op simple_expr ]? ]
    ] "goto" identifier

Examples conditional statements

The syntax for if and unless statements is the same, except for the keyword itself. Therefore the examples will use either.

  if null $P0 goto L1

Checks whether $P0 is null, if it is, flow of control jumps to label L1

  unless $P0 goto L2
  unless x   goto L2
  unless 1.1 goto L2

Unless $P0, x or 1.1 are 'true', flow of control jumps to L2. When the argument is a PMC (like the first example), true-ness depends on the PMC itself. For instance, in some languages, the number 0 is defined as 'true', in others it is considered 'false' (like C).

  if x < y goto L1
  if y != z  goto L1

are examples that check for the logical expression after if. Any of the relational operators may be used here.

Branching statements

  jump_stat:
    "goto" identifier

Examples branching statements

  goto MyLabel

The program will continue running at label 'MyLabel:'.

Operators

  relational_op:
      "==" | "!=" | "<=" | "<" | <"=" | <""

  binary_op:
      "+"  | "-"   | "/"  | "**"
    | "*"  | "%"   | "<<" | <">>"
    | <">" | "&&"  | "||" | "~~"
    | "|"  | "&"   | "~"  | "."

  assign_op:
      "+=" | "-=" | "/=" | "%="  | "*="  | ".="
    | "&=" | "|=" | "~=" | "<<=" | <">=" | <">>="

  unary_op:
      "!" | "-" | "~"

Expressions

  expression:
      simple_expr
    | simple_expr binary_op simple_expr
    | unary_op simple_expr

  simple_expr:
      float_constant
    | int_constant
    | string_constant
    | target

Example expressions

  42
  42 + x
  1.1 / 0.1
  "hello" . "world"
  str1 . str2
  -100
  ~obj
  !isSomething

Arithmetic operators are only allowed on floating-point numbers and integer values (or variables of that type). Likewise, string concatenation (".") is only allowed on strings. These checks are not done by the PIR parser.

Assignments

  assignment_stat:
      target "=" short_sub_call
    | target "=" target keylist
    | target "=" expression
    | target "=" "new" [ int_constant | string_constant | macro_id ]
    | target "=" "new" keylist
    | target "=" "find_type" [ string_constant | string_reg | id ]
    | target "=" heredoc
    | target "=" "global" string_constant
    | target assign_op simple_expr
    | target keylist "=" simple_expr
    | "global" string_constant "=" target
    | result_var_list "=" short_sub_call

NOTE: the definition of assignment statements is not complete yet. As languages/PIR evolves, this will be completed.

  keylist:
    "[" keys "]"

  keys:
    key [ sep key ]*

  sep:
    "," | ";"

  key:
      simple_expr
    | simple_expr ".."
    | ".." simple_expr
    | simple_expr ".." simple_expr

  result_var_list:
    "(" result_vars ")"

  result_vars:
    result_var [ "," result_var ]*

  result_var:
    target get_flags?

Examples assignment statements

  $I1 = 1 + 2
  $I1 += 1
  $P0 = foo()
  $I0 = $P0[1]
  $I0 = $P0[12.34]
  $I0 = $P0["Hello"]
  $P0 = new 42 # but this is really not very clear, better use identifiers

  $S0 = <<'HELLO'
  ...
  HELLO

  $P0 = global "X"
  global "X" = $P0

  .local int a, b, c
  (a, b, c) = foo()

Heredoc

NOTE: the heredoc rules are not complete or tested. Some work is required here.

  heredoc:
    "<<" string_constant nl
    heredoc_string
    heredoc_label

  heredoc_label:
    ^^ identifier

  heredoc_string:
    [ \N | \n ]*

Example Heredoc

  .local string str
  str = <<'ENDOFSTRING'
    this text
         is stored in the
               variable
      named 'str'. Whitespace and newlines
    are                  stored as well.
  ENDOFSTRING

Note that the Heredoc identifier should be at the beginning of the line, no whitespace in front of it is allowed. Printing str would print:

    this text
         is stored in the
               variable
      named 'str'. Whitespace and newlines
    are                  stored as well.

In IMCC, a heredoc identifier can be specified as an argument, like this:

    foo(42, "hello", <<'EOS')

    This is a heredoc text argument.

  EOS

In IMCC, only one such argument can be specified. The languages/PIR implementation aims to allow for any number of heredoc arguments, like this:

    foo(<<'STR1', <<'STR2')

    argument 1
  STR1
    argument 2
  STR2

Currently, this is not working.

Invoking subroutines and methods

  sub_invocation:
    long_sub_call | short_sub_call

  long_sub_call:
    ".pcc_begin" nl
    arguments
    [ method_call | non_method_call] target nl
    [ local_decl nl ]*
    result_values
    ".pcc_end"

  non_method_call:
    ".pcc_call" | ".nci_call"

  method_call:
    ".invocant" target nl
    ".meth_call"

  parenthesized_args:
    "(" args ")"

  args:
    arg [ "," arg ]

  arg:
    [ float_constant
    | int_constant
    | string_constant [ "=>" target ]?
    | target
    ]
    set_flags?


  arguments:
    [ ".arg" simple_expr set_flags? nl ]*

  result_values:
    [ ".result" target get_flags? nl ]*

  set_flags:
    [ ":flat"
    | named_flag
    ]+

Example long subroutine call

The long subroutine call syntax is very suitable to be generated by a language compiler targeting Parrot. Its syntax is rather verbose, but easy to read. The minimal invocation looks like this:

  .pcc_begin
  .pcc_call $P0
  .pcc_end

Invoking instance methods is a simple variation:

  .pcc_begin
  .invocant $P0
  .meth_call $P1
  .pcc_end

Passing arguments and retrieving return values is done like this:

  .pcc_begin
  .arg 42
  .pcc_call $P0
  .local int res
  .result res
  .pcc_end

Arguments can take flags as well. The following argument flags are defined:

 :flat

Flatten the (aggregate) argument. This argument can only be of type pmc.

 :named('x')

Pass the denoted argument into the named parameter that is denoted by 'x', like so:

 .param int myX :named('x')   # the type 'int' is just an example

As was mentioned at the parameter declaration section, the :named section can be used on an aggregate value in combination with the :flat flag.

 .arg pmc myArgs :flat :named

  .local pmc arr
  arr = new .Array
  arr = 2
  arr[0] = 42
  arr[1] = 43
  .pcc_begin
  .arg arr :flat
  .arg $I0 :named('intArg')
  .pcc_call foo
  .pcc_end

The Native Calling Interface (NCI) allows for calling C routines, in order to talk to the world outside of Parrot. Its syntax is a slight variation; it uses .nci_call instead of .pcc_call.

  .pcc_begin
  .nci_call $P0
  .pcc_end

Short subroutine invocation

  short_sub_call:
    invocant? [ target | string_constant ] parenthesized_args

  invocant:
    [ target"." | target "->" ]

Example short subroutine call

The short subroutine call syntax is useful when manually writing PIR code. Its simplest form is:

  foo()

Or a method call:

  obj.'toString'() # call the method 'toString'
  obj.x() # call the method whose name is stored in 'x'.

Note that no spaces are allowed between the invocant and the dot; "obj . 'toString'" is not valid. IMCC also allows the "->" instead of a dot, to make it readable for C++ programmers:

  obj->'toString'()

And of course, using the short version, passing arguments can be done as well, including all flags that were defined for the long version. The same example from the 'long subroutine invocation' is now shown in its short version:

  .local pmc arr
  arr = new .Array
  arr = 2
  arr[0] = 42
  arr[1] = 43
  foo(arr :flat, $I0 :named('intArg'))

In order to do a Native Call Interface invocation, the subroutine to be invoked needs to be in referenced from a PMC register, as its name is not visible from Parrot. A NCI call looks like this:

  .local pmc nci_sub, nci_lib
  .local string c_function, signature

  nci_lib = loadlib "myLib"

  # name of the C function to be called
  c_function = "sayHello"

  # set signature to "void" (no arguments)
  signature  = "v"

  # get a PMC representing the C function
  nci_sub = dlfunc nci_lib, c_function, signature

  # and invoke
  nci_sub()

Return values from subroutines

  return_stat:
      long_return_stat
    | short_return_stat
    | long_yield_stat
    | short_yield_stat
    | tail_call

  long_return_stat:
    ".pcc_begin_return" nl
    return_directive*
    ".pcc_end_return"

  return_directive:
    ".return" simple_expr set_flags? nl

Example long return statement

Returning values from a subroutine is in fact similar to passing arguments to a subroutine. Therefore, the same flags can be used:

  .pcc_begin_return
  .return 42 :named('answer')
  .return $P0 :flat
  .pcc_end_return

In this example, the value 42 is passed into the return value that takes the named return value known by 'answer'. The aggregate value in $P0 is flattened, and each of its values is passed as a return value.

Short return statement

  short_return_stat:
    ".return" parenthesized_args

Example short return statement

  .return(myVar, "hello", 2.76, 3.14);

Just as the return values in the long return statement could take flags, the short return statement may as well:

  .return(42 :named('answer'), $P0 :flat)

Long yield statements

  long_yield_stat:
    ".pcc_begin_yield" nl
    return_directive*
    ".pcc_end_yield"

Example long yield statement

A yield statement works the same as a normal return value, except that the point where the subroutine was left is stored somewhere, so that the subroutine can be resumed from that point as soon as the subroutine is invoked again. Returning values is identical to normal return statements.

  .sub foo
    .pcc_begin_yield
    .return 42
    .pcc_end_yield

    # and later in the sub, one could return another value:

    .pcc_begin_yield
    .return 43
    .pcc_end_yield
  .end

  # when invoking twice:
  foo() # returns 42
  foo() # returns 43

NOTE: IMCC allows for writing:

  .pcc_begin_yield
  ...
  .pcc_end_return

which is of course not consistent; languages/PIR does not allow this.

Short yield statements

  short_yield_stat:
    ".yield" parenthesized_args

Example short yield statement

Again, the short version is identical to the short version of the return statement as well.

  .yield("hello", 42)

Tail calls

  tail_call:
    ".return" short_sub_call

Example tail call

  .return foo()

Returns the return values from foo. This is implemented by a tail call, which is more efficient than:

  .local pmc results = foo()
  .return(results)

The call to foo can be considered a normal function call with respect to parameters: it can take the exact same format using argument flags. The tail call can also be a method call, like so:

  .return obj.'foo'()

Symbol namespaces

  open_namespace:
    ".namespace" identifier

  close_namespace:
    ".endnamespace" identifier

Example open/close namespaces

  .sub main
    .local int x
    x = 42
    say x
    .namespace NESTED
    .local int x
    x = 43
    say x
    .endnamespace NESTED
    say x
  .end

Will print:

  42
  43
  42

Please note that it is not necessary to pair these statements; it is acceptable to open a .namespace without closing it. The scope of the .namespace is limited to the subroutine.

Emit blocks

  emit:
    ".emit" nl
    labeled_pasm_instr*
    ".eom"

Example Emit block

An emit block only allows PASM instructions, not PIR instructions.

  .emit
     set I0, 10
     new P0, .Integer
     ret
   _foo:
     print "This is PASM subroutine "foo"
     ret
  .eom

Expansions

  expansion:
      macro_def
    | include
    | pasm_constant


  include:
    ".include" string_constant

  pasm_constant:
    ".constant" identifier [ constant_value | register ]

Macros

  macro_def:
    ".macro" identifier macro_parameters? nl
    macro_body

  macro_parameters:
    "(" id_list? ")"

  macro_body:
    <labeled_pir_instr>*
    ".endm" nl

  macro_invocation:
    macro_id parenthesized_args?

Note that before a macro body will be parsed, some grammar rules will be changed. In a macro body, local variable declaration can only be done using the .sym directive. The .local directive is only available for declaring labels.

  macro_label:
    ".local" "$"identifier":"

Example Macros

When the following macro is defined:

  .macro add2(n)
    inc .n
    inc .n
  .endm

then one can write in a subroutine:

  .sub foo
    .local int myNum
    myNum = 42
    .add2(myNum)
    print myNum  # prints 44
  .end

PIR Pragmas

  pragma:
      new_operators
    | loadlib
    | namespace
    | hll_mapping
    | hll_specifier
    | source_info

  new_operators:
    ".pragma" "n_operators" int_constant

  loadlib:
    ".loadlib" string_constant

  namespace:
    ".namespace" [ "[" namespace_id "]" ]?

  hll_specifier:
    ".HLL" string_constant "," string_constant

  hll_mapping:
    ".HLL_map" int_constant "," int_constant

  namespace_id:
    string_constant [ ";" string_constant ]*

  source_info:
    ".line" int_constant [ "," string_constant ]?

  id_list:
    identifier [ "," identifier ]*

Examples pragmas

  .include "myLib.pir"

includes the source from the file "myLib.pir" at the point of this directive.

  .pragma n_operators 1

makes Parrot automatically create new PMCs when using arithmetic operators, like:

  $P1 = new .Integer
  $P2 = new .Integer
  $P1 = 42
  $P2 = 43
  $P0 = $P1 * $P2
  # now, $P0 is automatically assigned a newly created PMC.


  .line 100
  .line 100, "myfile.pir"

NOTE: currently, the line directive is implemented in IMCC as #line. See the PROPOSALS document for more information on this.

  .namespace ['Foo'] # namespace Foo
  .namespace ['Object';'Foo'] # nested namespace

  .namespace # no [ id ] means the root namespace is activated

opens the namespace 'Foo'. When doing Object Oriented programming, this would indicate that sub or method definitions belong to the class 'Foo'. Of course, you can also define namespaces without doing OO-programming.

Please note that this .namespace directive is different from the .namespace directive that is used within subroutines.

  .HLL "Lua", "lua_group"

is an example of specifying the High Level Language (HLL) for which the PIR is being generated. It is a shortcut for setting the namespace to 'Lua', and for loading the PMCs in the lua_group library.

  .HLL_map .Integer, .LuaNumber

is a way of telling Parrot, that whenever an Integer is created somewhere in the system (C code), instead a LuaNumber object is created.

  .loadlib "myLib"

is a shortcut for telling Parrot that the library "myLib" should be loaded when running the program. In fact, it is a shortcut for:

  .sub _load :load :anon
    loadlib "myLib"
  .end

TODO: check flags and syntax for this.

Tokens, types and targets

  string_constant:
    [ encoding_specifier? charset_specifier ]?  quoted_string

  encoding_specifier:
    "utf8:"

  charset_specifier:
      "ascii:"
    | "binary:"
    | "unicode:"
    | "iso-8859-1:"

  type:
      "int"
    | "num"
    | "pmc"
    | "object"
    | "string"
    | "Array"
    | "Hash"

  target:
    identifier | register

Notes on Tokens, types and targets

A string constant can be written like:

  "Hello world"

but if desirable, the character set can be specified:

  unicode:"Hello world"

When using the "unicode" character set, one can also specify an encoding specifier; currently only utf8 is allowed:

  utf8:unicode:"hello world"

IMCC currently allows identifiers to be used as types. During the parse, the identifier is checked whether it is a defined class. The built-in types int, num, pmc and string are always available.

A target is something that can be assigned to, it is an L-value (but of course may be read just like an R-value). It is either an identifier or a register.

AUTHOR

Klaas-Jan Stol [parrotcode@gmail.com]

KNOWN ISSUES AND BUGS

Some work should be done on:

Heredoc parsing

The rule 'type' does currently not include custom types (user defined). Probably it needs an alternative "identifier". Not sure yet at this point.

Clean up grammar, remove never-used features.

Test. A lot.

Bugs or improvements may be sent to the author, and are of course greatly appreciated. Moreover, if you find any missing constructs that are in IMCC, indications of these would be appreciated as well.

Please see the PROPOSALS document for some proposals of the author to clean up the official grammar of PIR (as defined by the IMCC compiler).

REFERENCES

languages/PIR/lib/pir.pg - The actual PIR grammar implementation

PDD03 - Parrot Calling Conventions

PDD20 - Lexically scoped variables in Parrot

docs/imcc/calling_conventions.pod - definition of sub flags (:init etc)

docs/imcc/syntax.pod - official syntax for IMCC/PIR.

CHANGES

0.1.4

Added expansion rule, moved include and macro_def rules to that rule. Added pasm_constant definition.

Removed newlines in operator definition to save some lines for readability.

0.1.3

Updated short sub invocation for NCI invocations.

Added an example for .globalconst.

Added some remarks at section for Macros.

Added some remarks here and there, and fixed some style issues.

0.1.2

Removed .immediate, it is :immediate, and thus not a PIR directive, but a flag. This was a mistake.

Added .globalconst

Added macro parsing example (it is now fixed in languages/PIR).

Added reference to official doc for IMCC syntax.

Added :unique_reg to allowed flags for incoming parameters.

0.1.1

Switch to x.y.z version number; many fixes will follow.

Added more examples.

Fixed some errors.

0.1

Initial version having a version number.

parrotcode: The Grammar of languages/PIR
Contents \| Language Implementations \| PIR