IMCC - operation


0.1 initial

0.2 uninitialized warning; optimizations

0.3 merged parsing.pod into this file.


This document describes the principles of IMCC operation.


The main features of imcc are:

Source file parsing

Register allocation


Code generation

Running the code


Overview ^

IMCC parses and generates code in terms of compilation units. These are self-contained blocks of code very similar to subroutines.

Code for a compilation unit is created as soon (or not earlier) as the end of the unit is reached.

{{ Is this true? one sub calling another not-yet compiled sub would not work in that case. }}

Symbols, constants and labels ^

Compilation units maintain their own symbol table containing local labels and variable symbols. This symbol table, hash, is not visible to code in different units.

If you need global variables, please use the get_{hll,root}_global opcodes.

Global labels and constants are kept in the global symbol table ghash. This allows for global constant folding beyond the scope of individual subroutines.

This also means that you currently can't use the same global symbol (e.g. subroutine name) in different namespaces. The following creates invalid code:

  .sub main

   .namespace ["main"]
  .sub main

Local labels in different compilation units with the same name are allowed, though assembling the generated PASM doesn't work. However, running this code inside imcc is ok. This will probably change in the future so that local labels are mangled to be unique.


Register allocation is done per compilation unit.

IMCC identifiers and temporary variables e.g. $I0 are assigned a physical parrot register depending on the life range of these variables. If the life range of one variable doesn't overlap the range of another variable, they might get the same parrot register. For instance:

   $I0 = 10
   $I1 = 20

will translate to

   set I0, 10
   set I0, 20

provided that $I0 is not used after these lines. In this case, the assignment to $I0 is redundant and will be optimized away if IMCC is run with optimization level -O2.

PASM registers keep their register. During the usage of a PASM register this register will be not get assigned to. Therefore, they should be used only when absolutely necessary, and you should try to avoid using them within long pieces of code.

Basic blocks ^

To determine the life range of variables, the code gets separated into pieces, called basic blocks. A basic block starts at a label, which can get jumped to, and ends at a branch instruction.

Call graph ^

All connections between the basic blocks are calculated. This allows for:

Loop detection ^

where the range and depth of loops is calculated.

Life analysis ^

Whenever an operand is marked as an OUT argument, this operand starts with a new value. This means that at this point the life range of the symbol ends and a new life range is started, which allows the allocation of a different register to the same variable or the same register to a different variable.

Variables used as IN parameters must keep their parrot register over their usage range.

When imcc detects a register usage, where the first operation is using (reading) a register (and warnings are enabled), imcc emits an appropriate message.

Consider these two code snippets (block numbers are attached):

  .sub main :main
 0      $I0 = 0     # initialized
 0      if $I0 goto l1
 1      $I1 = 1     # init in block 1
 1      goto l2
 2  l1:
 2      $I1 = 2     # init in block 2
 3  l2:
 3      print $I0
 3      print $I1   # all paths leading here do init
 3      print "\n"
 3      end


  .sub main :main
 0      $I0 = 0     # initialized
 0      if $I0 goto l1  # branch to bb 1 or 2
 1      $I1 = 1     # init only in block 1
 2  l1:
 2      print $I0
 2      print $I1   # no init in code path from block 0
 2      print "\n"
 2      end

The latter of these emits the warning:

  warning:imcc:propagate_need: '$I1' might be used \
  uninitialized in _main:7

Interference graph ^

Once the above information is calculated, the next step is to look at which variables interfere with which others. Non-interfering variables can be given the same parrot register.

Register allocation ^

imcc then starts allocating registers according to a variable's score. Variables deeply nested inside loops have the highest score and get a parrot register first. Variables with a long life range (i.e. with many interferences) get allocated last.

Optimization ^

Optimizations are only done when enabled with the -O switch. Please consult t/imcpasm/*.t for examples. They occur between various stages and may be repeatedly done: e.g. after converting a conditional branch to an absolute one, unreachable code will be removed then, which might cause unused labels ...


Constant substitution ^

Constant arguments to many ops are evaluated. Conditional branches with constant conditions are converted to unconditional branches. Integer arguments to float operations are converted to float operands.

If-branch optimization ^

A sequence of code:

    if cond, L1
    branch L2

will be converted to

    unless cond, L2

The same is done for other conditional branches gt, ge, eq and their reverse meanings.

Branches to branches ^

Unconditional branch sequences get optimized to jumps to the final label.

Unused labels ^

Unreferenced labels are deleted.

Dead code removal ^

Code not reachable after an unconditional branch instruction and basic blocks that are not entered from somewhere get removed.


Note: These are currently experimental and might not do the Right Thing.

Used LHS once ^

For a sequence of code

   $I0 = 10
   $I1 = 20

where $I0 is not used again, the first assignment will be tossed, resulting in code like:

   set I0, 20

Loop optimization ^

Instructions which are invariant to a loop are pulled out of the loop and inserted in front of the loop entry.

Code generation ^

imcc either generates PASM or else directly generates a PBC file for running with parrot.

Running code ^

Additionally the generated code can be run immediately inside imcc. All parrot runtime options like -j or -t are available.


imc.c, cfg.c, optimizer.c, pbc.c


Leopold Toetsch <>