Dynamic Opcodes

The smallest executable component is not the compilation unit or even the subroutine, but is actually the opcode. Opcodes in Parrot, like opcodes in other machines (both virtual and physical), are individual instructions that implement low-level operations in the machine. In the world of microprocessors, the word "opcode" typically refers to the numeric identifier for each instructions. The human-readable word used in the associated assembly language is called the "mnemonic". An assembler, among other tasks, is responsible for converting mnemonics into opcodes for execution. In Parrot, instead of referring to an instruction by different names depending on what form it's in, we just call them all "opcodes". Of course the list of things that qualify as "low-level" in Parrot can be pretty advanced compared to the functionality supplied by regular assembly language opcodes.

Opcodes

Opcodes are the smallest logical execution element in Parrot. An individual opcode corresponds, in an abstract kind of way, with a single machine code instruction for a particular hardware processor architecture. The difference is that Parrot's opcodes can perform some very complex and high-level tasks that each may take many execution cycles for the average hardware processor. Also, Parrot's opcodes can be dynamically loaded in from a special library file called a dynop library. We'll talk about dynops a little bit later.

Opcode naming

To the PIR and PASM programmers, opcodes appear to be polymorphic. That is, some opcodes appear to have multiple argument formats. This is just an illusion, however. Parrot opcodes are not polymorphic, although certain features enable it to appear that way. Different argument list formats are detected during parsing and translated into separate, and unique, opcode names.

Opcode Multiple Dispatch

Writing Opcodes

Writing Opcodes, like writing PMCs, is done in a C-like language which is later compiled into C code by the opcode compiler. The opcode script represents a thin overlay on top of ordinary C code: All valid C code is valid opcode script. There are a few neat additions that make writing opcodes easier. This script is very similar to that used to define PMCs. The INTERP constant, for instance, is always available in the opcodes like they are in VTABLE and METHOD declarations. Unlike VTABLEs and METHODs, opcodes are defined with the op keyword.

Opcodes are written in files with the .ops extension. The core operation files are stored in the src/ops/ directory.

Opcode Parameters

Each opcode can take any fixed number of input and output arguments. These arguments can be any of the four primary data types--INTVALs, PMCs, NUMBERS and STRINGs--but can also be one of several other types of values including LABELs, KEYs and INTKEYs.

Each parameter can be an input, an output or both, using the in, out, and inout keywords respectively. Here is an example:

  op Foo (out INT, in NUM)

This opcode could be called like this:

  $I0 = Foo $N0     # in PIR syntax
  Foo $I0, $N0      # in PASM syntax

When Parrot parses through the file and sees the Foo operation, it converts it to the real name Foo_i_n. The real name of an opcode is its name followed by an underscore-separated ordered list of the parameters to that opcode. This is how Parrot appears to use polymorphism: It translates the overloaded opcode common names into longer unique names depending on the parameter list of that opcode. Here is a list of some of the variants of the add opcode:

  add_i_i      # $I0 += $I1
  add_n_n      # $N0 += $N1
  add_p_p      # $P0 += $P1
  add_i_i_i    # $I0 = $I1 + $I2
  add_p_p_i    # $P0 = $P1 + $I0
  add_p_p_n    # $P0 = $P1 + $N0

This isn't a complete list, but you should get the picture. Each different combination of parameters translates to a different unique operation, and each operation is remarkably simple to implement. In some cases, Parrot can even use its multi-method dispatch system to call opcodes which are heavily overloaded, or for which there is no exact fit but the parameters could be coerced into different types to complete the operation. For instance, attempting to add a STRING to a PMC might coerce the string into a numerical type first, and then dispatch to the add_p_p_n opcode. This is just an example, and the exact mechanisms may change as more opcodes are added or old ones are deleted.

Parameters can be one of the following types:

INT
NUM
STR
PMC
KEY
INTKEY
LABEL

In addition to these types, you need to specify the direction that data is moving through that parameter:

in
out
inout
invar

Opcode Control Flow

Some opcodes have the ability to alter control flow of the program they are in. There are a number of control behaviors that can be implemented, such as an unconditional jump in the goto opcode, or a subroutine call in the call code, or the conditional behavior implemented by if.

At the end of each opcode you can call a goto operation to jump to the next opcode to execute. If no goto is performed, control flow will continue like normal to the next operation in the program. In this way, opcodes can easily manipulate control flow. Opcode script provides a number of keywords to alter control flow:

NEXT()

If NEXT contains the address of the next opcode in memory. You don't need to call goto NEXT(), however, because the default behavior for all opcodes is to automatically jump to the next opcode in the program You can do this if you really want to, but it really wouldn't help you any. The NEXT keyword is frequently used in places like the invoke opcode to create a continuation to the next opcode to return to after the subroutine returns.

ADDRESS()

Jumps execution to the given address.

  ADDRESS(x);

Here, x should be an opcode_t * value of the opcode to jump to.

OFFSET()

Jumps to the address given as an offset from the current address.

  OFFSET(x)

Here, x is an offset in size_t units that represents how far forward (positive) or how far backwards (negative) to jump to.

POP()

POP pops the next opcode address off the control stack. To put an address onto the control stack, use the PUSH keyword instead. PUSH takes a single opcode_t * argument to store, and POP returns a single opcode_ * value.

The Opcode Compiler

As we've seen in our discussions above, ops have a number of transformations to go through before they can be become C code and compiled into Parrot. The various special variables like $1, INTERP and ADDRESS need to be converted to normal variable values. Also, each runcore requires the ops be compiled into various formats: The slow and fast cores need the ops to be compiled into individual subroutines. The switch core needs all the ops to be compiled into a single function using a large switch statement. The computed goto cores require the ops be compiled into a large function with a large array of label addresses.

Parrot's opcode compiler is a tool that's tasked with taking raw opcode files with a .ops extension and converting them into several different formats, all of which need to be syntactically correct C code for compilation.

Dynops

Parrot has about 1200 built-in opcodes. These represent operations which are sufficiently simple and fundamental, but at the same time are very common. However, these do not represent all the possible operations that some programmers are going to want to use. Of course, not all of those 1200 ops are unique, many of them are overloaded variants of one another. As an example there are about 36 variants of the set opcode, to account for all the different types of values you may want to set to all the various kinds of registers. The number of unique operations therefore is much smaller then 1200.

This is where dynops come in. Dynops are dynamically-loadable libraries of ops that can be written and compiled separately from Parrot and loaded in at runtime. dynops, along with dynpmcs and runtime libraries are some of the primary ways that Parrot can be extended.

Parrot ships with a small number of example dynops libraries in the file "dynoplibs/" in src. These are small libraries of mostly nonsensical but demonstrative opcodes that can be used as an example to follow.

Dynops can be written in a .ops file like the normal built-in ops are. The ops file should use #include "parrot/extend.h" in addition to any other libraries the ops need. They can be compiled into C using the opcode compiler, then compiled into a shared library using a normal C compiler. Once compiled, the dynops can be loaded into Parrot using the .loadlib directive.