Dynamic Opcodes
The smallest executable component is not the compilation unit or even the subroutine, but is actually the opcode. Opcodes in Parrot, like opcodes in other machines (both virtual and physical), are individual instructions that implement low-level operations in the machine. In the world of microprocessors, the word "opcode" typically refers to the numeric identifier for each instructions. The human-readable word used in the associated assembly language is called the "mnemonic". An assembler, among other tasks, is responsible for converting mnemonics into opcodes for execution. In Parrot, instead of referring to an instruction by different names depending on what form it's in, we just call them all "opcodes". Of course the list of things that qualify as "low-level" in Parrot can be pretty advanced compared to the functionality supplied by regular assembly language opcodes.
Opcodes
Opcodes are the smallest logical execution element in Parrot. An individual opcode corresponds, in an abstract kind of way, with a single machine code instruction for a particular hardware processor architecture. The difference is that Parrot's opcodes can perform some very complex and high-level tasks that each may take many execution cycles for the average hardware processor. Also, Parrot's opcodes can be dynamically loaded in from a special library file called a dynop library. We'll talk about dynops a little bit later.
Opcode naming
To the PIR and PASM programmers, opcodes appear to be polymorphic. That is, some opcodes appear to have multiple argument formats. This is just an illusion, however. Parrot opcodes are not polymorphic, although certain features enable it to appear that way. Different argument list formats are detected during parsing and translated into separate, and unique, opcode names.
Opcode Multiple Dispatch
Writing Opcodes
Writing Opcodes,
like writing PMCs,
is done in a C-like language which is later compiled into C code by the opcode compiler.
The opcode script represents a thin overlay on top of ordinary C code: All valid C code is valid opcode script.
There are a few neat additions that make writing opcodes easier.
This script is very similar to that used to define PMCs.
The INTERP
constant,
for instance,
is always available in the opcodes like they are in VTABLE and METHOD declarations.
Unlike VTABLEs and METHODs,
opcodes are defined with the op
keyword.
Opcodes are written in files with the .ops
extension.
The core operation files are stored in the src/ops/
directory.
Opcode Parameters
Each opcode can take any fixed number of input and output arguments. These arguments can be any of the four primary data types--INTVALs, PMCs, NUMBERS and STRINGs--but can also be one of several other types of values including LABELs, KEYs and INTKEYs.
Each parameter can be an input,
an output or both,
using the in
,
out
,
and inout
keywords respectively.
Here is an example:
op Foo (out INT, in NUM)
This opcode could be called like this:
$I0 = Foo $N0 # in PIR syntax Foo $I0, $N0 # in PASM syntax
When Parrot parses through the file and sees the Foo
operation, it converts it to the real name Foo_i_n
. The real name of an opcode is its name followed by an underscore-separated ordered list of the parameters to that opcode. This is how Parrot appears to use polymorphism: It translates the overloaded opcode common names into longer unique names depending on the parameter list of that opcode. Here is a list of some of the variants of the add
opcode:
add_i_i # $I0 += $I1 add_n_n # $N0 += $N1 add_p_p # $P0 += $P1 add_i_i_i # $I0 = $I1 + $I2 add_p_p_i # $P0 = $P1 + $I0 add_p_p_n # $P0 = $P1 + $N0
This isn't a complete list, but you should get the picture. Each different combination of parameters translates to a different unique operation, and each operation is remarkably simple to implement. In some cases, Parrot can even use its multi-method dispatch system to call opcodes which are heavily overloaded, or for which there is no exact fit but the parameters could be coerced into different types to complete the operation. For instance, attempting to add a STRING to a PMC might coerce the string into a numerical type first, and then dispatch to the add_p_p_n
opcode. This is just an example, and the exact mechanisms may change as more opcodes are added or old ones are deleted.
Parameters can be one of the following types:
- INT
- NUM
- STR
- PMC
- KEY
- INTKEY
- LABEL
In addition to these types, you need to specify the direction that data is moving through that parameter:
- in
- out
- inout
- invar
Opcode Control Flow
Some opcodes have the ability to alter control flow of the program they are in. There are a number of control behaviors that can be implemented, such as an unconditional jump in the goto
opcode, or a subroutine call in the call
code, or the conditional behavior implemented by if
.
At the end of each opcode you can call a goto
operation to jump to the next opcode to execute. If no goto
is performed, control flow will continue like normal to the next operation in the program. In this way, opcodes can easily manipulate control flow. Opcode script provides a number of keywords to alter control flow:
- NEXT()
- ADDRESS()
If NEXT
contains the address of the next opcode in memory. You don't need to call goto NEXT()
, however, because the default behavior for all opcodes is to automatically jump to the next opcode in the program You can do this if you really want to, but it really wouldn't help you any. The NEXT
keyword is frequently used in places like the invoke
opcode to create a continuation to the next opcode to return to after the subroutine returns.
Jumps execution to the given address.
ADDRESS(x);
Here, x
should be an opcode_t *
value of the opcode to jump to.
Jumps to the address given as an offset from the current address.
OFFSET(x)
Here, x
is an offset in size_t
units that represents how far forward (positive) or how far backwards (negative) to jump to.
POP
pops the next opcode address off the control stack. To put an address onto the control stack, use the PUSH
keyword instead. PUSH
takes a single opcode_t *
argument to store, and POP
returns a single opcode_ *
value.
The Opcode Compiler
As we've seen in our discussions above, ops have a number of transformations to go through before they can be become C code and compiled into Parrot. The various special variables like $1
, INTERP
and ADDRESS
need to be converted to normal variable values. Also, each runcore requires the ops be compiled into various formats: The slow and fast cores need the ops to be compiled into individual subroutines. The switch core needs all the ops to be compiled into a single function using a large switch
statement. The computed goto cores require the ops be compiled into a large function with a large array of label addresses.
Parrot's opcode compiler is a tool that's tasked with taking raw opcode files with a .ops
extension and converting them into several different formats, all of which need to be syntactically correct C code for compilation.
Dynops
Parrot has about 1200 built-in opcodes. These represent operations which are sufficiently simple and fundamental, but at the same time are very common. However, these do not represent all the possible operations that some programmers are going to want to use. Of course, not all of those 1200 ops are unique, many of them are overloaded variants of one another. As an example there are about 36 variants of the set
opcode, to account for all the different types of values you may want to set to all the various kinds of registers. The number of unique operations therefore is much smaller then 1200.
This is where dynops come in. Dynops are dynamically-loadable libraries of ops that can be written and compiled separately from Parrot and loaded in at runtime. dynops, along with dynpmcs and runtime libraries are some of the primary ways that Parrot can be extended.
Parrot ships with a small number of example dynops libraries in the file "dynoplibs/" in src. These are small libraries of mostly nonsensical but demonstrative opcodes that can be used as an example to follow.
Dynops can be written in a .ops
file like the normal built-in ops are. The ops file should use #include "parrot/extend.h"
in addition to any other libraries the ops need. They can be compiled into C using the opcode compiler, then compiled into a shared library using a normal C compiler. Once compiled, the dynops can be loaded into Parrot using the .loadlib directive.