[DRAFT] PDD 6: Parrot Assembly Language (PASM)
Abstract
The format of Parrot's bytecode assembly language.
Description
Parrot's bytecode can be thought of as a form of machine language for a virtual super CISC machine. It makes sense, then, to define an assembly language for it for those people who may need to generate bytecode directly, rather than indirectly through a high-level language.
{{ NOTE: out-of-date and incomplete. It seems that it would be more useful as a specification of the format of PASM than as a comprehensive listing of all opcodes. }}
Questions
- <barney> Can we get rid of PASM ? <spinclad> conversely, does PASM need to be kept up to date? <allison> PASM is just a text form of PBC, so it should be kept <allison> are there specific PBC features that can't currently be represented in PASM? <particle> besides hll and :outer? <chromatic> :init <mdiep> lexicals? <chromatic> :vtable <mdiep> I'm a bit rusty, but anything that starts with a '.' or ':' is suspect <allison> things that start with '.' are just directives to IMCC, equally applicable to PASM and PIR <mdiep> isn't PASM separate from IMCC? <allison> mdiep: it used to be separate <mdiep> so to say that PASM can have directives is a major architectural change <allison> perhaps the biggest thing we need is a definition of what PASM actually is <allison> the line has grown quite fuzzy over the years <barney> PASM could be defined as stringified PBC <particle> compilable stringified pbc <mdiep> it should be defined that way if we're going to call it assembly. <allison> barney: that's the most likely direction, and if so, it has some implications for how PASM behaves <particle> allison: which is what we want, anyway, right? <allison> particle: yup <barney> yes <particle> good, looks like we're in agreement and headed in the proper direction on that topic.
Implementation
Parrot opcodes take the format of:
code destination[dest_key], source1[source1_key], source2[source2_key]
The brackets do not denote optional arguments as such--they are real brackets. They may be left out entirely, however. If any argument has a key the assembler will substitute the null key for arguments missing keys.
Conditional branches take the format:
code boolean[bool_key], true_dest
The key parameters are optional, and may be either an integer or a string. If either is passed they are associated with the parameter to their left, and are assumed to be either an array/list entry number, or a hash key. Any time a source or destination can be a PMC register, there may be a key.
Destinations for conditional branches are an integer offset from the current PC.
All registers have a type prefix of P, S, I, or N, for PMC, string, integer, and number respectively.
Assembly Syntax
All assembly opcodes contain only ASCII lowercase letters, digits, and the underscore.
Assembler directives are prefixed with a dot. These directives are instructions for the assembler and may or may not translate to a PASM instruction.
Labels all end with a colon. They may have ASCII letters, numbers, and underscores in them.
Namespaces are noted with the .namespace directive. It takes a single parameter, the name of the namespace, in the form of a multi-dimensional key.
Constants can be declared with the .macro_const directive. It takes two parameters: the name of the constant and the value.
Subroutine names are noted with the .pcc_sub directive. It takes a single parameter, the name of the subroutine, which is added to the namespace's symbol table. Sub names may be any valid Unicode alphanumeric character and the underscore. The .pcc_sub directive may take flags to indicate when the sub should be invoked. The following flags are available: :main to indicate that execution should start at the specified subroutine; :immediate or :postcomp to indicate that the sub should be run immediately after compilation; :load to indicate that the sub should be executed when its bytecode segment is loaded; :init to indicate the sub should be run when the file is run directly.
Constants don't need to be named and put in a separate section of the assembly source. The assembler will take care of putting them in the appropriate part of the generated bytecode.
Below is an overview of the grammar of a PASM file.
{{ See compilers/pirc/src for a bison-based implementation of PASM }}
 pasm_file:
   [ pasm_line '\n' ]*
 pasm_line:
     pasm_instruction
   | constant_directive
   | namespace_directive
 pasm_instruction:
   [ [ sub_directive ]? label ]? instruction
 sub_directive:
   ".pcc_sub" [ sub_flag ]?
 sub_flag:
   ":init" | ":main" | ":load" | ":postcomp" | ":immediate" | ":anon"
 label:
   identifier ":"
 constant_directive:
   ".macro_const" identifier literal
 namespace_directive:
   ".namespace" "[" multi_dimensional_key "]"
 multi_dimensional_key:
   quoted_string [ ";" quoted_string ]*
Opcode List
In the following list, there may be multiple (but unlisted) versions of an opcode. If an opcode takes a register that might be keyed, the keyed version of the opcode has a _k suffix. If an opcode might take multiple types of registers for a single parameter, the opcode function really has a _x suffix, where x is either P, S, I, or N, depending on whether a PMC, string, integer, or numeric register is involved. The suffix isn't necessary (though not an error) as the assembler can intuit the information from the code.
In those cases where an opcode can take several types of registers, and more than one of the sources or destinations are of variable type, then the register is passed in extended format. An extended format register number is of the form:
register_number | register_type
where register_type is 0x100, 0x200, 0x400, or 0x800 for PMC, string, integer, or number respectively. So N19 would be 0x413.
Note: Instructions tagged with a * will call a vtable method to handle the instruction if used on PMC registers.
In all cases, the letters x, y, and z refer to register numbers. The letter t refers to a generic register (P, S, I, or N). A lowercase p, s, i, or n means either a register or constant of the appropriate type (PMC, string, integer, or number)
Control flow
The control flow opcodes check conditions and manage program flow.
- if tx, ix Check register tx. If true, branch by X.
- unless tx, ix Check register tx. If false, branch by X.
- jump tx Jump to the address held in register x (Px, Sx, or Ix).
- branch tx Branch forward or backward by the amount in register x. (X may be either Ix, Nx, or Px) Branch offset may also be an integer constant.
- jsr tx Jump to the location specified by register X. Push the current location onto the call stack for later returning.
- bsr ix Branch to the location specified by X (either register or label). Push the current location onto the call stack for later returning.
- ret Pop the location off the top of the stack and go there.
Data manipulation
These ops handle manipulating the data in registers
- new Px, iy Create a new PMC of class y stored in PMC register x.
- destroy Px Destroy the PMC in register X, leaving it undef
- set tx, ty Copies y into x. Note that strings and PMCs are referred to by pointer, so if you do something like:
- exchange tx, ty Exchange the contents of registers X and Y, which must be of the same type. (Generally cheaper than using the stack as an intermediary when setting up registers for function calls)
- assign Px, ty Takes the contents of Y and assigns them into the existing PMC in X.While set just copies pointers from one register to another, assign actually does a value assignment, as:
- clone Px, Py
- clone Sx, xy Performs a "deeper" copy of y into x, using the vtable appropriate to the class of Py if cloning a PMC.
- tostring Sx, ty, Iz Take the value in register y and convert it to a string of type z, storing the result in string register x.
- add tx, ty, tz * Add registers y and z and store the result in register x. (x = y + z) The registers must all be the same type, PMC, integer, or number.
- sub tx, ty, tz * Subtract register z from register y and store the result in register x. (x = y - z) The registers must all be the same type, PMC, integer, or number.
- mul tx, ty, tz * Multiply register y by register z and store the results in register x. The registers must be the same type.
- div tx, ty, tz * Divide register y by register z, and store the result in register x.
- inc tx, nn * Increment register x by nn. nn is an integer constant. If nn is omitted, increment is 1.
- dec tx, nn * Decrement register x by nn. nn is an integer constant. If nn is omitted, decrement by 1.
- length Ix, Sy Put the length of string y into integer register x.
- concat Sx, Sy Add string y to the end of string x.
- repeat Sx, Sy, iz Copies string y z times into string x.
set S0, S1this will copy the pointer in S1 into S0, leaving both registers pointing at the same string.
$foo = $bar;X's assign vtable method is invoked and it does whatever is appropriate.
Transcendental operations
These opcodes handle the transcendental math functions. The destination register here must always be either a numeric or a PMC register.
- sin nx, ty Return the sine of the number in Y
- cos nx, ty Return the cosine of the number in Y
- tan nx, ty Return the tangent of the number in Y
- sec nx, ty Return the secant of the number in Y
- atan nx, ty Return the arctangent of Y
- atan2 nx, ty Return the result of atan2 of Y
- asin nx, ty Return the arcsine of y
- acos nx, ty Return the arccosine of y
- asec nx, ty Return the arcsecant of y
- cosh nx, ty Return the hyperbolic cosine of y
- sinh nx, ty Return the hyperbolic sine of y
- tanh nx, ty Return the hyperbolic tangent of y
- sech nx, ty Return the hyperbolic secant of y
- log2 nx, ty Return the base 2 log of y
- log10 nx, ty Return the base 10 log of y
- ln Nx, ty Return the base e log of y
- log nx, ty, tz Return the base Z log of Y
- pow nx, ty, tz Return Y to the Z power
- exp nx, ty Return e to the Y power
Register and stack ops
These opcodes deal with registers and stacks
- clearp Clean out the current set of PMC registers, setting them to NULL
- cleari Clean out the current set of I registers, setting them to 0
- clears Clean out the current set of S registers, setting them to NULL
- clearn Clean out the current set of N registers, setting them to 0
- null tx Set register X to a null value; for S and P registers, this will be NULL, while for I and N registers it is 0
- save tx Push register or constant X onto the generic stack
- restore tx Restore register X from the generic stack by popping off the topmost entry. The type of this entry must match the register type.
- entrytype Ix, iy Put the type of generic stack entry Y into integer register X
- depth Ix Get the current depth of the generic stack
- lookback tx, iy Fetch the entry that's at position Y from the top of the generic stack. This does not remove an entry from the stack, merely fetches the entry off it.0 is the entry at the top of the stack, 1 is the entry immediately previous to that, and so on. Entry -1 is the very bottom-most entry in the stack. (While the stack may be a tree when looked at from the bottom up, you don't have access to any other branches when looking this way).
Names, pads, and globals
These operations are responsible for finding names in lexical or global scopes, as well as storing data into those slots. A static scope is captured by a scratchpad. The current dynamic scope is represented by the state of the lexical stack (which contains scratchpads). For more detail on these ops see the inline POD documentation in ops/var.ops.
- store_lex sx, Py
- find_lex Px, sy Instructions for storing in, and retrieving from, the scratchpad associated with the current context.
- find_global Px, sy, sz Find the PMC for the global variable sy from the table sz and store it in register X{{ DEPRECATED: op find_global was deprecated }}
- find_global Px, sy Find the PMC for the global in the default table and put it in X.{{ DEPRECATED: op find_global was deprecated }}
- find_global_table Px, sy Find the global symbol table Y and store its PMC in X
- find_global_slot ix, Py, sz Find the slot in the global table Y for the global named Z, and store its slot in register X.
- fetch_global Px, Py, iz Fetch the global in slot Z of the symbol table pointed to by Y
- store_global Px, sy Store X in the default global symbol table with a name of Y.{{ DEPRECATED: op store_global was deprecated }}
Exceptions
These opcodes deal with exception handling at the lowest level. Exception handlers are dynamically scoped, and any exception handler set in a scope will be removed when that scope is exited.
- set_eh Px Sets an exception handler in place. The code referred to by register Px will get called if an exception is thrown while the exception handler is in scope.
- pop_eh Pop the most recently placed exception off the handler stack.
- throw Px Throw an exception represented by the object in PMC register x.
- rethrow Px Only valid inside an exception handler. Rethrow the exception represented by the object in PMC register x. This object may have been altered by the exception handler.
Object things
These opcodes deal with PMCs as objects, rather than as opaque data items.
- find_method Px, Py, tz Find the method Z for object Y, and return a PMC for it in X.
- callmethod Px, ty
- set_attribute Px, ty, tz
- can Ix, Py, sz Sets X to TRUE if object Y can perform method Z; otherwise, X is set to FALSE.
- does Ix, Py, sz Sets X to TRUE if object Y can implements interface Z; otherwise, X is set to FALSE.
- isa Px, ty
Module handling
These opcodes deal with loading in bytecode or executable code libraries, and fetching info about those libraries. This is all dealing with precompiled bytecode or shared libraries.
- load_bytecode sx Load in the bytecode in file X. Search the library path if need be.
- load_opcode_lib sx, iy Load in the opcode library X, starting at opcode number Y. Search the path if needed.
- load_string_lib sx Load in the string handling library named X
- get_op_count sx Return the number of opcodes in opcode library X
- get_string_name sx Get the name of the string encoding that the library X handles
- find_string_lib sx, sy Find the string library that handles strings of type Y. Return its name in X.
I/O operations
Reads and writes read and write records, for some value of record.
- new_fh px Create a new filehandle px
- open px, sy Open the file Y on filehandle X
- read px, py, pz Issue a read on the filehandle in y, and put the result in PMC X. PMC Z is the sync object.
- write px, sy, pz Write the string Y to filehandle X. PMC Z is the sync object.
- wait px Wait for the I/O operation represented by sync object X to finish
- readw px, py Read from filehandle Y and put the results in PMC X. Blocks until the read completes.
- writew px, sy Write string Y to filehandle X, waiting for the write to complete.
- seek px, ty Seek filehandle X to position Y.
- tell tx, py Return the current position of filehandle Y and put it in X. Returns -1 for filehandles where this can't be determined. (Such as stream connections)
- status px, py, tz Get informational item Z for filehandle Y and put the result in X. This fetches things like the number of entries in the IO pipe, number of outstanding I/O ops, number of ops on the filehandle, and so forth.
Threading ops
- lock Px Take out a high-level lock on the PMC in register X
- unlock Px Unlock the PMC in register X
- pushunlock Px Push an unlock request on the stack
Interpreter ops
- newinterp Px, flags Create a new interpreter in X, using the passed flags.
- runinterp Px, iy Jump into interpreter X and run the code starting at offset Y from the current location. (This is temporary until we get something better)
- callout Pw, Px, sy, pz Call routine Y in interpreter x, passing it the list of parameters Z. W is a synchronization object returned. It can be waited on like the sync objects returned from async I/O routines.
- interpinfo Ix, iy Get information item Y and put it in register X. Currently defined are:
- 1 TOTAL_MEM_ALLOC The total amount of system memory allocated for later parceling out to Buffers. Doesn't include any housekeeping memory, memory for Buffer or PMC structs, or things of that nature.
- 2 GC_MARK_RUNS The total number of garbage collection mark runs that have been made.
- 3 GC_COLLECT_RUNS The total number of garbage collection sweep runs that have been made.
- 4 ACTIVE_PMCS The number of PMCs considered active. This means the GC scan hasn't noted them as dead.
- 5 ACTIVE_BUFFERS The number of Buffers (usually STRINGs but could be other things) considered active.
- 6 TOTAL_PMCS The total number of PMCs the interpreter has available. Includes both active and free PMCs
- 7 TOTAL_BUFFERS The total number of Buffer structs the interpreter has available.
- 8 HEADERS_ALLOC_SINCE_COLLECT The number of new Buffer header block allocations that have been made since the last GC mark run. (Buffers, when allocated, are allocated in chunks)
- 9 MEM_ALLOCS_SINCE_COLLECT The number of times we've requested a block of memory from the system for allocation to Buffers since the last time we compacted the memory heap.
Garbage collection
- sweep Fire off a dead object sweep
- collect Fire off a garbage collection sweep
- pausecollect Pause the garbage collector. No collections will be done for this interpreter until the collector is unpaused.
- resumecollect Unpause the collector. This doesn't necessarily do a GC run, merely allows the interpreter to fire one off when it deems it necessary.
Key operations
Keys are used to get access to individual elements of an aggregate variable. This is done to allow for opaque, packed, and multidimensional aggregate types.
A key entry may be an integer, string, or PMC. Integers are used for array lookups, strings for hash lookups, and PMCs for either.
- new_key Sx Create a new key structure and put a pointer to it in register X.
- clone_key Sx, ky Make a copy of the key Y and put a pointer to it in register X. Y may be either an S register or a constant.
- size_key Sx, iy Make the key structure X large enough to hold Y key entries
- key_size Ix, ky Put the number of elements in key Y into integer register X.
- toss_key Sx Nuke key X. Throws the structure away and invalidates the register.
- ke_type Ix, ky, iz Put the type of key Y's entry Z in register X. Current values are 0, 1, and 2 for Integer, String, and PMC, respectively.
- ke_value tx, ky, iz Put the value from key Y, entry Z into register X.
- chop_key Sx Toss the topmost entry from key X.
- inc_key Sx, iy Increment entry Y of key X by one.
- set_key Sw, [isp]x, iy[, iz] Set key W, offset Y, to value X. If X is a PMC, then the fourth operand must be specified. It can have a value of 0, 1, or 2, corresponding to integer, string, or object. Aggregates use this to figure out how to treat the key entry.
Properties
Properties are a sort of runtime note attached to a PMC. Any PMC can have properties on it. Properties live in a flat namespace, and they are not in any way associated with the class of the PMC that they are attached to.
Properties may be used for runtime notes on variables, or other metadata that may change. They are not for object attributes.
- setprop Px, sy, Pz Set the property named Y of PMC X to the PMC in Z
- getprop Px, sy, Pz Get the property named Y from PMC Z and put the result in register X. Returns a NULL if the property doesn't exist.
- delprop Px, sy Delete the property Y from PMC X
- prophash Px, Py Fetch the properties from Y, put them in a Hash, and put the Hash in X.
Symbolic support for HLLs
- setline ix Sets the 'current line' marker.
- setfile sx Sets the 'current file' marker.
- setpackage sx Sets the 'current package' marker.
- getline ix Fetches the 'current line' marker.
- getfile sx Fetches the 'current file' marker.
- getpackage sx Fetches the 'current package' marker.
Foreign library access
These are the ops we use to load in and interface to non-parrot libraries.
- loadlib Px, Sy Load in the library whose name is specified by y, and put a handle to it into P register x.
- dlfunc Pw, Px, Sy, Sz Find a routine named Y, in library X (which you did, of course, open with loadlib), and put a sub PMC onto W for it. You can call this sub as if it were any other parrot subroutine.Z has the function signature, which tells Parrot how to build the interface from parrot (and parrot's calling conventions) to the calling conventions of the library routine. Yes, this does mean that you must know the function signature, but if you don't know that why the heck would you be invoking the function, right?The signature is a series of 1 or more characters, representing the types for the call. The first character is the return type, while the rest are the parameters. The types are:
- v Void. As a return type indicates that there is no return type. As a parameter indicates that there are no parameters. Can't be mixed with other parameter types.
- c Char. This is an integer type, taken from (or put into) an I register.
- s short. An integer type, taken from 0 or put into an I register
- i int. An integer type.
- l long. An integer type. You know the drill.
- f float. F register denizen.
- d double. F register, double-precision floating point type
- p PMC thingie. A generic pointer, taken from or stuck into a PMC's data pointer. If this is a return type, parrot will create a new UnManagedStruct PMC type, which is just a generic "pointer so some damn thing or other" PMC type which Parrot does no management of.
- t string pointer. Taken from, or stuck into, a string register. (Converted to a null-terminated C string before passing in)
- invoke Invoke a subroutine in P0. Presumes that all the registers are set up right for the call. The invoked subroutine must preserve any registers that are not explicitly return parameters or calling convention metadata parameters. (Such as the number of I reg parameters, for example)
   int SDL_BlitSurface(SDL_Surface *src,
                       SDL_Rect    *srcrect,
                       SDL_Surface *dst,
                       SDL_Rect    *dstrect);
would be ipppp, since it returns an integer and takes four pointers. Presumably previous calls would have set those pointers up properly.Do note that parrot makes no guarantees as to the behaviour of the libraries, and currently does no type checking on the input parameters. We will fix that later.The generated routine follows the calling conventions in PDD03. Note that int, string, pmc, and float parameters are counted separately. So if you have a signature of ippiidd the return goes into I5, and the parameters come from P5, P6, I5, I6, N5, and N6, respectively. A signature of ipdiidp has the identical same set of registers used (and in the same order).
Runtime compilation
These opcodes deal with runtime creation of bytecode and compilation of source code.
- compile Px, Py, Sz Compile source string Z, with compiler unit Y, and stick a handle to a subroutine for the resulting bytecode segment (already loaded into the current interpreter) into X.Y is a assembler/compiler object of some sort, as registered with the compreg opcode or the Parrot_compreg function. This will be something like "Perl5", "Perl6", "Perl5RE", "Perl6RE", "Python", "Ruby"... you get the picture.Parrot knows of a "PASM1" compiler, i.e. a one statement PASM compiler implemented as PDB_eval. Imcc registers "PASM" and "PIR" compilers.This is a high-level op, with the assumption that the resulting sub will be called. It's the equivalent of perl 5's string eval, except for the actual execution of the resulting code.
- compreg Px, Sy Get a compiler for source type Y.
- compreg Sx, Py Register the sub Y as a parser/compiler function named X. It will be called whenever anyone invokes the compile op with the name X.
Attachments
None.
References
None.
Version
1.9
Current
    Maintainer: Dan Sugalski
    Class: Internals
    PDD Number: 6
    Version: 1.9
    Status: Developing
    Last Modified: 28 February 2007
    PDD Format: 1
    Language: English
History
- Version 1.9 February 28, 2007
- Version 1.8 December 11, 2002
- Version 1.7 December 02, 2002
- Version 1.6 November 05, 2001
- Version 1.5 October 12, 2001
- Version 1.4 September 24, 2001
- Version 1.3 September 12, 2001
- Version 1.2 August 25, 2001
- Version 1.1 August 8, 2001
- version 1 None. First version
Changes
- Version 1.9
- Removed remark on "upper case names reserved for directives"
- Fixed ".sub" directive, should be ".pcc_sub"
- Added constant directive in description.
- Added grammar overview.
- Version 1.8
- Added property ops
- Fixed some bad register designations
- Opened up opcode name character list to include numbers
- Version 1.7
- Fixed stack ops; push, pop, and clear properly documented according to the engine's behaviour now.
- Version 1.6
- Added GC opcodes
- Version 1.5
- Now have a bsr in addition to a jsr
- return is now ret
- Added save and restore ops for saving and restoring individual registers
- Version 1.4
- Conditional branches have just a true destination now
- Added the I/O ops
- Added in the threading ops
- Added in the interpreter ops
- Version 1.3
- Added in the low-level module loading ops
- Added in transcendental functions and modulo
- Finished the pad/global variable fetching bits
- Version 1.2 We have an interpreter now! Yay! (Okay, a simple one, but still...) Changes made to reflect that.
- Version 1.1
- Added in object
- Changed remnants of "perl" to "Parrot"
- Branch destination may be integer constant
- Added "Assembly Syntax" section
- Version 1.0 None. First version
