Indexing Aggregate PMCs
Abstract
This PDD aims to clear up the confusion regarding the implementation of keyed access to PMCs in Parrot.
Description
First,
let's define some terminology.
An aggregate PMC
is one which stores or references other values as elements.
The aggregate PMC allows indexed access to element by implementing some of the _keyed
variants of VTABLE functions.
These variants are called indexing
operations,
as they act on a specific indexed element of an aggregate PMC.
Examples of a aggregate PMCs include Hash
,
FixedIntegerArray
and ResizablePMCArray
.
Non-aggregates may also support _keyed
variants of the VTABLE functions,
but they not do anything particularly clever.
For instance,
PMC types implementing Perl references will merely pass the index on to the referent.
These aren't aggregates because they don't directly store or reference elements.
Indexing operations take one or more aggregate keys.
At runtime these operations will index into the aggregate using the key
and return a value.
Here is a well-known indexing operation in Perl 6:
@a[12] = $b;
The key here is the constant integer 12
The aggregate is the Perl6Array
@a
. In the process of this assignment, Parrot will have to extract the PMC in element 12 of the array, producing a value
. $b
is then assigned to this value.
Now, how does this all get implemented?
Implementation
The key structure
The key structure must bundle multiple keys. This is to allow indexing into multidimensional aggregate PMCs. These keys may be specified as integer, string, number or PMC.
For this reason the following structure was produced. Individual keys (e.g. a single Integer
or String
) are stored in a Key
PMC. The type of the key is encoded in the private flags of the PMC as speficied below. The value of the Key
PMC is stored in the PMC's cache (i.e. PMC_pmc_val(key)
, PMC_int_val(key)
).
For example, indexing the multidimensional array @foo[$a,12;"hi"]
produces three PMCs; one with a PMC type, one with an integer type and one with a string type.
The key type is encoded in the PMC flags using 8 bits based on the following scheme (from includes/parrot/key.h):
typedef enum { KEY_integer_FLAG = PObj_private0_FLAG, KEY_number_FLAG = PObj_private1_FLAG, KEY_hash_iterator_FLAGS = PObj_private0_FLAG | PObj_private1_FLAG, KEY_string_FLAG = PObj_private2_FLAG, KEY_pmc_FLAG = PObj_private3_FLAG, KEY_register_FLAG = PObj_private4_FLAG, KEY_start_slice_FLAG = PObj_private5_FLAG, KEY_end_slice_FLAG = PObj_private6_FLAG, KEY_inf_slice_FLAG = PObj_private7_FLAG, KEY_type_FLAGS = KEY_integer_FLAG | KEY_number_FLAG | KEY_string_FLAG | KEY_pmc_FLAG | KEY_register_FLAG | KEY_hash_iterator_FLAGS } KEY_flags
The KEY_register_FLAG
is used to indicate that value if the key is in a register. In this case, PMC_int_val(key)
contains the number of a register of the appropriate type that contains the value.
Parrot must also have a way to combine multiple keys into a key that can be treated as a single unit. This is done by forming a singly linked list such that each key points at the next. Within a single Key PMC, the pointer to the next key is stored in PMC_data(key)
. The linked list structure allows the use of partial keys in multidimentional lookups, since the next key can be generated while the aggregate PMC is being traversed.
These definitions, along with declarations of support routines used to manipulate keys, can be found in include/parrot/key.h
Aggregate and non-aggregate PMCs
We've already said that what separates the aggregate PMCs from the non-aggregates is their implementation of the _keyed
vtable methods. So it is Hereby Decreed that the default vtable which everyone inherits from defines the _keyed
forms to throw an exception.
- Todo Need discussion on whether
EXCEPTION_OUT_OF_BOUNDS
is a good exception for this, or whether something else should be used. It's really a compiler screw-up, since code which indexes a non-aggregate shouldn't be generated._keyed
vtable methods
So what of these magical _keyed
vtable methods? They are generated when you add the keyed
tag to the appropriate entry in src/vtable.tbl. They are constructed by following every PMC
argument with a second PMC
argument which acts as the key for that argument; the name of the second PMC
argument is formed by adding _key
onto the end of the first PMC
argument.
The reason why every PMC argument has an associated key is twofold. Firstly, it means that
@a[$b] = $c
and
$a = @b[$c]
use the same vtable method, reducing the multiplicity of methods. Secondly, a three-argument assign
as suggested by the code above would be ambiguous - the code above uses 3 PMCs in different ways.
Also, operations which take an aggregate key for one of their arguments should take aggregate keys for all of their arguments. This is to avoid the following:
void foo_keyed_i(PMC* x, PMC* x_key, INT a) void foo_keyed_n(PMC* x, PMC* x_key, NUM a) void foo_keyed_p(PMC* x, PMC* x_key, PMC a) void foo_keyed_p_keyed(PMC* x, PMC* x_key, PMC* a, PMC* a_key)
These are all replaced with the single entry
void foo_keyed(PMC* x, PMC* a_key, PMC* a, PMC* a_key)
(Think how much worse it gets when there are three or more PMCs in an entry...)
Yes. This means that you may need to turn some things into PMC
s that you didn't want to. Since the alternative is mega pollution and duplication in the vtable table, and since the majority of things that you'll deal with in a real world situation are expected to be PMC
s anyway, this shouldn't be too much of a problem.
So, if you have a PMC in a _keyed
method which you don't want to index, pass in NULL
instead of a real key. Code implementing these methods should understand PMC *foo, PMC *NULL
as meaning the entirety of foo
in some sense; this is trivial to understand if foo
is non-aggregate, and implementation-defined if foo
is aggregate. If you remember that a key PMC is really a linked list, you'll notice that after traversing down through the list, you'll reach a NULL
which again means the entirety of whatever object you traversed to.
Similarly, non-_keyed
methods on aggregates are implementation defined; for instance, a set_integer
on a PerlArray
may be understood as setting @array.length
.
Historically, we first implemented keys as two separate keyed methods per applicable method - ..._index
and ..._index_s
for integer and string indexing respectively. However, this didn't give us the flexibility and scalability that key structures give us.
Input to the assembler
There are several different valid specifications of an aggregate key to the assembler. These are:
op arg, P1[1234] # Constant integer key op arg, P1[I1] # Integer key op arg, P1[12.34] # Constant number key - handled as constant key op arg, P1["foo"] # Constant string key - handled as constant key op arg, P1[I1;I2] # Multi-level key - handled as constant key op arg, P1[P1] # Register key
(Rationale: fits programmer's expectation, easier to understand at a glance than op P1, P2, P3
. Also, is op P1, P2, P3
the same as op P1[P2], P3
or op P1, P2[P3]
, or are these three separate PMCs?)
In all there are four types of key. The first two are integer keys and constant integer keys which are optimisations for the common case of single level integer keys.
The other two are constant keys, which can handle any combination of constants and registers with any number of levels; and register keys which are represented by a single PMC register that is assumed to contain a PMC of the Key class.
What the assembler did next
When the assembler sees an aggregate key, it "detaches" the key to form a separate argument. It then decides on the type of key. For integer keys (both constant and register) the data is encoded in the same way as an ordinary integer argument. For register keys the data is encoded as for an ordinary PMC register argument, while for constant keys a key constant is generated that encodes the list of constants and registers that make up the key and an appropriate index into the constant table is encoded as the argument.
Next it selects the appropriate op. Register keys have the signature k
and constant keys have the signature kc
. Integer register and constant keys are encoded as ki
and kic
respectively.
set P1["hi"], 1234
finds an op named set_p_kc_i
. On the other hand,
set P1[P1], 1234
produces an op named set_p_k_i
. Likewise, this:
set P1[1], 1234
produces an op named set_p_kic
, and this:
set P1[I1], 1234
produces an op named set_p_ki
.
Bytecode representation
The bytecode representation of these keys are as follows: constant keys are treated just like another constant, and are an index into the packfile's constant table.
Each key in that constant table consists of one word specifying its length in terms of number of keys. For instance, ["hi"]
has length 1; ["hi";P1;S1;123]
has length 4. Next, each key is specified using two words. The first word is a type specifier:
1 - Integer constant 2 - Number constant 4 - String constant 7 - Integer register 8 - Number register 9 - PMC register 10 - String register
and the second word is either a value (for integer constants), a register number (for registers) or an index into the appropriate constant table.
The type values shown above are actually the PARROT_ARG_*
values taken from include/parrot/op.h.
Version
Current
Maintainer: Simon Cozens <simon@netthink.co.uk> Class: Internals PDD Number: 8 Version: 1.3 Status: Developing Last Modified: 25 August, 2002 PDD Format: 1 Language: English
History
- Sun Aug 25 11:14:43 GMT 2002 : Version 1.3 Updated to reflect Dan's decision to change keys to use PMCs instead of a custom data structure. Also corrects documentation of multi-level keys and how they are compiled and work. tom@compton.nu.
- Thu Apr 25 18:30:36 UTC 2002 : Version 1.2 Renamed
- Fri Mar 8 18:47:34 GMT 2002 : Version 1.1 updated to reflect Dan's comments that non-aggregates also support
KEY_PAIR
to KEY_ATOM
, updated to reflect changeover to linked list. - steve@fink.com
_keyed
variant vtable methods.References
To come.