DESCRIPTION

This is the eighth episode in a tutorial series on building a compiler with the Parrot Compiler Tools.

Episode 8: Hashtables and Arrays

Welcome to Episode 8! This is the second-last episode in this tutorial. After this episode, we'll have a complete implementation of our Squaak language. This episode focuses on aggregate data structures: arrays and hashtables. We'll discuss the syntax to assign to them and to construct them. We'll see that implementing the action methods is really easy, almost trivial. After that, we'll make some notes on aggregates as arguments, and how they differ from the basic data types when passing them around as subroutine arguments.

Arrays and Hashtables

Besides basic data types such as integer, floating-point and string, Squaak has two aggregate data types: array and hashtable. An array is an object that can store a sequence of values. The values in this sequence can be of different types, unlike some languages that require all elements of an array to be the same type. An example of using arrays is shown below:

 grades[0] = "A"
 grades[1] = "A+"
 grades[2] = "B+"
 grades[3] = "C+"

A hashtable stores key-value pairs; the key is used as index to store a value. Keys must be string constants, but the value can be of any type. An example is shown below:

 lastnames{"larry"}   = "wall"
 lastnames{"allison"} = "randal"

Array constructors

Just as there are integer literals (42) and string literals ("hello world") that can be assigned to variables, you can have array literals. Below is the grammar rule for this:

 rule circumfix:sym<[ ]> {
     '[' [<EXPR> ** ',']? ']'
 }

Some examples are shown below:

 foo = []
 bar = [1, "hi", 3.14]
 baz = [1, [2, 3, 4] ]

The first example creates an empty array and assigns this to foo. The second example shows the construction of three elements, assigning the array to bar. Note that the elements of one array can be of different types. The third example shows the construction of nested arrays. This means that element baz[1][0] evaluates to the value 2 (indexing starts at 0).

Hashtable constructors

Besides array literals, Squaak supports hashtable literals, that can be constructed through a hashtable constructor. The syntax for this is expressed below:

 rule circumfix:sym<{ }> {
     '{' [<named_field> ** ',']? '}'
 }

 rule named_field {
     <string_constant> '=>' <EXPR>
 }

 # We need to rename our existing string_constant term to a separate rule
 # so that we can use it specifically.
 token term:sym<string_constant> { <string_constant> }

 # Don't forget to rename the action method.
 token string_constant { <quote> }

Some examples are shown below:

    foo = {}
    bar = { "larry" => "wall", "allison" => "randal" }
    baz = { "a" => { "b" => 42} }

The first line creates an empty hashtable and assigns this to foo. The second creates a hashtable with two fields: "larry" and "allison". Their respective values are: "wall" and "randal". The third line shows that hashtables can be nested, too. There, a hashtable is constructed that has one field, called "a", and its value is another hashtable, containing a field "b" that has the value 42.

Implementation

You might think implementing support for arrays and hashtables looks rather difficult. Well, it's not. Actually, the implementation is rather straightforward. First, we're going to update the grammar rule for primary:

 rule primary {
     <identifier> <postfix_expression>*
 }

 proto rule postfix_expression { <...> }

 rule postfix_expression:sym<index> { '[' <EXPR> ']' }

 rule postfix_expression:sym<key> { '{' <EXPR> '}' }

A primary object is now an identifier followed by any number of postfix-expressions. A postfix expression is either a hashtable key or an array index. Allowing any number of postfix expressions allows to nest arrays and hashtables in each other, allowing us to write, for instance:

 foo{"key"}[42][0]{"hi"}

Of course, you as a Squaak programmer must make sure that foo is actually a hashtable, and that foo{"key"} yields an array, and so forth. Implementing this is actually quite simple. First, let us see how to implement the action method index.

 method postfix_expression:sym<index>($/) {
     my $index := $<EXPR>.ast;
     my $past  := PAST::Var.new( $index,
                                 :scope('keyed'),
                                 :viviself('Undef'),
                                 :vivibase('ResizablePMCArray'),
                                 :node($/) );

     make $past;
 }

First, we retrieve the PAST node for EXPR. Then, we create a keyed variable access operation, by creating a PAST::Var node and setting its scope to keyed. If a PAST::Var node has keyed scope, then the first child is evaluated as the aggregate object, and the second child is evaluated as the index on that aggregate.

But wait! The PAST::Var node we just created has only one child!

Here's where the updated action method for primary comes in. This is shown below.

 method primary($/) {
     my $past := $<identifier>.ast;

     for $<postfix_expression> {
         my $expr := $_.ast;
         $expr.unshift( $past );
         $past := $expr;
     }

     make $past;
 }

First, the PAST node for identifier is retrieved. Then, for each postfix-expression, we get the PAST node, and unshift the (current) $past onto it. Effectively, the (current) $past is set as the first child of $expr.

And you know what $expr contains: that's the keyed variable access node, that was created in the action method index. After that, $past is set to $expr; either there's another postfix-expression, in which case this $past will be set as the first child of that next postfix-expression, or, the current $past is set as the result object.

Implementing Constructors

To implement the array and hashtable constructors, we're going to take advantage of Parrot's Calling Conventions (PCC). The PCC supports, amongst others, optional parameters, named parameters and slurpy parameters. If you're Dutch, you might think that slurpy parameters make a lot of noise ("slurpen" is a Dutch verb meaning drinking carefully, which you usually do if your beverage is hot, making noise in the process), but you would be wrong. Slurpy parameters will store all remaining arguments that have not yet been stored in other parameters (implying that there can only be one slurpy (positional) parameter, and it should come after all normal (positional) parameters). Parrot will automatically create an aggregate to store these remaining arguments. Besides positional slurpy parameters, you can also define a named slurpy parameter, which will store all remaining named parameters, after all normal (named) arguments have been stored.

You might be confused by now.

Let's look at an example, as this issue is worth a few brain cells to store.

 .sub foo
     .param pmc a
     .param pmc b
     .param pmc c :slurpy
     .param pmc k :named('x')
     .param pmc l :named('y')
     .param pmc m :named :slurpy

 .end

 foo(1, 2, 3, 4, 6 :named('y'), 5 :named('x'), 7 :named('p'), 8 :named('q') )

This will result in the following mapping:

 a: 1
 b: 2
 c: {3, 4}
 k: 5
 l: 6
 m: {"p"=>7, "q"=>8}

So, after the positional parameters (a, b), c is declared as a slurpy parameters, storing all remaining positional parameters. Parameters k and l are declared as named parameters, which have the respective names "x" and "y". Using these names, values can be passed. After the named parameters, there's the parameter m, which is both flagged as named and slurpy. This parameter will store all remaining named arguments that have not yet been stored by the normal named parameters.

The interesting parameters for us are "c" and "m". For the positional slurpy parameter, Parrot creates an array, while for the named slurpy parameter a hashtable is created. This happens to be exactly what we need! Implementing the array and hash constructors becomes trivial:

 # Inset this in src/Squaak/Runtime.pm

 {
     my sub array (*@args) { @args; }
     my sub hash (*%args) { %args; }

     Q:PIR {
         $P0 = find_lex 'array'
         set_global '!array', $P0
         $P0 = find_lex 'hash'
         set_global '!hash', $P1
     }
 }

Array and hashtable constructors can then be compiled into subroutine calls to the respective Parrot subroutines, passing all fields as arguments. (Note that these names start with a "!", which is not a valid Squaak identifier. This prevents us from calling these subs in normal Squaak code).

Basic data types and Aggregates as arguments

All data types, both basic and aggregate data types are represented by Polymorphic Containers (PMCs). The PMC is one of the four built-in data types that Parrot can handle; the others are integer, floating-point and string. Currently, the PCT can only generate code to handle PMCs, not the other basic data types. Parrot has registers for each its four built-in data types. The integer, floating-point and string registers store the actual data value, while PMC registers store a reference to the PMC object. This has consequences for how PMCs are handled when passing them as arguments. When passing a PMC as an argument, the invoked subroutine gets access to the PMC reference; in other words, PMCs are passed by reference. This means that the subroutine can change the original argument that was passed by the caller. Of course, it depends what instructions are being generated, what the invoked subroutine does to the references. In Squaak, when passing basic data values, these cannot be changed by the invoked subroutine. When assigning a new value to a parameter, a whole new object is created and bound to the parameter identifier. No changes are made to the original argument. Aggregate data types are handled differently, however. When an invoked subroutine assigns to an index or hashtable field of a parameter, then the original argument is affected. In other words, basic data types have by value semantics, while aggregate data types have by reference semantics. A short example to demonstrate this:

 sub foo(a,b,c)
     a       = 42
     b[0]    = 1
     c{"hi"} = 2
 end

 var a = 0
 var b = []
 var c = {}
 foo(a,b,c)
 print(a, b[0], c{"hi"} ) # prints 0, 1, 2

What's Next?

This was the last episode to discuss implementation details to make Parrot (run) Squaak. After doing this episode's exercises, your implementation should be fairly complete. Next episode will be the last of this series, in which we'll recap what we did, and demonstrate our language with a nice demo program.

Exercises

We've shown how to implement keyed variable access for arrays, by implementing the action method for index. The same principle can be applied to keyed access for hashtables. Implement the action method for key.
Implement the action methods for circumfix:sym<[ ]> and circumfix:sym<{ }>. Use a PAST::Op node and set the pasttype to 'call'. Use the "name" attribute to specify the names of the subs to be invoked (e.g., :name("!array") ). Note that all hash fields must be passed as named arguments. Check out PDD26 for doing this, and look for a "named " method.
We'd like to add a little bit of syntactic sugar for accessing hashtable keys. Instead of writing foo{"key"}, I'd like to write foo.key. Of course, this only works for keys that do not contain spaces and such. Add the appropriate grammar rule (call it "member") that enables this syntax, and write the associated action method. Make sure this member name is converted to a string.

Hint: use a PAST::Val node for the string conversion.

Solutions to the exercises

    method postfix_expression:sym<key>($/) {
        my $key := $<expression>.ast;

        make PAST::Var.new( $key, :scope('keyed'),
                                  :vivibase('Hash'),
                                  :viviself('Undef'),
                                  :node($/) );
    }

    method term:sym<string_constant>($/) { make $<string_constant>.ast; }

    method named_field($/) {
        my $past := $<EXPR>.ast;
        my $name := $<string_constant>.ast;
        ## the passed expression is in fact a named argument,
        ## use the named() accessor to set that name.
        $past.named($name);
        make $past;
    }

    method circumfix:sym<[ ]>($/) {
        ## use the parrot calling conventions to
        ## create an array,
        ## using the "anonymous" sub !array
        ## (which is not a valid Squaak name)
        my $past := PAST::Op.new( :name('!array'),
                                  :pasttype('call'),
                                  :node($/) );
        for $<EXPR> {
            $past.push($_.ast);
        }
        make $past;
    }

    method circumfix:sym<{ }>($/) {
        ## use the parrot calling conventions to
        ## create a hash, using the "anonymous" sub
        ## !hash (which is not a valid Squaak name)
        my $past := PAST::Op.new( :name('!hash'),
                                  :pasttype('call'),
                                  :node($/) );
        for $<named_field> {
            $past.push($_.ast);
        }
        make $past;
    }

    rule postfix_expression:sym<member> {
        '.' <identifier>
    }

    method postfix_expression:sym<member>($/) {
        my $member := $<identifier>.ast;
        ## x.y is syntactic sugar for x{"y"},
        ## so stringify the identifier:
        my $key := PAST::Val.new( :returns('String'),
                                  :value($member.name),
                                  :node($/) );

        ## the rest of this method is the same
        ## as method key() above.
        make PAST::Var.new( $key, :scope('keyed'),
                                  :vivibase('Hash'),
                                  :viviself('Undef'),
                                  :node($/) );
    }