parrotcode: Lexical variables | |
Contents | Documentation |
docs/pdds/pdd20_lexical_vars.pod - Lexical variables
$Revision$
This document defines the requirements and implementation strategy for lexically scoped variables.
.sub foo
.lex "$a", P0
P1 = new Integer
P1 = 13013
store_lex "$a", P1
print P0 # prints 13013
.end
.sub bar :outer(foo)
P0 = find_lex "$a" # may succeed; depends on closure creation
.end
.sub baz
P0 = find_lex "$a" # guaranteed to fail: no .lex, no :outer()
.end
.sub corge
print "hi"
.end # no .lex and no :lex, thus: no LexInfo, no LexPad
# Lexical behavior varies by HLL. For example,
# Tcl's lexicals are not declared at compile time.
.HLL "Tcl", "tcl_group"
.sub grault :lex # without ":lex", Tcl subs have no lexicals
P0 = find_lex "x" # FAILS
P0 = new Integer # really TclInteger
P0 = 42
store_lex "x", P0 # creates lexical "x"
P0 = find_lex "x" # SUCCEEDS
.end
For Parrot purposes, "lexical variables" are variables stored in a hash (or hash-like) PMC associated with a subroutine invocation, a.k.a. a call frame.
LexInfo PMCs contain what is known at compile time about lexical variables of a given subroutine: their names (for most languages), perhaps their types, etc. They are the interface through which the PIR compiler stores and validates compile-time information about lexical variables.
At compile time, each newly created Subroutine (or Subroutine derivative, e.g. Closure) that uses lexical variables will be populated with a PMC of HLL-mapped type LexInfo. (Note that this type may actually be Null in some HLLs, e.g. Tcl.)
LexPads hold what becomes known at run time about lexical variables of a given invocation of a given subroutine: their values, of course, and for some languages (e.g. Tcl) their names. They are the interface through which the Parrot runtime stores and fetches lexical variables.
At run time, each call frame for a Subroutine (or Subroutine derivative) that uses lexical variables will be populated with a PMC of HLL-mapped type LexPad. Note that call frames for subroutines without lexical variables will omit the LexPad.
From the interface perspective, LexPads are basically Hashes, with strings as keys and PMCs as values. They extend the basic Hash interface with specialized initialization (requiring a reference to an associated LexInfo) and the query METHOD get_lexinfo()
(to return it).
LexPad keys are unique. Therefore, in each subroutine, there can be only one lexical variable with a given name.
In the normal use case, LexPads are not exposed to user code (not for any special reason; it just worked out that way). Instead, specialized opcodes implement the common use cases. Specialized opcodes are particularly a Good Idea here because most lexical usage involves searching more than one LexPad, so a single LexPad reference wouldn't be as useful as one might expect. And, of course, opcodes can cheat ... er, can be written in optimized C. :-)
TODO: Describe how lexical naming system interacts with non-ASCII character sets.
If Parrot is asked to access a lexical variable named $var, Parrot follows the following strategy. Note that fetch and store use the exact same approach.
Parrot starts with the currently executing subroutine $sub, then loops through these steps:
1. Starting at the current call frame, walk back until an active frame is
found that is executing $sub. Call it $frame.
(NOTE: The first time through, $sub is the current subroutine and $frame
is the currently live frame.)
2. Look for $var in $frame.get_lexpad using standard Hash methods.
3. If the given pad contains $var, fetch/store it and REPORT SUCCESS.
4. Set $sub to $sub.outer. (That is, the textually enclosing subroutine.)
But if $sub has no outer sub, REPORT FAILURE.
Parrot does not assume that every subroutine needs lexical variables. Therefore, Parrot defaults to not creating LexInfo or LexPad PMCs. It only creates a LexInfo when it first encounters a ".lex" directive in the subroutine. If no such directive is found, Parrot does not create a LexInfo for it at compile time, and therefore cannot create a LexPad for it at run time.
However, an absence of ".lex" directives is normal for some languages (e.g. Tcl) which lack compile-time knowledge of lexicals. For these languages, the additional Subroutine attribute ":lex" should be specified. It forces Parrot to create LexInfo and LexPads.
NOTE: This section should be taken using the "as-if" rule: Parrot behaves as if this section were literally true. As always, short cuts (development and runtime) may be taken.
Closures are specialized Subroutines that carry their lexical environment along with them. A lexical environment, which we will call a "LexEnv" for brevity, is a list of LexPads to be searched when looking for lexical variables. Its implementation may be as simple as a basic PMC array, but any ordered integer-indexed collection will do.
The newclosure
op creates a Closure from a Subroutine and gives that Closure a new LexEnv attribute. The LexEnv is then populated with pointers to the current enclosing LexPads. The definition of "enclosing" is not obvious, though.
The algorithm used to find "enclosing" LexPads is a loop of the following steps, starting with $sub set to the running Subroutine (which is a Closure):
1. Starting at the current call frame, walk back until an active frame is
found that is executing $sub. Call it $frame.
(NOTE: The first time through, $sub is the current subroutine and $frame
is the currently live frame.)
2. Append $frame's LexPad to the LexEnv.
3. If $sub has a LexEnv, append $sub's LexEnv to the LexEnv being built,
and END LOOP. Otherwise:
4. Set $sub to $sub.outer. (That is, the textually enclosing subroutine.)
But if $sub has no outer sub, END LOOP.
NOTE: The newclosure
opcode should check to make sure that the target Subroutine has an :outer()
attribute that points back to the currently running Subroutine. This is a requirement for closures.
At runtime, the find_lex
opcode behaves differently in closures. It has no need to walk the call stack finding LexPads - they have all already been collected conveniently together in the LexEnv. Therefore, in a Closure, find_lex
ignores the call stack, and instead searches (1) the current call frame's LexPad - i.e. the Closure's own lexicals -- and then (2) the LexPads in the LexEnv.
The implementation of lexical variables in the PIR compiler depends on two new PMCs: LexPad and LexInfo. However, the default Parrot LexPad and LexInfo PMCs will not meet the needs of all languages. They should suit Perl 6, for example, but not Tcl.
Therefore, it is expected that HLLs will map the LexPad and LexInfo types to something more appropriate (e.g. TclLexPad and TclLexInfo). That mapping will automatically occur when the appropriate ".HLL" directive is in force.
Using Tcl as an extreme example: TclLexPad will likely be a thin veneer on PMCHash. Meanwhile, TclLexInfo will likely map to Null: Tcl provides no reliable compile-time information about lexicals; without any compile-time information to store, there's no need for TclLexInfo to do anything interesting.
For HLLs that support nested subroutines, Parrot provides a way to denote that a given subroutine is conceptually "inside" another. Lookup for lexical variables starts at the current call frame and proceeds through call frames that invoke "outer" subroutines. The specific meaning of "outer" is defined below, but it's designed to support the common linguistic structure of nested subroutines where inner subs refer to lexical variables contained in outer blocks.
Note that "outer" and "caller" are very different concepts! For example, given the Perl 6 code:
sub foo {
my $a = 1;
my sub a { eval '$a' }
return &a;
}
The &foo
subroutine is the outer subroutine of &a
, but it is not the caller of &a
.
In the above example, the definition of the Parrot subroutine implementing &a must include a notation that it is textually enclosed within &foo
. This is a static attribute of a Subroutine, set at compile time and never changed thereafter. (Unless you're evil, or Damian. But I repeat myself.) This information is given through an :outer()
subroutine attribute, e.g.:
.sub a :outer(foo)
Note that the "foo" sub must be compiled first; in other words, "foo" must appear before "a" in the source text. Compilers can easily do this via preorder traversal of lexically-nested subs.
Below are the standard LexInfo methods that all HLL LexInfo PMCs may support. Each LexInfo PMC should only define the methods that it can usefully implement, so the compiler can use method lookup failure to generate useful diagnostics (e.g. "register aliasing not supported by Tcl lexicals").
Each language's LexInfo will implement methods that are helpful to that language's LexPad. In the extreme case, LexInfo can be Null -- but if it is, the given HLL should not generate any ".lex*" directives.
.lex STRING, PREG
directive. For example, given this preamble: .lex "$a", $P0
$P1 = new Integer
$P0 = $P1
store_lex "$a", $P1
$P1 = $P0
$P1 = find_lex "$a"
LexPads start by implementing the Hash interface: variable names are string keys, and variable values are PMCs.
In addition, LexPads must implement the following methods:
init_pmc()
time.For debugging and introspection, the Closure PMC should support:
newclosure
time.The default LexInfo supports lexicals only as aliases for PMC registers. It therefore implements declare_lex_preg()
. (Internally, it could be a Hash of some kind, where keys are String variable names and values are integer register numbers.)
The default LexPad (like all LexPads) implements the Hash interface. When asked to look up a variable, it finds the corresponding register number by querying its associated LexInfo. It then gets or sets the given numbered register in its associated Parrot Context structure.
Due to implementation concerns, it will not be until late in Parrot development -- if ever -- that call frames will be available to user code as PMCs. Until then, the interpreter and continuation PMCs will be the interface to use to get frame info.
For example, to get the immediate caller's LexPad, use:
$P0 = getinterp
$P1 = $P0["lexpad"; 1]
It's likely that this interface will continue to be available even once call frames become visible as PMCs.
TODO: Full interpreter introspection interface.
None.
None.
None.
|