NOTE ^

This document is probably no longer current.

PARSING ^

This document explains the algorithm used by parse.imc to parse a hunk of tcl. This is a from-scratch implementation based on the tcl man page. This was, btw, a heck of a lot easier when I had perl5's regexps to do things with. =-)

First, in __main, we read in the input file and shove it in a string. This then gets passed to the __parse sub. (Or we take stdin, we're not picky)

Footnotes refer to bullet items in the tcl man page.

STATE_MACHINE ^

There are several states that our parser can be in:

BEGIN_SCOPE ^

Where we begin, create a lexical scope in which to store variables.

newline/backslash substitution is performed on the string. [8]

BEGIN_COMMAND ^

We clear out the Array that is holding our command.

BEGIN_WORD ^

skip any leading whitespace. If a newline or a ; is found, goto END_COMMAND [1]

if the first character of the first word is a #, then it's a comment: ignore all characters until the next newline, and go to BEGIN_COMMAND [9]

If the first character of a word is a double-quote, the word consists of all the characters between the two double quotes. append it to the command Array and goto BEGIN_WORD [4] (escaped \"'s are ignored.) (any non whitespace/ non command separator character in the stream at this point is an error.)

If the first character of a word is a {, the word consists of all the characters between the { and the }. append the word to the command Array and goto begin word. [5] There must be a matched number of unescaped { and } chars. (any non whitespace/ non command separator character in the stream at this point is an error.)

If there are no more characters, goto END_SCOPE

If any other character, then fall through:

MIDDLE_WORD ^

We're in the middle of getting a word. Any whitespace indicates END_WORD. A ; or \n indicates END_COMMAND.

If a [ (unescaped) is present, then the word extends to at least the next ]. Grab these characters, goto MIDDLE_WORD

If a ${ appears, the word extends to at least the next }. Grab these characters, goto MIDDLE_WORD

END_WORD ^

We've reached the end of a word. Add it to the array of words. goto BEGIN_WORD

END_COMMAND ^

We've reached the end of a command, append any outstanding word into the command array.

Append the command array to the array of commands. Goto BEGIN_COMMAND

END_SCOPE ^

We now have an array of arrays, which correspond to the raw text of the words in the code. Now we need to perform various substitutions on the words. (In a future version, this is where we'd compile the code. For now, we'll just interpret it.)

RUN_COMMAND ^

pop an array off the array of commands. For each of the words in the command array, we need to make sure we only process each character of text once - to do this, we keep a linked list of { state, start, len } - Each round of substitution can only happen on raw segments. Once a substitution occurs, the list is further segmented, the raw being broken up into possibly multiple alternating raw/cooked segments. Substitutions are NOT done on words that were {} words.

Command substitution

All characters between a [ and ] are considered a tcl script, and run through the parser.

Variable substitution

If there's a $ , then any of the following text is replaced with the corresponding variable value: $name , $name(index) and ${name}. index has command, variable, and backslash substitutions performed on it before it's used to lookup a value.

backslash substitution

Various \ substitutions, except for backslash-newline, which is done before anything else when we first get our script.

EXECUTE_COMMAND ^

At this point, each of the words is as cooked as it's going to be. Put the list for each word back together into a single string. Call the command associated with the first cooked word and pass in the rest of the array as the parameters.

Save return value. (but only the last one)

While there are commands left, go to RUN_COMMAND

return the last return value saved. (XXX: what to return if there was no command executed? empty string?)

POD ERRORS ^

Hey! The above document had some coding errors, which are explained below:

Around line 111:

You forgot a '=back' before '=head2'


parrot