parrotcode: Untitled | |
Contents | Language Implementations | TCL |
This document is probably no longer current.
This document explains the algorithm used by parse.imc
to parse a hunk of tcl.
This is a from-scratch implementation based on the tcl man page.
This was,
btw,
a heck of a lot easier when I had perl5's regexps to do things with.
=-)
First,
in __main
,
we read in the input file and shove it in a string.
This then gets passed to the __parse
sub.
(Or we take stdin,
we're not picky)
Footnotes refer to bullet items in the tcl man page.
There are several states that our parser can be in:
Where we begin, create a lexical scope in which to store variables.
newline/backslash substitution is performed on the string. [8]
We clear out the Array that is holding our command.
skip any leading whitespace. If a newline or a ; is found, goto END_COMMAND [1]
if the first character of the first word is a #, then it's a comment: ignore all characters until the next newline, and go to BEGIN_COMMAND [9]
If the first character of a word is a double-quote, the word consists of all the characters between the two double quotes. append it to the command Array and goto BEGIN_WORD [4] (escaped \"'s are ignored.) (any non whitespace/ non command separator character in the stream at this point is an error.)
If the first character of a word is a {, the word consists of all the characters between the { and the }. append the word to the command Array and goto begin word. [5] There must be a matched number of unescaped { and } chars. (any non whitespace/ non command separator character in the stream at this point is an error.)
If there are no more characters, goto END_SCOPE
If any other character, then fall through:
We're in the middle of getting a word. Any whitespace indicates END_WORD. A ; or \n indicates END_COMMAND.
If a [ (unescaped) is present, then the word extends to at least the next ]. Grab these characters, goto MIDDLE_WORD
If a ${ appears, the word extends to at least the next }. Grab these characters, goto MIDDLE_WORD
We've reached the end of a word. Add it to the array of words. goto BEGIN_WORD
We've reached the end of a command, append any outstanding word into the command array.
Append the command array to the array of commands. Goto BEGIN_COMMAND
We now have an array of arrays, which correspond to the raw text of the words in the code. Now we need to perform various substitutions on the words. (In a future version, this is where we'd compile the code. For now, we'll just interpret it.)
pop an array off the array of commands. For each of the words in the command array, we need to make sure we only process each character of text once - to do this, we keep a linked list of { state, start, len } - Each round of substitution can only happen on raw segments. Once a substitution occurs, the list is further segmented, the raw being broken up into possibly multiple alternating raw/cooked segments. Substitutions are NOT done on words that were {} words.
At this point, each of the words is as cooked as it's going to be. Put the list for each word back together into a single string. Call the command associated with the first cooked word and pass in the rest of the array as the parameters.
Save return value. (but only the last one)
While there are commands left, go to RUN_COMMAND
return the last return value saved. (XXX: what to return if there was no command executed? empty string?)
Hey! The above document had some coding errors, which are explained below:
|