NAME ^

design.pod - description PIRC's design.

DESCRIPTION ^

This document describes the design and implementation of PIRC, a PIR Compiler.

OVERVIEW ^

PIRC currently consists of a PIR parser, together with a lexer. It also has the beginning of semantic actions in the parser. Through the use of a vtable, several back-ends can be implemented, leaving the parser untouched.

Documentation of the lexer and the parser can be generated by running:

 make docs

which will generate html files in the doc directory.

This document will only provide a high-level overview.

THE LEXER ^

The lexer is defined in pirlexer.c. The header file lists all tokens that may be returned by the lexer.

The lexer reads the complete file contents into a buffer, from which it reads the individual words, or tokens. A buffer is much faster than using getc() for each character, as I/O is relatively slow.

THE PARSER ^

The parser is defined in pirparser.c. The header file only predeclares the parser_state structure, but its definition is written in the C file, to hide the implementation details from other files. Access to specific fields is done through accessor functions, defined in the header file as well.

The parser communicates with the lexer through the lexer's accessor function. Of these, the next_token() function is most important: it requests the next token from the lexer.

The parser does not know anything about the spelling of tokens, although it can request these through find_keyword().

SEMANTIC ACTIONS ^

The parser calls at a number of places emit functions. These are hooks to which a function can be hooked, that will be called when the parser calls that function. This is implemented using vtables. Of course, not all hooks need to be used. If a hook is not assigned a function by the user, the default empty function is invoked. This is done to prevent NULL checks; let's just hope the optimizer sees the invoked function is empty, so the overhead of calling it is removed.

Example Vtable methods ^

This section gives a simple example to show how things are used. Let's consider a simplified version of a long_invocation. Syntactically, it looks like this (this is the simplified version):

 long-invocation -> '.begin_call' '\n'
                    arguments
                    '.call' invokable '\n'
                    results
                    '.end_call' '\n'

 invocant -> IDENTIFIER | PREG

The parsing routine for long-invocation (again, its simplified version) looks as follows:

 static void
 long_invocation(parser_state *p) {
      emit_invocation_start(p); /* indicate start of invocation */
      match(p, T_PCC_BEGIN);
      match(p, T_NEWLINE);
      arguments(p);

      match(p, T_PCC_CALL); /* check for token '.call' */
      match(p, T_NEWLINE);  /* check for a newline */

      /* get current token from lexer and store it as the invokable object */
      emit_invokable(p, get_current_token(p->lexer));

      /* check whether it was an invokable object and get next token */
      switch (p->curtoken) {
          case T_IDENTIFIER:
          case T_PREG:
              emit_invokable(p, get_current_token(p->lexer));
              break;
          default:
              syntax_error(p, 1, "invokable object expected");
              break;
      }

      results(p);

      match(p, T_PCC_END); /* accept token '.end_call' */
      match(p, T_NEWLINE); /* accept the newline token */
      emit_invocation_end(p); /* close down invocation sequence */

 }

 static void
 arguments(parser_state *p) {
      emit_args_start(p);   /* start sequence of arguments */
      /* handle arguments */
      emit_args_end(p); /* stop sequence of arguments */
 }

 static void
 results(p) {
      emit_results_start(p); /* start sequence of results */
      /* handle results */
      emit_results_end(p); /* stop sequence of results */
 }

To each of the emit_* function calls, the writer of the back-end can hook a custom function, that does the Appropiate Thing. What is appropiate, depends on the back-end. Some back-ends need to construct a data structure (AST) (for example the PBC backend would need this), others can just spit out what they get (like the PIR back-end).

Supported back-ends ^

Currently, there are the following back-end targets:

See src/pirvtable.{c,h} for details.

Please note that none of the back-ends is complete.

VTable Methods ^

Now, you might ask yourself, who or what decides where these hooks, or vtable method calls are done. "Why is there a hook over here?" Well, that's done by figuring out at what moment a back-end might need to get some information of the parser. As the parser continues, the tokens being read are lost, if they're not stored anywhere. So, every once and a while the back-end needs to be able to store stuff, so it can do its job properly.

The vtable is not complete yet. There are a number of parsing routines that do not have associated vtable methods (invocations). Of course, we don't want the parser to do too much vtable invocations. On the other hand, if the parser does too few, constructing a back-end might be impossible. It's a bit of a trade-off.

OVERVIEW ^

This section gives an overview of what functionality is in what file:

WHAT NEEDS TO BE DONE ^

There are some major TODOs:

AUTHOR ^

Klaas-Jan Stol <parrotcode at gmail dot com>


parrot