This is a complete rewrite of the PIR lexical analyzer, as defined in IMCC. Goal is to fix the issues with the current implementation of the PIR language.

The current approach is to create a three-pass compiler, but if any optimizations in this schedule can be made, then this is preferred. This needs more experimentation.

The first pass is the heredoc pre-processor, which converts all heredoc strings into normal strings (they are "flattened). Furthermore, this phase strips all comments, both POD and line comments.

The second pass is the macro pre-processor, which handles the .macro, .macro_const and .include directives. The resulting output is the file that can be fed into the actual PIR parser.

The third pass is then the PIR parsing phase. It takes the output of the macro pre-processor, which contains no heredoc strings and macros. For that reason, the PIR lexer is very simple and straightforward.

Each of the phases can be easily implemented. When they must be combined, the complexity grows quickly. Therefore, this approach, which is probably not the most efficient, is easier to maintain, and preferable.


The C89 standard does not define a strdup() in the C library, so define our own strdup. Function names beginning with "str" are reserved (I think), so make it dupstr, as that is what it does: duplicate a string.


See dupstr, except that this version takes the number of characters to be copied. Easy for copying a string except the quotes.


constructor for a lexer. It's very important to initialize all fields.


static int is_parrot_op(char const *const spelling)