This is a complete rewrite of the PIR lexical analyzer, as defined in IMCC. Goal is to fix the issues with the current implementation of the PIR language.

The current approach is to create a three-pass compiler, but if any optimizations in this schedule can be made, then this is preferred. This needs more experimentation.

The first pass is the heredoc pre-processor, which converts all heredoc strings into normal strings (they are "flattened). Furthermore, this phase strips all comments, both POD and line comments.

The second pass is the macro pre-processor, which handles the .macro, .macro_const and .include directives. The resulting output is the file that can be fed into the actual PIR parser.

The third pass is then the PIR parsing phase. It takes the output of the macro pre-processor, which contains no heredoc strings and macros. For that reason, the PIR lexer is very simple and straightforward.

Each of the phases can be easily implemented. When they must be combined, the complexity grows quickly. Therefore, this approach, which is probably not the most efficient, is easier to maintain, and preferable.


Update the line number. The yytext buffer is scanned for '\n' characters; for each one, the line number is incremented. It's done this way, because we can't increment it in the rule for matching newlines, as a set of consecutive newlines are matched together and a single newline token is returned.

*/ static void update_location(void *yyscanner, lexer_state * const lexer) { char const *iter = yyget_text(yyscanner); assert(lexer != NULL);

    /* TODO: is yytext NULL terminated? */
    while (*iter != '\0') {
        if (*iter == '\n') {
            lexer->line_pos = 1; /* reset column */
        else {


The C89 standard does not define a strdup() in the C library, so define our own strdup. Function names beginning with "str" are reserved (I think), so make it dupstr, as that is what it does: duplicate a string.

*/ static char * dupstr(char const * const source) { char *newstring = (char *)calloc(strlen(source) + 1, sizeof (char)); assert(newstring); strcpy(newstring, source); return newstring; }



See dupstr, except that this version takes the number of characters to be copied. Easy for copying a string except the quotes.

*/ static char * dupstrn(char const * const source, size_t num_chars) { char *newstring = (char *)calloc(num_chars + 1, sizeof (char)); assert(newstring); /* only copy num_chars characters */ strncpy(newstring, source, num_chars); return newstring; }

/* after each rule execute update_location() */ #define YY_USER_ACTION do { \ lexer_state *my_lexer = yyget_extra(yyscanner); \ update_location(yyscanner, my_lexer); \ } \ while(0);



constructor for a lexer. It's very important to initialize all fields.

*/ lexer_state * new_lexer(char * const filename) { lexer_state *lexer = (lexer_state *)malloc(sizeof (lexer_state)); assert(lexer != NULL);

    lexer->filename      = filename;
    lexer->line_nr       = 1;
    lexer->line_pos      = 1;
    lexer->parse_errors  = 0;

    lexer->subs          = NULL;
    lexer->is_instr      = 0;

    printdebug(stderr, "Constructing new lexer\n");

    return lexer;