NAME ^

lib/lpeg.pir - Parsing Expression Grammar for Lua, version 0.8

DESCRIPTION ^

See original on http://www.inf.puc-rio.br/~roberto/lpeg.html

Introduction ^

See on http://www.inf.puc-rio.br/~roberto/lpeg.html#intro

Functions ^

lpeg.match (pattern, subject [, init])

The matching function. It attempts to match the given pattern against the subject string. If the match succeeds, returns the index in the subject of the first character after the match, or the values of captured values (if the pattern captured any value).

An optional numeric argument init makes the match starts at that position in the subject string. As usual in Lua libraries, a negative value counts from the end.

Unlike typical pattern-matching functions, match works only in anchored mode; that is, it tries to match the pattern with a prefix of the given subject string (at position init), not with an arbitrary substring of the subject. So, if we want to find a pattern anywhere in a string, we must either write a loop in Lua or write a pattern that matches anywhere. This second approach is easy and quite efficient; see examples.

NOT YET IMPLEMENTED.

lpeg.print (pattern)

UNDOCUMENTED.

lpeg.span (string)

NOT YET IMPLEMENTED.

lpeg.type (value)

If the given value is a pattern, returns the string "pattern". Otherwise returns nil.

lpeg.version ()

Returns a string with the running version of LPEG.

Basic Constructions ^

The following operations build patterns. All operations that expect a pattern as an argument may receive also strings, tables, numbers, booleans, or functions, which are translated to patterns according to the rules of function lpeg.P.

lpeg.P (value)

Converts the given value into a proper pattern, according to the following rules:

* If the argument is a pattern, it is returned unmodified.

* If the argument is a string, it is translated to a pattern that matches literally the string.

* If the argument is a number, it is translated as follows. A non-negative number n gives a pattern that matches exactly n characters; a negative number -n gives a pattern that succeeds only if the input string does not have n characters. It is (as expected) equivalent to the unary minus operation (see below) applied over the absolute value of n.

* If the argument is a boolean, the result is a pattern that always succeeds or always fails (according to the boolean value), without consuming any input.

* If the argument is a table, it is interpreted as a grammar (see "Grammars").

* If the argument is a function, returns a pattern equivalent to a match-time capture over the empty string.

If the function is called with parameters s and i, its result is valid if it is in the range [i, len(s) + 1].

NOT YET IMPLEMENTED (see getpatt).

lpeg.R ({range})

Returns a pattern that matches any single character belonging to one of the given ranges. Each range is a string xy of length 2, representing all characters with code between the codes of x and y (both inclusive).

As an example, the pattern lpeg.R("09") matches any digit, and lpeg.R("az", "AZ") matches any ASCII letter.

NOT YET IMPLEMENTED.

lpeg.S (string)

Returns a pattern that matches any single character that appears in the given string. (The S stands for Set.)

As an example, the pattern lpeg.S("+-*/") matches any arithmetic operator.

Note that, if s is a character (that is, a string of length 1), then lpeg.P(s) is equivalent to lpeg.S(s) which is equivalent to lpeg.R(s..s). Note also that both lpeg.S("") and lpeg.R() are patterns that always fail.

NOT YET IMPLEMENTED.

lpeg.V (v)

This operation creates a non-terminal (a variable) for a grammar. The created non-terminal refers to the rule indexed by v in the enclosing grammar. (See "Grammars" for details.)

NOT YET IMPLEMENTED.

#patt

Returns a pattern equivalent to &patt in the original PEG notation. This is a pattern that matches only if the input string does match patt, but without consuming any input, independently of success or failure.

When it succeeds, #patt produces all captures produced by patt.

NOT YET IMPLEMENTED.

-patt

Returns a pattern equivalent to !patt in the original PEG notation. This pattern matches only if the input string does not match patt. It does not consume any input, independently of success or failure.

As an example, the pattern -1 matches only the end of string.

This pattern never produces any captures, because either patt fails or -patt fails. (A failing pattern produces no captures.)

NOT YET IMPLEMENTED (see __sub).

patt1 + patt2

Returns a pattern equivalent to an ordered choice of patt1 and patt2. (This is denoted by patt1 / patt2 in the original PEG notation, not to be confused with the / operation in LPeg.) It matches either patt1 or patt2 (with no backtracking once one of them succeeds). The identity element for this operation is the pattern lpeg.P(false), which always fails.

If both patt1 and patt2 are character sets, this operation is equivalent to set union:

 lower = lpeg.R("az")
 upper = lpeg.R("AZ")
 letter = lower + upper
NOT YET IMPLEMENTED.

patt1 - patt2

Returns a pattern equivalent to !patt2 patt1. This pattern asserts that the input does not match patt2 and then matches patt1.

If both patt1 and patt2 are character sets, this operation is equivalent to set difference. Note that -patt is equivalent to "" - patt (or 0 - patt). If patt is a character set, 1 - patt is its complement.

NOT YET IMPLEMENTED.

patt1 *patt2

Returns a pattern that matches patt1 and then matches patt2, starting where patt1 finished. The identity element for this operation is the pattern lpeg.P(true), which always succeeds.

(LPeg uses the * operator [instead of the more obvious ..] both because it has the right priority and because in formal languages it is common to use a dot for denoting concatenation.)

NOT YET IMPLEMENTED.

patt^n

If n is nonnegative, this pattern is equivalent to pattn patt*. It matches at least n occurrences of patt.

Otherwise, when n is negative, this pattern is equivalent to (patt?)-n. That is, it matches at most -n occurrences of patt.

In particular, patt^0 is equivalent to patt*, patt^1 is equivalent to patt+, and patt^-1 is equivalent to patt? in the original PEG notation.

In all cases, the resulting pattern is greedy with no backtracking. That is, it matches only the longest possible sequence of matches for patt.

NOT YET IMPLEMENTED.

Grammars ^

With the use of Lua variables, it is possible to define patterns incrementally, with each new pattern using previously defined ones. However, this technique does not allow the definition of recursive patterns. For recursive patterns, we need real grammars.

LPeg represents grammars with tables, where each entry is a rule.

The call lpeg.V(v) creates a pattern that represents the nonterminal (or variable) with index v in a grammar. Because the grammar still does not exist when this function is evaluated, the result is an open reference to the respective rule.

A table is fixed when it is converted to a pattern (either by calling lpeg.P or by using it wherein a pattern is expected). Then every open reference created by lpeg.V(v) is corrected to refer to the rule indexed by v in the table.

When a table is fixed, the result is a pattern that matches its initial rule. The entry with index 1 in the table defines its initial rule. If that entry is a string, it is assumed to be the name of the initial rule. Otherwise, LPeg assumes that the entry 1 itself is the initial rule.

As an example, the following grammar matches strings of a's and b's that have the same number of a's and b's:

 equalcount = lpeg.P{
  "S";   -- initial rule name
  S = "a" * lpeg.V"B" + "b" * lpeg.V"A" + "",
  A = "a" * lpeg.V"S" + "b" * lpeg.V"A" * lpeg.V"A",
  B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B",
 } * -1

Captures ^

Captures specify what a match operation should return (the so called semantic information). LPeg offers several kinds of captures, which produces values based on matches and combine them to produce new values.

A capture pattern produces its values every time it succeeds. For instance, a capture inside a loop produces as many values as matched by the loop. A capture produces a value only when it succeeds. For instance, the pattern lpeg.C(lpeg.P"a"^-1) produces the empty string when there is no "a" (because the pattern "a"? succeeds), while the pattern lpeg.C("a")^-1 does not produce any value when there is no "a" (because the pattern "a" fails).

Usually, LPEG evaluates all captures only after (and if) the entire match succeeds. At match time it only gathers enough information to produce the capture values later. As a particularly important consequence, most captures cannot affect the way a pattern matches a subject. The only exception to this rule is the so-called match-time capture. When a match-time capture matches, it forces the immediate evaluation of all its nested captures and then calls its corresponding function, which tells whether the match succeeds and also what values are produced.

lpeg.C (patt)

Creates a simple capture, which captures the substring of the subject that matches patt. The captured value is a string. If patt has other captures, their values are returned after this one.

NOT YET IMPLEMENTED (see capture_aux).

lpeg.Ca (patt)

Creates an accumulator capture. This capture assumes that patt should produce at least one captured value of any kind, which becomes the initial value of an accumulator. Pattern patt then may produce zero or more function captures. Each of these functions in these captures is called having the accumulator as its first argument (followed by any other arguments provided by its own pattern), and the value returned by the function becomes the new value of the accumulator. The final value of this accumulator is the sole result of the whole capture.

As an example, the following pattern matches a list of numbers separated by commas and returns their addition:

 -- matches a numeral and captures its value
 local number = lpeg.R"09"^1 / tonumber
 --
 -- auxiliary function to add two numbers
 local function add (acc, newvalue) return acc + newvalue end
 --
 list = lpeg.Ca(number * ("," * number / add)^0)
 --
 -- example of use
 print(list:match("10,30,43"))   --> 83
NOT YET IMPLEMENTED (see capture_aux).

lpeg.Carg (n)

Creates an argument capture. This pattern matches the empty string and produces the value given as the nth extra argument given in the call to lpeg.match.

NOT YET IMPLEMENTED (see emptycap_aux).

lpeg.Cb (n)

Creates a back capture. This pattern matches the empty string and produces the values produced by the nth previous capture.

Captures are numbered dynamically. So, the first previous capture is the last capture to match before the current one. The numbering includes only complete captures; so, if the back capture is inside another capture, this enclosing capture is ignored (because it is not complete when the back capture is seen). Numbering does not count nested captures. Numbering counts captures, not the values produced by them; it does not matter whether a capture produces zero or many values, it counts as one.

This is an experimental feature. It probably will be changed or even removed in future releases.

NOT YET IMPLEMENTED (see emptycap_aux).

lpeg.Cc ({value})

Creates a constant capture. This pattern matches the empty string and produces all given values as its captured values.

NOT YET IMPLEMENTED.

lpeg.Cp ()

Creates a position capture. It matches the empty string and captures the position in the subject where the match occurs. The captured value is a number.

NOT YET IMPLEMENTED.

lpeg.Cs (patt)

Creates a substitution capture, which captures the substring of the subject that matches patt, with substitutions. For any capture inside patt, the substring that matched the capture is replaced by the capture value (which should be a string). The capture values from patt are not returned independently (only as substrings in the resulting string).

NOT YET IMPLEMENTED (see capture_aux).

lpeg.Ct (patt)

Creates a table capture. This capture creates a table and puts all captures made by patt inside this table in successive integer keys, starting at 1.

The captured value is only this table. The captures made by patt are not returned independently (only as table elements).

NOT YET IMPLEMENTED (see capture_aux).

patt / string

Creates a string capture. It creates a capture string based on string. The captured value is a copy of string, except that the character % works as an escape character: any sequence in string of the form %n, with n between 1 and 9, stands for the match of the n-th capture in patt. (Currently these nested captures can be only simple captures.) The sequence %0 stands for the whole match. The sequence %% stands for a single %.

patt / table

Creates a query capture. It indexes the given table using as key the value of the first capture of patt, or the whole match if patt made no capture. The value at that index is the final value of the capture. If the table does not have that key, there is no captured value. Everything works as if there was no capture.

patt / function

Creates a function capture. It calls the given function passing all captures made by patt as arguments, or the whole match if patt made no capture. The values returned by the function are the final values of the capture. (This capture may create multiple values.) In particular, if function returns no value, there is no captured value; everything works as if there was no capture.

NOT YET IMPLEMENTED (see capture_aux).

lpeg.Cmt (patt, function)

Creates a match-time capture. Unlike all other captures, this one is evaluated immediately when a match occurs. It forces the immediate evaluation of all its nested captures and then calls function.

The function gets as arguments the entire subject, the current position (after the match of patt), plus any capture values produced by patt.

The first value returned by function defines how the match happens. If the call returns a number, the match succeeds and the returned number becomes the new current position. (Assuming a subject s and current position i, the returned number must be in the range [i, len(s) + 1].) If the call returns false, nil, or no value, the match fails.

Any extra values returned by the function become the values produced by the capture.

NOT YET IMPLEMENTED.

Some Examples ^

http://www.inf.puc-rio.br/~roberto/lpeg.html#ex

LINKS ^

Parsing Expression Grammars

http://pdos.csail.mit.edu/%7Ebaford/packrat/

Wikipedia Entry for PEG

http://en.wikipedia.org/wiki/Parsing_expression_grammar

Parsing Expression Grammars: A Recognition-Based Syntactic Foundation

http://pdos.csail.mit.edu/%7Ebaford/packrat/popl04/peg-popl04.pdf

AUTHORS ^

Francois Perrad


parrot