parrotcode: Parsing Expression Grammar for Lua, version 0.9 | |
Contents | Language Implementations | Lua |
lib/lpeg.pir - Parsing Expression Grammar for Lua, version 0.9
See original on http://www.inf.puc-rio.br/~roberto/lpeg.html
See on http://www.inf.puc-rio.br/~roberto/lpeg.html#intro
lpeg.match (pattern, subject [, init])
init
makes the match starts at that position in the subject string.
As usual in Lua libraries,
a negative value counts from the end.Unlike typical pattern-matching functions,
match
works only in anchored mode; that is,
it tries to match the pattern with a prefix of the given subject string (at position init
),
not with an arbitrary substring of the subject.
So,
if we want to find a pattern anywhere in a string,
we must either write a loop in Lua or write a pattern that matches anywhere.
This second approach is easy and quite efficient; see examples.NOT YET IMPLEMENTED.
lpeg.print (pattern)
lpeg.span (string)
lpeg.type (value)
"pattern"
.
Otherwise returns nil.
lpeg.version ()
The following operations build patterns.
All operations that expect a pattern as an argument may receive also strings,
tables,
numbers,
booleans,
or functions,
which are translated to patterns according to the rules of function lpeg.P
.
lpeg.P (value)
lpeg.R ({range})
range
is a string xy of length 2,
representing all characters with code between the codes of x and y (both inclusive).As an example,
the pattern lpeg.R("09")
matches any digit,
and lpeg.R("az", "AZ")
matches any ASCII letter.NOT YET IMPLEMENTED.
lpeg.S (string)
S
stands for Set.)As an example,
the pattern lpeg.S("+-*/")
matches any arithmetic operator.Note that,
if s is a character (that is,
a string of length 1),
then lpeg.P(s)
is equivalent to lpeg.S(s)
which is equivalent to lpeg.R(s..s)
.
Note also that both lpeg.S("")
and lpeg.R()
are patterns that always fail.NOT YET IMPLEMENTED.
lpeg.V (v)
v
in the enclosing grammar.
(See "Grammars" for details.)NOT YET IMPLEMENTED.
locale ([table])
alnum
,
alpha
,
cntrl
,
digit
,
graph
,
lower
,
print
,
punct
,
space
,
upper
,
and xdigit
,
each one containing a correspondent pattern.
Each pattern matches any single character that belongs to its class.If called with an argument table
,
then it creates those fields inside the given table and returns that table.NOT YET IMPLEMENTED.
#patt
patt
,
but without consuming any input,
independently of success or failure.
(This pattern is equivalent to &patt in the original PEG notation.)When it succeeds,
#patt
produces all captures produced by patt
.NOT YET IMPLEMENTED.
-patt
patt
.
It does not consume any input,
independently of success or failure.
(This pattern is equivalent to !patt in the original PEG notation.)As an example,
the pattern -lpeg.P(1)
matches only the end of string.This pattern never produces any captures,
because either patt
fails or -patt
fails.
(A failing pattern never produces captures.)NOT YET IMPLEMENTED.
patt1 + patt2
patt1
and patt2
.
(This is denoted by patt1 / patt2 in the original PEG notation,
not to be confused with the /
operation in LPeg.) It matches either patt1
or patt2
,
with no backtracking once one of them succeeds.
The identity element for this operation is the pattern lpeg.P(false)
,
which always fails.If both patt1
and patt2
are character sets,
this operation is equivalent to set union:
lower = lpeg.R("az") upper = lpeg.R("AZ") letter = lower + upperNOT YET IMPLEMENTED.
patt1 - patt2
patt2
and then matches patt1
.If both patt1
and patt2
are character sets, this operation is equivalent to set difference. Note that -patt
is equivalent to "" - patt
(or 0 - patt
). If patt
is a character set, 1 - patt
is its complement.NOT YET IMPLEMENTED.
patt1 *patt2
patt1
and then matches patt2
, starting where patt1
finished. The identity element for this operation is the pattern lpeg.P(true)
, which always succeeds.(LPeg uses the *
operator [instead of the more obvious ..
] both because it has the right priority and because in formal languages it is common to use a dot for denoting concatenation.)NOT YET IMPLEMENTED.
patt^n
n
is nonnegative, this pattern is equivalent to pattn patt*. It matches at least n
occurrences of patt
.Otherwise, when n
is negative, this pattern is equivalent to (patt?)-n. That is, it matches at most -n
occurrences of patt
.In particular, patt^0
is equivalent to patt*, patt^1
is equivalent to patt+, and patt^-1
is equivalent to patt? in the original PEG notation.In all cases, the resulting pattern is greedy with no backtracking. That is, it matches only the longest possible sequence of matches for patt
.In all cases, the resulting pattern is greedy with no backtracking (also called a possessive repetition). That is, it matches only the longest possible sequence of matches for patt
.NOT YET IMPLEMENTED.With the use of Lua variables, it is possible to define patterns incrementally, with each new pattern using previously defined ones. However, this technique does not allow the definition of recursive patterns. For recursive patterns, we need real grammars.
LPeg represents grammars with tables, where each entry is a rule.
The call lpeg.V(v)
creates a pattern that represents the nonterminal (or variable) with index v
in a grammar. Because the grammar still does not exist when this function is evaluated, the result is an open reference to the respective rule.
A table is fixed when it is converted to a pattern (either by calling lpeg.P
or by using it wherein a pattern is expected). Then every open reference created by lpeg.V(v)
is corrected to refer to the rule indexed by v
in the table.
When a table is fixed, the result is a pattern that matches its initial rule. The entry with index 1 in the table defines its initial rule. If that entry is a string, it is assumed to be the name of the initial rule. Otherwise, LPeg assumes that the entry 1 itself is the initial rule.
As an example, the following grammar matches strings of a's and b's that have the same number of a's and b's:
equalcount = lpeg.P{ "S"; -- initial rule name S = "a" * lpeg.V"B" + "b" * lpeg.V"A" + "", A = "a" * lpeg.V"S" + "b" * lpeg.V"A" * lpeg.V"A", B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B", } * -1
Captures specify what a match operation should return (the so called semantic information). LPeg offers several kinds of captures, which produces values based on matches and combine them to produce new values.
A capture pattern produces its values every time it succeeds. For instance, a capture inside a loop produces as many values as matched by the loop. A capture produces a value only when it succeeds. For instance, the pattern lpeg.C(lpeg.P"a"^-1)
produces the empty string when there is no "a"
(because the pattern "a"?
succeeds), while the pattern lpeg.C("a")^-1
does not produce any value when there is no "a"
(because the pattern "a"
fails).
Usually, LPEG evaluates all captures only after (and if) the entire match succeeds. At match time it only gathers enough information to produce the capture values later. As a particularly important consequence, most captures cannot affect the way a pattern matches a subject. The only exception to this rule is the so-called match-time capture. When a match-time capture matches, it forces the immediate evaluation of all its nested captures and then calls its corresponding function, which tells whether the match succeeds and also what values are produced.
lpeg.C (patt)
patt
. The captured value is a string. If patt
has other captures, their values are returned after this one.NOT YET IMPLEMENTED (see capture_aux).
lpeg.Carg (n)
lpeg.match
.NOT YET IMPLEMENTED.
lpeg.Cb (name)
name
.Most recent means the last complete outermost group capture with the given name. A Complete capture means that the entire pattern corresponding to the capture has matched. An Outermost capture means that the capture is not inside another complete capture.NOT YET IMPLEMENTED.
lpeg.Cc ({value})
lpeg.Cf (patt, func)
patt
produces a list of captures C1 C2 ... Cn, this capture will produce the value func(...func(func(C1, C2), C3)..., Cn), that is, it will fold (or accumulate, or reduce) the captures from patt
using function <func>.This capture assumes that patt
should produce at least one capture with at least one value (of any type), which becomes the initial value of an accumulator. (If you need a specific initial value, you may prefix a constant capture to patt
.) For each subsequent capture LPeg calls func
with this accumulator as the first argument and all values produced by the capture as extra arguments; the value returned by this call becomes the new value for the accumulator. The final value of the accumulator becomes the captured value.As an example, the following pattern matches a list of numbers separated by commas and returns their addition:
-- matches a numeral and captures its value number = lpeg.R"09"^1 / tonumber -- -- matches a list of numbers, captures their values list = number * ("," * number)^0 -- -- auxiliary function to add two numbers function add (acc, newvalue) return acc + newvalue end -- -- folds the list of numbers adding them sum = lpeg.Cf(list, add) -- -- example of use print(sum:match("10,30,43")) --> 83NOT YET IMPLEMENTED (see capture_aux).
lpeg.Cg (patt [, name])
patt
into a single capture. The group may be anonymous (if no name is given) or named with the given name.An anonymous group serves to join values from several captures into a single capture. A named group has a different behavior. In most situations, a named group returns no values at all. Its values are only relevant for a following back capture or when used inside a table capture.NOT YET IMPLEMENTED (see capture_aux).
lpeg.Cp ()
lpeg.Cs (patt)
patt
, with substitutions. For any capture inside patt
with a value, the substring that matched the capture is replaced by the capture value (which should be a string). The final captured value is the string resulting from all replacements.NOT YET IMPLEMENTED (see capture_aux).
lpeg.Ct (patt)
patt
inside this table in successive integer keys, starting at 1. Moreover, for each named capture group created by patt
, the first value of the group is put into the table with the group name as its key. The captured value is only the table.NOT YET IMPLEMENTED (see capture_aux).
patt / string
string
. The captured value is a copy of string
, except that the character %
works as an escape character: any sequence in string
of the form %n, with n between 1 and 9, stands for the match of the n-th capture in patt
. The sequence %0
stands for the whole match. The sequence %%
stands for a single %
.
patt / table
patt
, or the whole match if patt
produced no value. The value at that index is the final value of the capture. If the table does not have that key, there is no captured value.
patt / function
patt
as arguments, or the whole match if patt
made no capture. The values returned by the function
are the final values of the capture. In particular, if function
returns no value, there is no captured value.NOT YET IMPLEMENTED (see capture_aux).
lpeg.Cmt (patt, function)
function
.The function gets as arguments the entire subject, the current position (after the match of patt
), plus any capture values produced by patt
.The first value returned by function
defines how the match happens. If the call returns a number, the match succeeds and the returned number becomes the new current position. (Assuming a subject s and current position i, the returned number must be in the range [i, len(s) + 1].) If the call returns false, nil, or no value, the match fails.Any extra values returned by the function become the values produced by the capture.NOT YET IMPLEMENTED.http://www.inf.puc-rio.br/~roberto/lpeg.html#ex
|