Convert the native perl6 compiler's representation of a regex parse tree into languages/regex's format.
- Convert a P6C parse tree to a regex tree.
Main entry point.
Calls convert() to do the dirty work.
- Recursively perform the conversion.
For any node type
calls a method named
- Grab the parse tree out of a P6C::rule and convert it,
then wrap it in a scan node unless there was a
^ within it.
since there are probably ways of having a
^ that does not apply to the entire expression.
- P6C::rx_alt(T) -> Regex::Ops::Tree::alternate(branches of T)
- P6C::rx_seq(T) -> Regex::Ops::Tree::seq(elements of T)
- Convert P6C::rx_atom(T) depending on what kind of atom T is.
A P6C::rx_val is converted directly; an ARRAY is treated as a code block; if T has a type field that is of type
then it is matched as an array literal (as if it were an alternation of each of its elements); otherwise,
it is assumed to be a string.
- This also plays around with group captures.
It increments the group id ($1 -> $2 etc.) Which is overly simplistic and stupid.
- Utility routine to extract an integer value out of a P6C tree.
- P6C::rx_repeat(T) -> Regex::Ops::Tree::multi_match(T.min,
- Convert a metacharacter (backslashed character).
Things like \s and \x34 and things.
- Convert P6C::rx_any,
which represents .
in a regex,
which just means to skip ahead one character.
- Convert zero-width assertion.
I think the
$ implementation may be fine.
Doesn't implement anything else.
- Generate an inclusion/exclusion list out of a string representing a character class.
An inc/exc list L is a sequence of code points representing a character class,
which can also be thought of as a set of code points.
Anything less than the first element L is not in the set; anything equal to the L but less than or equal to L is in the set; anything greater than L but less than or equal to L is not in the set,
() - the empty set
(0) - the universal set
(5) - anything 5 or greater
(2,4) - 2 or 3
- FIXME: makes no attempt to handle unicode
- Generate an inclusion/exclusion list from a single code point. Unless it is negated, it is kind of silly to use this instead of a simple 'match' op.
- Convert the P6C compiler's notion of a character class into languages/regex's.
- Placeholder for assertions.
$tree - A P6C::rx_call object representing a call to a nested rule within a regex match tree
- Matching a literal value by breaking it up into individual characters. Which seems pretty stupid at the time I'm documenting this, considering that I have code to match a whole string. Oh well; with optimization, it should boil down to pretty much the same thing. And maybe there's some brilliant reason why I chose to do it this way instead. (But I doubt it; I probably just did this one first.)