Convert the native perl6 compiler's representation of a regex parse tree into languages/regex's format.
- convert_p6tree
- Convert a P6C parse tree to a regex tree.
Main entry point.
Calls convert() to do the dirty work.
- convert
- Recursively perform the conversion.
For any node type
TYPE
,
calls a method named convert_TYPE
.
- convert_rule
- Grab the parse tree out of a P6C::rule and convert it,
then wrap it in a scan node unless there was a
^
within it.
Probably buggy,
since there are probably ways of having a ^
that does not apply to the entire expression.
- convert_rx_alt
- P6C::rx_alt(T) -> Regex::Ops::Tree::alternate(branches of T)
- convert_rx_seq
- P6C::rx_seq(T) -> Regex::Ops::Tree::seq(elements of T)
- convert_rx_atom
- Convert P6C::rx_atom(T) depending on what kind of atom T is.
A P6C::rx_val is converted directly; an ARRAY is treated as a code block; if T has a type field that is of type
PerlArray
,
then it is matched as an array literal (as if it were an alternation of each of its elements); otherwise,
it is assumed to be a string.
- This also plays around with group captures.
It increments the group id ($1 -> $2 etc.) Which is overly simplistic and stupid.
- intvalue
- Utility routine to extract an integer value out of a P6C tree.
- convert_rx_repeat
- P6C::rx_repeat(T) -> Regex::Ops::Tree::multi_match(T.min,
T.max,
T.greedyflag,
T.expr)
- convert_rx_meta
- Convert a metacharacter (backslashed character).
Things like \s and \x34 and things.
Fun stuff.
Lots missing.
- convert_rx_any
- Convert P6C::rx_any,
which represents .
in a regex,
which just means to skip ahead one character.
- convert_rx_any
- Convert zero-width assertion.
Currently mishandles
^
and $
.
Actually,
I think the $
implementation may be fine.
Doesn't implement anything else.
- string_to_incexc
- Generate an inclusion/exclusion list out of a string representing a character class.
An inc/exc list L is a sequence of code points representing a character class,
which can also be thought of as a set of code points.
Anything less than the first element L[0] is not in the set; anything equal to the L[0] but less than or equal to L[1] is in the set; anything greater than L[1] but less than or equal to L[2] is not in the set,
etc.
- Examples:
() - the empty set
(0) - the universal set
(5) - anything 5 or greater
(2,4) - 2 or 3
- FIXME: makes no attempt to handle unicode
- ord_to_incexc
- Generate an inclusion/exclusion list from a single code point. Unless it is negated, it is kind of silly to use this instead of a simple 'match' op.
- convert_rx_oneof
- Convert the P6C compiler's notion of a character class into languages/regex's.
- convert_rx_assertion
- Placeholder for assertions.
- convert_rx_call
- Argument:
$tree
- A P6C::rx_call object representing a call to a nested rule within a regex match tree
- convert_sv_literal
- Matching a literal value by breaking it up into individual characters. Which seems pretty stupid at the time I'm documenting this, considering that I have code to match a whole string. Oh well; with optimization, it should boil down to pretty much the same thing. And maybe there's some brilliant reason why I chose to do it this way instead. (But I doubt it; I probably just did this one first.)