P6C::IMCC::ExtRegex::Adapter ^

Convert the native perl6 compiler's representation of a regex parse tree into languages/regex's format.

convert_p6tree

Convert a P6C parse tree to a regex tree. Main entry point. Calls convert() to do the dirty work.

convert

Recursively perform the conversion. For any node type TYPE, calls a method named convert_TYPE.

convert_rule

Grab the parse tree out of a P6C::rule and convert it, then wrap it in a scan node unless there was a ^ within it. Probably buggy, since there are probably ways of having a ^ that does not apply to the entire expression.

convert_rx_alt

P6C::rx_alt(T) -> Regex::Ops::Tree::alternate(branches of T)

convert_rx_seq

P6C::rx_seq(T) -> Regex::Ops::Tree::seq(elements of T)

convert_rx_atom

Convert P6C::rx_atom(T) depending on what kind of atom T is. A P6C::rx_val is converted directly; an ARRAY is treated as a code block; if T has a type field that is of type PerlArray, then it is matched as an array literal (as if it were an alternation of each of its elements); otherwise, it is assumed to be a string.

This also plays around with group captures. It increments the group id ($1 -> $2 etc.) Which is overly simplistic and stupid.

intvalue

Utility routine to extract an integer value out of a P6C tree.

convert_rx_repeat

P6C::rx_repeat(T) -> Regex::Ops::Tree::multi_match(T.min, T.max, T.greedyflag, T.expr)

convert_rx_meta

Convert a metacharacter (backslashed character). Things like \s and \x34 and things. Fun stuff. Lots missing.

convert_rx_any

Convert P6C::rx_any, which represents . in a regex, which just means to skip ahead one character.

convert_rx_any

Convert zero-width assertion. Currently mishandles ^ and $. Actually, I think the $ implementation may be fine. Doesn't implement anything else.

string_to_incexc

Generate an inclusion/exclusion list out of a string representing a character class. An inc/exc list L is a sequence of code points representing a character class, which can also be thought of as a set of code points. Anything less than the first element L[0] is not in the set; anything equal to the L[0] but less than or equal to L[1] is in the set; anything greater than L[1] but less than or equal to L[2] is not in the set, etc.

Examples:

  ()    - the empty set
  (0)   - the universal set
  (5)   - anything 5 or greater
  (2,4) - 2 or 3
FIXME: makes no attempt to handle unicode

ord_to_incexc

Generate an inclusion/exclusion list from a single code point. Unless it is negated, it is kind of silly to use this instead of a simple 'match' op.

convert_rx_oneof

Convert the P6C compiler's notion of a character class into languages/regex's.

convert_rx_assertion

Placeholder for assertions.

convert_rx_call

Argument: $tree - A P6C::rx_call object representing a call to a nested rule within a regex match tree

convert_sv_literal

Matching a literal value by breaking it up into individual characters. Which seems pretty stupid at the time I'm documenting this, considering that I have code to match a whole string. Oh well; with optimization, it should boil down to pretty much the same thing. And maybe there's some brilliant reason why I chose to do it this way instead. (But I doubt it; I probably just did this one first.)


parrot