parrotcode: Parrot Strings | |
Contents | C |
src/string.c - Parrot Strings
This file implements the non-ICU parts of the Parrot string subsystem.
Note that bufstart
and buflen
are used by the memory subsystem.
The string functions may only use buflen
to determine if there is some space left beyond bufused
.
This is the only valid usage of these two data members,
beside setting bufstart
/buflen
for external strings.
void Parrot_unmake_COW
STRING *Parrot_make_COW_reference
STRING *Parrot_reuse_COW_reference
STRING *string_set
void string_free
Creation, enlargement, etc.
void string_init
void string_deinit
UINTVAL string_capacity
STRING *string_make_empty
const CHARSET *string_rep_compatible
ascii <op> utf8 => utf8 => ascii, B<if> C<STRING *b> has ascii chars only.Returs NULL, if no compatible string representation can be found.
STRING *string_concat
NULL
, then a copy of the non-NULL
string is returned. If both strings are NULL
, then a new zero-length string is created and returned.
STRING *string_append
STRING *string_from_cstring
const char *string_primary_encoding_for_representation
STRING *const_string
STRING *string_make
len
bytes of string data read from buffer
.The value of charset_name
specifies the string's representation. The currently recognised values are:
'iso-8859-1' 'ascii' 'binary' 'unicode'The encoding is implicitly guessed;
unicode
implies the utf-8
encoding, and the other three assume fixed-8
encoding.If charset
is unspecified, the default charset 'ascii' will be used.The value of flags
is optionally one or more PObj_*
flags OR
-ed together.
STRING *string_make_direct
STRING *string_grow
UINTVAL string_length
INTVAL string_index
INTVAL string_str_index
start
. The return value is a (0 based) offset in characters, not bytes. If second string is not found in the first string, returns -1.
INTVAL string_ord
STRING *string_chr
STRING *string_copy
INTVAL string_compute_strlen
INTVAL string_max_bytes
STRING *string_repeat
STRING *string_substr
length
from offset
from the specified Parrot string and stores it in **d
, allocating memory if necessary. The substring is also returned.
STRING *string_replace
length
characters from offset
in the first Parrot string with the second Parrot string, returning what was replaced.This follows the Perl semantics for:
substr EXPR, OFFSET, LENGTH, REPLACEMENTReplacing a sequence of characters with a longer string grows the string; a shorter string shrinks it.Replacing 2 past the end of the string is undefined. However replacing 1 past the end of the string concatenates the two strings.A negative offset is allowed to replace from the end.
STRING *string_chopn
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The returned string is a copy of the one passed in.
void string_chopn_inplace
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The string passed in is modified and returned.
INTVAL string_compare
INTVAL string_equal
static void make_writable
len
. The representation
argument is required in case a new Parrot string has to be created.
STRING *string_bitwise_and
AND
on two Parrot string, performing type and encoding conversions if necessary. If the second string is not NULL
then it is reused. Otherwise a new Parrot string is created.
STRING *string_bitwise_or
OR
on two Parrot strings, performing type and encoding conversions if necessary. If the third string is not NULL
, then it is reused. Otherwise a new Parrot string is created.
STRING *string_bitwise_xor
XOR
on two Parrot strings, performing type and encoding conversions if necessary. If the second string is not NULL
, then it is reused. Otherwise a new Parrot string is created.
STRING *string_bitwise_not
NOT
on a Parrot string. If the second string is not NULL
then it is reused, otherwise a new Parrot string is created.
INTVAL string_bool
0
, ""
or "0"
.
STRING *string_printf
INTVAL string_to_int
sign = '+' | '-' digit = "Any code point considered a digit by the chartype" indicator = 'e' | 'E' digits = digit [digit]... decimal-part = digits '.' [digits] | ['.'] digits exponent-part = indicator [sign] digits numeric-string = [sign] decimal-part [exponent-part]The integer value is the appropriate integer representation of such a number, rounding towards zero.
FLOATVAL string_to_num
STRING *string_from_int
STRING *string_from_num
char *string_to_cstring
string_cstring_free()
to free the string. Failure to do this will result in a memory leak.
char *string_to_cstring_nullable
string_cstring_free()
to free the string. Failure to do this will result in a memory leak.
void string_cstring_free
string_to_cstring()
.TODO - Hopefully this can go away at some point, as it's got all sorts of leak potential otherwise.
void string_pin
void string_unpin
string_pin()
so that the string once again uses managed memory.
size_t string_hash
s->hashval
.
STRING *string_escape_string
string_unescape_cstring
can handle are escaped as \x, as well as a double quote character. Other control chars and codepoints < 0x100 are escaped as \xhh, codepoints up to 0xffff, as \uhhhh, and codepoints greater than this as \x{hh...hh}.
STRING *string_escape_string_delimited
STRING *string_unescape_cstring
\xhh 1..2 hex digits \ooo 1..3 oct digits \cX control char X \x{h..h} 1..8 hex digits \uhhhh 4 hex digits \Uhhhhhhhh 8 hex digits \a, \b, \t, \n, \v, \f, \r, \e
STRING *string_upcase
void string_upcase_inplace
STRING *string_downcase
void string_downcase_inplace
STRING *string_titlecase
void string_titlecase_inplace
STRING *string_increment
const char *Parrot_string_cstring
INTVAL Parrot_string_is_cclass
s
at given offset is in the given character class flags
. See also include/parrot/cclass.h for possible character classes. Returns 0 otherwise, or if the string is empty or NULL.
INTVAL Parrot_string_find_cclass
flags
in the string, and returns its glyph-wise index.
INTVAL Parrot_string_find_not_cclass
flags
in the string starting from offset
and looking at count
positions, and returns its glyph-wise index. Returns offset + count
, if not found.
STRING *Parrot_string_trans_charset
dest
== NULL, converts src
to the given charset or encoding inplace. Otherwise returns a copy of src
with the charset/encoding in dest
.
STRING *Parrot_string_trans_encoding
dest
== NULL, converts src
to the given charset or encoding in place. Otherwise returns a copy of src
with the charset/encoding in dest
STRING *string_compose
STRING *string_join
ar
as strings with the string j
between them, returning the result.
PMC *string_split
str
at the delimiter delim
, returning a ResizableStringArray
of results.
PMC *Parrot_string_split
STRING *uint_to_str
num
converted to a Parrot STRING
.Note that base
must be defined (a default of 10 is not assumed). The caller has to verify that base >= 2 && base <= 36
The buffer tc
must be at least sizeof (UHUGEINTVAL)*8 + 1
chars big.If minus
is true, then -
is prepended to the string representation.
STRING *int_to_str
num
converted to a Parrot STRING
.Note that base
must be defined (a default of 10 is not assumed).If num < 0
, then -
is prepended to the string representation.
|