parrotcode: Parrot Strings | |
Contents | C |
src/string.c - Parrot Strings
This file implements the non-ICU parts of the Parrot string subsystem.
Note that bufstart
and buflen
are used by the memory subsystem.
The string functions may only use buflen
to determine,
if there is some space left beyond bufused
.
This is the only valid usage of these two data members,
beside setting bufstart
/buflen
for external strings.
Parrot_unmake_COW
Parrot_make_COW_reference
Parrot_reuse_COW_reference
string_set
Creation, enlargement, etc.
string_init
string_deinit
string_capacity
string_make_empty
string_rep_compatible
ascii <op> utf8 => utf8
=> ascii, B<if> C<STRING *b> has ascii chars only.
string_append
string_from_cstring
string_primary_encoding_for_representation
const_string
string_make
len
bytes of string data read from buffer
.charset_name
specifies the string's representation. The currently recognised values are: 'iso-8859-1'
'ascii'
'binary'
'unicode'
unicode
implies the utf-8
encoding, and the other three assume fixed-8
encoding.charset
is unspecified the default charset 'ascii' will be used.flags
is optionally one or more PObj_*
flags OR
-ed together.string_grow
string_length
string_index
string_str_index
start
. The return value is a (0 based) offset in characters, not bytes. If second string is not specified, then return -1.string_ord
string_chr
string_copy
string_compute_strlen
string_max_bytes
string_concat
NULL
, then a copy of the non-NULL
string is returned. If both strings are NULL
, then a new zero-length string is created and returned.string_repeat
string_substr
length
from offset
from the specified Parrot string and stores it in **d
, allocating memory if necessary. The substring is also returned.string_replace
substr EXPR, OFFSET, LENGTH, REPLACEMENT
length
characters from offset
in the first Parrot string with the second Parrot string, returning what was replaced.string_chopn
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The returned string is a copy of the one passed in.string_chopn_inplace
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The string passed in is modified and returned.string_equal
make_writable
len
. The representation
argument is required in case a new Parrot string has to be created.nonnull_encoding_name(STRING *s)
real_exception
to print the exception message could potentially be null.string_bitwise_and
AND
on two Parrot string, performing type and encoding conversions if necessary. If the second string is not NULL
then it is reused, otherwise a new Parrot string is created.string_bitwise_or
OR
on two Parrot strings, performing type and encoding conversions if necessary. If the third string is not NULL
then it is reused, otherwise a new Parrot string is created.string_bitwise_xor
XOR
on two Parrot strings, performing type and encoding conversions if necessary. If the second string is not NULL
then it is reused, otherwise a new Parrot string is created.string_bitwise_not
NOT
on a Parrot string. If the second string is not NULL
then it is reused, otherwise a new Parrot string is created.string_bool
0
, ""
or "0"
.string_nprintf
Parrot_snprintf()
except that it writes to and returns a Parrot string.bytelen
does not include space for a (non-existent) trailing '\0'
. dest
may be a NULL
pointer, in which case a new native string will be created. If bytelen
is 0, the behaviour becomes more sprintf
-ish than snprintf
-like. bytelen
is measured in the encoding of *dest
.string_printf
string_to_int
sign = '+' | '-'
digit = "Any code point considered a digit by the chartype"
indicator = 'e' | 'E'
digits = digit [digit]...
decimal-part = digits '.' [digits] | ['.'] digits
exponent-part = indicator [sign] digits
numeric-string = [sign] decimal-part [exponent-part]
string_to_num
string_to_int()
except that a floating-point value is returned.string_from_int
string_from_num
string_to_cstring
string_cstring_free()
to free the string. Failure to do this will result in a memory leak.string_cstring_free
string_to_cstring()
.string_pin
string_unpin
string_pin()
so that the string once again uses managed memory.string_hash
s->hashval
.string_escape_string
string_unescape_cstring
can handle are esacped as \x, as well as a double quote character. Other control chars and codepoints < 0x100 are escaped as \xhh, codepoints up to 0xffff, as \uhhhh, and codepoints greater than this as \x{hh...hh}.string_escape_string_delimited
string_unescape_cstring
\xhh 1..2 hex digits
\ooo 1..3 oct digits
\cX control char X
\x{h..h} 1..8 hex digits
\uhhhh 4 hex digits
\Uhhhhhhhh 8 hex digits
\a, \b, \t, \n, \v, \f, \r, \e
string_upcase
string_upcase_inplace
string_downcase
string_downcase_inplace
string_titlecase
string_titlecase_inplace
string_increment
Parrot_string_cstring
Parrot_string_is_cclass
s
at given offset is in the given character class flags
. See also include/parrot/cclass.h for possible character classes. Returns 0 otherwise, or if the string is empty or NULL.Parrot_string_trans_charset
dest
== NULL, converts src
to the given charset or encoding inplace, else returns a copy of src
with the charset/encoding in dest.Parrot_string_trans_encoding
dest
== NULL, converts src
to the given charset or encoding inplace, else returns a copy of src
with the charset/encoding in dest.uint_to_str
num
converted to a Parrot STRING
.base
must be defined, a default of 10 is not assumed. The caller has to verify that base >= 2 && base <= 36
The buffer tc
must be at least sizeof (UHUGEINTVAL)*8 + 1
chars big.minus
is true then -
is prepended to the string representation.int_to_str
num
converted to a Parrot STRING
.base
must be defined, a default of 10 is not assumed.num < 0
then -
is prepended to the string representation.
|