NAME
src/string/api.c - Parrot Strings
DESCRIPTION
This file implements the non-ICU parts of the Parrot string subsystem.
Note that bufstart
and buflen
are used by the memory subsystem.
The string functions may only use buflen
to determine if there is some space left beyond bufused
.
This is the only valid usage of these two data members,
beside setting bufstart
/buflen
for external strings.
Functions
void Parrot_str_write_COW
If the specified Parrot string is copy-on-write then the memory is copied over and the copy-on-write flag is cleared.
STRING *Parrot_str_new_COW
Creates a copy-on-write string,
cloning a string header without allocating a new buffer.
STRING *Parrot_str_reuse_COW
Creates a copy-on-write string by cloning a string header without allocating a new buffer.
Doesn't allocate a new string header,
instead using the one passed in and returns it.
STRING *Parrot_str_set
Makes the contents of first Parrot string a copy of the contents of second.
void Parrot_str_free
Frees the given STRING's header,
accounting for reference counts for the STRING's buffer &c.
Use this only if you know that nothing else has stored the STRING elsewhere.
Basic String Functions
Creation, enlargement, etc.
void Parrot_str_init
Initializes the Parrot string subsystem.
void Parrot_str_finish
De-Initializes the Parrot string subsystem.
UINTVAL string_capacity
Returns the capacity of the specified Parrot string in bytes,
that is how many bytes can be appended onto strstart.
STRING *Parrot_str_new_noinit
Creates and returns an empty Parrot string.
const CHARSET *string_rep_compatible
Find the "lowest" possible charset and encoding for the given string.
E.g.
STRING *Parrot_str_concat
Concatenates two Parrot strings. If necessary, converts the second string's encoding and/or type to match those of the first string. If either string is STRING *Parrot_str_append
Take in two Parrot strings and append the second to the first. NOTE THAT RETURN VALUE MAY NOT BE THE FIRST STRING, if the first string is COW'd or read-only. So make sure to _use_ the return value.
STRING *Parrot_str_new
Make a Parrot string from a specified C string.
const char *string_primary_encoding_for_representation
Returns the primary encoding for the specified representation.This is needed for packfile unpacking, unless we just always use UTF-8 or BOCU.
STRING *Parrot_str_new_constant
Creates and returns a constant Parrot string.
STRING *string_make
Creates and returns a new Parrot string using STRING *Parrot_str_new_init
Given a buffer, its length, an encoding, a character set, and STRING flags, creates and returns a new string. Don't call this directly.
STRING *Parrot_str_resize
Grows the Parrot string's buffer by the specified number of characters.
ascii <op> utf8 => utf8 => ascii, B<if> C<STRING *b> has ascii chars only.Returs NULL, if no compatible string representation can be found.
NULL
, then a copy of the non-NULL
string is returned. If both strings are NULL
, then a new zero-length string is created and returned.
len
bytes of string data read from buffer
.The value of charset_name
specifies the string's representation. The currently recognised values are:
'iso-8859-1' 'ascii' 'binary' 'unicode'The encoding is implicitly guessed;
unicode
implies the utf-8
encoding, and the other three assume fixed-8
encoding.If charset
is unspecified, the default charset 'ascii' will be used.The value of flags
is optionally one or more PObj_*
flags OR
-ed together.
Ordinary user-visible string operations
UINTVAL Parrot_str_byte_length
Returns the number of characters in the specified Parrot string.
INTVAL Parrot_str_indexed
Returns the character (or glyph, depending upon the string's encoding). This abstracts the process of finding the Nth character in a (possibly Unicode or JIS-encoded) string, the idea being that once the encoding functions are fleshed out, this function can do the right thing.Note that this is not range-checked.
INTVAL Parrot_str_find_index
Returns the character position of the second Parrot string in the first at or after INTVAL string_ord
Returns the codepoint at a given index into a string. Negative indexes are treated as counting from the end of the string.
STRING *string_chr
Returns a single-character Parrot string.TODO - Allow this to take an array of characters?
STRING *Parrot_str_copy
Creates and returns a copy of the specified Parrot string.
start
. The return value is a (0 based) offset in characters, not bytes. If second string is not found in the first string, returns -1.
Vtable Dispatch Functions
INTVAL Parrot_str_length
Calculates and returns the number of characters in the specified Parrot string.
INTVAL string_max_bytes
Returns the number of bytes required to safely contain the specified number of characters in the specified Parrot string's representation.
STRING *Parrot_str_repeat
Repeats the specified Parrot string num times and returns the result.
STRING *Parrot_str_substr
Copies the substring of length STRING *Parrot_str_replace
Replaces a sequence of STRING *Parrot_str_chopn
Removes the last void Parrot_str_chopn_inplace
Removes the last INTVAL Parrot_str_compare
Compares two strings to each other. If s1 is less than s2, returns -1. If the strings are equal, returns 0. If s1 is greater than s2, returns 2. This comparison uses the character set collation order of the strings for comparison.
INTVAL Parrot_str_not_equal
Compares two Parrot strings, performing type and encoding conversions if necessary. Returns 1 if the strings are not equal, and 0 otherwise.
INTVAL Parrot_str_equal
Compares two Parrot strings, performing type and encoding conversions if necessary.Returns 1 if the strings are equal, and 0 otherwise.
static void make_writable
Makes the specified Parrot string writable with minimum length STRING *Parrot_str_bitwise_and
Performs a bitwise STRING *Parrot_str_bitwise_or
Performs a bitwise STRING *Parrot_str_bitwise_xor
Performs a bitwise STRING *Parrot_str_bitwise_not
Performs a bitwise INTVAL Parrot_str_boolean
Returns whether the specified Parrot string is true. A string is true if it is equal to anything other than STRING *Parrot_str_format_data
Writes and returns a Parrot string.
INTVAL Parrot_str_to_int
Converts a numeric Parrot string to an integer value.A number is such that:
FLOATVAL Parrot_str_to_num
Converts a numeric Parrot STRING to a floating point number.
STRING *Parrot_str_from_int
Returns a Parrot string representation of the specified integer value.
STRING *Parrot_str_from_num
Returns a Parrot string representation of the specified floating-point value.
char *Parrot_str_to_cstring
Returns a C string for the specified Parrot string. Use char *string_to_cstring_nullable
Returns a C string for the specified Parrot string. Use void Parrot_str_free_cstring
Free a string created by void Parrot_str_pin
Replaces the specified Parrot string's managed buffer memory by system memory.
void Parrot_str_unpin
Undoes a size_t Parrot_str_to_hashval
Returns the hash value for the specified Parrot string, caching it in STRING *Parrot_str_escape
Escapes all non-ASCII chars to backslash sequences. Control chars that STRING *Parrot_str_escape_truncate
Escapes all non-ASCII characters in the given string with backslashed versions, but limits the length of the output (used for trace output of strings).
STRING *Parrot_str_unescape
Unescapes the specified C string. These sequences are covered:
STRING *Parrot_str_upcase
Returns a copy of the specified Parrot string converted to upper case. Non-caseable characters are left unchanged.
void Parrot_str_upcase_inplace
Converts the specified Parrot string to upper case.
STRING *Parrot_str_downcase
Returns a copy of the specified Parrot string converted to lower case. Non-caseable characters are left unchanged.
void Parrot_str_downcase_inplace
Converts the specified Parrot string to lower case.
STRING *Parrot_str_titlecase
Returns a copy of the specified Parrot string converted to title case. Non-caseable characters are left unchanged.
void Parrot_str_titlecase_inplace
Converts the specified Parrot string to title case.
STRING *string_increment
Increments the string in the Perl 5 fashion, where incrementing aa gives you bb and so on. Currently single char only.
const char *Parrot_string_cstring
Returns a C string from a Parrot string. Both sides are treated as constants -- i.e. do not resize the result.
INTVAL Parrot_str_is_cclass
Returns 1 if the codepoint of string INTVAL Parrot_str_find_cclass
Finds the first occurrence of the given character class in INTVAL Parrot_str_find_not_cclass
Finds the first occurrence of the a character not in the given character class in STRING *Parrot_str_change_charset
If STRING *Parrot_str_change_encoding
If STRING *Parrot_str_compose
Normalizes the string.
STRING *Parrot_str_join
Joins the elements of the array PMC *Parrot_str_split
Splits the string STRING *Parrot_str_from_uint
Returns STRING *Parrot_str_from_int_base
Returns
length
from offset
from the specified Parrot string and stores it in **d
, allocating memory if necessary. The substring is also returned.
length
characters from offset
in the first Parrot string with the second Parrot string, returning what was replaced.This follows the Perl semantics for:
substr EXPR, OFFSET, LENGTH, REPLACEMENTReplacing a sequence of characters with a longer string grows the string; a shorter string shrinks it.Replacing 2 past the end of the string is undefined. However replacing 1 past the end of the string concatenates the two strings.A negative offset is allowed to replace from the end.
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The returned string is a copy of the one passed in.
n
characters of the specified Parrot string. If n
is negative, cuts the string after +n
characters. The string passed in is modified and returned.
len
. The representation
argument is required in case a new Parrot string has to be created.
AND
on two Parrot string, performing type and encoding conversions if necessary. If the second string is not NULL
then it is reused. Otherwise a new Parrot string is created.
OR
on two Parrot strings, performing type and encoding conversions if necessary. If the third string is not NULL
, then it is reused. Otherwise a new Parrot string is created.
XOR
on two Parrot strings, performing type and encoding conversions if necessary. If the second string is not NULL
, then it is reused. Otherwise a new Parrot string is created.
NOT
on a Parrot string. If the second string is not NULL
then it is reused, otherwise a new Parrot string is created.
0
, ""
or "0"
.
sign = '+' | '-' digit = "Any code point considered a digit by the chartype" indicator = 'e' | 'E' digits = digit [digit]... decimal-part = digits '.' [digits] | ['.'] digits exponent-part = indicator [sign] digits numeric-string = [sign] decimal-part [exponent-part]The integer value is the appropriate integer representation of such a number, rounding towards zero.
Parrot_str_free_cstring()
to free the string. Failure to do this will result in a memory leak.
Parrot_str_free_cstring()
to free the string. Failure to do this will result in a memory leak.
Parrot_str_to_cstring()
.TODO - Hopefully this can go away at some point, as it's got all sorts of leak potential otherwise.
Parrot_str_pin()
so that the string once again uses managed memory.
s->hashval
.
Parrot_str_unescape
can handle are escaped as \x, as well as a double quote character. Other control chars and codepoints < 0x100 are escaped as \xhh, codepoints up to 0xffff, as \uhhhh, and codepoints greater than this as \x{hh...hh}.
\xhh 1..2 hex digits \ooo 1..3 oct digits \cX control char X \x{h..h} 1..8 hex digits \uhhhh 4 hex digits \Uhhhhhhhh 8 hex digits \a, \b, \t, \n, \v, \f, \r, \e
s
at given offset is in the given character class flags
. See also include/parrot/cclass.h for possible character classes. Returns 0 otherwise, or if the string is empty or NULL.
flags
in the string, and returns its glyph-wise index.
flags
in the string starting from offset
and looking at count
positions, and returns its glyph-wise index. Returns offset + count
, if not found.
dest
== NULL, converts src
to the given charset or encoding inplace. Otherwise returns a copy of src
with the charset/encoding in dest
.
dest
== NULL, converts src
to the given charset or encoding in place. Otherwise returns a copy of src
with the charset/encoding in dest
ar
as strings with the string j
between them, returning the result.
str
at the delimiter delim
, returning a ResizableStringArray
of results. Returns PMCNULL if the string or the delimiter is NULL.
num
converted to a Parrot STRING
.Note that base
must be defined (a default of 10 is not assumed). The caller has to verify that base >= 2 && base <= 36
The buffer tc
must be at least sizeof (UHUGEINTVAL)*8 + 1
chars big.If minus
is true, then -
is prepended to the string representation.
num
converted to a Parrot STRING
.Note that base
must be defined (a default of 10 is not assumed).If num < 0
, then -
is prepended to the string representation.