NAME ^

src/string.c - Parrot Strings

DESCRIPTION ^

This file implements the non-ICU parts of the Parrot string subsystem.

Note that bufstart and buflen are used by the memory subsystem. The string functions may only use buflen to determine, if there is some space left beyond bufused. This is the only valid usage of these two data members, beside setting bufstart/buflen for external strings.

Functions ^

PARROT_API void Parrot_unmake_COW(PARROT_INTERP, NOTNULL(STRING *s))

If the specified Parrot string is copy-on-write then the memory is copied over and the copy-on-write flag is cleared.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT STRING *Parrot_make_COW_reference(PARROT_INTERP, NOTNULL(STRING *s))

Creates a copy-on-write string by cloning a string header without allocating a new buffer.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *Parrot_reuse_COW_reference(SHIM_INTERP, NOTNULL(STRING *s), NOTNULL(STRING *d))

Creates a copy-on-write string by cloning a string header without allocating a new buffer. Doesn't allocate a new string header, instead using the one passed in and returns it.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_set(PARROT_INTERP, NULLOK(STRING *dest), NOTNULL(STRING *src))

Makes the contents of first Parrot string a copy of the contents of second.

Basic String Functions ^

Creation, enlargement, etc.

PARROT_API void string_init(PARROT_INTERP)

Initializes the Parrot string subsystem.

PARROT_API void string_deinit(PARROT_INTERP)

De-Initializes the Parrot string subsystem.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_PURE_FUNCTION UINTVAL string_capacity(SHIM_INTERP, NOTNULL(const STRING *s))

Returns the capacity of the specified Parrot string in bytes, that is how many bytes can be appended onto strstart.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_make_empty(PARROT_INTERP, parrot_string_representation_t representation, UINTVAL capacity)

Creates and returns an empty Parrot string.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CAN_RETURN_NULL const CHARSET *string_rep_compatible(SHIM_INTERP, NOTNULL(const STRING *a), NOTNULL(const STRING *b), ARGOUT(const ENCODING **e))

Find the "lowest" possible charset and encoding for the given string. E.g.

  ascii <op> utf8 => utf8
                  => ascii, B<if> C<STRING *b> has ascii chars only.
Returs NULL, if no compatible string representation can be found.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_append(PARROT_INTERP, NULLOK(STRING *a), NULLOK(STRING *b))

Take in two Parrot strings and append the second to the first. NOTE THAT RETURN VALUE MAY NOT BE THE FIRST STRING, if the first string is COW'd or read-only. So make sure to _use_ the return value.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_MALLOC PARROT_CANNOT_RETURN_NULL STRING *string_from_cstring(PARROT_INTERP, NULLOK(const char *const buffer), const UINTVAL len)

Make a Parrot string from a specified C string.

PARROT_API PARROT_CANNOT_RETURN_NULL const char *string_primary_encoding_for_representation(PARROT_INTERP, parrot_string_representation_t representation)

Returns the primary encoding for the specified representation.

This is needed for packfile unpacking, unless we just always use UTF-8 or BOCU.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *const_string(PARROT_INTERP, NOTNULL(const char *buffer))

Creates and returns a constant Parrot string.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_make(PARROT_INTERP, NULLOK(const char *buffer), UINTVAL len, NULLOK(const char *charset_name), UINTVAL flags)

Creates and returns a new Parrot string using len bytes of string data read from buffer.

The value of charset_name specifies the string's representation. The currently recognised values are:

    'iso-8859-1'
    'ascii'
    'binary'
    'unicode'
The encoding is implicitly guessed; unicode implies the utf-8 encoding, and the other three assume fixed-8 encoding.

If charset is unspecified the default charset 'ascii' will be used.

The value of flags is optionally one or more PObj_* flags OR-ed together.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_make_direct(PARROT_INTERP, NULLOK(const char *buffer), UINTVAL len, NOTNULL(const ENCODING *encoding), NOTNULL(const CHARSET *charset), UINTVAL flags)

TODO: Not yet documented!!!

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_grow(PARROT_INTERP, NOTNULL(STRING *s), INTVAL addlen)

Grows the Parrot string's buffer by the specified number of characters.

Ordinary user-visible string operations ^

PARROT_API PARROT_PURE_FUNCTION UINTVAL string_length(SHIM_INTERP, NOTNULL(const STRING *s))

Returns the number of characters in the specified Parrot string.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_index(PARROT_INTERP, NOTNULL(const STRING *s), UINTVAL idx)

Returns the character (or glyph, depending upon the string's encoding) This is to abstract the process of finding the Nth character in a (possibly unicode or JIS-encoded) string, the idea being that once the encoding functions are fleshed out, this function can do the right thing.

Note that this is not range-checked.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_str_index(PARROT_INTERP, NOTNULL(const STRING *s), NOTNULL(const STRING *s2), INTVAL start)

Returns the character position of the second Parrot string in the first at or after start. The return value is a (0 based) offset in characters, not bytes. If second string is not specified, then return -1.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_ord(PARROT_INTERP, NOTNULL(const STRING *s), INTVAL idx)

Returns the codepoint at a given index into a string. Negative indexes are treated as counting from the end of the string.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT STRING *string_chr(PARROT_INTERP, UINTVAL character)

Returns a single character Parrot string.

TODO - Allow this to take an array of characters?

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT STRING *string_copy(PARROT_INTERP, NOTNULL(STRING *s))

Creates and returns a copy of the specified Parrot string.

Vtable Dispatch Functions ^

PARROT_API PARROT_IGNORABLE_RESULT INTVAL string_compute_strlen(PARROT_INTERP, NOTNULL(STRING *s))

Calculates and returns the number of characters in the specified Parrot string.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_max_bytes(SHIM_INTERP, NOTNULL(const STRING *s), INTVAL nchars)

Returns the number of bytes required to safely contain the specified number of characters in the specified Parrot string's representation.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_concat(PARROT_INTERP, NULLOK(STRING *a), NULLOK(STRING *b), UINTVAL Uflags)

Concatenates two Parrot strings. If necessary, converts the second string's encoding and/or type to match those of the first string. If either string is NULL, then a copy of the non-NULL string is returned. If both strings are NULL, then a new zero-length string is created and returned.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_repeat(PARROT_INTERP, NOTNULL(const STRING *s), UINTVAL num, ARGOUT_NULLOK(STRING **d))

Repeats the specified Parrot string num times and stores the result in the second string, and returns it. The second string is created if necessary.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT STRING *string_substr(PARROT_INTERP, NOTNULL(STRING *src), INTVAL offset, INTVAL length, ARGOUT_NULLOK(STRING **d), int replace_dest)

Copies the substring of length length from offset from the specified Parrot string and stores it in **d, allocating memory if necessary. The substring is also returned.

PARROT_API PARROT_CAN_RETURN_NULL STRING *string_replace(PARROT_INTERP, NOTNULL(STRING *src), INTVAL offset, INTVAL length, NOTNULL(STRING *rep), ARGOUT_NULLOK(STRING **d))

This should follow the Perl semantics for:

    substr EXPR, OFFSET, LENGTH, REPLACEMENT
Replaces a sequence of length characters from offset in the first Parrot string with the second Parrot string, returning what was replaced.

Replacing a sequence of characters with a longer string grows the string; a shorter string shrinks it.

Replacing 2 past the end of the string is undefined. However replacing 1 past the end of the string concatenates the two strings.

A negative offset is allowed to replace from the end.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_chopn(PARROT_INTERP, NOTNULL(STRING *s), INTVAL n)

Chops off the last n characters of the specified Parrot string. If n is negative, cuts the string after +n characters. The returned string is a copy of the one passed in.

PARROT_API void string_chopn_inplace(PARROT_INTERP, NOTNULL(STRING *s), INTVAL n)

Chops off the last n characters of the specified Parrot string. If n is negative, cuts the string after +n characters. The string passed in is modified and returned.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_compare(PARROT_INTERP, NULLOK(const STRING *s1), NULLOK(const STRING *s2))

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_equal(PARROT_INTERP, NULLOK(const STRING *s1), NULLOK(const STRING *s2))

Compares two Parrot strings, performing type and encoding conversions if necessary.

Note that this function returns 0 if the strings are equal and 1 otherwise.

static void make_writable(PARROT_INTERP, ARGINOUT(STRING **s), const size_t len, parrot_string_representation_t representation)

Makes the specified Parrot string writable with minimum length len. The representation argument is required in case a new Parrot string has to be created.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_bitwise_and(PARROT_INTERP, NULLOK(STRING *s1), NULLOK(STRING *s2), ARGOUT_NULLOK(STRING **dest))

Performs a bitwise AND on two Parrot string, performing type and encoding conversions if necessary. If the second string is not NULL then it is reused, otherwise a new Parrot string is created.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_bitwise_or(PARROT_INTERP, NULLOK(STRING *s1), NULLOK(STRING *s2), ARGOUT_NULLOK(STRING **dest))

Performs a bitwise OR on two Parrot strings, performing type and encoding conversions if necessary. If the third string is not NULL then it is reused, otherwise a new Parrot string is created.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_bitwise_xor(PARROT_INTERP, NULLOK(STRING *s1), NULLOK(STRING *s2), ARGOUT_NULLOK(STRING **dest))

Performs a bitwise XOR on two Parrot strings, performing type and encoding conversions if necessary. If the second string is not NULL then it is reused, otherwise a new Parrot string is created.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_bitwise_not(PARROT_INTERP, NULLOK(STRING *s), ARGOUT_NULLOK(STRING **dest))

Performs a bitwise NOT on a Parrot string. If the second string is not NULL then it is reused, otherwise a new Parrot string is created.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_bool(PARROT_INTERP, NOTNULL(const STRING *s))

Returns whether the specified Parrot string is true. A string is true if it is equal to anything other than 0, "" or "0".

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_nprintf(PARROT_INTERP, NULLOK(STRING *dest), INTVAL bytelen, NOTNULL(const char *format), ...)

This is like Parrot_snprintf() except that it writes to and returns a Parrot string.

Note that bytelen does not include space for a (non-existent) trailing '\0'. dest may be a NULL pointer, in which case a new native string will be created. If bytelen is 0, the behaviour becomes more sprintf-ish than snprintf-like. bytelen is measured in the encoding of *dest.

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_printf(PARROT_INTERP, NOTNULL(const char *format), ...)

Writes and returns a Parrot string.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL string_to_int(SHIM_INTERP, NOTNULL(const STRING *s))

Converts a numeric Parrot string to an integer value.

A number is such that:

    sign            =  '+' | '-'
    digit           =  "Any code point considered a digit by the chartype"
    indicator       =  'e' | 'E'
    digits          =  digit [digit]...
    decimal-part    =  digits '.' [digits] | ['.'] digits
    exponent-part   =  indicator [sign] digits
    numeric-string  =  [sign] decimal-part [exponent-part]
The integer value is the appropriate integer representation of such a number, rounding towards zero.

PARROT_API PARROT_WARN_UNUSED_RESULT FLOATVAL string_to_num(PARROT_INTERP, NOTNULL(const STRING *s))

Same as string_to_int() except that a floating-point value is returned.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_from_int(PARROT_INTERP, INTVAL i)

Returns a Parrot string representation of the specified integer value.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_from_num(PARROT_INTERP, FLOATVAL f)

Returns a Parrot string representation of the specified floating-point value.

PARROT_API PARROT_MALLOC PARROT_CANNOT_RETURN_NULL char *string_to_cstring(SHIM_INTERP, NOTNULL(const STRING *s))

Returns a C string for the specified Parrot string. Use string_cstring_free() to free the string. Failure to do this will result in a memory leak.

PARROT_API void string_cstring_free(NULLOK(char *p))

Free a string created by string_to_cstring().

TODO - Hopefully this can go away at some point, as it's got all sorts of leak potential otherwise.

PARROT_API void string_pin(PARROT_INTERP, NOTNULL(STRING *s))

Replace the specified Parrot string's managed buffer memory by system memory.

PARROT_API void string_unpin(PARROT_INTERP, NOTNULL(STRING *s))

Undo a string_pin() so that the string once again uses managed memory.

PARROT_API PARROT_WARN_UNUSED_RESULT size_t string_hash(PARROT_INTERP, NULLOK(STRING *s), size_t seed)

Returns the hash value for the specified Parrot string, caching it in s->hashval.

PARROT_API PARROT_CAN_RETURN_NULL STRING *string_escape_string(PARROT_INTERP, NULLOK(const STRING *src))

Escape all non-ascii chars to backslash sequences. Control chars that string_unescape_cstring can handle are esacped as \x, as well as a double quote character. Other control chars and codepoints < 0x100 are escaped as \xhh, codepoints up to 0xffff, as \uhhhh, and codepoints greater than this as \x{hh...hh}.

PARROT_API PARROT_CAN_RETURN_NULL STRING *string_escape_string_delimited(PARROT_INTERP, NULLOK(const STRING *src), UINTVAL limit)

Like above but limit output to len chars (used for trace output of strings).

PARROT_API PARROT_CANNOT_RETURN_NULL STRING *string_unescape_cstring(PARROT_INTERP, NOTNULL(const char *cstring), char delimiter, NULLOK(const char *enc_char))

Unescapes the specified C string. These sequences are covered:

  \xhh        1..2 hex digits
  \ooo        1..3 oct digits
  \cX         control char X
  \x{h..h}    1..8 hex digits
  \uhhhh      4 hex digits
  \Uhhhhhhhh  8 hex digits
  \a, \b, \t, \n, \v, \f, \r, \e
PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_MALLOC STRING *string_upcase(PARROT_INTERP, NOTNULL(const STRING *s))

Returns a copy of the specified Parrot string converted to upper case. Non-caseable characters are left unchanged.

TODO - implemented only for ASCII.

PARROT_API void string_upcase_inplace(PARROT_INTERP, NOTNULL(STRING *s))

Converts the specified Parrot string to upper case.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_MALLOC STRING *string_downcase(PARROT_INTERP, NOTNULL(const STRING *s))

Returns a copy of the specified Parrot string converted to lower case. Non-caseable characters are left unchanged.

PARROT_API void string_downcase_inplace(PARROT_INTERP, NOTNULL(STRING *s))

Converts the specified Parrot string to lower case.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_MALLOC STRING *string_titlecase(PARROT_INTERP, NOTNULL(const STRING *s))

Returns a copy of the specified Parrot string converted to title case. Non-caseable characters are left unchanged.

PARROT_API void string_titlecase_inplace(PARROT_INTERP, NOTNULL(STRING *s))

Converts the specified Parrot string to title case.

PARROT_API PARROT_CANNOT_RETURN_NULL PARROT_WARN_UNUSED_RESULT STRING *string_increment(PARROT_INTERP, NOTNULL(const STRING *s))

Perl5ish increment the string. Currently single char only.

PARROT_API PARROT_PURE_FUNCTION PARROT_CANNOT_RETURN_NULL const char *Parrot_string_cstring(SHIM_INTERP, NOTNULL(const STRING *str))

Returns a C string from a Parrot string. Both sides are treated as constants -- i.e. do not resize the result.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL Parrot_string_is_cclass(PARROT_INTERP, INTVAL flags, NOTNULL(STRING *s), UINTVAL offset)

Returns 1 if the codepoint of string s at given offset is in the given character class flags. See also include/parrot/cclass.h for possible character classes. Returns 0 otherwise, or if the string is empty or NULL.

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL Parrot_string_find_cclass(PARROT_INTERP, INTVAL flags, NOTNULL(STRING *s), UINTVAL offset, UINTVAL count)

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT INTVAL Parrot_string_find_not_cclass(PARROT_INTERP, INTVAL flags, NULLOK(STRING *s), UINTVAL offset, UINTVAL count)

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CAN_RETURN_NULL STRING *Parrot_string_trans_charset(PARROT_INTERP, NULLOK(STRING *src), INTVAL charset_nr, NULLOK(STRING *dest))

If dest == NULL, converts src to the given charset or encoding inplace, else returns a copy of src with the charset/encoding in dest.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CAN_RETURN_NULL STRING *Parrot_string_trans_encoding(PARROT_INTERP, NULLOK(STRING *src), INTVAL encoding_nr, NULLOK(STRING *dest))

If dest == NULL, converts src to the given charset or encoding inplace, else returns a copy of src with the charset/encoding in dest.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CAN_RETURN_NULL STRING *string_compose(PARROT_INTERP, NULLOK(STRING *src))

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *string_join(PARROT_INTERP, NULLOK(STRING *j), NOTNULL(PMC *ar))

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL PMC *string_split(PARROT_INTERP, NOTNULL(STRING *delim), NOTNULL(STRING *str))

TODO: Not yet documented!!!

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *uint_to_str(PARROT_INTERP, NOTNULL(char *tc), UHUGEINTVAL num, char base, int minus)

Returns num converted to a Parrot STRING.

Note that base must be defined, a default of 10 is not assumed. The caller has to verify that base >= 2 && base <= 36 The buffer tc must be at least sizeof (UHUGEINTVAL)*8 + 1 chars big.

If minus is true then - is prepended to the string representation.

PARROT_API PARROT_WARN_UNUSED_RESULT PARROT_CANNOT_RETURN_NULL STRING *int_to_str(PARROT_INTERP, NOTNULL(char *tc), HUGEINTVAL num, char base)

Returns num converted to a Parrot STRING.

Note that base must be defined, a default of 10 is not assumed.

If num < 0 then - is prepended to the string representation.

SEE ALSO ^

src/string_primitives.c

include/parrot/string.h

include/parrot/string_funcs.h

docs/strings.pod


parrot