NAME ^

src/string.c - Parrot Strings

DESCRIPTION ^

This file implements the non-ICU parts of the Parrot string subsystem.

Note that bufstart and buflen are used by the memory subsystem. The string functions may only use buflen to determine if there is some space left beyond bufused. This is the only valid usage of these two data members, beside setting bufstart/buflen for external strings.

Functions ^

void Parrot_unmake_COW

If the specified Parrot string is copy-on-write then the memory is copied over and the copy-on-write flag is cleared.

STRING *Parrot_make_COW_reference

Creates a copy-on-write string by cloning a string header without allocating a new buffer.

STRING *Parrot_reuse_COW_reference

Creates a copy-on-write string by cloning a string header without allocating a new buffer. Doesn't allocate a new string header, instead using the one passed in and returns it.

STRING *string_set

Makes the contents of first Parrot string a copy of the contents of second.

Basic String Functions ^

Creation, enlargement, etc.

void string_init

Initializes the Parrot string subsystem.

void string_deinit

De-Initializes the Parrot string subsystem.

UINTVAL string_capacity

Returns the capacity of the specified Parrot string in bytes, that is how many bytes can be appended onto strstart.

STRING *string_make_empty

Creates and returns an empty Parrot string.

const CHARSET *string_rep_compatible

Find the "lowest" possible charset and encoding for the given string. E.g.

  ascii <op> utf8 => utf8
                  => ascii, B<if> C<STRING *b> has ascii chars only.
Returs NULL, if no compatible string representation can be found.

STRING *string_concat

Concatenates two Parrot strings. If necessary, converts the second string's encoding and/or type to match those of the first string. If either string is NULL, then a copy of the non-NULL string is returned. If both strings are NULL, then a new zero-length string is created and returned.

STRING *string_append

Take in two Parrot strings and append the second to the first. NOTE THAT RETURN VALUE MAY NOT BE THE FIRST STRING, if the first string is COW'd or read-only. So make sure to _use_ the return value.

STRING *string_from_cstring

Make a Parrot string from a specified C string.

const char *string_primary_encoding_for_representation

Returns the primary encoding for the specified representation.

This is needed for packfile unpacking, unless we just always use UTF-8 or BOCU.

STRING *const_string

Creates and returns a constant Parrot string.

STRING *string_make

Creates and returns a new Parrot string using len bytes of string data read from buffer.

The value of charset_name specifies the string's representation. The currently recognised values are:

    'iso-8859-1'
    'ascii'
    'binary'
    'unicode'
The encoding is implicitly guessed; unicode implies the utf-8 encoding, and the other three assume fixed-8 encoding.

If charset is unspecified, the default charset 'ascii' will be used.

The value of flags is optionally one or more PObj_* flags OR-ed together.

STRING *string_make_direct

Given a buffer, its length, an encoding, a character set, and STRING flags, creates and returns a new string. Don't call this directly.

STRING *string_grow

Grows the Parrot string's buffer by the specified number of characters.

Ordinary user-visible string operations ^

UINTVAL string_length

Returns the number of characters in the specified Parrot string.

INTVAL string_index

Returns the character (or glyph, depending upon the string's encoding). This abstracts the process of finding the Nth character in a (possibly Unicode or JIS-encoded) string, the idea being that once the encoding functions are fleshed out, this function can do the right thing.

Note that this is not range-checked.

INTVAL string_str_index

Returns the character position of the second Parrot string in the first at or after start. The return value is a (0 based) offset in characters, not bytes. If second string is not found in the first string, returns -1.

INTVAL string_ord

Returns the codepoint at a given index into a string. Negative indexes are treated as counting from the end of the string.

STRING *string_chr

Returns a single-character Parrot string.

TODO - Allow this to take an array of characters?

STRING *string_copy

Creates and returns a copy of the specified Parrot string.

Vtable Dispatch Functions ^

INTVAL string_compute_strlen

Calculates and returns the number of characters in the specified Parrot string.

INTVAL string_max_bytes

Returns the number of bytes required to safely contain the specified number of characters in the specified Parrot string's representation.

STRING *string_repeat

Repeats the specified Parrot string num times and stores the result in the second string, and returns it. The second string is created if necessary (that is, if you pass in a NULL value).

STRING *string_substr

Copies the substring of length length from offset from the specified Parrot string and stores it in **d, allocating memory if necessary. The substring is also returned.

STRING *string_replace

Replaces a sequence of length characters from offset in the first Parrot string with the second Parrot string, returning what was replaced.

This follows the Perl semantics for:

    substr EXPR, OFFSET, LENGTH, REPLACEMENT
Replacing a sequence of characters with a longer string grows the string; a shorter string shrinks it.

Replacing 2 past the end of the string is undefined. However replacing 1 past the end of the string concatenates the two strings.

A negative offset is allowed to replace from the end.

STRING *string_chopn

Removes the last n characters of the specified Parrot string. If n is negative, cuts the string after +n characters. The returned string is a copy of the one passed in.

void string_chopn_inplace

Removes the last n characters of the specified Parrot string. If n is negative, cuts the string after +n characters. The string passed in is modified and returned.

INTVAL string_compare

Compares two strings to each other. If s1 is less than s2, returns -1. If the strings are equal, returns 0. If s1 is greater than s2, returns 2. This comparison uses the character set collation order of the strings for comparison.

INTVAL string_equal

Compares two Parrot strings, performing type and encoding conversions if necessary.

Note that this function returns 0 if the strings are equal, and non-zero otherwise.

static void make_writable

Makes the specified Parrot string writable with minimum length len. The representation argument is required in case a new Parrot string has to be created.

STRING *string_bitwise_and

Performs a bitwise AND on two Parrot string, performing type and encoding conversions if necessary. If the second string is not NULL then it is reused. Otherwise a new Parrot string is created.

STRING *string_bitwise_or

Performs a bitwise OR on two Parrot strings, performing type and encoding conversions if necessary. If the third string is not NULL, then it is reused. Otherwise a new Parrot string is created.

STRING *string_bitwise_xor

Performs a bitwise XOR on two Parrot strings, performing type and encoding conversions if necessary. If the second string is not NULL, then it is reused. Otherwise a new Parrot string is created.

STRING *string_bitwise_not

Performs a bitwise NOT on a Parrot string. If the second string is not NULL then it is reused, otherwise a new Parrot string is created.

INTVAL string_bool

Returns whether the specified Parrot string is true. A string is true if it is equal to anything other than 0, "" or "0".

STRING *string_printf

Writes and returns a Parrot string.

INTVAL string_to_int

Converts a numeric Parrot string to an integer value.

A number is such that:

    sign            =  '+' | '-'
    digit           =  "Any code point considered a digit by the chartype"
    indicator       =  'e' | 'E'
    digits          =  digit [digit]...
    decimal-part    =  digits '.' [digits] | ['.'] digits
    exponent-part   =  indicator [sign] digits
    numeric-string  =  [sign] decimal-part [exponent-part]
The integer value is the appropriate integer representation of such a number, rounding towards zero.

FLOATVAL string_to_num

Converts a numeric Parrot STRING to a floating point number.

STRING *string_from_int

Returns a Parrot string representation of the specified integer value.

STRING *string_from_num

Returns a Parrot string representation of the specified floating-point value.

char *string_to_cstring

Returns a C string for the specified Parrot string. Use string_cstring_free() to free the string. Failure to do this will result in a memory leak.

void string_cstring_free

Free a string created by string_to_cstring().

TODO - Hopefully this can go away at some point, as it's got all sorts of leak potential otherwise.

void string_pin

Replaces the specified Parrot string's managed buffer memory by system memory.

void string_unpin

Undoes a string_pin() so that the string once again uses managed memory.

size_t string_hash

Returns the hash value for the specified Parrot string, caching it in s->hashval.

STRING *string_escape_string

Escapes all non-ASCII chars to backslash sequences. Control chars that string_unescape_cstring can handle are escaped as \x, as well as a double quote character. Other control chars and codepoints < 0x100 are escaped as \xhh, codepoints up to 0xffff, as \uhhhh, and codepoints greater than this as \x{hh...hh}.

STRING *string_escape_string_delimited

Escapes all non-ASCII characters in the given string with backslashed versions, but limits the length of the output (used for trace output of strings).

STRING *string_unescape_cstring

Unescapes the specified C string. These sequences are covered:

  \xhh        1..2 hex digits
  \ooo        1..3 oct digits
  \cX         control char X
  \x{h..h}    1..8 hex digits
  \uhhhh      4 hex digits
  \Uhhhhhhhh  8 hex digits
  \a, \b, \t, \n, \v, \f, \r, \e
STRING *string_upcase

Returns a copy of the specified Parrot string converted to upper case. Non-caseable characters are left unchanged.

void string_upcase_inplace

Converts the specified Parrot string to upper case.

STRING *string_downcase

Returns a copy of the specified Parrot string converted to lower case. Non-caseable characters are left unchanged.

void string_downcase_inplace

Converts the specified Parrot string to lower case.

STRING *string_titlecase

Returns a copy of the specified Parrot string converted to title case. Non-caseable characters are left unchanged.

void string_titlecase_inplace

Converts the specified Parrot string to title case.

STRING *string_increment

Increments the string in the Perl 5 fashion, where incrementing aa gives you bb and so on. Currently single char only.

const char *Parrot_string_cstring

Returns a C string from a Parrot string. Both sides are treated as constants -- i.e. do not resize the result.

INTVAL Parrot_string_is_cclass

Returns 1 if the codepoint of string s at given offset is in the given character class flags. See also include/parrot/cclass.h for possible character classes. Returns 0 otherwise, or if the string is empty or NULL.

INTVAL Parrot_string_find_cclass

Finds the first occurrence of the given character class in flags in the string, and returns its glyph-wise index.

INTVAL Parrot_string_find_not_cclass

Finds the first occurrence of the a character not in the given character class in flags in the string starting from offset and looking at count positions, and returns its glyph-wise index. Returns offset + count, if not found.

STRING *Parrot_string_trans_charset

If dest == NULL, converts src to the given charset or encoding inplace. Otherwise returns a copy of src with the charset/encoding in dest.

STRING *Parrot_string_trans_encoding

If dest == NULL, converts src to the given charset or encoding in place. Otherwise returns a copy of src with the charset/encoding in dest

STRING *string_compose

Normalizes the string.

STRING *string_join

Joins the elements of the array ar as strings with the string j between them, returning the result.

PMC *string_split

Splits the string str at the delimiter delim, returning a ResizableStringArray of results.

STRING *uint_to_str

Returns num converted to a Parrot STRING.

Note that base must be defined (a default of 10 is not assumed). The caller has to verify that base >= 2 && base <= 36 The buffer tc must be at least sizeof (UHUGEINTVAL)*8 + 1 chars big.

If minus is true, then - is prepended to the string representation.

STRING *int_to_str

Returns num converted to a Parrot STRING.

Note that base must be defined (a default of 10 is not assumed).

If num < 0, then - is prepended to the string representation.

SEE ALSO ^

src/string_primitives.c

include/parrot/string.h

include/parrot/string_funcs.h

docs/strings.pod


parrot