NAME ^

string.ops - String Operations

DESCRIPTION ^

Operations that work on strings, whether constructing, modifying or examining them.

ord(out INT, in STR)

The codepoint in the current character set of the first character of string $2 is returned in integer $1. If $2 is empty, an exception is thrown.

ord(out INT, in STR, in INT)

The codepoint in the current character set of the character at integer index $3 of string $2 is returned in integer $1. If $2 is empty, an exception is thrown. If $3 is greater than the length of $2, an exception is thrown. If $3 is less then zero but greater than the negative of the length of $2, counts backwards through $2, such that -1 is the last character, -2 is the second-to-last character, and so on. If $3 is less than the negative of the length of $2, an exception is thrown.

chr(out STR, in INT)

The character specified by codepoint integer $2 in the current character set is returned in string $1.

chopn(inout STR, in INT)

Remove n characters specified by integer $2 from the tail of string $1. If $2 is negative, cut the string after -$2 characters.

chopn(out STR, in STR, in INT)

Remove n characters specified by integer $3 from the tail of string $2, and returns the characters not chopped in string $1. If $3 is negative, cut the string after -$3 characters.

concat(inout STR, in STR)

concat(in PMC, in STR)

concat(in PMC, in PMC)

Modify string $1 in place, appending string $2. The PMC versions are MMD operations.

concat(out STR, in STR, in STR)

concat(in PMC, in PMC, in STR)

concat(in PMC, in PMC, in PMC)

n_concat(out PMC, in PMC, in STR)

n_concat(out PMC, in PMC, in PMC)

Append strings $3 to string $2 and place the result into string $1. The PMC versions are MMD operations. The n_ variants create a new PMC $1 to store the result. See src/ops/math.ops for the general infix and n_infix syntax.

repeat(out STR, in STR, in INT)

repeat(in PMC, in PMC, in INT)

repeat(in PMC, in PMC, in PMC)

n_repeat(out PMC, in PMC, in INT)

n_repeat(out PMC, in PMC, in PMC)

Repeat string $2 integer $3 times and return result in string $1. The PMC versions are MMD operations.

length(out INT, in STR)

Calculate the length (in characters) of string $2 and return as integer $1. If $2 is NULL or zero length, zero is returned.

bytelength(out INT, in STR)

Calculate the length (in bytes) of string $2 and return as integer $1. If $2 is NULL or zero length, zero is returned.

pin(inout STR)

Make the memory in string $1 immobile. This memory will not be moved by the Garbage Collector, and may be safely passed to external libraries. (Well, as long as they don't free it) Pinning a string will move the contents.

$1 should be unpinned if it is used after pinning is no longer necessary.

unpin(inout STR)

Make the memory in string $1 movable again. This will make the memory in $1 move.

substr(out STR, in STR, in INT)

substr(out STR, in STR, in INT, in INT)

substr(out STR, inout STR, in INT, in INT, in STR)

substr(inout STR, in INT, in INT, in STR)

substr(out STR, invar PMC, in INT, in INT)

Set $1 to the portion of $2 starting at (zero-based) character position $3 and having length $4. If no length ($4) is provided, it is equivalent to passing in the length of $2. This creates a COW copy of $2.

Optionally pass in string $5 for replacement. If the length of $5 is different from the length specified in $4, then $2 will grow or shrink accordingly. If $3 is one character position larger than the length of $2, then $5 is appended to $2 (and the empty string is returned); this is essentially the same as

  concat $2, $5
Finally, if $3 is negative, then it is taken to count backwards from the end of the string (ie an offset of -1 corresponds to the last character).

The third form is optimized for replace only, ignoring the replaced substring and does not waste a register to do the string replace.

inline op substr(out STR, in STR, in INT) :base_core { const INTVAL len = string_length(interp, $2); $1 = string_substr(interp, $2, $3, len, &$1, 0); goto NEXT(); }

inline op substr(out STR, in STR, in INT, in INT) :base_core { $1 = string_substr(interp, $2, $3, $4, &$1, 0); goto NEXT(); }

inline op substr(out STR, inout STR, in INT, in INT, in STR) :base_core { $1 = string_replace(interp, $2, $3, $4, $5, &$1); goto NEXT(); }

inline op substr(inout STR, in INT, in INT, in STR) :base_core { (void)string_replace(interp, $1, $2, $3, $4, NULL); goto NEXT(); }

inline op substr(out STR, invar PMC, in INT, in INT) :base_core { $1 = $2->vtable->substr_str(interp, $2, $3, $4); goto NEXT(); }

index(out INT, in STR, in STR)

index(out INT, in STR, in STR, in INT)

The index function searches for a substring within target string, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of substring $3 in target string $2 at or after zero-based position $4. If $4 is omitted, index starts searching from the beginning of the string. The return value is based at "0". If the string is null, or the substring is not found or is null, index returns "-1".

sprintf(out STR, in STR, invar PMC)

sprintf(out PMC, invar PMC, invar PMC)

#=item sprintf(out STR, in STR) [unimplemented] [[what is this op # supposed to do? --jrieks]]

#=item sprintf(out PMC, invar PMC) [unimplemented] [[what is this # op supposed to do? --jrieks]]

Sets $1 to the result of calling Parrot_psprintf with the given format ($2) and arguments ($3, which should be an ordered aggregate PMC). In the (unimplemented) versions that don't include $3, arguments are popped off the user stack.

The result is quite similar to using the system sprintf, but is protected against buffer overflows and the like. There are some differences, especially concerning sizes (which are largely ignored); see misc.c for details.

new(out STR)

new(out STR, in INT)

Allocate a new empty string, of length $2 (optional), encoding $3 (optional) and type $4. (optional)

stringinfo(out INT, in STR, in INT)

Extract some information about string $2 and store it in $1. If a null string is passed, $1 is always set to 0. If an invalid $3 is passed, an exception is thrown. Possible values for $3 are:

1 The location of the string buffer header.

2 The location of the start of the string.

3 The length of the string buffer (in bytes).

4 The flags attached to the string (if any).

5 The amount of the string buffer used (in bytes).

6 The length of the string (in characters).

upcase(out STR, in STR)

Uppercase $2 and put the result in $1

upcase(inout STR)

Uppercase $1 in place

downcase(out STR, in STR)

Downcase $2 and put the result in $1

downcase(inout STR)

Downcase $1 in place

titlecase(out STR, in STR)

Titlecase $2 and put the result in $1

titlecase(inout STR)

Titlecase $1 in place

join(out STR, in STR, invar PMC)

Create a new string $1 by joining array elements from array $3 with string $2.

split(out PMC, in STR, in STR)

Create a new Array PMC $1 by splitting the string $3 into pieces delimited by the string $2. If $2 does not appear in $3, then return $3 as the sole element of the Array PMC. Will return empty strings for delimiters at the beginning and end of $3

Note: the string $2 is just a string. If you want a perl-ish split on regular expression, use PGE::Util's split from the standard library.

charset(out INT, in STR)

Return the charset number $1 of string $2.

charsetname(out STR, in INT)

Return the name $1 of charset number $2. If charset number $2 is not found, name $1 is set to null.

find_charset(out INT, in STR)

Return the charset number of the charset named $2. If the charset doesn't exist, throw an exception.

trans_charset(inout STR, in INT)

Change the string to have the specified charset.

trans_charset(out STR, in STR, in INT)

Create a string $1 from $2 with the specified charset.

Both functions may throw an exception on information loss.

encoding(out INT, in STR)

Return the encoding number $1 of string $2.

encodingname(out STR, in INT)

Return the name $1 of encoding number $2. If encoding number $2 is not found, name $1 is set to null.

find_encoding(out INT, in STR)

Return the encoding number of the encoding named $2. If the encoding doesn't exist, throw an exception.

trans_encoding(inout STR, in INT)

Change the string to have the specified encoding.

trans_encoding(out STR, in STR, in INT)

Create a string $1 from $2 with the specified encoding.

Both functions may throw an exception on information loss.

is_cclass(out INT, in INT, in STR, in INT)

Set $1 to 1 if the codepoint of $3 at position $4 is in the character class(es) given by $2.

find_cclass(out INT, in INT, in STR, in INT, in INT)

Set $1 to the offset of the first codepoint matching the character class(es) given by $2 in string $3, starting at offset $4 for up to $5 codepoints. If no matching character is found, set $1 to (offset + count).

find_not_cclass(out INT, in INT, in STR, in INT, in INT)

Set $1 to the offset of the first codepoint not matching the character class(es) given by $2 in string $3, starting at offset $4 for up to $5 codepoints. If the substring consists entirely of matching characters, set $1 to (offset + count).

escape(out STR, invar STR)

Escape all non-ascii chars to backslashed escape sequences. A string with charset ascii is created as result.

compose(out STR, in STR)

Compose (normalize) a string.

COPYRIGHT ^

Copyright (C) 2001-2007, The Perl Foundation.

LICENSE ^

This program is free software. It is subject to the same license as the Parrot interpreter itself.


parrot