parrotcode: String Operations Contents | Ops

# NAME

string.ops - String Operations

# DESCRIPTION

Operations that work on strings, whether constructing, modifying or examining them. See also rx.ops.

ord(out INT, in STR)

Two-argument form returns the 0th character of string \$2 in register \$1. If \$2 is empty, throws an exception.

ord(out INT, in STR, in INT)

Three-argument form returns character \$3 of string \$2 in register \$1. If \$2 is empty, throws an exception. If \$3 is greater than the length of string \$2, throws an exception. If \$3 is less then zero but greater than the negative of the length, counts backwards through the string, such that -1 is the last character, -2 is the second-to-last character, and so on. If \$3 is less than the negative of the length, throws an exception.

chr(out STR, in INT)

Returns the character specified by the \$2 number.

chopn(inout STR, in INT)

Remove \$2 characters from the end of the string in \$1. If \$2 is negative, cut the string after -\$2 characters.

chopn(out STR, in STR, in INT)

Removes \$3 characters from the end of the string in \$2 and returns the result in \$1. If \$3 is negative, cut the string after -\$3 characters.

concat(inout STR, in STR)

Append the string in \$2 to the string in \$1.

concat(out STR, in STR, in STR)

Append the string \$3 to \$2 and places the result into \$1.

repeat(out STR, in STR, in INT)

Repeats string \$2 \$3 times and stores result in \$1.

length(out INT, in STR)

Set \$1 to the length (in characters) of the string in \$2.

bytelength(out INT, in STR)

Set \$1 to the length (in bytes) of the string in \$2.

pin(inout STR)

Make the memory in \$1 immobile. This memory will not be moved by the GC, and may be safely passed to external libraries. (Well, as long as they don't free it) Pinning a string will move the contents.

The memory only need be unpinned if you plan on using it for any length of time after its pinning is no longer necessary.

unpin(inout STR)

Make the memory in \$1 movable again. This will make the memory in \$1 move.

substr(out STR, in STR, in INT)

substr(out STR, in STR, in INT, in INT)

substr(out STR, inout STR, in INT, in INT, in STR)

substr(inout STR, in INT, in INT, in STR)

substr(out STR, in PMC, in INT, in INT)

Set \$1 to the portion of \$2 starting at (zero-based) character position \$3 and having length \$4. If no length (\$4) is provided, it is equivalent to passing in the length of \$2. This creates a COW copy of \$2.

Optionally pass in string \$5 for replacement. If the length of \$5 is different from the length specified in \$4, then \$2 will grow or shrink accordingly. If \$3 is one character position larger than the length of \$2, then \$5 is appended to \$2 (and the empty string is returned); this is essentially the same as

`  concat \$2, \$5`
Finally, if \$3 is negative, then it is taken to count backwards from the end of the string (ie an offset of -1 corresponds to the last character).

The third form is optimized for replace only, ignoring the replaced substring and does not waste a register to do the string replace.

substr_r(out STR, in STR, in INT, in INT)

Make \$1 refer to the given part of \$2, basically like above, but it is reusing the given destination string and doesn't care if the source string is changed later. This is changed includes also GC runs, which will move the referenced string. This also means that \$1 has to be reset before any GC may happen.

This opcode should really be just used to quickly refer to a substring of another part, e.g. for printing and is a temporary hack.

Handle with care

index(out INT, in STR, in STR)

index(out INT, in STR, in STR, in INT)

The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of \$3 in \$2 at or after \$4. If \$4 is omitted, starts searching from the beginning of the string. The return value is based at "0". If the substring is not found, returns "-1".

sprintf(out STR, in STR, in PMC)

sprintf(out PMC, in PMC, in PMC)

#=item sprintf(out STR, in STR) [unimplemented] [[what is this op supposed to do? --jrieks]]

#=item sprintf(out PMC, in PMC) [unimplemented] [[what is this op supposed to do? --jrieks]]

Sets \$1 to the result of calling `Parrot_psprintf` with the given format (\$2) and arguments (\$3, which should be an ordered aggregate PMC). In the (unimplemented) versions that don't include \$3, arguments are popped off the user stack.

The result is quite similar to using the system `sprintf`, but is protected against buffer overflows and the like. There are some differences, especially concerning sizes (which are largely ignored); see misc.c for details.

new(out STR)

new(out STR, in INT)

Allocate a new empty string, of length \$2 (optional), encoding \$3 (optional) and type \$4. (optional)

stringinfo(out INT, in STR, in INT)

Extract some information about string \$2 and store it in \$1. Possible values for \$3 are:

1 The location of the string buffer header.

2 The location of the start of the string.

3 The length of the string buffer (in bytes).

4 The flags attached to the string (if any).

5 The amount of the string buffer used (in bytes).

6 The length of the string (in characters).

upcase(out STR, in STR)

Uppercase \$2 and put the result in \$1

upcase(inout STR)

Uppercase \$1 in place

downcase(out STR, in STR)

Downcase \$2 and put the result in \$1

downcase(inout STR)

Downcase \$1 in place

titlecase(out STR, in STR)

Titlecase \$2 and put the result in \$1

titlecase(inout STR)

Titlecase \$1 in place

join(out STR, in STR, in PMC)

Create a new string \$1 by joining array elements from array \$3 with string \$2.

split(out PMC, in STR, in STR)

Create a new Array PMC \$1 by splitting the string \$3 into pieces delimited by the string \$2. If \$2 does not appear in \$3, then return \$3 as the sole element of the Array PMC. Will return empty strings for delimiters at the beginning and end of \$3

charset(out INT, in STR)

Return the charset number of string \$2.

charsetname(out STR, in INT)

Return the name of charset numbered \$2.

find_charset(out INT, in STR)

Return the charset number of the charset named \$2. If the charset doesn't exist, throw an exception.

trans_charset(inout STR, in INT)

Change the string to have the specified charset.

trans_charset(out STR, in STR, in INT)

Create a string \$1 from \$2 with the specified charset.

Both functions may throw an exception on information loss.

encoding(out INT, in STR)

Return the encoding number of string \$2.

encodingname(out STR, in INT)

Return the name of encoding numbered \$2.

find_encoding(out INT, in STR)

Return the encoding number of the encoding named \$2. If the encoding doesn't exist, throw an exception.

trans_encoding(inout STR, in INT)

Change the string to have the specified encoding.

trans_encoding(out STR, in STR, in INT)

Create a string \$1 from \$2 with the specified encoding.

Both functions may throw an exception on information loss.

is_cclass(out INT, in INT, in STR, in INT)

Set \$1 to 1 if the codepoint of \$3 at position \$4 is in the character class(es) given by \$2.

find_cclass(out INT, in INT, in STR, in INT, in INT)

Set \$1 to the offset of the first codepoint matching the character class(es) given by \$2 in string \$3, starting at offset \$4 for up to \$5 codepoints. If no matching character is found, set \$1 to (offset + count).

find_not_cclass(out INT, in INT, in STR, in INT, in INT)

Set \$1 to the offset of the first codepoint not matching the character class(es) given by \$2 in string \$3, starting at offset \$4 for up to \$5 codepoints. If the substring consists entirely of matching characters, set \$1 to (offset + count).

escape(out STR, invar STR)

Escape all non-ascii chars to backslashed escape sequences. A string with charset ascii is created as result.

compose(out STR, in STR)

Compose (normalize) a string.