parrotcode: Parrot I/O | |
Contents | Documentation |
docs/pdds/pdd22_io.pod - Parrot I/O
Parrot's I/O subsystem.
$Revision$
A "stream" allows input or output operations on a source/destination such as a file, keyboard, or text console. Streams are also called "filehandles", though only some of them have anything to do with files.
Currently,
the Parrot I/O subsystem uses a per-interpreter stack to provide a layer-based approach to I/O.
Each layer implements a subset of the ParrotIOLayerAPI
vtable.
To find an I/O function,
the layer stack is searched downwards until a non-NULL function pointer is found for that particular slot.
This implementation will be replaced with a composition model.
Rather than living in a stack,
the module fragments that make up the ParrotIO class will be composed and any conflicts resolved when the class is loaded.
This strategy eliminates the need to search a stack on each I/O call,
while still allowing a "layered" combination of functionality for different platforms.
Currently, Parrot only implements synchronous I/O operations. For the 1.0 release the asynchronous operations will be implemented separately from the synchronous ones. There may be an implementation that uses one variant to implement the other someday, but it's not an immediate priority.
Synchronous opcodes are differentiated from asynchronous opcodes by the presence of a callback argument in the asynchronous calls. Asynchronous calls that don't supply callbacks (perhaps if the user wants to manually check later if the operation succeded) are enough of a fringe case that they don't need opcodes. They can access the functionality via methods on ParrotIO objects.
The asynchronous I/O implementation will use the composition model to allow some platforms to take advantage of their built-in asynchronous operations, layered behind Parrot's asynchronous I/O interface.
Asynchronous operations use a lightweight concurrency model. At the user level, Parrot follows the callback function model of asynchronous I/O. At the interpreter level, each asynchronous operation registers a task with the interpreter's concurrency scheduler. The registered task could represent a simple Parrot asynchronous I/O operation, a platform-native asynchronous I/O call, or even synchronous code in a full Parrot thread (rare but possibly useful for prototyping new features, or for mock objects in testing).
Communication between the calling code and the asynchronous operation task is handled by a shared status object. The operation task updates the status object whenever the status changes, and the calling code can check the status object at any time. The status object contains a reference to the returned result of an asynchronous I/O call. In order to allow sharing of the status object, asynchronous ops both pass the status object to the callback PMC, and return it to the calling code.
The lightweight tasks typically used by the asynchronous I/O system capture no state other than the arguments passed to the I/O call, and share no variables with the calling code other than the status object.
[See http://en.wikipedia.org/wiki/Asynchronous_I/O, for a relatively comprehensive list of asynchronous I/O implementation options.]
Methods
[Over and over again throughout this section, I keep wanting an API that isn't possible with current low-level PMCs. This could mean that low-level PMCs need a good bit of work to gain the same argument passing capabilities as higher-level Parrot objects (which is true, long-term). It could mean that Parrot I/O objects would be better off defined in a higher-level syntax, with embedded C (via NCI, or a lighter-weight embedding mechanism) for those pieces that really are direct C access. Or, it could mean that I'll come back and rip this interface down to a bare minimum.]
$P0 = new ParrotIO
open
opcode.] $P0 = $P1.open()
$P0 = $P1.open($S2)
$P0 = $P1.open($S2, $S3)
open
opcode: 'r' for read, 'w' for write, 'a' for append, and 'p' for pipe. When the mode is set to write or append, a file is created without warning if none exists. When the mode is read (without write), a nonexistent file is an error. $P0 = $P1.close()
$P0 = $P1.close($P2)
close
method returns a PMC status object. $P0 = $P1.print($I2)
$P0 = $P1.print($N2)
$P0 = $P1.print($S2)
$P0 = $P1.print($P2)
$P0 = $P1.print($I2, $P3)
$P0 = $P1.print($N2, $P3)
$P0 = $P1.print($S2, $P3)
$P0 = $P1.print($P2, $P3)
$S0 = $P1.read($I2)
$P0 = $P1.read($I2, $P3)
utf8
or similar role to the object [the syntax for applying a role to an object has yet to be defined in PDD 15]. If there are fewer bytes remaining in the stream than specified in the read request, it returns the remaining bytes (with no error). $S0 = $P1.readline()
$P0 = $P1.readline($P2)
readline
flags the stream as operating in line-buffer mode (see the buffer_type
method below). The readline
operation respects the read mode of the I/O object the same as read
does. Newlines are not removed from the end of the string. $S0 = $P1.record_separator()
$P0.record_separator($S1)
$I0 = $P1.buffer_type()
$S0 = $P1.buffer_type()
$P0.buffer_type($I1)
$P0.buffer_type($S1)
0 PIO_NONBUF
Unbuffered I/O. Bytes are sent as soon as possible.
1 PIO_LINEBUF
Line buffered I/O. Bytes are sent when a record separator is
encountered.
2 PIO_FULLBUF
Fully buffered I/O. Bytes are sent when the buffer is full.
[Note, the constant was called "BLKBUF" because bytes are
sent as a block, but line buffering also sends them as a
block, so changed to "FULLBUF".]
$I0 = $P1.buffer_size()
$P0.buffer_size($I1)
$I0 = $P1.get_fd()
get_fd
retrieves the Unix integer file descriptor of the object. This method doesn't exist on stream objects that aren't Unix filehandles, so check does
for the appropriate role, or can
for the method before calling it. $I0 = $P1
if $P0 goto ...
true
for successful completion or while still running, false
for an error. $P0 = $P1.return()
$P0 = $P1.error()
$P0.throw()
[Implementation NOTE: this may either be the default Iterator object applied to a ParrotIO object, a separate Iterator object for I/O objects, or an Iterator role applied to I/O objects.]
new $P0, .Iterator, $P1
shift $S0, $P1
buffer_type
setting: unbuffered, line-buffered, or fully-buffered. unless $P0 goto iter_end
true
if there is more data to pull from the I/O object, false
if the iterator has reached the end of the data. [NOTE: this means that an iterator always checks for the next line/block of data when it retrieves the current one.]The signatures for the asynchronous operations are nearly identical to the synchronous operations, but the asynchronous operations take an additional argument for a callback, and the only return value from the asynchronous operations is a status object. When the callbacks are invoked, they are passed the status object as their sole argument. Any return values from the operation are stored within the status object.
The listing below says little about whether the opcodes return error information. For now assume that they can either return a status object, or return nothing. Error handling is discussed more thoroughly below in "Error Handling".
$P0 = open $S1
$P0 = open $S1, $S2
$P0 = open $P1
$P0 = open $P1, $S2
close $P0
close $P0, $P1
These opcodes do not have asynchronous variants.
getstdin
, getstdout
, and getstderr
return a stream object for standard input, standard output, and standard error.fdopen
converts an existing and already open UNIX integer file descriptor into a stream object. It also takes a string argument to specify the mode. print $I0
print $N0
print $S0
print $P0
print $P0, $I1
print $P0, $N1
print $P0, $S1
print $P0, $P1
print $P0, $I1, $P2
print $P0, $N1, $P2
print $P0, $S1, $P2
print $P0, $P1, $P2
printerr $I0
printerr $N0
printerr $S0
printerr $P0
printerr
. [It's just a shortcut. If they want an asynchronous version, they can use print
.] $S0 = read $I1
$S0 = read $P1, $I2
$P0 = read $P1, $I2, $P3
$S0 = readline $P1
$P0 = readline $P1, $P2
readline
flags the stream as operating in line-buffer mode (see pioctl
below). $S0 = peek
$S0 = peek $P1
peek
retrieves the next byte from a stream into a string, but doesn't remove it from the stream. By default it reads from standard input, but it also takes a stream object argument for an alternate source.peek
. [Does anyone have a line of reasoning why one might be needed? The concept of "next byte" seems to be a synchronous one.] seek $P0, $I1, $I2
seek $P0, $I1, $I2, $I3
seek $P0, $I1, $I2, $P3
seek $P0, $I1, $I2, $I3, $P4
$I0 = tell $P1
($I0, $I1) = tell $P2
$I0 = poll $P1, $I2, $I3, $I4
poll
.]poll
to see the constants for event types and return status.write
prints to standard output but it cannot select another stream. It only accepts a PMC value to write. This is redundant with the print
opcode, so it will be deprecated.getfd
retrieves the UNIX integer file descriptor of a stream object. The opcode has been replaced by a 'get_fd' method on the ParrotIO object.pioctl
provides low-level access to the attributes of a stream object. It takes a stream object, an integer flag to select a command, and a single integer argument for the command. It returns an integer indicating the success or failure of the command.This opcode has been replaced with methods on the ParrotIO object, but is kept here for reference.
The following constants are defined for the commands that pioctl
can execute:
0 PIOCTL_CMDRESERVED
No documentation available.
1 PIOCTL_CMDSETRECSEP
Set the record separator. [This doesn't actually work at the
moment.]
2 PIOCTL_CMDGETRECSEP
Get the record separator.
3 PIOCTL_CMDSETBUFTYPE
Set the buffer type.
4 PIOCTL_CMDGETBUFTYPE
Get the buffer type
5 PIOCTL_CMDSETBUFSIZE
Set the buffer size.
6 PIOCTL_CMDGETBUFSIZE
Get the buffer size.
The following constants are defined as argument/return values for the buffer-type commands:
0 PIOCTL_NONBUF
Unbuffered I/O. Bytes are sent as soon as possible.
1 PIOCTL_LINEBUF
Line buffered I/O. Bytes are sent when a newline is
encountered.
2 PIOCTL_BLKBUF
Fully buffered I/O. Bytes are sent when the buffer is full.
[Okay, I'm seriously considering moving most of these to methods on the ParrotIO object. More than that, moving them into a role that is composed into the ParrotIO object when needed. For the ones that have the form 'opcodename parrotIOobject, arguments', I can't see that it's much less effort than 'parrotIOobject.methodname(arguments)' for either manually writing PIR or generating PIR. The slowest thing about I/O is I/O, so I can't see that we're getting much speed gain out of making them opcodes. The ones to keep as opcodes are 'unlink', 'rmdir', and 'opendir'.]
stat
retrieves information about a file on the filesystem. It takes a string filename or an integer argument of a UNIX file descriptor [or an already opened stream object?], and an integer flag for the type of information requested. It returns an integer containing the requested information. The following constants are defined for the type of information requested (see runtime/parrot/include/stat.pasm): 0 STAT_EXISTS
Whether the file exists.
1 STAT_FILESIZE
The size of the file.
2 STAT_ISDIR
Whether the file is a directory.
3 STAT_ISDEV
Whether the file is a device such as a terminal or a disk.
4 STAT_CREATETIME
The time the file was created.
(Currently just returns -1.)
5 STAT_ACCESSTIME
The last time the file was accessed.
6 STAT_MODIFYTIME
The last time the file data was changed.
7 STAT_CHANGETIME
The last time the file metadata was changed.
8 STAT_BACKUPTIME
The last time the file was backed up.
(Currently just returns -1.)
9 STAT_UID
The user ID of the file.
10 STAT_GID
The group ID of the file.
The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the stat operation is complete, it invokes the callback, passing it a status object and an integer containing the status information.
unlink
deletes a file from the filesystem. It takes a single string argument of a filename (including the path).The asynchronous version takes an additional final PMC callback argument. When the unlink operation is complete, it invokes the callback, passing it a status object.
rmdir
deletes a directory from the filesystem if that directory is empty. It takes a single string argument of a directory name (including the path).The asynchronous version takes an additional final PMC callback argument. When the rmdir operation is complete, it invokes the callback, passing it a status object.
opendir
opens a stream object for a directory. It takes a single string argument of a directory name (including the path) and returns a stream object.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the opendir operation is complete, it invokes the callback, passing it a status object and a newly created stream object.
readdir
reads a single item from an open directory stream object. It takes a single stream object argument and returns a string containing the path and filename/directory name of the current item. (i.e. the directory stream object acts as an iterator.)The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the readdir operation is complete, it invokes the callback, passing it a status object and the string result.
telldir
returns the current position of readdir
operations on a directory stream object.No asynchronous version.
seekdir
sets the current position of readdir
operations on a directory stream object. It takes a stream object argument and an integer for the position. [The system seekdir
requires that the position argument be the result of a previous telldir
operation.]The asynchronous version takes an additional final PMC callback argument. When the seekdir operation is complete, it invokes the callback, passing it a status object and the directory stream object it was called on.
rewinddir
sets the current position of readdir
operations on a directory stream object back to the beginning of the directory. It takes a stream object argument.No asynchronous version.
closedir
closes a directory stream object. It takes a single stream object argument.The asynchronous version takes an additional final PMC callback argument. When the closedir operation is complete, it invokes the callback, passing it a status object.
Most of these opcodes conform to the standard UNIX interface, but the layer API allows alternate implementations for each.
[These I'm also considering moving to methods in a role for the ParrotIO object. Keep 'socket' as an opcode, or maybe just make 'socket' an option on creating a new ParrotIO object.]
socket
returns a new socket object from a given address family, socket type, and protocol number (all integers). The socket object's boolean value can be tested for whether the socket was created.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the socket operation is complete, it invokes the callback, passing it a status object and a new socket object.
sockaddr
returns an object representing a socket address, generated from a port number (integer) and an address (string).No asynchronous version.
connect
connects a socket object to an address.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the socket operation is complete, it invokes the callback, passing it a status object and the socket object it was called on. [If you want notification when a connect operation is completed, you probably want to do something with that connected socket object.]
recv
receives a message from a connected socket object. It returns the message in a string.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the recv operation is complete, it invokes the callback, passing it a status object and a string containing the received message.
send
sends a message string to a connected socket object.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the send operation is complete, it invokes the callback, passing it a status object.
sendto
sends a message string to an address specified in an address object (first connecting to the address).The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the sendto operation is complete, it invokes the callback, passing it a status object.
bind
binds a socket object to the port and address specified by an address object (the packed result of sockaddr
).The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the bind operation is complete, it invokes the callback, passing it a status object and the socket object it was called on. [If you want notification when a bind operation is completed, you probably want to do something with that bound socket object.]
listen
specifies that a socket object is willing to accept incoming connections. The integer argument gives the maximum size of the queue for pending connections.There is no asynchronous version. listen
marks a set of attributes on the socket object.
accept
accepts a new connection on a given socket object, and returns a newly created socket object for the connection.The asynchronous version takes an additional final PMC callback argument, and only returns a status object. When the accept operation receives a new connection, it invokes the callback, passing it a status object and a newly created socket object for the connection. [While the synchronous accept
has to be called repeatedly in a loop (once for each connection received), the asynchronous version is only called once, but continues to send new connection events until the socket is closed.]
shutdown
closes a socket object for reading, for writing, or for all I/O. It takes a socket object argument and an integer argument for the type of shutdown: 0 PIOSHUTDOWN_READ
Close the socket object for reading.
1 PIOSHUTDOWN_WRITE
Close the socket object for writing.
2 PIOSHUTDOWN
Close the socket object.
Currently some of the networking opcodes (connect
, recv
, send
, poll
, bind
, and listen
) return an integer indicating the status of the call, -1 or a system error code if unsuccessful. Other I/O opcodes (such as getfd
and accept
) have various different strategies for error notification, and others have no way of marking errors at all. We want to unify all I/O opcodes so they use a consistent strategy for error notification.
Synchronous I/O operations return an integer status code indicating success or failure in addition to their ordinary return value(s). This approach has the advantage of being lightweight: returning a single additional integer is cheap.
[Discuss: should synchronous operations take the same error handling strategy as asynchronous ones?]
Asynchronous I/O operations return a status object. The status object contains an integer status code, string status/error message, and boolean success value.
An error callback may be set on a status object, though it isn't required. This callback will be invoked if the asynchronous operation terminates in an error condition. The error callback takes one argument, which is the status object containing all information about the failed call. If no error callback is set, then the standard callback will be invoked, and the user will need to check for error conditions in the status object as the first operation of the handler code.
At some point in the future, I/O objects may also provide a way to throw exceptions on error conditions. This feature will be enabled by calling a method on the I/O object to set an internal flag. The exception throwing will be implemented as a method call on the status object.
Note that exception handlers for asynchronous I/O operations will likely have to be set at a global scope because execution will have left the dynamic scope of the I/O call by the time the error occurs.
The transition from IPv4 to IPv6 is in progress, though not likely to be complete anytime soon. Most operating systems today offer at least dual-stack IPv6 implementations, so they can use either IPv4 or IPv6, depending on what's available. Parrot also needs to support either protocol. For the most part, the network I/O opcodes should internally handle either addressing scheme, without requiring the user to specify which scheme is being used.
IETF recommends defaulting to IPv6 connections and falling back to IPv4 connections when IPv6 fails. This would give us more solid testing of Parrot's compatibility IPv6, but may be too slow. Either way, it's a good idea to make setting the default (or selecting one exclusively) an option when compiling Parrot.
The most important issues for Parrot to consider with IPv6 are:
20a:95ff:fef5:7e5e
.[20a:95ff:fef5:7e5e]:80
and [20a:95ff::]/64
.sockaddr
opcode, should be passed around as an object (or at least a structure) rather than as a string.See the relevant IETF RFCs: "Application Aspects of IPv6 Transition" (http://www.ietf.org/rfc/rfc4038.txt) and "Basic Socket Interface Extensions for IPv6" (http://www.ietf.org/rfc/rfc3493.txt).
None.
None.
src/io/io.c
src/ops/io.ops
include/parrot/io.h
runtime/parrot/library/Stream/*
src/io/io_unix.c
src/io/io_win32.c
Perl 5's IO::AIO
Perl 5's POE
|