PDD 10: Embedding

Abstract

Parrot, more precisely libparrot, can be embedded into applications to provide a dynamic language runtime. A perfect example of this embedding is in the Parrot executable, which is a thin wrapper around libparrot.

Version 1

Description

Difference Between Embedding and Extending

Embedding and Extending (PDD 11) are similar concepts. In both, we write code that interfaces with libparrot. In an embedding situation we write an application which loads and calls libparrot. In an extending situation, libparrot loads and calls your module.

Extending gives libparrot more features, and allows your code to execute from inside libparrot. From that location, the extending application has full access to the available power and features of libparrot. This includes knowledge about internal structure definitions, and internal-only functions and subsystems. Because extending code is so closely tied to the internals of libparrot, it will be more affected by changes in libparrot itself. Also, the stability of extending code is tied to the stability of libparrot: If either crashes, the other will likely crash with it.

Embedding, on the other hand, has much more limited access to libparrot. All embedding applications must use the official embedding API, which is limited and abstracted by design. Embedding applications must treat all pointers and structures returned from the API as being opaque. This abstraction buys stability. Changes to the internals of libparrot are unlikely to cause changes in embedding code. If libparrot crashes or suffers an unrecoverable error, it can return control to the embedding application more gracefully.

The Embedding API

The Embedding API is a special set of functions found in the src/embed/ directory. These functions may not be used internally by libparrot, embedding applications may not use any other functions. Breaking either of these rules can have serious implications for application stability.

Prior to the implementation of the new API, when libparrot had an unhandled exception it would call the C exit() library function to close the application. This is undesirable because embedding applications want the ability to handle errors and recover from problems in libparrot. The new API provides error handling capabilities for cases of unhandled exceptions, including both expected EXCEPT_exit and other types of error-related exceptions.

The embedding API also makes sure certain details are in place, including stack markers for the GC. Calling into libparrot without setting a valid stack marker could cause serious (and difficult to diagnose) errors.

The embedding API provides relatively limited interaction with libparrot, at least from the point of view of an internals developer or an extension developer. There are many reasons for this. First and foremost, the full power of libparrot is almost always available through the runcore. If you want to do something with Parrot, it is almost always easier and preferred to write your code in a language which targets Parrot, compile it down to bytecode, and load that bytecode into Parrot to execute. Almost all applications of libparrot will involve bytecode execution at some level, and this is where most operations become possible.

The API also provides a powerful abstraction layer between the libparrot internals developers and the embedding application developers. The API is sufficiently abstracted and detached enough that even large changes to the internals of libparrot are unlikely to require any changes in the embedding application. For instance, libparrot could completely change its entire object model implementation and not cause a change to the API at all.

While limited, the API is not static. If embedders need new features or functionality, those can usually be added with relative ease.

Using the Embedding API

The embedding API follows certain guidelines that should be understood by users, and followed by developers:

Implementation

The embedding API has two goals: To allow access to libparrot as a dynamic language runtime and bytecode interpreter, and to encapsulate implementation details internal to libparrot from the embedding application.

There are several guidelines for the embedding API implementation that developers of it should follow:

Working with Interpreters

It is the external code's duty to create, manage, and destroy interpreters.

Parrot_api_make_interpreter returns an opaque pointer to a new interpreter, with some options set in it. The definition of Parrot_api_make_interpreter is as follows:

    Parrot_Int
    Parrot_api_make_interpreter(Parrot_PMC parent, Parrot_Int flags,
            Parrot_Init_Args *args, Parrot_PMC * interp);

A common usage pattern for making an interpreter is:

    Parrot_PMC interp = NULL;
    Parrot_Init_Args *args = NULL;
    GET_INIT_ARGS(args);
    if (!Parrot_api_make_interpreter(NULL, 0, args, &interp)) {
        fprintf(stderr, "Could not create interpreter");
        exit(EXIT_FAILURE);
    }

parent can be NULL for the first interpreter created, or where the interpreter does not have a logical parent. If a parent is provided, the new interpreter will have a child/parent relationship with the parent interp.

The flags parameter contains a bit-wise combination of certain startup flags that govern interpreter creation. It is safe to set this to 0 unless special needs require it to be otherwise.

The args parameter is a structure containing a series of options that must be set on the interpreter during initialization. These options, many of which deal with the memory subsystem and other deep internals can typically be ignored. args can be NULL if no special options need to be set.

The new interpreter PMC is returned in the last parameter.

Parrot_api_destroy_interpreter ( interp ) destroys an interpreter and frees its resources.

    Parrot_Int Parrot_api_destroy_interpreter(Parrot_Interp);

It is a good idea to destroy child interpreters before destroying their parents.

Working with Source Code and PBC Files

libparrot natively executes .pbc bytecode files. These are manipulated in Parrot through a PMC interface. PBC PMCs can be obtained in a number of ways: they can be returned from a compiler, they can be loaded from PBC, or they can be constructed on the fly.

Note: There are PackFile PMCs which represent bytecode, but the PBC object returned or consumed by any individual compiler or API function may have a different type. Treat these objects as opaque until a good interface for them is worked out.

Once a PBC PMC is obtained, several things can be done with it: It can be loaded into libparrot as a library and individual calls can be made into it. It can also be executed directly as an application, which will trigger the :main function, if any. The PMC can also be written out to a .pbc file for later use.

Currently there are two functions to get a bytecode PMC.

    Parrot_api_load_bytecode(interp, filename, *pbc)
    Parrot_api_load_bytecode_bytes(interp, bytecode, size, *pbc)

The first function loads bytecode from a file. The second loads bytecode in from an in-memory byte array. Both return a bytecode PMC. That PMC can be passed as an argument to any of the following functions:

    Parrot_api_ready_bytecode(interp, pbc, *main_sub)
    Parrot_api_run_bytecode(interp, pbc, args)
    Parrot_api_disassemble_bytecode(interp, pbc, outfilename, opts)

Parrot_api_ready_bytecode loads the bytecode into memory and returns a reference to the :main Sub PMC, if any. It does not automatically execute the bytecode. Parrot_api_run_bytecode loads bytecode and automatically executes :main with the given arguments, and sets these arguments in the IGLOBALS array for later access. Parrot_api_disassemble_bytecode is used primarily by the pbc_disassemble frontend.

Settings and Configuration

The interpreter is configured in many ways. When the interpreter is created there is a structure Parrot_Init_Args that we can use to optionally change some of the low-level internal options of the interpreter. Thereafter, we can set configurations using the Config Hash or a series of API calls.

The Configuration hash is a hash of named settings that can be set in the interpreter. The primary purpose of this is to help set information such as standard search paths. The configuration hash will also be available from the interpreter's IGLOBALS array, and may be used by various tools and utilities to inform certain decisions. To set a configuration hash, call

    Parrot_api_set_configuration_hash

This function is only really intended to be called once per interpreter. It is possible to set a new configuration hash at any time, but settings from the old hash will not be removed first. Currently, the configuration hash is set as a global, and any interpreter created after a config hash is set will automatically inherit the last set configuration hash. Setting a new config hash on a child interpreter does not affect any existing interpreters.

In addition to the configuration hash, library search paths can be appended to through the following functions:

    Parrot_api_add_library_search_path
    Parrot_api_add_include_search_path
    Parrot_api_add_dynext_search_path

There is currently no way to remove search paths once set, or to examine the complete list of search paths. This may be added later.

Strings and PMCs

Embedding API functions which perform general operations on any PMC are named Parrot_api_pmc_* and are located in the file src/embed/pmc.c. Functions which perform general operations on a Parrot String are named Parrot_api_string_* and are located in the file src/embed/strings.c.

PMC functions are general and explicitly limited. The API does not, and does not want to, provide complete access to the entire suite of internal operations and VTABLEs. The API does not provide convenience methods to do all operations in a single call. Some common operations may take multiple API calls to perform.

String operations are broken down into two types: The first are the set of functions used to marshal between C-level strings and Parrot Strings. Import functions take a C string (char* or wchar_t*), and copy their contents into a new String buffer. Export functions perform the opposite operation: The internal buffer is copied to a freshly allocated memory block and returned. When you are done with the string, there is an associated free function to deallocate that memory.

The second type of string API functions are functions to perform operations on strings. These could be operations such as string analysis or string manipulation.

References