[DRAFT] PDD 31: Inter-Language Calling

Abstract

This PDD describes Parrot's conventions and support for communication between high-level languages (HLLs). It is focused mostly on what implementors should do in order to provide this capability to their users.

Version

$Revision$

Description

The ability to mix different high-level languages at runtime has always been an important design goal of Parrot. Another important goal, that of supporting all dynamic languages, makes language interoperability especially interesting -- where "interesting" means the same as it does in the Chinese curse, "May you live in interesting times." It is expected that language implementers, package authors, and package users will have to be aware of language boundaries when writing their code. It is hoped that this will not become too burdensome.

None of what follows is binding on language implementors, who may do whatever they please. Nevertheless, we hope they will at least follow the spirit of this document so that the code they produce can be used by the rest of the Parrot community, and save the fancy footwork for intra-language calling. However, this PDD is binding on Parrot implementors, who must provide a stable platform for language interoperability to the language implementors.

Ground rules

In order to avoid N**2 complexity and the resulting coordination headaches, each language compiler provides an interface as a target for other languages that should be designed to require a minimum of translation. In the general case, some translation may be required by both the calling language and the called language:

{{ There seems to be an implied basic assumption here that language interoperability is the responsibility of the language implementor. It is not. We cannot require that language implementors design and implement their languages according to some global specification. Any interoperability infrastructure must be provide by Parrot, and must work for all languages. --allison }}

        |
        |
        |                        Calling sub
        |                             |
        |   Language X                |
        |                             V
        |                        Calling stub
        +================             |
                                      |
          "plain Parrot"              |
                                      |
        +================             |
        |                              V
        |                        Called wrapper
        |                             |
        |                             |
        |   Language Y                V
        |                         Called sub
        |

Where necessary, a language may need to provide a "wrapper" sub to interface external calls to the language's internal calling and data representation requirements. Such wrappers are free to do whatever translation is required.

Similarly, the caller may need to emit a stub that converts an internal call into something more generic.

{{ Of course, "stub" is really too close to "sub", so we should find a better word. Doesn't the C community call these "bounce routines"? Or something? -- rgr, 31-Jul-08.

The language will never provide a wrapper for its subs. For the most part, wrappers will be unnecessary. Where a wrapper is desired to make a library from some other language act more like a "native" library, the person who desires the native behavior can implement the wrapper and make it publicly available. --allison }}

{{ I am discovering that there are five different viewpoints here, corresponding to the five layers (including "plain Parrot") of the diagram above. I need to make these viewpoints clearer, and describe the responsibilities of each of these parties to each other. -- rgr, 31-Jul-08. }}

Languages are free to implement the stub and wrapper layers (collectively called "glue") as they see fit. In particular, they may be inlined in the caller, or integral to the callee.

Ideally, of course, the "plain Parrot" layer will be close enough to the semantics of both languages that glue code is unnecessary, and the call can be made directly. Language implementors are encouraged to dispense with glue whenever possible, even if glue is sometimes required for the general case.

In summary:

Each HLL gets its own namespace subtree, within which get_hll_global and set_hll_global operate. In order to make external calls, the HLL must provide a means of identifying the language, the function, and enough information about the arguments and return values for the calling language to generate the call correctly. This is necessarily language-dependent, and is beyond the scope of this document.
When calling across languages, both the caller and the callee should try to use "plain Parrot semantics" to the extent possible. This is explained in more detail below, but essentially means to use the simplest calling conventions and PMC classes possible. Ideally, if an API uses only PMCs that are provided by a "bare Parrot" (i.e. one without any HLL runtime code), then it should be possible to use this API from any other language.

It is acceptable for languages to define subs for internal calling that are not suitable for external calling. Such subs should be marked as such, and other languages should respect those distinctions. (Or, if they choose to call intra-language subs, they should be very sure they understand that language's calling conventions.

{{ It's not possible to define a sub that can't be called externally --allison }}

Half-Baked Ideas

{{ Every draft PDD should have one of these. ;-} -- rgr, 28-Jul-08. }}

Common syntax for declaring exported functions?

I assume we will need some additional namespace support. Not clear yet whether it's better to mark the ones that or OK for external calling, or the ones that are not.

(As you can guess, I don't have a strong suggestion for what to call these functions yet. Do we call them "external"? Would that get confused with intra-language public interfaces?)

Beyond that, we probably need additional metainformation on the external subs so that calling compilers will know what code to emit. Putting them on the subs means that the calling compiler just needs to load the PBC in order to access the module API (though it may need additional hints). Of course, that also requires a PIR API for accessing this metainformation . . .

{{ Exporting is very much a Perl idea, not much applicability for exporting outside of Perl. --allison}}

Crazy idea: This is more or less the same information (typing) required for multimethods. If we encourage the export of multisubs, then the exporting language could provide multiple interfaces, and the calling compiler could query the set of methods for the one most suitable.

{{ Proposal rejected, because we aren't going with "external" and "internal" subroutine variants, so it's not needed. --allison }}

More namespace complexity?

{{ Proposal rejected, because we aren't going with "external" and "internal" subroutine variants, so it's not needed. --allison }}

It might be good to have some way for HLLs to define a separate external definition for a given sub (i.e. one that provides the wrapper) that can be done without too much namespace hair. I.e.

        .sub foo :extern

defines the version that is used by interlanguage calling, and

        .sub foo

defines the version that is seen by other code written in that language (i.e. via get_hll_global). If there is no plain foo, the :extern version is used for internal calls. That way, the compiler can emit both wrapper code and internal code without having to do anything special (much), even if different calling conventions and/or data conversions are required.

{{ Of course, this wouldn't be necessary if all external subs were multisubs. -- rgr, 31-Jul-08. }}

Multiple type hierarchies?

Different languages will have to "dress up" the Parrot type/class hierarchy differently. For example, Common Lisp specifies that STRING is a subtype of VECTOR, which in turn is a subtype of ARRAY. This is not likely to be acceptable to other languages, so Lisp needs its own view of type relationships, which must affect multimethod dispatch for Lisp generic functions, i.e. a method defined for VECTOR must be considered when passed a string as a parameter.

{{ Common Lisp (for example) will have its own set of type relationships, because it will have its own set of types. There will be no "remapping" of core types --allison }}

The language that owns the multisub gets to define the type hierarchy and dispatch rules used when it gets called. In order to handle objects from foreign languages, the "owning" language must decide where to graft the foreign class inheritance graph into its own graph. {{ It would be nice if some Parrot class, e.g. Object, could be defined as the conventional place to root language-specific object class hierarchies; that way, a language would only have to include Object in order to incorporate objects from all other conforming languages. -- rgr, 26-Aug-08. }}

{{ The language that owns the multisub does get to define the dispatch rules for the multisub. But, it doesn't get to alter the type hierarchy of objects from other languages. --allison }}

Note that common Parrot classes will in general appear in different places in different languages' dispatch hierarchies, so it is important to bear in mind which language "owns" the dispatch.

Definitions

{{ Collect definitions of new jargon words here, once we figure out what they should be. -- rgr, 29-Jul-08. }}

Implementation

Plain Parrot Semantics

Fortunately, "plain Parrot" is pretty powerful, so the "common denominator" is not in fact the lowest possible. For example, not all Parrot languages support named, optional, or repeated arguments. For the called language, this is never a problem; calling module can only use the subset API anyway. Implementers of subset calling languages are encouraged to provide their users with an extended API for the interlanguage call; typically, this is only required for named arguments.

Strings

    {{ I am probably not competent to write this section.  At the very least,
    it requires discussion of languages that expect strings to be mutable
    versus . . . Java.  -- rgr, 28-Jul-08. }}

Other scalar data types

All Parrot language implementations should stick to native Parrot PMC types for scalar data, except in case of dire need. To see with this is so, take the particular case of integer division, which differs significantly between languages.

{{ No, this is completely backwards. Languages are heartily encouraged to create their own PMCs for any and all common variable types found in the language. --allison }}

In Tcl, "the integer three divided by the integer five" produces the integer value 0.

In Perl 5 and Lua, this division produces the floating-point value 0.6. (This happens to be Parrot's native behavior as well.)

In Common Lisp, this division produces "3/5", a number of type RATIO with numerator 3 and denominator 5 that represents the mathematically-exact result.

Furthermore, no Perl 5 code, when given two integers to divide, will expect a Common Lisp ratio as a result. Any Perl 5 implementation that does this has a bug, even if both those integers happen to come from Common Lisp. Ditto for a floating-point result from Common Lisp code that happens to get two integers from Perl or Lua (or both!).

{{ Not a bug, it's the expected result. Divide operations are multi-dispatched. If you pass two Common Lisp integers into a divide operation in Perl 5, it'll search for the best matching multi, and if it finds one for Common Lisp integers (an exact match), it'll run that and return a Common Lisp ratio. --allison }}

Even though these languages all use "/" to represent division, they do not all mean the same thing by it, and similarly for most (if not all) other built-in arithmetic operators. However, they pretty clearly do mean the same thing by (e.g.) "the integer with value five," so there is no need to represent the inputs to these operations differently; they can all be represented by the same Integer PMC class.

{{ The whole point of having sets of PMCs in different languages is to handle the case where "it's an integer, but has a different division operation than other languages" --allison}}

{{ Must also discuss morphing: If some languages do it and other do not, then care must be taken at the boundaries. -- rgr, 31-Jul-08. }}

Defining new scalar data types

There will be cases where existing Parrot PMC classes cannot represent a primitive HLL scalar type, and so a new PMC class is required. In this case, interoperability cannot be guaranteed, since it may not be possible to define behavior for such objects in other languages. But the choice of a new PMC is forced, so we must make the best of it.

{{ Yes, except this is the common case, and interoperability will still work --allison }}

A good case in point is that of complex rational numbers in Common Lisp. The Complex type provided by Parrot assumes that its components are floating-point numbers. This is a suitable representation type for (COMPLEX REAL), but CL partitions "COMPLEX" into (COMPLEX REAL) and (COMPLEX RATIONAL), with the latter being further divided into (COMPLEX RATIO), (COMPLEX INTEGER), etc. The straightforward way to provide this functionality is to define a ComplexRational PMC that is built on Complex and has real and imaginary PMC components that are constrained to be Integer, Bigint, or Ratio PMCs.

So how do we make (COMPLEX RATIONAL) arithmetic work as broadly as possible?

The first aspect is defining how the new type actually works within its own language. The Lisp arithmetic operators will usually return a ComplexRational if given one, but need to return a RATIONAL subtype if the imaginary part is zero, and that may not be suitable for other languages, so Lisp needs its own set of basic arithmetic operators. We must therefore define methods on these multis that specialize ComplexRational (and probably the generic arithmetic to redispatch on the type of the real and imaginary parts; you know the drill). But, in case we are also passed another operand that is another language's exotic type, we should take care to use the most general possible class to specialize the other operands, in the hope that other exotics are subclasses of these.

{{ It is perfectly fine for a Lisp arithmetic operator to return a RATIONAL subtype. Please don't define methods for a pile of operations that already have vtable functions --allison }}

The other aspect is extending other languages' arithmetic to do something reasonable with our exotic types. If we're lucky, Parrot will provide a basic multisub that takes care of most cases, and we just need to add method(s) to that. If not, we will have to add specialized methods on the other language's multisub, trying to redispatch to the other language's arithmetic ops passing the (hopefully more generic) component PMCs. Doing so is still the responsibility of the language that defines the exotic class, since it is in charge of its internal representation.

{{ The default multi for a common operation like division will call the PMC's get_number vtable function, perform a standard division operation, and return a standard Integer/Number/BigNum. --allison }}

{{ We can define multimethods on another language without loading it, can't we? If not, then making this work may require negotiation between language implementors, if it is feasible at all. -- rgr, 31-Jul-08. }}

{{ I'm not sure what you mean by defining multimethods on another language. Perhaps you're asking if it's possible to declare a multisub for a type that doesn't exist yet? --allison }}

This brings us to a number of guidelines for defining language-specific arithmetic so as to maximize interoperability:

Define language-specific operations using multimethods (to avoid conflict with other languages).

Define them on the highest (most general) possible PMC classes (in order that they continue to work if passed a subclass by a call from a different language).

{{ Define them on the class that makes sense. There's no point in targeting any particular level of the inheritance hierarchy. --allison }}

Don't define a language-specific PMC class unless there is clear need for a different internal representation. (And even then, you might consider donating it to become part of the Parrot core.)

{{ The fundamental rule is to implement your language in the way that makes the most sense for your language. Language implementors don't have to think about interoperability. --allison }}

The rest of this section details exceptions and caveats in dealing with scalar data types.

"Fuzzy" scalars

Some languages are willing to coerce strings to numbers and vice versa without any special action on the part of the programmer and others are not. The problem arises when such "fuzzy" scalars are passed (or returned) to languages that do not support "fuzzy" coercion . . .

{{ This section is meant to answer Geoffrey's "What does Lisp do with a Perl 5 Scalar?" question. I gotta think about this more. -- rgr, 29-Jul-08. }}

{{ The scalar decides when to morph, not the language. All the languages that have morphing scalars implement them in such a way that they know how to handle, for example, morphing when a string value is assigned to an integer scalar, and what to do if that value is later used as an integer again. --allison }}

`Complex` numbers

Not all languages support complex numbers, so if an exported function requires a complex argument, it should either throw a suitable error, or coerce an acceptable numeric argument. In the latter case, be sure to advertise this in the documentation, so that callers without complex numbers can tell their compiler that acceptable numeric type.

{{ All documentation for a library should state what argument types it accepts and what results it returns, there's nothing unique about complex numbers. --allison }}

`Ratio` numbers

Not all languages support ratios (rather few, actually), so if an exported function requires a ratio as an argument, it should either throw a suitable error, or convert an acceptable numeric value.

However, since ratios are rare (and it is rather eccentric for a program to insist on a ratio as a parameter), it is strongly advised to accept a floating point or integer value, and convert it in the wrapper.

{{ All documentation for a library should state what argument types it accepts and what results it returns, there's nothing unique about ratios. --allison }}

    {{ Parrot does not support these yet, so this is not a current issue.  --
    rgr, 28-Jul-08. }}

Aggregate data types

{{ I probably haven't done these issues justice; I don't know enough Java or Tcl to grok this part of the list discussion. -- rgr, 28-Jul-08. }}

Aggregates (hashes, arrays, and struct-like thingies) can either be passed directly, or mapped by wrapper or caller code into something different. The problem with mapping, besides being slow, is that if either the caller or the callee does this, the aggregate is effectively read-only. (It is possible for the wrapper to stuff the changes back in the original structure by side effect, but this has its own set of problems.)

{{ Mapping is generally discouraged, but I don't see any reason it would make the aggregate read-only. You can certainly convert a Python dictionary to a Perl hash, use it in your Perl code, and then either return it as a Perl hash, or convert it back to a Python dictionary. --allison }}

In other words, if the mapping is not straightforward, it may not be possible. If the mapping is straightforward it may not be necessary -- and an unnecessary mapping may limit use of the called module's API.

Struct-like objects are problematic. They are normally considered as low-level and language-specific, and handled by emitting special code for slot accessor/setter function, which other language compilers won't necessarily know how to do. The choices are therefore to (a) treat them like black boxes in the other language, or (b) provide a separate functional or OO API (or both) for calling from other languages.

Several questions arise for languages with multiple representations for aggregate types. Typically, this is because these types are more restricted in some fashion. [finish. -- rgr, 29-Jul-08.]

Functional data types

In a sense, functional types (i.e. callable objects) are the easiest things to pass across languages, since they require no mapping at all. On the other hand, if a language doesn't support functional arguments, then there is no hope of using an API written in another language that requires them.

{{ Hmmm? They're just subs, how would they not be callable from another language? --allison }}

Datum vs. object

Some languages present everything to the programmer as an object; in such languages, code only exists in methods. A few languages have no methods, only functions (and/or subroutines) and "passive" data. The remainder have both, and pose no problem calling into the others.

But how does an obligate OO language call a non-OO language, or vice versa? An extreme case would be Ruby (which has only objects) and Scheme (which (as far as Ruby is concerned) has none). What good is a Ruby object as a datum to a Scheme program if Scheme can't access any of the methods? Similarly, what could Ruby do with a Scheme list when it can't even get to the Scheme car function?

{{ Except that Ruby would never even get a Scheme list in the first place if it hadn't loaded a Scheme library of some sort. And, being a list, the Scheme list would still support the standard vtable functions for lists. --allison }}

{{ Methinks the right thing would be to define a common introspection API (a good thing in its own right). Scheme and Ruby should each define their own implementation of the same in "plain Parrot semantics" terms, independently. The caller can then use his/her language's binding of the introspection API to poke around in the other module, and find the necessary tools to call the other. For Scheme, this would mean functions for finding Ruby classes and providing functional wrappers around methods. For Ruby, I admit this would probably be even weirder. In any case, it is important that the calling user not need anything out of the ordinary, from either language or the called module author. -- rgr, 29-Jul-08. }}

{{ There is a common introspection API, the 'inspect' vtable function. But what you're describing here isn't introspection, it's actually the standard vtable functions. --allison }}

Defining methods across language boundaries

{{ Is the term "unimethod" acceptable here? -- rgr, 29-Jul-08. They're just methods or subroutines, and it's just "single dispatch". --allison}}

There will be cases where a module user wants to extend that module by defining a new method on an externally-defined class, or add a multimethod to an externally-defined multisub. Since a class with unimethod dispatch belongs wholly to the external language, the calling language (i.e. the one adding the method) must use the semantics of the external language. If the external language uses a significantly different metamodel, simply adding the :method pragma may not cut it.

{{ No, the :method flag is always all you need to define a method. The class object you add the method to determines what it does with that method. --allison }}

There are two cases: (1) The calling language is adding a new method, which cannot therefore interfere with existing usage in the called language; and (2) the calling language is attempting to extend an existing interface provided by the called language. In the first case, the calling compiler has the option of treating the new method as part of the calling language, and dispensing with the glue altogether. In the second case, the compiler must treat the new method as part of the foreign language, and provide both glue layers (as necessary) around it. It is therefore not expected that all compilers will provide a way to define methods on all foreign classes for all language pairs.

{{ These should generally be handled by subclassing the parent language class, and adding your method to the subclass. Monkeypatching is certainly possible, but not encouraged. And, there really isn't any distinction between "treating the new method as part of the calling language" and "treat[ing] the new method as part of the foreign language". It's a method, you call it on an object, the class of the object determines how it's found and invoked. --allison }}

Multimethods are easier; although the multisub does belong conceptually to one language (from whose namespace the caller must find the multisub), multis are more loosely coupled to their original language.

{{ Well, the semantics of the language that defined the multisub also determine how it is found and invoked. --allison }}

The cases for multimethods are similar, though: (1) If the calling language method is specialized to classes that appear only in the calling module, then other uses of the multisub will never call the new method, and the calling language can choose to treat as internal. (2) If the calling method is specialized only on Parrot or called-language classes, then the compiler should take care to make it generally usable.

Subclassing across language boundaries

{{ This is an important feature, but requires compatible metamodels. -- rgr, 29-Jul-08.

Or Proxy PMCs, which is how we're currently handling inheritance across metamodel boundaries. --allison }}

Method vs. multimethod

{{ This is the issue where some languages (e.g. Common Lisp) use only multimethods, where others (e.g. Ruby) use only unimethods. (S04 says something about MMD "falling back" to unimethods, but so far this is not described in Parrot.) Calling is easy; multimethods look like functions, so the MM language just has to create a function (or MM) wrapper for the UM language, and a UM language can similarly treat a MM call as a normal function call. (Which will require the normal "make the function look like a method" hack for obligate OO languages like Ruby.) Defining methods across the boundary is harder, and may not be worth the trouble. -- rgr, 29-Jul-08. }}

{{ That's "multiple dispatch" and "single dispatch". In general, defining code in one language and injecting it into the namespace of another language isn't the primary focus of language interoperability. Using libraries from other languages is. --allison }}

References

None.