parrotcode: Parrot FAQ | |
Contents | Documentation |
docs/faq.pod - Parrot FAQ
Parrot is the new interpreter being designed from scratch to support the upcoming Perl 6 language. It is being designed as a standalone virtual machine that can be used to execute bytecode compiled dynamic languages such as Perl 6, but also Perl 5. Ideally, Parrot can be used to support other dynamic, bytecode-compiled languages such as Python, Ruby and Tcl.
The name "Parrot" relates to Simon Cozens's April Fool's Joke where Larry Wall and Guido van Rossum announced the merger of the Perl and Python languages.
As penance, Simon spent time as Parrot's lead developer, but he's gotten better.
No. Parrot is an implementation that is expected to be used to execute Perl 6 programs. The Perl 6 language definition is currently being crafted by Larry Wall. While the true nature of Perl 6 is still unknown, it will be substantially similar to Perl as we know it today, and will need a runtime system. For more information on the nascent Perl 6 language definition, check out Larry's apocalypses.
Yes.
Parrot is in the early phases of its implementation. The primary way to use Parrot is to write Parrot assembly code, described in PDD6.
You can also create dynamic content within Apache using Ask Bjorn Hansen's mod_parrot module. You are strongly advised that mod_parrot is a toy, and should not be used with any production code.
Lots of reasons, actually. :^)
Seriously, though, programming in Parrot assembly language is an interesting challenge. It's also one of the best ways to write test cases for Parrot.
It depends on what you mean by real. :^)
C.
Because it's the best we've got.
So true. Regardless, C's available pretty much everywhere. Perl 5's in C, so we can potentially build any place Perl 5 builds.
Because of one of:
The most common issues are:
Parrot has an odd license -- it currently uses the same license as Perl 5, which is the disjunction of the GNU GPL and the Artistic License, which can be written (Artistic|GPL) for short. Thus, Parrot's license is compatible with the GNU GPL, which means you can combine Parrot with GPL'ed code.
Code accepted into the core interpreter must fall under the same terms as parrot. Library code (for example the ICU library we're using for Unicode) we link into the interpreter can be covered by other licenses so long as their terms don't prohibit this.
Parrot has to work on most of Perl 5's platforms, as well as a few of its own. Perl 5 runs on eighty platforms; Parrot must run on Unix, Windows, Mac OS (X and Classic), VMS, Crays, Windows CE, and Palm OS, just to name a few. Among its processor architectures will be x86, SPARC, Alpha, IA-64, ARM, and 68x00 (Palms and old Macs). If something doesn't work on all of these, we can't use it in Parrot.
Not only does Parrot have to run on all those platforms, but it must also run efficiently. Parrot's core size is currently between 250K and 700K, depending on compiler. That's pushing it on the handheld platforms. Any library used by Parrot must be fast enough to have a fairly small performance impact, small enough to have little impact on core size, and flexible enough to handle the varying demands of Perl, Python, Tcl, Ruby, Scheme, and whatever else some clever or twisted hacker throws at Parrot.
These tests are very hard to pass; currently we're expecting we'll probably have to write everything but the Unicode stuff.
Those VMs are designed for statically typed languages. That's fine, since Java, C#, and lots of other languages are statically typed. Perl isn't. For a variety of reasons, it means that Perl would run more slowly there than on an interpreter geared towards dynamic languages.
The .NET VM didn't even exist when we started development, or at least we didn't know about it when we were working on the design. We do now, though it's still not suitable.
Sure we will. They're just not our first target. We build our own interpreter/VM, then when that's working we start in on the JVM and/or .NET back ends.
While I'm sure that's a perfectly nice, fast VM, it's probably got the same issues as do the languages in the "Why not something besides C" question does. I realize that the Scheme-48 interpreter's darned fast, for example, but we're looking at the same sort of portability and talent pool problems that we are with, say, Erlang or Haskell as an implementation language.
The mailing list precedes the Parrot joke and subsequent unveiling of the True Grand Project by a number of months. We've just not gotten around to renaming the mailing list. We will.
Audrey Tang, the lead on the Pugs project, notes that an unoptimized Parrot is already 30% faster than the Haskell-based interpreter. Add compiler optimization and a few planned optimizations and Parrot will beat Pugs for speed hands down. Audrey thinks that Pugs could be made faster with some Haskell compiler tricks, but it's harder work and less effective than the Parrot optimizations we already know how to do.
Perl 5 is highly portable, and builds on around 50 different systems, many far removed from Unix or MS Windows. We'd like Perl 6 to be able run everywhere that Perl 5 runs, so we need to keep Parrot as portable as possible. The Glasgow Haskell Compiler is a pain to build on minor systems, and downright impossible on small systems. So by going with Pugs and GHC we'd be sacrificing portability.
As well, other languages apart from Perl 6 are being targeted to Parrot. Significant parts of Python, TCL, Perl 5, and Basic have already been implemented and others are on the way. Running multiple languages on the same Parrot engine allows them to be cross-language compatible-- in other words, one targeted language could directly invoke the methods of another at the bytecode level.
Finally there is a reason the Parrot design keeps talking about running bytecode direct from disk rather than relying on doing compiling (from Perl or with a JIT) in memory. It's all very well doing such operations when running one program, but think what happens on a multi-user system when 300 people fire up "parrot order.pbc" - 300 parrot processes all fighting for resources. To quote Dan,
non-jit vss/rss is 29784 17312, JIT vss/rss is 122032 108916. A not
insignificant difference :)
With read only bytecode shared between processes, much of that "non-jit" resident memory is going to be shared. So much less swapping. And don't think that this won't matter to you because you don't have 300 users all running the same program - consider what happens if each Perl 6 module is compiled to bytecode. With read only bytecode 300 different Perl scripts all share the same memory for Carp.pbc, warnings.pbc, etc. Without, and they're all swapping like crazy...
Yes (though at this time, that's in the early stages). Still, the ultimate goal is for Perl 6 to be self-hosting (that is, written in itself) in order to improve introspection, debugger capabilities, compile-time semantic modulation, etc. For this reason, Pugs-on-Haskell will probably be the compiler that first compiles the ultimate Perl 6 compiler, but thereafter the Haskell-based interpreter will no longer be the primary reference implementation. This is documented by the Pugs team at http://svn.perl.org/perl6/pugs/trunk/docs/01Overview.html
Good question.
At The Perl Conference 4.0, in the summer of 2000, Larry Wall announced that it was time to recreate Perl from the ground up. This included the Perl language, the implementation of that language, the community of open source developers who volunteer to implement and maintain the language, and the larger community of programmers who use Perl.
A variety of reasons were given for embarking on this project:
system
should return true instead of false on success, and localtime
should return the year, not the year - 1900.Sure. Why not? C, Java, Lisp, Scheme, and practically every other language is self-hoisting. Why not?
No, not really. Don't forget that we can use Perl 5 to run Perl 5 programs, such as a Perl 5 to Parrot compiler.
We don't know yet, since it depends on the Perl 6 language definition. But we could use the more appropriate of two Perl compilers, depending of whether we're compiling Perl 5 or Perl 6. Larry has mumbled something about a package
statement declaring that the file is Perl 5, but we're still not quite sure on how that fits in.
Probably.
No, Parrot won't be twisted enough for Damian. Perhaps when Parrot is ported to a pair of supercooled calcium ions, though...
You had to be there.
Not much, why do you ask?
No, in fact, I don't.
Like what? There's just the JVM.
What others? That's it, unless you count Perl, Python, or Ruby.
Yeah, right. You never thought of them as VMs, admit it. :^)
Seriously, we're already running with a faster opcode dispatch than any of them are, and having registers just decreases the amount of stack thrash we get.
The 68K emulator Apple ships with all its PPC-enabled versions of Mac OS.
Reference counting has three big issues.
Well... no. It's all or nothing. If we were going to do a partial scheme we might as well do a full scheme. (A partial refcounting scheme is actually more expensive, since partial schemes check to see whether refcounts need twiddling, and checks are more expensive than you might think)
Whether we have a lot or not actually depends on how you count. In absolute, unique op numbers we have more than pretty much any other processor, but that is in part because we have *no* runtime op variance.
It's also important to note that there's no less code involved (or, for the hardware, complexity) doing it our way or the decode-at-runtime way -- all the code is still there in every case, since we all have to do the same things (add a mix of ints, floats, and objects, with a variety of ways of finding them) so there's no real penalty to doing it our way. It actually simplifies the JIT some (no need to puzzle out the parameter types), so in that we get a win over other platforms since JIT expenses are paid by the user every run, while our form of decoding's only paid when you compile.
Finally, there's the big "does it matter, and to whom?" question. As someone actually writing parrot assembly, it looks like parrot only has one "add" op -- when emitting pasm or pir you use the "add" mnemonic. That it gets qualified and assembles down to one variant or another based on the (fixed at assemble time) parameters is just an implementation detail. For those of us writing op bodies, it just looks like we've got an engine with full signature-based dispatching (which, really, we do -- it's just a static variant), so rather than having to have a big switch statement or chain of ifs at the beginning of the add op we just write the specific variants identified by function prototype and leave it to the engine to choose the right variant.
Heck, we could, if we chose, switch over to a system with a single add op with tagged parameter types and do runtime decoding without changing the source for the ops at all -- the op preprocessor could glob them all together and autogenerate the big switch/if ladder at the head of the function. (We're not going to, of course, but we could.)
As for what the rationale is... well, it's a combination of whim and necessity for adding them, and brutal reality for deleting them.
Our ops fall into two basic categories. The first, like add, are just basic operations that any engine has to perform. The second, like time, are low-level library functions.
For something like hardware, splitting standard library from the CPU makes sense -- often the library requires resources that the hardware doesn't have handy. Hardware is also often bit-limited -- opcodes need to fit in 8 or 9 bits.
Parrot, on the other hand, *isn't* bit-limited, since our ops are 32 bits. (A more efficient design on RISC systems where byte-access is expensive.) That opens things up a bunch.
If you think about it, the core opcode functions and the core low-level libraries are *always* available. Always. The library functions also have a very fixed parameter list. Fixed parameter list, guaranteed availability... looks like an opcode function to me. So they are. We could make them library functions instead, but all that'd mean would be that they'd be more expensive to call (our sub/method call is a bit heavyweight) and that you'd have to do more work to find and call the functions. Seemed silly.
Or, I suppose, you could think of it as if we had *no* opcodes at all other than end and loadoplib. Heck, we've a loadable opcode system -- it'd not be too much of a stretch to consider all the opcode functions other than those two as just functions with a fast-path calling system. The fact that a while bunch of 'em are available when you start up's just a convenience for you.
See http://www.nntp.perl.org/group/perl.perl6.internals/22003 for more details.
April Fool's Joke: http://www.perl.com/pub/a/2001/04/01/parrot.htm
apocalypses: http://dev.perl.org/perl6/apocalypse/
exegeses: http://dev.perl.org/perl6/exegesis/
synopses: http://dev.perl.org/perl6/synopsis/
Java bytecode to Parrot bytecode: http://archive.develooper.com/perl6-internals@perl.org/msg03864.html
http://www.perl.com/pub/a/2000/10/23/soto2000.html
be there: http://www.csse.monash.edu.au/~damian/papers/#Superpositions
Really.: http://developer.apple.com/techpubs/mac/PPCSoftware/PPCSoftware-13.html
The FAQ is now in version control and "Revision" isn't really being tracked. The most recent SVN ID is $Id$
|