parrotcode: Conventions and Guidelines for Parrot Source | |
Contents | Documentation |
docs/pdds/pdd07_codingstd.pod - Conventions and Guidelines for Parrot Source Code
$Revision$
This document describes the various rules, guidelines and advice for those wishing to contribute to the source code of Parrot, in such areas as code structure, naming conventions, comments etc.
One of the criticisms of Perl 5 is that its source code is impenetrable to newcomers, due to such things as inconsistent or obscure variable naming conventions, lack of comments in the source code, and so on. We don't intend to make the same mistake when writing Parrot. Hence this document.
We define three classes of conventions:
Note that since Parrot is substantially implemented in C, these rules apply to C language source code unless otherwise specified.
In addition,
C code may assume that any pointer value can be coerced to an integral type (no smaller than typedef INTVAL
in Parrot),
then back to its original type,
without loss.
C code that makes assumptions beyond these must depend on the configuration system, either to not compile an entire non-portable source where it will not work, or to provide an appropriate #ifdef macro.
{{ TODO: Enumerate all other non-C89 assumptions that Parrot depends on. }}
Perl code may use features not available in Perl 5.6.1 only if it is not vital to Parrot,
and if it uses $^O
and $]
to degrade or fail gracefully when it is run where the features it depends on are not available.
The following must apply:
}
that closes an if
must line up with the if
.else
s are forbidden: i.e.
avoid } else {
.The following should apply:
Interp *foo
,
but not e.g.
<Interp* foo>.return (x+y)*2
.There should be no space between function names and following open parentheses,
e.g.
z = foo(x+y)*2
.
and ->
) should have at least one space on either side; there should be no space between unary operators and their operands; parentheses should not have space immediately after the opening parenthesis nor immediately before the closing parenthesis; commas should have at least one space after,
but not before; e.g.: x = (a-- + b) * f(c, d / e.f)
foo = 1 + 100;
x = 100 + 1;
whatever = 100 + 100;
... to this (good):
foo = 1 + 100;
x = 100 + 1;
whatever = 100 + 100;
(Note that formatting consistency trumps this rule. For example, a long if
/else if
chain is easier to read if all (or none) of the conditional code is in blocks.)
{{ TODO: Modify parrot.el so this rule is no longer required. }}
if (a && (b = c)) ...
or if ((a = b)) ...
. z = foo(bar + baz(something_very_long_here
* something_else_very_long),
corge);
The following must apply:
/* comment */
. (Not all C compilers handle C++-style comments.)The following should apply
typedef struct Foo {
...
} Foo;
#ifndef NO_FEATURE_FOO
.if (!foo) ...
.(Note: PMC *
values should be checked for nullity with the PMC_IS_NULL
macro, unfortunately leading to violations of the double-negative rule.)
All developers using Emacs must ensure that their Emacs instances load the elisp source file editor/parrot.el before opening Parrot source files. See "README.pod" in editor for instructions.
All source files must end with an editor instruction coda:
/*
* Local variables:
* c-file-style: "parrot"
* End:
* vim: expandtab shiftwidth=4:
*/
# Local Variables:
# mode: cperl
# cperl-indent-level: 4
# fill-column: 100
# End:
# vim: expandtab shiftwidth=4:
{{ XXX - Proper formatting and syntax coloring of C code under Emacs requires that Emacs know about typedefs. We should provide a simple script to update a list of typedefs, and parrot.el should read it or contain it. }}
Parrot runs on many, many platforms, and will no doubt be ported to ever more bizarre and obscure ones over time. You should never assume an operating system, processor architecture, endian-ness, size of standard type, or anything else that varies from system to system.
Since most of Parrot's development uses GNU C, you might accidentally depend on a GNU feature without noticing. To avoid this, know what features of gcc are GNU extensions, and use them only when they're protected by #ifdefs.
C arrays, including strings, are very sharp tools without safety guards, and Parrot is a large program maintained by many people. Therefore:
Don't use a char *
when a Parrot STRING would suffice. Don't use a C array when a Parrot array PMC would suffice. If do use a char *
or C array, check and recheck your code for even the slightest possibility of buffer overflow or memory leak.
Note that efficiency of some low-level operations may be a reason to break this rule. Be prepared to justify your choices to a jury of your peers.
unsigned char
to isxxx()
and toxxx()
Pass only values in the range of unsigend char
(and the special value -1, a.k.a. EOF
) to the isxxx() and toxxx() library functions. Passing signed characters to these functions is a very common error and leads to incorrect behavior at best and crashes at worst. And under most of the compilers Parrot targets, char
is signed.
const
keyword on argumentsUse the const
keyword as often as possible on pointers. It lets the compiler know when you intend to modify the contents of something. For example, take this definition:
int strlen(const char *p);
The const
qualifier tells the compiler that the argument will not be modified. The compiler can then tell you that this is an uninitialized variable:
char *p;
int n = strlen(p);
Without the const
, the compiler has to assume that strlen()
is actually initializing the contents of p
.
const
keyword on variablesIf you're declaring a temporary pointer, declare it const
, with the const to the right of the *
, to indicate that the pointer should not be modified.
Wango * const w = get_current_wango();
w->min = 0;
w->max = 14;
w->name = "Ted";
This prevents you from modifying w
inadvertantly.
new_wango = w++; /* Error */
If you're not going to modify the target of the pointer, put a const
to the left of the type, as in:
const Wango * const w = get_current_wango();
if (n < wango->min || n > wango->max) {
/* do something */
}
Declare variables in the innermost scope possible.
if (foo) {
int i;
for (i = 0; i < n; i++)
do_something(i);
}
Don't reuse unrelated variables. Localize as much as possible, even if the variables happen to have the same names.
if (foo) {
int i;
for (i = 0; i < n; i++)
do_something(i);
}
else {
int i;
for (i = 14; i > 0; i--)
do_something_else(i * i);
}
You could hoist the int i;
outside the test, but then you'd have an i
that's visible after it's used, which is confusing at best.
+-------------------------------------------------------+
Everything below this point must still be reviewed
+-------------------------------------------------------+
PERL_IN_CORE
.PERL_IN_FOO
so that code knows when it is being used within that subsystem. The file will also contain all the 'convenience' macros used to define shorter working names for functions without the perl prefix (see below). /* file header comments */
#if !defined(PARROT_<FILENAME>_H_GUARD)
#define PARROT_<FILENAME>_H_GUARD
/* body of file */
#endif /* PARROT_<FILENAME>_H_GUARD */
new_foo_bar
rather than NewFooBar
or (gasp) newfoobar
.create_foo_from_bar()
in preference to ct_foo_bar()
. Avoid cryptic abbreviations wherever possible.pmc_foo()
, struct io_bar
. They should be further prefixed with the word 'perl' if they have external visibility or linkage, namely, non-static functions, plus macros and typedefs etc which appear in public header files. (Global variables are handled specially; see below.) For example: perlpmc_foo()
struct perlio_bar
typedef struct perlio_bar Perlio_bar
#define PERLPMC_readonly_TEST ...
In the specific case of the use of global variables and functions within a subsystem, convenience macros will be defined (in foo_private.h) that allow use of the shortened name in the case of functions (ie pmc_foo()
instead of perlpmc_foo()
), and hide the real representation in the case of global variables.
pmc_foo
.Foo_bar
. The exception to this is when the first component is a short abbreviation, in which case the whole first component may be made uppercase for readability purposes, eg IO_foo
rather than Io_foo
. Structures should generally be typedefed.PMC_foo_FLAG
, PMC_bar_FLAG
, ...._FLAG
, eg PMC_readonly_FLAG
(although you probably want to use an enum
instead.)_TEST
, eg if (PMC_readonly_TEST(foo)) ...
_SET
, eg PMC_readonly_SET(foo);
_CLEAR
, eg PMC_readonly_CLEAR(foo);
_MASK
, eg foo &= ~PMC_STATUS_MASK
(but see notes on extensibility below)._SETALL
, CLEARALL
, _TESTALL
or <_TESTANY> suffixes as appropriate, to indicate aggregate bits, eg PMC_valid_CLEARALL(foo)
HAS_
, eg HAS_BROKEN_FLOCK
, HAS_EBCDIC
.IN_
, eg PERL_IN_CORE
, PERL_IN_PMC
, PERL_IN_X2P
. Individual include file visitations should be marked with PERL_IN_FOO_H
for file foo.hUSE_
, eg PERL_USE_STDIO
, USE_MULTIPLICITY
.DECL_
, eg DECL_SAVE_STACK
. Note that macros which implicitly declare and then use variables are strongly discouraged, unless it is essential for portability or extensibility. The following are in decreasing preference style-wise, but increasing preference extensibility-wise. { Stack sp = GETSTACK; x = POPSTACK(sp) ... /* sp is an auto variable */
{ DECL_STACK(sp); x = POPSTACK(sp); ... /* sp may or may not be auto */
{ DECL_STACK; x = POPSTACK; ... /* anybody's guess */
GLOBAL_foo
(the name being deliberately clunky). So we might for example have the following macros: /* perl_core.h or similar */
#ifdef HAS_THREADS
# define GLOBALS_BASE (aTHX_->globals)
#else
# define GLOBALS_BASE (Perl_globals)
#endif
/* pmc_private.h */
#define GLOBAL_foo GLOBALS_BASE.pmc.foo
#define GLOBAL_bar GLOBALS_BASE.pmc.bar
... etc ...
The importance of good code documentation cannot be stressed enough. To make your code understandable by others (and indeed by yourself when you come to make changes a year later :-), the following conventions apply to all source files.
del_I_foo()
/*
=item C<function(arguments)>
Description.
=cut
*/
% perl tools/docs/write_docs.pl -s
/* The loop is partially unrolled here as it makes it a lot faster.
* See the .dev file for the full details
*/
if (FOO_bar_BAZ(**p+*q) <= (r-s[FOZ & FAZ_MASK]) || FLOP_2(z99)) {
/* we're in foo mode: clean up lexicals */
... (20 lines of gibberish) ...
}
else if (...) {
/* we're in bar mode: clean up globals */
... (20 more lines of gibberish) ...
}
else {
/* we're in baz mode: self-destruct */
....
}
If Perl 5 is anything to go by, the lifetime of Perl 6 will be at least seven years. During this period, the source code will undergo many major changes never envisaged by its original authors - cf threads, unicode in perl 5. To this end, your code should balance out the assumptions that make things possible, fast or small, with the assumptions that make it difficult to change things in future. This is especially important for parts of the code which are exposed through APIs - the requirements of src or binary compatibility for such things as extensions can make it very hard to change things later on.
For example, if you define suitable macros to set/test flags in a struct, then you can later add a second word of flags to the struct without breaking source compatibility. (Although you might still break binary compatibility if you're not careful.) Of the following two methods of setting a common combination of flags, the second doesn't assume that all the flags are contained within a single field:
foo->flags |= (FOO_int_FLAG | FOO_num_FLAG | FOO_str_FLAG);
FOO_valid_value_SETALL(foo);
Similarly, avoid using a char* (or {char*,length}) if it is feasible to later use a PMC *
at the same point: cf UTF-8 hash keys in Perl 5.
Of course, private code hidden behind an API can play more fast and loose than code which gets exposed.
We want Perl to be fast. Very fast. But we also want it to be portable and extensible. Based on the 90/10 principle, (or 80/20, or 95/5, depending on who you speak to), most performance is gained or lost in a few small but critical areas of code. Concentrate your optimization efforts there.
Note that the most overwhelmingly important factor in performance is in choosing the correct algorithms and data structures in the first place. Any subsequent tweaking of code is secondary to this. Also, any tweaking that is done should as far as possible be platform independent, or at least likely to cause speed-ups in a wide variety of environments, and do no harm elsewhere. Only in exceptional circumstances should assembly ever even be considered, and then only if generic fallback code is made available that can still be used by all other non-optimized platforms.
Probably the dominant factor (circa 2001) that effects processor performance is the cache. Processor clock rates have increased far in excess of main memory access rates, and the only way for the processor to proceed without stalling is for most of the data items it needs to be found to hand in the cache. It is reckoned that even a 2% cache miss rate can cause a slowdown in the region of 50%. It is for this reason that algorithms and data structures must be designed to be 'cache-friendly'.
A typical cache may have a block size of anywhere between 4 and 256 bytes. When a program attempts to read a word from memory and the word is already in the cache, then processing continues unaffected. Otherwise, the processor is typically stalled while a whole contiguous chunk of main memory is read in and stored in a cache block. Thus, after incurring the initial time penalty, you then get all the memory adjacent to the initially read data item for free. Algorithms that make use of this fact can experience quite dramatic speedups. For example, the following pathological code ran four times faster on my machine by simply swapping i
and j
.
int a[1000][1000];
... (a gets populated) ...
int i,j,k;
for (i=0; i<1000; i++) {
for (j=0; j<1000; j++) {
k += a[j][i];
}
}
This all boils down to: keep things near to each other that get accessed at around the same time. (This is why the important optimizations occur in data structure and algorithm design rather than in the detail of the code.) This rule applies both to the layout of different objects relative to each other, and to the relative positioning of individual fields within a single structure.
If you do put an optimization in, time it on as many architectures as you can, and be suspicious of it if it slows down on any of them! Perhaps it will be slow on other architectures too (current and future). Perhaps it wasn't so clever after all? If the optimization is platform specific, you should probably put it in a platform-specific function in a platform-specific file, rather than cluttering the main source with zillions of #ifdefs.
And remember to document it.
Loosely speaking, Perl tends to optimize for speed rather than space, so you may want to code for speed first, then tweak to reclaim some space while not affecting performance.
The section on coding style is based on Perl5's Porting/patching.pod by Daniel Grisinger. The section on naming conventions grew from some suggestions by Paolo Molaro <lupus@lettere.unipd.it>. Other snippets came from various P5Pers. The rest of it is probably my fault.
Maintainer: Dave Mitchell <davem@fdgroup.com>
Class: Internals
PDD Number: 7
Version: 1
Status: Developing
Last Modified: 6 August 2001
PDD Format: 1
Language: English
Based on an earlier draft which covered only code comments.
None. First version
|