Cage Cleaner Guide
From docs/project/roles_responsibilities.pod:
Fixes failing tests, makes sure coding standards are implemented, reviews documentation and examples. A class of tickets in the tracking system (Trac) has been created for use by this group. This is an entry level position, and viewed as a good way to get familiar with parrot internals.
Testing Parrot after making a code cleaning change
To be really really sure you're not breaking anything after doing code cleaning or attending to the newspaper at the bottom of our Parrot's cage here are is the process I (ptc) go through before committing a new change:
make realclean > make_realclean.out 2>&1 perl Configure.pl > perl_configure.out 2>&1 make buildtools_tests > buildtools_tests.out 2>&1 make test > make_test.out 2>&1
Then I diff the *.out
files with copies of the *.out
files I made on a previous test run. If the diffs show nothing nasty is happening, you can be more sure that you've not broken anything and can commit the change. Then rename the *.out
files to something like *.out.old
so that you maintain reasonably up to date references for the diffs.
This process should be put into a script and stored somewhere...
Parrot Cage Cleaners high-level goals
Smoke testing on many platforms with many compilers
The more platforms we have, the more likely we are to find portability problems. Parrot has to be the most portable thing we've created.
More platforms also means more compilers. Maybe your DEC compiler is more picky than gcc, and spews more warnings. Good! More opportunities for cleaning!
icc
icc
is the Intel C/C++ Compiler and is available for free for non-commercial use. To use icc
to build parrot, use the following arguments to Configure.pl
:
perl Configure.pl --cc=icc --ld=icc
(courtesy of Steve Peters, steve at fisharerojo dot org
).
Compiler pickiness
Use as many compiler warnings as we possibly can. The more warnings we enable, the less likely something will pass the watchful eye of the compiler.
Note that warnings may not just be -W flags. Some warnings in gcc only show up when optimization is enabled.
splint
Splint (http://www.splint.org) is a very very picky lint tool, and setup and configuration is a pain. Andy has tried to get Perl 5 running under it nicely, but has met with limited success. Maybe the Parrot will be nicer.
Solaris lint
Sun has made its dev tools freely available at http://developers.sun.com/prodtech/cc/. Its lint is the best one out there, except from Gimpel's FlexeLint (http://www.gimpel.com/html/flex.htm) which costs many dollars.
Enforcing coding standards, naming conventions, etc
- Automatic standards checking
const
checking
The docs in filename here explains what our code should look like. Write something that automatically validates it in a .t file.
Declaring variables as const
wherever possible lets the compiler do lots of checking that wouldn't normally be possible. Walk the source code adding the const
qualifier wherever possible. The biggest bang is always in passing pointers into functions.
Why consting is good
In Perl, we have the use constant
pragma to define unchanging values. The Readonly module extends this to allow arrays and hashes to be non-modifiable as well.
In C, we have const
numbers and pointers, and using them wherever possible lets us put safety checks in our code, and the compiler will watch over our shoulders.
const
numbers
The easiest way to use the const
qualifier is by flagging numbers that are set at the top of a block. For example:
int max_elements; max_elements = nusers * ELEMENTS_PER_USER; ... array[max_elements++] = n; /* but you really meant array[max_elements] = n++; */
Adding a const
qualifier means you can't accidentally modify max_elements
.
const int max_elements = nusers * ELEMENTS_PER_USER;
const
pointers
If a pointer is qualified as const, then its contents cannot be modified. This lets the compiler protect you from doing naughty things to yourself.
Here are two examples for functions you're familiar with:
int strlen( const char *str ); void memset( char *ptr, char value, int length );
In the case of strlen
, the caller is guaranteed that any string passed in won't be modified. How terrible it would be if it was possible for strlen
to modify what gets passed in!
The const on strlen
's parameter also lets the compiler know that strlen
can't be initializing what's passed in. For example:
char buffer[ MAX_LEN ]; int n = strlen( buffer );
The compiler knows that buffer
hasn't been initialized, and that strlen
can't be initializing it, so the call to strlen
is on an uninitialized value.
Without the const, the compiler assumes that the contents of any pointer are getting initialized or modified.
const
arrays
Consting arrays makes all the values in the array non-modifiable.
const int days_per_month[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
You don't want to be able to do days_per_month[1] = 4;
, right? (We'll ignore that about 25% of the time you want days_per_month[1]
to be 29.)
Mixing consts
Combining const
s on a pointer and its contents can get confusing. It's important to know on which side of the asterisk that the const
lies.
To the left of the asterisk, the characters are constant. To the right of the asterisk, the pointer is constant.
Note the difference between a pointer to constant characters:
/* Pointer to constant characters */ const char *str = "Don't change me."; str++; /* legal, now points at "o" */ *str = "x"; /* not legal */
and a constant pointer to characters:
/* Constant pointer to characters */ char * const str = buffer; str++; /* not legal */ *str = 'x'; /* buffer[0] is now 'x' */
Note the difference between which side of the asterisk that the const
is on.
You can also combine the two, with a constant pointer to constant characters:
const char * const str = "Don't change me";
or even an array of constant pointers to constant characters:
const char * const days[] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
If you see a declaration you don't understand, use cdecl
. It's standard in many C compiler suites, and is freely available around the net.
$ cdecl Type `help' or `?' for help cdecl> explain const char * str; declare str as pointer to const char cdecl> explain char * const str; declare str as const pointer to char
Decreasing the amount of repeated code
PMD (http://pmd.sourceforge.net/) has been used on C code, even though it's a Java tool. It looks for repeated strings of tokens that are candidates for either functions or macros.
PMD usage
General usage:
pmd [directory] [report format] [ruleset file]
To generate html output of unused code within parrot use:
pmd . html rulesets/unusedcode.xml > unused_code.html
Also distributed with PMD is the CPD (Copy/Paste Detector) which finds duplicate code. An easy way to get started with this tool is to use the gui (cpdgui). Set the root source directory to your parrot working directory, and choose the by extension...
option of the Language:
menu. Then put .c
in the Extension:
box and click Go
.
Automated source macros
Perl5 has a lot of good source management techniques that we can use.
- Macro for interp argument
- Parrot_xxx macros
A macro for declaring the interpreter argument, and maybe a macro for passing it
BTW, our Perl experience teaches us that somebody is going to want to make the interpreter a C++ object for Windows environments, and it wouldn't hurt to make that possible, or at least work in that direction, as long as clarity doesn't suffer.
Automated processing that would make a macro to let us write
somefunc(interp,a,b,c)
while the linkage is
Parrot_somefunc(interp,a,b,c)
for namespace cleanup. This is straight out of embed.fnc and proto.h in Perl5.
Automated generation of C headers
This has started significantly with the headerizer.pl program. Right now, it extracts the function headers correctly, but now I have to have it create the .h files.
Creating automated code checking tools
Documenting function behavior and structure members
Developing coverage tools
Automatically running the coverage tools
Run on many different C compilers
Most of Andy's work right now is with GCC 4.2 on Linux. We need many more.
Run under valgrind
Valgrind (http://valgrind.org/) is a profiler/debugger most notable for the way it magically monitors memory accesses and management.
To run parrot under Valgrind, the following argument set should be helpful:
valgrind --num-callers=500 \ --leak-check=full --leak-resolution=high --show-reachable=yes \ parrot --leak-test
(adapted from a post to parrot-porters
by chromatic).
IMCC cleanup
From #parrot:
vsoni: there seems to be some dead code/feature....I had a chat with leo and I am going to send and email to p6i for deprecation of certain old features
Help other contributors hack their patches into Parrot-style industrial-strength C code.
From chip's comment at http://www.oreillynet.com/onlamp/blog/2006/07/calling_for_parrot_janitors.html
We've just had contributed an improved register allocation implementation, but since the contributor is new to Parrot, there are some style and coding standards issues that need to be worked out. It'd be great if a Cage Cleaner could step up and help our new contributor bang the code into Parrotish form.
Remove usage of deprecated features
The api.yaml file lists features that are deprecated but not yet removed, as well as experimental features. A Trac ticket will document how this deprecated feature is to be replaced. Help prepare for the actual removal of the feature by replacing its usage.
Clean up skipped tests
Parrot has too many skipped tests. Pick a test file with a skipped test, disable the skip() line, then make it pass. The Parrot code may not compile, or you may have to modify it to bring it up to date. The test may not even be useful anymore; we won't know until you try.
If you can make it pass, great!
If you can make it run, great! Make it a TODO test instead.
If neither, please report your findings so that everyone can decide what to do.
Handy configuration tips
Displaying trailing whitespace in vim and emacs
Vim
Add this to your .vimrc
:
set list set listchars=trail:-,tab:\.\
NOTE: there is a space character after the last backslash. It is very important!
Contributed by Jerry Gay <jerry dot gay at gmail dot com>.
Emacs
Add this to your .emacs
:
(setq-default show-trailing-whitespace t)
Emacs 22 users can highlight tabs like this:
(global-hi-lock-mode 1) (highlight-regexp "\t")
Contributed by Eric Hanchrow <offby1 at blarg dot net>.
AUTHOR
Paul Cochrane a.k.a. ptc; original document by Andy Lester
SEE ALSO
docs/project/roles_responsibilities.pod, RESPONSIBLE_PARTIES and the list of Cage items in Trac http://trac.parrot.org.