NAME
docs/dev/c_functions.pod - C function decoration guidelines
Overview
Compilers have the ability to detect a wide class of potential errors in code during the compilation phase, especially if certain metadata is provided by the programmer to indicate details about specific operations. This metadata is typically compiler-dependent, but using a system of macros and our existing configuration system, we can instruct the compiler to search for and prevent certain types of errors.
The net result is that errors (or potential errors) can be detected early at compile-rime instead of later an runtime or during "make test".
Headerizer creates function declarations based on function definitions. It scans the source files passed to it and extracts the function declarations. Then it puts them into the appropriate .h file or, in the case of static functions, back into the source file itself.
The headerizer also adds function attributes as specified by the decorations on the source. It's important to properly-decorate functions that are written, so that programs like headerizer can pass on important metadata to the compiler.
Notice that not all of these decorations will have a real effect for all compilers. In some cases, the various macros might be empty placeholders. Also, where it says "compiler", it could also mean "lint or any other static analysis tool like splint."
Function Parameter Decorators
What's a shim?
Think of "shim" as shorthand for "placeholder". It's 64% shorter.
GCC (and lint and splint and other analysis tools) likes to complain if you pass an argument into a function and don't use it. If we know that we're not going to use an argument, we can either remove the argument from the function declaration, or mark it as unused.
Throwing the argument away is not always possible.
Usually,
it's because the function is one that gets referred to by a function pointer,
and all functions of this group must have the same signature.
Consider a function with three args: Interp,
Foo and Bar.
Maybe a given function doesn't use Foo,
but we still have to accept Foo because all the other functions like ours do.
In this case we can use the UNUSED(Foo)
macro in the body of the function to silence any compiler warnings.
UNUSED
lets the compiler know that we know the parameter isn't used,
and that we haven't just forgotten about it.
UNUSED
is for cases where we don't currently use a particular parameter,
but we might in the future.
If we never will use it,
mark it as a SHIM(Foo)
in the declaration.
Here's an example:
void MyFunction(PARROT_INTERP, SHIM(int Foo), /* Never using Foo */ int Bar) { UNUSED(Bar); /* We aren't using Bar YET */ ... }
If the interpreter structure in a function is a shim, there is a special macro for that.
Passing Interpreter Pointers
Most of the time, if you need an interpreter in your function, define that argument as PARROT_INTERP
. If your interpreter is a shim, then use SHIM_INTERP
, not SHIM(PARROT_INTERP)
.
What are input and output arguments?
Pointers are dangerous because they are so versatile. You can pass a pointer to a function only to have that function modify the data that the pointer is pointing to. In Parrot, we decorate all our pointer parameters with keywords like ARGIN
, ARGOUT
, and ARGMOD
to specify whether the pointer is an input only, an output only, or is modified.
Input pointers are pointers which are read, but the data they point to is not changed. The data after the function call is the same as the data you had before it. If you specify a parameter is an input parameter, and the function tries to modify its contents anyway, you'll get a warning. Also, if you pass in an uninitialized pointer, the compiler will throw a warning.
Output pointers are pointers that are passed into a function, its existing contents are ignored, and new contents are created for it. It's called an output because the data in the pointed-to structure are populated inside the function and passed back out to the caller. If you have a pointer that points to valid data and you pass it as an ARGOUT parameter, the compiler will throw a warning. Unlike input arguments, you can typically pass an uninitialized pointer as an ARGOUT parameter.
Modifiable, or "in-out" parameters are parameters that have both behaviors. Some fields in it are read, some fields in it are changed.
Please note that these are only to be used on pointer types. If you're not absolutely sure that the argument is a pointer, don't use them. (The "va_list" builtin type is sometimes a pointer, sometimes a struct, depending on platform and compiler.)
Here's a simple example of a function that uses these modifiers:
void MyFunction(PARROT_INTERP, ARGIN(char *Foo), ARGOUT(int *Bar), ARGMOD(float *Baz));
NOTNULL(x)
For function arguments and variables that must never have NULL assigned to them, or passed into them. For example, if we were defining strlen()
in Parrot, we'd do it as strlen(NOTNULL(const char *p))
. All the previous pointer decorations, ARGIN
, ARGOUT
and ARGMOD
imply NOTNULL
. The compiler will throw a warning if it detects a null value being passed to a NOTNULL
parameter.
NULLOK(x)
For function arguments and variables where it's OK to pass in NULL. For example, if we wrote free()
in Parrot, it would be strlen(NULLOK(void *p))
. There are variants of ARGIN
, ARGOUT
, and ARGMOD
that allow NULL values: ARGIN_NULLOK
, ARGOUT_NULLOK
, and ARGMOD_NULLOK
. These have the same semantics as their non-NULLOK counterparts, except the compiler will not throw errors if a null value is passed.
Function Decorators
In addition to the SHIM
, ARGIN
, ARGOUT
and ARGMOD
parameters and variants for parameters, there are a number of helpful modifiers that can be applied directly to the function declaration itself.
PARROT_WARN_UNUSED_RESULT
Tells the compiler to warn if the function is called, but the result is ignored. For instance, on a memory allocation function you would want to keep track of the result so that you could free it later and not cause a memory leak.
PARROT_IGNORABLE_RESULT
Tells the compiler that it's OK to ignore the function's return value.
PARROT_MALLOC
Functions marked with this are flagged as having received malloc
ed memory. This lets the compiler do analysis on memory leaks.
PARROT_CONST_FUNCTION
The function is a deterministic one that will always return the same value if given the same arguments, every time. Examples include functions like mod
or max
. An anti-example is rand()
which returns a different value every time. Some compilers can do optimizations by replacing constant functions with lookup tables, if the results are always going to be the same.
PARROT_PURE_FUNCTION
Less stringent than PARROT_CONST_FUNCTION, these functions only operate on their arguments and the data they point to. These functions have no other side effects to worry about, and clever compilers may find ways to optimize these functions. Examples include strlen()
or strchr()
.
PARROT_DOES_NOT_RETURN
For functions that can't return, like Parrot_exit()
or functions that cause exceptions to be thrown. This helps the compiler's flow analysis which can help detect unreachable code, or opportunities for optimization.
PARROT_CANNOT_RETURN_NULL
For functions that return a pointer, but the pointer is guaranteed to not be NULL. The compiler can help to detect null pointer dereferences, and this hint will simplify the process.
PARROT_CAN_RETURN_NULL
For functions that return a pointer that could be null. These return values should be tested for null values before they are used or dereferenced.
PARROT_INLINE
For functions that could be inlined by the compiler for optimization. This is more of a hint then a command, and many compilers might ignore it entirely. Use this instead of the inline
keyword.
PARROT_EXPORT
For functions that are important API functions.
{{TODO: More detail is needed on this}}
Examples
PARROT_EXPORT PARROT_WARN_UNUSED_RESULT INTVAL Parrot_str_find_index(PARROT_INTERP, NOTNULL(const STRING *s), NOTNULL(const STRING *s2), INTVAL start)
Parrot_str_find_index
is part of the Parrot API, and returns an INTVAL. The interpreter is used somewhere in the function. String s
and s2
cannot be NULL. If the calling function ignores the return value, it's an error, because you'd never want to call Parrot_str_find_index()
without wanting to know its value.
PARROT_EXPORT PARROT_PURE_FUNCTION INTVAL parrot_hash_size(SHIM_INTERP, NOTNULL(const Hash *hash)) { return hash->entries; }
This function is a pure function because it only looks at its parameters or global memory. The interpreter doesn't get used, but needs to be passed because all PARROT_EXPORT functions have interpreters passed, so is flagged as a SHIM_INTERP.
We could put PARROT_WARN_UNUSED_RESULT
on this function, but since all PARROT_PURE_FUNCTION
s and PARROT_CONST_FUNCTION
s get flagged that way anyway, there's no need.