Update (2014-12-19): The advice provided in this blog post is questionable and, in fact, probably incorrect. The bug described below must have happened for some unrelated reason (like, maybe, reuse of ap), but at this point (three years later!) I do not really remember what was going on here nor have much interest in retrying.

A long time ago, while I was preparing an ATF release, I faced many failing tests and crashes in one of the platforms under test. My memory told me this was a problem in OpenSolaris, but the repository logs say that the problem really happened in Fedora 8 x86_64.
The problem manifested itself as segmentation faults pretty much everywhere, and I could trace such crashes down to pieces of code like the following, of which the C code of ATF is full of:
void
foo_fmt(const char *fmt, ...)
{
va_list ap;

va_start(ap, fmt);
foo_ap(fmt, ap);
va_end(ap);
}

void
foo_ap(const char *fmt, va_list ap)
{
char buf[128];

vsnprintf(buf, sizeof(buf), fmt, ap);

... now, do something with buf ...
}
The codebase of ATF provides _fmt and _ap variants for many functions to give more flexibility to the caller and, as shown above, the _fmt variant just relies on the _ap variant to do the real work.
Now, the crashes that appeared from the code above seemed to come from the call that consumes the ap argument, which in this case is vsnprintf. Interestingly, though, all the tests in other platforms but Linux x86_64 worked just fine, and this included OpenSolaris, other Linux distributions, some BSDs and even different hardware platforms.
As it turned out, you cannot blindly pass ap arguments around because they are not "normal" parameters (even though, unfortunately, they look like so!). In most platforms, the ap element will be just an "absolute" pointer to the stack, so passing the variable to an inner function calls is fine because the caller's stack has not been destroyed yet and, therefore, the pointer is still valid. But... the ap argument can have other representations. It'd be an offset to the stack instead of a pointer, or it'd be a data structure that holds all the variable parameters. If, for example, the ap argument held an offset, passing it to an inner function call would make such offset point to "garbage" because the stack would have been grown due to the new call frame. (I haven't investigated what specific representation is x86_64 using.)
The solution is to use the va_copy function to generate a new ap object that is valid for the current stack frame. This is easy, so as an example, we have to rewrite the foo_ap function above as follows:
void
foo_ap(const char *fmt, va_list ap)
{
char buf[128];
va_list ap2;

va_copy(ap2, ap);
vsnprintf(buf, sizeof(buf), fmt, ap2);
va_end(ap2);

... now, do something with buf ...
}
This duplication of the ap argument pointing to the variable list of arguments ensures that ap2 can be safely used from the new stack frame.

Subscribe via RSS · Go to posts index

   Delivered by FeedBurner

Comments from the original Blogger-hosted post: