GCC __builtin_apply size parameter

GCC contains some very useful builtin functions in the form of __builtin_apply() and friends, which basically let you do the same as you can with apply() in javascript, ie. package up the list of arguments passed to a function and then use that to call another function. This is dead handy for writing wrappers and loading stubs and shit. But there is one small problem...

You grab the argument list by calling void *__builtin_apply_args() and stashing the return value in a pointer. Fine, so far so hoopy. But when you come to call the other function, which you do by using void *__builtin_apply(void (*function)(), void *arguments, size_t size), this extra parameter size suddenly crops up. This is the size of the argument list arguments which you got from void *__builtin_apply_args(), and now you need to tell __builtin_apply() how big it is. Only you don't know that, because __builtin_apply_args() doesn't tell you, and there aren't any other functions provided that do.


The GCC manual says "It is not always simple to compute the proper value for size." Yeah, no shit, Sherlock. And that's all it says. It doesn't say a bloody word about how you might actually go about it. It doesn't even give any dubiously-useful suggestions or vague and tentative hints. Apparently all you can do is guess, and flicking round the internet it seems that's just what people do do - the few example code snippets that are around just show people pulling a value like 100 or 200 out of their arse, on the grounds that that surely ought to be enough. (And yeah, you'd think it would be, but actually it isn't, as will be seen.) There doesn't seem to be anyone who's come up with anything better than a guess, not even an informed guess based on looking at how the ABI works.

Double fuck.

It's bloody stupid really because __builtin_apply_args() of course does know how much memory it has allocated for the argument list. It's adjusted the stack pointer to, effectively, __builtin_alloca() it on the stack. It could perfectly well have been written as void *__builtin_apply_args(size_t *size) so it could use the pointer parameter to return the adjustment, but it bloody well hasn't.

Dog's cunt.

But there is a kludge. __builtin_alloca() adjusts the stack pointer to allocate memory on the stack, and returns a pointer to the allocated block. So if you ask it for a block of size 0, the return value just is the stack pointer. So you can do this once before you grab the argument list with __builtin_apply_args(), and do it again afterwards, and the difference between the two values is the size of the block __builtin_apply_args() gave itself.

Only this doesn't fucking work because you can't call __builtin_alloca() before you call __builtin_apply_args(). You can write the source code in that order, but GCC reorders it so it does __builtin_apply_args() before it does anything else, even without optimisation turned on, so as to be sure it can save the arguments without anything clobbering them first. So both instances of __builtin_alloca(0) end up happening at the same place and your computed size comes out as 0.

The kludge to fix the kludge is to get the "before" value by using __builtin_frame_address(0), which returns the same value as you'd expect from doing __builtin_alloca(0) right at the beginning but without the possibility of unexpected whatsits fucking it up. For the "after" value of course there isn't a problem.

If you tell it to print out the calculated value, it turns out to be huge. When it says it "saves the arg pointer register, structure value address, and all registers that might be used to pass arguments to a function into a block of memory allocated on the stack", it really means it. It includes the entire block of XMM registers and various other bits of miscellaneous kitchen plumbing. I found I was getting values of 0x160 with optimisation of any level turned on, and 0x180 with it off. Clearly the usual kind of guess of "one or two hundred" is nowhere near enough, although in most cases you can get away with it.

So what it ends up being looks, in stripped and skeletal form, like this:

void *other_function() { /* whatever */ } void *function_that_calls_another_function() { void *s1, *s2, *aagz; size_t sz; s1 = __builtin_frame_address(0); aagz = __builtin_apply_args(); s2 = __builtin_alloca(0); sz = (sz = s1 - s2) ? sz : 0x200; __builtin_return(__builtin_apply(other_function, aagz, sz)); }

Note the fallback value in case it decides to be a bastard is 0x200, ie. 512, not 200...

This works on x86_64, and it ought to also work on anything else that has a sane stack architecture. And replacing s1 - s2 with imaxabs(s1 - s2) ought to make it "universal" in the sense of being able to work equally well on things where the stack grows upwards. However I don't have anything other than x86_64 to test it on (at least not without considerably more fucking about than I can even remotely be arsed with) so it's only on x86_64 that I actually know it works.

Back to Pigeon's Nest

Be kind to pigeons

Valid HTML 4.01!