Optimised stack direction?

Why does LLVM struggle to optimize the following stack direction check?

#include <stdint.h>

inline int stack_direction() {
int x = 0;
int y = 0;

return uintptr_t(&x) < uintptr_t(&y);
}

int main(int argc, const char * argv) {
return stack_direction();
}

It generates the following assembly:

main: # @main
lea rcx, [rsp - 8]
lea rdx, [rsp - 4]
xor eax, eax
cmp rdx, rcx
setb al
ret

It seems to me it should be possible, because clearly LLVM knows the layout of x and y at compile time.

Shared code: https://godbolt.org/z/ZQKESy

Thanks
Samuel

Hi Samuel,

Why does LLVM struggle to optimize the following stack direction check?

Probably because it's a niche operation. You can't rely on x and y
being laid out in any particular order even during multiple calls to
stack_direction within a single compilation, so there's really not
much you can do with such a comparison. That makes it a rare case.

On the LLVM side, the optimization isn't something that would happen
naturally through generic analyses so someone would have to sit down
specifically to write it into LLVM; and no-one has.

The only way I can think of to reliably detect the direction is using
an __attribute__((noinline)) function to compare locals from two
different, nested frames (even that's iffy though on a semantic
level). If there turned out to be a compelling enough use-case, an
intrinsic could be added to get the result more efficiently.

Cheers.

Tim.

Actually, (uintptr_t)__builtin_frame_address(0) <
(uintptr_t)__builtin_frame_address(1) would probably work too.

Cheers.

Tim.

Thanks for all the detailed replies.

You can’t rely on x and y being laid out in any particular order even during multiple calls to stack_direction within a single compilation

But according to the assembly being generated, there is a fixed layout on the stack. I agree, the code is pretty ambiguous. But it should be well defined within the compiler. If it changes from compilation unit to compilation unit, so be it, at least given some specific instance of the function, it should be a constant.

If there turned out to be a compelling enough use-case, an intrinsic could be added to get the result more efficiently.

We need this operation in Ruby, and generally in coroutines to know how to layout the stack. Some architectures decided it would be a great idea for the stack to grow upwards.

  • Where to put guard page.
  • What should be the initial stack pointer.
  • How to alloca for custom stack allocations.

Thanks
Samuel

Actually, (uintptr_t)__builtin_frame_address(0) < (uintptr_t)__builtin_frame_address(1) would probably work too.

I tried it but it doesn’t seem to generate a constant either. https://godbolt.org/z/Z5HyEu

But according to the assembly being generated, there is a fixed layout on the stack. I agree, the code is pretty ambiguous. But it should be well defined within the compiler.

Only in the sense that the compiler tries to be deterministic between
invocations. Detecting actual stack growth direction from that snippet
is fundamentally broken.

LLVM could perfectly legitimately fix a seed and call rand() to decide
which of x & y to put first. More likely is that inlining and stack
slot colouring would perturb them.

If it changes from compilation unit to compilation unit, so be it, at least given some specific instance of the function, it should be a constant.

Even within a translation unit it can vary. The claim is just about
true if you include static call-stack as part of an "instance", but
you still don't get anything you can reason about in any kind of wider
context.

We need this operation in Ruby, and generally in coroutines to know how to layout the stack. Some architectures decided it would be a great idea for the stack to grow upwards.

None of the uses listed look compelling enough to optimize to me. It
strikes me as an easily cacheable property at best, at worst one the
interpreter/JIT should know statically before it even thinks about
generating code.

I tried it but it doesn't seem to generate a constant either. Compiler Explorer

I wouldn't expect it to. It has a higher probability of representing a
meaningful fact about the platform than the original though.

Cheers.

Tim.

Tim, everything you state makes sense. Thanks for explaining things to me.

Clearly, adding a new intrinsic does’t solve the problem for existing platforms which need to be supported.

That being said:

We are definitely not the only ones in this situation. I’ve been looking at several “broken” implementations over the past few hours to try and understand what is the best approach.

We definitely want to know the stack direction at compile time if possible, because we want to avoid conditional branches and reduce the amount of code on the hot path.

The use case really has nothing to do with the interpreter/JIT but more about implementing green threads and manually created stacks. We currently have a macro in Ruby STACK_DIR_UPPER(x,y) which emits x if stack grows up, and y if stack grows down. For most platforms, it’s resolved at compile time, which is perfect.

If it’s trivial to add something like __builtin_stack_direction(), it could probably help avoid a lot of problems in the future for code that needs to resolve this issue. The most common “general” implementation I’ve seen passes the address of a local to a 2nd function and compares the addresses that way, so it’s very similar to the approach using __builtin_frame_address, but that itself doesn’t get optimised out.

Even if we can use __builtin_frame_address, which is great because it’s supported today, we need to use configure test to determine stack direction if we want it resolved at compile time. So, I wish we can use this approach (as you state it’s probably more meaningful), but that it can also be resolved at compile time by better optimisation.

Kind regards,
Samuel