Stack protector performance

I have a question about the performance of the implementation of the stack
protector in LLVM.

Consider the following C program:

canary.tgz (4.22 KB)

What optimization level are you using?
-O0 is not interesting, and at -O1 the optimizer nukes all the code

In your example, the stack variable and the stack accesses are optimized away:

% ./build/Release+Asserts/bin/clang -O1 -S -emit-llvm -o - stack.c

define void @canary() nounwind uwtable readnone {
entry:
ret void
}

define i32 @main() nounwind uwtable readnone {
for.end:
ret i32 0
}

You need to prepare a more optimizer-resistant benchmark.

–kcc

If you compile this with optimizations, then the 'canary()' function should be totally inlined into the 'main()' function. In that case, the cost of the stack protectors will be very small compared to the loop.

-bw

If you compile this with optimizations, then the 'canary()' function should
be totally inlined into the 'main()' function. In that case, the cost of
the stack protectors will be very small compared to the loop.

Yes, I know. I'm just really interested in an explanation on how it is
possible that the use of canaries results in faster code in the binaries I
attached to my original message (which are unoptimized).

If you look at the binaries, you see that the bodies of canary() are exactly
the same except that in the protected binary, it has some extra stuff in the
prologue/epilogue. So, how can it be that a function that does exactly the
same plus something extra runs faster?