RFC: Better alternative to llvm.frameallocate for use in Windows EH

I realized that WinEH preparation can probably be a lot less invasive than it is currently.

Initially, when I was thinking about recovering the address of an object in a parent stack frame, I thought about it in terms of “let’s allocate something at a fixed offset from ebp to keep things simple”. That line of thinking suggested that we needed this thing to be fundamentally different from a normal alloca. I was going to make sure it was allocated early in the prologue, for example.

However, this never happened, and having a fixed offset isn’t very simple. I ended up ditching the fixed offset and using assembly label assignments to communicate frame index offsets between parent functions and outlined subfunctions. This technique easily generalizes to support referencing an arbitrary number of allocations in the parent frame, and I think we should go ahead and do that.

The current approach has downsides that we take a bunch of vanilla allocas and SSA values in the parent function and mash them into a single allocation, and replace the accesses with GEPs of an intrinsic result. This is a lot of funky looking IR for something that should be really simple. We also already have good isel for accessing allocas, and we lose that when we switch to an intrinsic.

So instead, let’s go back to using normal allocas and “blessing” each of them as escaped allocations that can be referenced from EH helpers. Here’s what it would look like:

define i32 @parent() {
%a = alloca i32
%b = alloca i32
call void (…)* @llvm.frameescape(i32* %a, i32* %b)
%fp = call i8* @llvm.frameaddress(i32 0)
call void @helper_func(i8* %fp)
%a_val = load i32, i32* %a
%b_val = load i32, i32* %b
%r = add i32 %a_val, %b_val
ret i32 %r
}

define void @helper_func(i8* %fp) {
%a.i8 = call i8* @llvm.framerecover(i8* bitcast (i32 ()* @parent to i8*), i8* %fp, i32 0)
%b.i8 = call i8* @llvm.framerecover(i8* bitcast (i32 ()* @parent to i8*), i8* %fp, i32 1)
%a = bitcast i8* %a.i8 to i32*
%b = bitcast i8* %b.i8 to i32*
store i32 1, i32* %a
store i32 2, i32* %b
ret void
}

declare i8* @llvm.frameaddress(i32)
declare i8* @llvm.framerecover(i8*, i8*, i32)
declare void @llvm.frameescape(…)

In this example, ‘helper_func’ is able to access the frame of ‘parent’. ‘parent’ should return 3.

With this, we can outline landingpads without disturbing the parent function nearly as much. It should help -O0 codegen time, as we don’t need as much replacement. It just seems nicer.

We still have the same potential codegen problems in the outlined helpers that we did in the parent function, but recall that the outlined functions only get called when an exception is thrown, ie practically never. I’m OK if we have to hack up the IR for helper function by sinking lots of @llvm.framerecover calls into all the BBs that use the original alloca. I’m OK if the isel is slightly worse than it could be.

Thoughts?

This seems like a nice IR. This would only actually be formed very late during codegen preparation, right? It’ll kill data-flow optimizations, but if it’s only introduced late, that doesn’t matter.

John.

>
> I realized that WinEH preparation can probably be a lot less invasive
than it is currently.
>
> Initially, when I was thinking about recovering the address of an object
in a parent stack frame, I thought about it in terms of "let's allocate
something at a fixed offset from ebp to keep things simple". That line of
thinking suggested that we needed this thing to be fundamentally different
from a normal alloca. I was going to make sure it was allocated early in
the prologue, for example.
>
> However, this never happened, and having a fixed offset isn't very
simple. I ended up ditching the fixed offset and using assembly label
assignments to communicate frame index offsets between parent functions and
outlined subfunctions. This technique easily generalizes to support
referencing an arbitrary number of allocations in the parent frame, and I
think we should go ahead and do that.
>
> The current approach has downsides that we take a bunch of vanilla
allocas and SSA values in the parent function and mash them into a single
allocation, and replace the accesses with GEPs of an intrinsic result. This
is a lot of funky looking IR for something that should be really simple. We
also already have good isel for accessing allocas, and we lose that when we
switch to an intrinsic.
>
> So instead, let's go back to using normal allocas and "blessing" each of
them as escaped allocations that can be referenced from EH helpers. Here's
what it would look like:
>
> define i32 @parent() {
> %a = alloca i32
> %b = alloca i32
> call void (...)* @llvm.frameescape(i32* %a, i32* %b)
> %fp = call i8* @llvm.frameaddress(i32 0)
> call void @helper_func(i8* %fp)
> %a_val = load i32, i32* %a
> %b_val = load i32, i32* %b
> %r = add i32 %a_val, %b_val
> ret i32 %r
> }
>
> define void @helper_func(i8* %fp) {
> %a.i8 = call i8* @llvm.framerecover(i8* bitcast (i32 ()* @parent to
i8*), i8* %fp, i32 0)
> %b.i8 = call i8* @llvm.framerecover(i8* bitcast (i32 ()* @parent to
i8*), i8* %fp, i32 1)
> %a = bitcast i8* %a.i8 to i32*
> %b = bitcast i8* %b.i8 to i32*
> store i32 1, i32* %a
> store i32 2, i32* %b
> ret void
> }
>
> declare i8* @llvm.frameaddress(i32)
> declare i8* @llvm.framerecover(i8*, i8*, i32)
> declare void @llvm.frameescape(...)
>
> In this example, 'helper_func' is able to access the frame of 'parent'.
'parent' should return 3.

This seems like a nice IR.

Completely agree. This is much better than our original idea. I really like
just packing the escaped bits into various arguments of the intrinsic call
without rearranging anything.

  This would only actually be formed very late during codegen preparation,
right? It’ll kill data-flow optimizations, but if it’s only introduced
late, that doesn’t matter.

That is my understanding.

Does this plan prevent the allocas used for the formal argument of multiple exception handlers from being coalesced into a single stack location when possible?

This definitely seems like an improvement, and the amount of code that drops away from WinEHPrepare because of this change is definitely a good sign.

-Andy

Does this plan prevent the allocas used for the formal argument of
multiple exception handlers from being coalesced into a single stack
location when possible?

I think it will actually better enable stack space reuse if we ever get
lifetime markers working right for small objects.

This definitely seems like an improvement, and the amount of code that
drops away from WinEHPrepare because of this change is definitely a good
sign.

Cool, thanks for looking. :slight_smile: