[WinEH] Cloning blocks that reference non-cloned PHI nodes

Hi Reid and David,

I just wanted to give you a heads up that I’m currently working on a problem where the WinEHPrepare outlining code stumbles and asserts while cloning code that references extracted landing pad values via PHI nodes that are in intermediate blocks that aren’t being cloned. The test I’m looking at fails with an assertion claiming that llvm.eh.begincatch was called twice inside a handler.

I have an idea for how to address this (giving the cloning director a chance to replace operands before an instruction is remapped). I’ll put something up for review as soon as I have it working. In the mean time I wanted to make sure you weren’t working on the same problem.

I’m seeing it with some C++ EH tests I’m working on, but I think it may be possible for it to happen with SEH cleanup.

If you’re curious, here’s a reproducer.

int main(void) {

try {

try {

throw ‘a’;

} catch (char c) {

printf("%c\n", c);

}

throw 1;

} catch(int x) {

printf("%d\n", x);

} catch(…) {

printf("…\n");

}

try {

try {

throw ‘b’;

} catch (char c) {

printf("%c\n", c);

}

throw 2;

} catch(int x) {

printf("%d\n", x);

} catch (char c) {

printf("%c\n", c);

} catch(…) {

printf("…\n");

}

return 0;

}

-Andy

Sounds reasonable. Honestly, we’ve been having some serious whiteboard discussions about how to get out of doing all this selector pattern matching. I think that’s basically the most fragile part of winehprepare, and if we can add some new constructs that wrap that up in something opaque to mid-level passes, that would be great.

Unfortunately, I haven’t really been able to get enough consensus to write something down. If I can’t convince my coworkers that we need a new construct in person, there’s not much point in writing up a detailed RFC. :frowning:

Here’s the gist of the new idea:

  • Leave invoke alone, too many optimizations care about it
  • Add a new ‘landingswitch’ (name under debate) instruction that fills the landingpad role and is also a terminator
  • landingswitch clauses have some fixed operands (RTTI, handler label) and some variable number of operands as required by the EH personality (catch object, adjectives)
  • Allow the ‘resume’ instruction to have 0, 1, or 2 successors, one of which can be a landingpad.
  • Add a ‘resume’ clause to landingswitches to allow them to continue the unwind process at another landingpad in the same function

I’ve tried to sell this idea, but I’m getting push back suggesting that we fix it a different way. The most compelling suggestion is to have the existing landingpad instruction produce an i8* selector instead of an i32 selector, and then we indirectbr on it. This would save the selector pattern matching that we do today, but I don’t particularly like it. Optimization passes can do basic block commoning, tail merging, and hoist instructions between the landingpad and the indirectbr, and I don’t like dealing with that.

I also don’t like it because we’d need to teach the inliner about indirectbr, which today it refuses to inline.

I can definitely see why you’d want to work around this selector matching stuff, but I can also understand why it’s difficult to get traction with the alternatives. I’ll give it some thought to see if I can come up with something worth adding to the conversation, but in the mean time I’ll keep trying to fix what I’ve got.

BTW, I’ve got five other modes of failure from the suite of 22 C++ EH tests I was running. Do you want me to file Bugzilla reports or just keep them on my own plate for now?

The quick summary is as follows:

Assert in CloneAndPruneIntoFromInst because of an unrecognized PHINode (almost certainly a different manifestation of the problem I’m currently working on)

Assert in WinEHPrepare::addStubInvokeToHandlerIfNeed because no return instruction was found.

Assert in Win64Exception::emitCXXFrameHandler3Table because CatchHigh > TBME.TryHigh

Assert in MCWinCOFFStreamer::EmitLabel because the symbol being emitted was previously defined.

Assert in X86DAGToDAGISel:: getAddressOperands because non-zero placement is ignored with ES.

-Andy

Just as an off-the-cuff idea that I haven’t thought through at all, what would you think of the possibility of replacing this:

%12 = call i32 @llvm.eh.typeid.for(i8* bitcast (%eh.CatchHandlerType* @llvm.eh.handlertype.H.0 to i8*))

%matches7 = icmp eq i32 %sel6, %12

with this:

%matches7 = call i32 @llvm.eh.cmp.selector(i32 %sel6, i8* bitcast (%eh.CatchHandlerType* @llvm.eh.handlertype.H.0 to i8*))

If we did that, the outlining code could just assume %sel6 was the current selector (because we don’t clone into nested landing pads) and we wouldn’t need to track it at all.

-Andy

Yeah, wrapping up the whole selector comparison into one intrinsic would help. Thanks for the idea. :slight_smile:

The current strategy also seems pretty fragile around complex cleanups. If the cleanup isn’t some simple destructor call with easily traceable unconditional branch control flow, we get lost and can’t find our way to the next selector comparison. Having an explicit ‘resume’ or some other intrinsic at the end of the cleanup would be pretty helpful here.

Comments below…

Comments below…

*From:* Reid Kleckner [mailto:rnk@google.com]
*Sent:* Friday, April 10, 2015 5:01 PM
*To:* Kaylor, Andrew
*Cc:* David Majnemer <david.majnemer@gmail.com> (david.majnemer@gmail.com);
LLVM Developers Mailing List
*Subject:* Re: [WinEH] Cloning blocks that reference non-cloned PHI nodes

Yeah, wrapping up the whole selector comparison into one intrinsic would
help. Thanks for the idea. :slight_smile:

The current strategy also seems pretty fragile around complex cleanups. If
the cleanup isn't some simple destructor call with easily traceable
unconditional branch control flow, we get lost and can't find our way to
the next selector comparison. Having an explicit 'resume' or some other
intrinsic at the end of the cleanup would be pretty helpful here.

---

We also talked a lot about making WinEHPrepare try to less work all at
once, and split it into phases. This is separable from the landingswitch
proposal, which is more about making it easier to identify handler start
points. Here's a sketch of the phases we came up with:

Step 1: Identify selector dispatch conditional branches. Identify all
cleanup actions, and split basic blocks to make them start and end on a
basic block boundary. Build a list of all basic blocks that start a handler.

[Andy Kaylor] This sounds an awful lot like what we’re doing now except
for the block splitting part. I guess that you are talking about doing
this for all landing pads before we go any further, right? I agree that
the current code to identify cleanup blocks is too fragile. We can
probably fix it though. Intrinsics to mark the start and/or end of cleanup
code would help immensely.

Yep, this is basically the selector pattern matching that we do today.
Maybe the block splitting isn't necessary, but I feel like it would be nice
to say that every handler starts at a BB and no two handlers start at the
same BB. Today it is possible for a cleanup and catch handler to start at
the same BB in this example:

some_eh_bb:
  call void @dtor()
  ... typeid...
  %matches = icmp eq ...
  br i1 %matches, label %more_dispatch, label %catch_int

Step 2: Do a forwards-backwards dataflow analysis to mark basic blocks as

reachable and unreachable from each handler start block. The forwards part
is an easy DFS, the backwards part is basically about pruning unreachable
blocks like 'ret' from a handler. We can probably skip the backwards part
initially.

[Andy Kaylor] I have some concerns about this part. When blocks have
been merged so that handlers are sharing blocks with non-exception code
things are going to get fairly messy. The existing cloning code takes care
of this (in theory at least) by scrubbing things after resolving PHIs based
on which blocks didn’t get cloned. I suppose this is a potential fault
line if the code we didn’t want loops back into the shared block. I’m
guessing that this is the sort of problem you’re thinking of that has you
wanting to change things. Do you have a test case that exposes the
problem? More importantly, do you have a solution in mind?

Specifically, I'm dealing with __finally blocks today, which have this
control flow structure of branch to shared cleanup, branch on phi out to EH
or normal flow. This made WinEHPrepare pretty unhappy. I need some way to
fix this. I've been toying with doing the __finally outlining in Clang, but
if we're serious about making WinEHPrepare handle this, then maybe we don't
need that.

With catch handlers we have clear markers for the beginning and end of the
catch (possibly including multiple end calls). Having similar markers for
cleanup would help a lot. I still think there’s potential for getting lost.

We can add markers, but I'm worried that it might be an optimization
barrier and we could still get lost. What differentiates the end of one
cleanup from the end of another? Should the end cleanup call consume an SSA
value produced by the begin cleanup call maybe? I guess nesting catches
isn't a problem because the only way to enter another catch is to do an
invoke, which goes via a landingpad, which we don't outline. The same
should be true for cleanup end markers.

Step 3: Clone all blocks that are reachable from more than one handler

until all blocks belong to exactly one handler, and sink code from
landingpads back into handlers if necessary. We don't have many code
commoning optimizations, so this step is more of a defense against
theoretical future optimization passes that do more than tail merging.

[Andy Kaylor] I’m not sure I see how this is better than the current
cloning approach.

These last 3 steps are basically an optimization. Cloning and then deleting
all IR reachable from landingpads, which we currently do at -O0, is fairly
wasteful. When no inlining occurs at -O0 and -O1, the IR should be easily
analyzable, and we just need to splice the handler BBs into new functions
and rewrite uses of allocas.

Step 4: Demote all SSA values live from one handler into or out of
another. The only SSA values referenceable across handlers are static
allocas.

[Andy Kaylor] Again, if I underststand correctly we’re doing this now,
just with a fairly distributed mechanism.

Step 5: Extract the blocks belonging to each handler into their own
functions. Rewrite static alloca references in each new handler to use
llvm.framerecover.

Step 6: Profit?

Does this seem like a reasonable plan? IMO simplifying the overarching
design is a higher priority right now than fixing individual test cases.

[Andy Kaylor] I think there’s significant value in getting some core suite
of tests working so that we can get out of our current pattern of rewriting
all of our tests every time we make a significant change and into a
situation where we’ll have some stable tests that will verify that any
redesign isn’t breaking something. I also think that in chasing down
problems with the implementation we have now we’ll learn more about the
problems that need to be solved.

I don’t have a strong objection to your working on a redesign while I try
to work through some bugs in the current implementation, but I’d like to
ask you to hold off on committing significant restructuring for a little
while unless I run into something that can’t be solved within the current
design. I feel like I can have the small test suite that I’m working on
passing within a week or so.

Sure. By breaking things down into steps, I'm not trying to fundamentally
change anything like I was with landingswitch. This is just trying to
separate IR analysis from IR transformation so that we can reason about
things a bit more easily.