[RFC] [Windows SEH][-EHa] Support Hardware Exception Handling

Hi,

This is a spin-off of previous Windows SEH RFC below. This RFC only focus on supporting HW Exception Handling.

A detailed implementation can be seen in here: https://github.com/tentzen/llvm-project/commit/8a2421c274b683051e456cbe12c177e3b934fb5e

It passes all MSVC SEH suite (excluding those with “Jumping out of _finally” ( _Local_Unwind)).

Thanks,

–Ten

**** The rules for C code: ****

For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to

follow three rules. First, no exception can move in or out of _try

region., i.e., no "potential faulty instruction can be moved across _try

boundary. Second, the order of exceptions for instructions ‘directly’

under a _try must be preserved (not applied to those in callees).

Finally, global states (local/global/heap variables) that can be read

outside of _try region must be updated in memory (not just in register)

before the subsequent exception occurs.

**** The impact to C++ code: ****

Although SEH is a feature for C code, -EHa does have a profound effect

on C++ side. When a C++ function (in the same compilation unit with

option -EHa ) is called by a SEH C function, a hardware exception occurs

in C++ code can also be handled properly by an upstream SEH _try-handler

or a C++ catch(…). As such, when that happens in the middle of an

object’s life scope, the dtor must be invoked the same way as C++

Synchronous Exception during unwinding process.

**** Design and Implementation: ****

A natural way to achieve the rules above in LLVM today is to allow an EH

edge added on memory/computation instruction (previous iload/istore

idea) so that exception path is modeled in Flow graph preciously.

However, tracking every single memory instruction and potential faulty

instruction can create many Invokes, complicate flow graph and possibly

result in negative performance impact for downstream optimization and

code generation. Making all optimizations be aware of the new semantic

is also substantial.

This design does not intend to model exception path at instruction

level. Instead, the proposed design tracks and reports EH state at

BLOCK-level to reduce the complexity of flow graph and minimize the

performance-impact on CPP code under -EHa option. Detailed

implementation described below.

– Two intrinsic are created to track CPP object scopes;

eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately

added after ctor() is called and EHStack is pushed. So it must be an

invoke, not a call. With that it’s also guaranteed an EH-cleanup-pad is

created regardless whether there exists a call in this scope. _scope_end

is added before dtor(). These two intrinsics make the computation of

Block-State possible in downstream code gen pass, even in the presence

of ctor/dtor inlining.

– Two intrinsic, seh_try_begin() and seh_try_end(), are added for

C-code to mark _try boundary and to prevent from exceptions being moved

across _try boundary.

– All memory instructions inside a _try are considered as ‘volatile’ to

assure 2nd and 3rd rules for C-code above. This is a little

sub-optimized. But it’s acceptable as the amount of code directly under

_try is very small.

– For both C++ & C-code, the state of each block is computed at the

same place in BE (WinEHPreparing pass) where all other EH tables/maps

are calculated. In addition to _scope_begin & _scope_end, the

computation of block state also rely on the existing State tracking code

(UnwindMap and InvokeStateMap).

– For both C++ & C-code, the state of each block with potential trap

instruction is marked and reported in DAG Instruction Selection pass,

the same place where the state for -EHsc (synchronous exceptions) is

done.

– If the first instruction in a reported block scope can trap, a Nop is

injected before this instruction. This nop is needed to accommodate LLVM

Windows EH implementation, in which the address in IPToState table is

offset by +1. (note the purpose of that is to ensure the return address

of a call is in the same scope as the call address.

– The handler for catch(…) for -EHa must handle HW exception. So it

is ‘adjective’ flag is reset (it cannot be IsStdDotDot (0x40) that only

catches C++ exceptions).

I still have basically the same concerns. I’ll try to give more concrete examples for what I’m concerned about.

Suppose I have something like the following:

typedef struct C { int x[2]; } C;

void threw_exception();

void z();

C f() {

__try {

z();

return (C)0;

} __except(1) {

threw_exception();

}

C c = {0};

return c;

}

Currently, under your proposal, this won’t call threw_exception() if optimization is enabled, as far as I can tell. I have no idea if this is intentional: your proposal and your patch don’t contain or point to any documentation, and I can’t find any documentation that describes this on Microsoft’s website. (I don’t really care what the answer is here; I care that there’s some documented answer to this question, and other questions like it.)

Constructing a testcase for the register allocation issues I mentioned before is hard because it’s sort of “random” based on the register allocation heuristics, but see https://reviews.llvm.org/D77767 for the sort of issues that come up. Note that we mark setjmp returns_twice, which turns off certain optimizations. I don’t really like extending the usage of this sort of construct further, but if we are going to, we should at least mark the new intrinsics returns_twice, so they get the same protection as setjmp.

-Eli

Hi, Eli,

Why are you under the impression that threw_exception() will not be called if optimizations are enabled? I don’t know if the -EHa Spec is clearly described in MSFT Webs. At least this proposal has described the rules for both C & C++ code.

The very first rule clearly said that “no exception can move in or out of _try region., i.e., no potential faulty instruction can be moved across _try boundary”. As such the dereference of statement return (C)0 must be kept in _try scope and the access-violation fault will be caught in _except handler where threw_exception() will be called.

I don’t see why Register allocation plays a part in this topic. I do see a serious problem in LLVM SJLJ today (All tests in MSVC’s Setjmp suite fail with -O2 that I will look into it soon). But I failed to see why HW exception is corelated to setjmp/longjmp. These are two totally different features and the approaches employed are also totally different.

It would be helpful if you can give one example why this proposal need to care about how registers are allocated.

Again what we intend to do in this feature is to achieve these two points below. Please take a moment to read through it. Let me know if there is anything unclear.

Thanks

–Ten

**** The rules for C code: ****

For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to

follow three rules. First, no exception can move in or out of _try

region., i.e., no "potential faulty instruction can be moved across _try

boundary. Second, the order of exceptions for instructions ‘directly’

under a _try must be preserved (not applied to those in callees).

Finally, global states (local/global/heap variables) that can be read

outside of _try region must be updated in memory (not just in register)

before the subsequent exception occurs.

**** The impact to C++ code: ****

Although SEH is a feature for C code, -EHa does have a profound effect

on C++ side. When a C++ function (in the same compilation unit with

option -EHa ) is called by a SEH C function, a hardware exception occurs

in C++ code can also be handled properly by an upstream SEH _try-handler

or a C++ catch(…). As such, when that happens in the middle of an

object’s life scope, the dtor must be invoked the same way as C++

Synchronous Exception during unwinding process.

UHi Ten,

Thanks for the writeup and implementation, nice to meet you.

I wonder if it would be best to try to discuss the features separately. My view is that catching hardware exceptions (/EHa) is critical functionality, but it's not clear to me if local unwind is truly worth implementing. Having looked at the code briefly, it seemed like a large portion of the complexity comes from local unwind. Today, clang crashes on this small example that jumps out of a __finally block, but the intention was to reject the code and avoid implementing the functionality. Clang does, in fact, emit a warning:
$ clang -c t.cpp
t.cpp:7:7: warning: jump out of __finally block has undefined behavior [-Wjump-seh-finally]
      goto lu1;
      ^
Local unwind, in my view, is the user saying, "I wrote __finally, but actually I decided I wanted to catch the exception, so let's transfer to normal control flow now." It seems to me that the user already has a way to express this: __except. I know the mapping isn't trivial and it's not exactly the same, but it seems feasible to rewrite most uses of local unwind this way.

[Ten] Right, I agree that to some degree a local_unwind can be viewed as another type of _except handler in the middle of unwinding. And true that some usage patterns can be worked around by rewriting SEH hierarchy. But I believe the work can be substantial and risky, especially in an OS Kernel. Furthermore, to broaden the interpretation, local_unwind can also serve as a _filter (or even rethrow-like handler in C++ EH), and the target block is the final handler. See the multi-local-unwind example in the doc.

Can you estimate the prevalence of local unwind? What percent of __finally blocks in your experience use non-local control flow? I see a lot of value in supporting catching hardware exceptions, but if we can avoid carrying over the complexity of this local unwind feature, it seems to me that future generations of compiler engineers will thank us.

[Ten] I don’t have this data in hand. But what I know is that local_unwind is an essential feature to build Windows Kernel. One most important SEH test (the infamous xcpt4u.c) is composed of 88 tests; among them there are 25 jumping-out-of-finally occurrences. Of course this does not translate to a percentage of local_unwind, but it does show us the significance of this feature to Windows. FYI Passing xcpt4u.c is the very first fundamental requirement before building Windows Kernel.

As stated in the design paragraph, this design does not intend to model precise CFG at instruction level since it’s complicated and unnecessary.

As long as we comply C and C++ rules listed below, we achieve -EHa semantic. There is NO need to precisely model HW exception control flow at instruction-level.

Your example about memcpy() is just a bug in current implementation. I will fix it so that it’s volatilized in some manner. We are not in Code Review stage yet. let’s focus on design.

There is one seh_try_begin() invoke at the beginning of _try and one seh_try_end() invoke at the end of _try. Their EH edge will point to _except handler. So

your example below will not happen. Compiler should not generate code like that.

Thanks,

–Ten

Hi,

Is there other concerns or suggestions about this proposal?

I have added some details about Reid’s earlier feedback regarding control-flow “region”. The new doc for -EHa is also span off here: https://github.com/tentzen/llvm-project/wiki/Windows-SEH:-HARDWARE-EXCEPTION-HANDLING-(EHa)

Since last time I also added several changes in my prototype to more robustly compute State at block-level in SEME (Single-Entry-Multiple-Exits) region. I also fixed the bug brought up by Eli regarding memcpy intrinsic. The patch can be read here: https://github.com/tentzen/llvm-project/compare/SEH-EHa-base…SEH-EHa?expand=1.

The implementation is actually very simple, isolated and very easy to read. It adds very little complexity into existing code.

Major changes are in four files:

  • [Clang] CGException.cpp and CGCleanup.cpp: Add scope_begin&scope_end intrinsic at entry and exits of seh-_try/cpp-cleanup SEME region.
  • [LLVM] WinEHPrepare.coo & SelectionDAGISel.cpp: Compute and report EH State at block-level.

Other charges are trivial and straight-forward helper code.

I believe I have answered and addressed all concerns or questions from you guys. Please let me know if I miss anything.

Thanks,

–Ten

Thanks,

–Ten

Any comment?

Also +Andrew and Pengfei.

–Ten