setjmp/longjmp and volatile stores, but non-volatile loads

Hi,

In our (non-C) compiler we use setjmp/longjmp to implement exception
handling. For the initial implementation LLVM backend, I'm keeping that
model. In order to ensure that changes performed in a try/setjmp==0
block survive the longjmp, the changes must be done via volatile operations.

Given that volatility is a property of individual load/store
instructions rather than of memory slots in LLVM, I thought I could
optimise this by only marking the stores in try-blocks as volatile.
After all, I don't mind LLVM removing extra loads of the same variable
in a try-block.

However, if I do that (instead of making all loads/stores to the
variable volatile), then some kind of (in my case invalid) value
propagation seems to happen from the try-block to the exception block.

From what I can see with opt -print-after-all, it's the GVN pass that

does this.

I have attached a C program that demonstrates the issue: compiled with
clang -O0 or -O1 it works fine, but with -O2 it prints an error because
the "loops" variable has a wrong value on exit.

Mind you: I do not claim that the attached program is valid according to
any C standard or setjmp/longjmp documentation (it might be, but I don't
know). It's just that if you compile this program at with clang -O0
-emit-llvm, you get more or less the same LLVM IR as what we generate
for our code. (*) Next you can process the resulting .ll file with e.g.
opt -O2 to reproduce the issue (i.e., without any regard to a C
standard, to the extent that LLVM IR behaviour can be considered to be
unrelated to any C standard).

My question is: is this a bug in LLVM, or is this behaviour deemed to be
by design? Also: our setjmp is not called setjmp, but we do mark its
replacement also as "returns_twice". So even if the above behaviour
would be considered "expected" for setjmp, would the same go for any
function marked as "returns_twice"?

I tested with clang/LLVM 3.7.0 and the clang from 7.0.1. I don't have
newer versions available here.

Thanks,

Jonas

(*) The only difference is that in our code there is no alias of the
"loops" variable through a pointer: we simply directly store to the
loops variable with a volatile store inside the try/setjmp==0 block. At
the LLVM IR level, that should not make a difference though since the
aliasing from the pointer to the loops variable is trivial. Removing the
aliasing manually from the generated IR does not change anything either,
as expected.

tint643c.c (677 Bytes)

If you want to observe those volatile store updates, you're really going to
need to volatilize the load operations. In your example, LLVM does not
model the CFG edge from the longjmp to the setjmp. This leads LLVM to
conclude that the only reaching definition of 'loops' at the point of the
load in the else block is 'loops = 0'.

Volatilizing all operations on local variables is going to kill your
performance, obviously. You should really emit invoke instructions in your
frontend. You can either use your own EH personality, or the existing SjLj
EH personality, which will optimize on a correct CFG and then volatilize
all values live across exceptional edges. Then the LLVM CFG will be
correct, and you'll get pretty good code.

Reid Kleckner wrote:

    model. In order to ensure that changes performed in a try/setjmp==0
    block survive the longjmp, the changes must be done via volatile
    operations.

If you want to observe those volatile store updates, you're really going
to need to volatilize the load operations. In your example, LLVM does
not model the CFG edge from the longjmp to the setjmp. This leads LLVM
to conclude that the only reaching definition of 'loops' at the point of
the load in the else block is 'loops = 0'.

Ok, thanks for confirming this approach is not going to work.

Volatilizing all operations on local variables is going to kill your
performance, obviously. You should really emit invoke instructions in
your frontend. You can either use your own EH personality, or the
existing SjLj EH personality, which will optimize on a correct CFG and
then volatilize all values live across exceptional edges. Then the LLVM
CFG will be correct, and you'll get pretty good code.

The main reason for using setjmp/longjmp is to maintain compatibility
between code compiled with the LLVM backend and with our existing code
generators . Switching to the SjLj personality would defeat that I think
(it seems to use LLVM-defined internal data structures for storing the
context information, such as the "five word buffer in which the calling
context is saved"). In that case it would be better to immediately
switch to ehframe-based exception handling so as to at least reap some
benefits in the process.

It is not clear to me from reading
Exception Handling in LLVM — LLVM 18.0.0git documentation whether it is possible to
use our own setjmp/longjmp infrastructure without modifying LLVM. I'm
only interested in getting the LLVM CFG correct. I don't need any
runtime support, data structures (ehframe) or context information from
LLVM. All of our exception state is stored in TLS structures that can be
obtained by calling routines in our own runtime.

So, can I use invoke and landingpad without using any of the other
exception handling intrinsics? (in combination with a dummy personality
function) Or will LLVM in all cases insist on using ehframe information,
a (C++-)ABI-compliant personality function, and possibly linking in
parts of its own runtime that depend on this information being correct?

Thanks,

Jonas

Reid Kleckner wrote:
>
> model. In order to ensure that changes performed in a try/setjmp==0
> block survive the longjmp, the changes must be done via volatile
> operations.
>
> If you want to observe those volatile store updates, you're really going
> to need to volatilize the load operations. In your example, LLVM does
> not model the CFG edge from the longjmp to the setjmp. This leads LLVM
> to conclude that the only reaching definition of 'loops' at the point of
> the load in the else block is 'loops = 0'.

Ok, thanks for confirming this approach is not going to work.

> Volatilizing all operations on local variables is going to kill your
> performance, obviously. You should really emit invoke instructions in
> your frontend. You can either use your own EH personality, or the
> existing SjLj EH personality, which will optimize on a correct CFG and
> then volatilize all values live across exceptional edges. Then the LLVM
> CFG will be correct, and you'll get pretty good code.

The main reason for using setjmp/longjmp is to maintain compatibility
between code compiled with the LLVM backend and with our existing code
generators . Switching to the SjLj personality would defeat that I think
(it seems to use LLVM-defined internal data structures for storing the
context information, such as the "five word buffer in which the calling
context is saved"). In that case it would be better to immediately
switch to ehframe-based exception handling so as to at least reap some
benefits in the process.

Sounds right.

It is not clear to me from reading
Exception Handling in LLVM — LLVM 18.0.0git documentation whether it is possible to
use our own setjmp/longjmp infrastructure without modifying LLVM. I'm
only interested in getting the LLVM CFG correct. I don't need any
runtime support, data structures (ehframe) or context information from
LLVM. All of our exception state is stored in TLS structures that can be
obtained by calling routines in our own runtime.

So, can I use invoke and landingpad without using any of the other
exception handling intrinsics? (in combination with a dummy personality
function) Or will LLVM in all cases insist on using ehframe information,
a (C++-)ABI-compliant personality function, and possibly linking in
parts of its own runtime that depend on this information being correct?

I would say that the coupling between LLVM generated code and the HP unwind
interface is pretty low. The only call LLVM emits is to _Unwind_Resume, so
you would have to go into llvm/lib/CodeGen/DwarfEHPrepare.cpp and teach
LLVM what you want it to call in your runtime for your EH personality.

Other than that, LLVM mostly emits ehframe unwind info and the LSDA, which
describes the landingpad PCs and how to get there. You can either try to
interoperate with that, or you'll have to change LLVM to emit your own LSDA
format. At this point we support 3-ish distinct personalities, all with
their own LSDA format, so there's a fair amount of prior art to look at in
llvm/lib/CodeGen/AsmPrinter/(anything EH related).

I thought I had finally figured out how to dance around this, but while I got close, it's not perfect :confused:

Recap: we use setjmp/longjmp for our exception handling on all platforms in our regular (non-LLVM) code generators. I'd like to use the same infrastructure with the LLVM code generator for code interoperability purposes (the LLVM SjLj personality is not binary-compatible with our existing setjump/longjump buffers).

What I did was:
1) create a dummy personality function in our run time library that basically always says "no, this frame does not wish to handle this exception nor does it have any clean-up code", and specify it as personality function for any function that deals with exceptions
2) create a BBL with a landingpad and no other code at the end of every try-block
3) in the try-blocks themselves, replace all calls with invokes that unwind to this landingpad BBL

Then, I tried the following:
a) if the longjmp for the try-block is taken (i.e., the setjmp right before the try-block returns a non-zero value), jump to the landingpad BBL.

-> Problem: LLVM does not allow regular jump edges to landingpad BBLs

b) since the landingpad is empty anyway and falls through into the next BBL (which contains the start of our actual exception handling code), jump to that next BBL from setjmp.

-> Problem: even though I do not insert code in the landingpad BBLs, LLVM may still add code to them, e.g. for (LLVM-created) phi-nodes.

I can't think of any solution to deal with this :confused:

Jonas

Actually, there's another —even more fundamental— problem: the longjmp
will always restore the non-volatile registers to the contents they had
at the start of the try-block, which is not what LLVM expects when
entering an SEH-based landing pad.

Jonas

I'm moderately sure that SjLj presonality does not use system longjmp
and does not claim to be binary compatible either. It usees the
intrinsics.

Joerg

I know, that's why I was trying to combine our own setjmp/longjmp-based exception handling helpers (for binary compatibility) with a bare minimum of LLVM's generic exception handling infrastructure ("invoke" and "landingpad {i8 *, i32} catch i8* null", to ensure that LLVM's control flow analysis is correct).

You can see an example of the result at ; [8] i1:=low(int64); store i64 -9223372036854775808, i64* @"\01_U_$P$P - Pastebin.com, which is the try/except-statement of the following program:

{$q+}
var
   i1: int64;
   caught: boolean;
begin
   i1:=low(int64);
   caught:=false;
   try
     i1:=i1-1;
   except
     caught:=true;
   end;
end.

However, that approach does not appear to be possible in the end because of the reasons mentioned in my previous mail.

Jonas

The SjLjEHPrepare pass tries to deal with this by demoting all values live
across EH edges to the stack. This should also eliminate those phis from
landingpad blocks. Check
out TargetPassConfig::addPassesToHandleExceptions() and make sure you run
that pass.

Thanks for the suggestion! Unfortunately,
1) it also inserts calls to _Unwind_SjLj_Register/_Unwind_SjLj_Unregister
2) it still inserts phis, and also the equivalent for memory-based synchronisation in landingpads:

Lj15:
   %tmp.6.1.reg2mem.0 = phi i32 [ %reg.1_66, %Lj47 ], [ %reg.1_66, %Lj46 ], [ %reg.1_66, %Lj45 ], [ %reg.1_66, %Lj44 ], [ %reg.1_66, %Lj43 ], [ %reg.1_66, %Lj42 ], [ %reg.1_66, %Lj41 ], [ %reg.1_66, %Lj40 ], [ %reg.1_66, %Lj32 ], [ %reg.1_66, %Lj38 ], [ %reg.1_6
6, %Lj37 ], [ %reg.1_66, %Lj36 ], [ %reg.1_66, %Lj35 ], [ %reg.1_66, %Lj34 ], [ %reg.1_66, %Lj31 ], [ %tmp.6.0, %Lj29 ], [ %tmp.6.0, %Lj28 ], [ %tmp.6.0, %Lj27 ], [ %tmp.6.0, %Lj26 ], [ %tmp.6.0, %Lj25 ], [ %tmp.6.0, %Lj22 ], [ %tmp.6.0, %Lj20 ]
   %reg.1_116 = landingpad %"typ.PROGRAM.$llvmstruct$d00000004i32"
           catch i8* null
   %exception_gep = getelementptr { i8*, i32, [4 x i32], i8*, i8*, [5 x i8*] }, { i8*, i32, [4 x i32], i8*, i8*, [5 x i8*] }* %fn_context, i64 0, i32 2, i64 0
   %exn_val = load volatile i32, i32* %exception_gep, align 4
   %exn_selector_gep = getelementptr { i8*, i32, [4 x i32], i8*, i8*, [5 x i8*] }, { i8*, i32, [4 x i32], i8*, i8*, [5 x i8*] }* %fn_context, i64 0, i32 2, i64 1
   %exn_selector_val = load volatile i32, i32* %exn_selector_gep, align 4
   br label %Lj13

; label where our setjmp branches to
Lj13: ; preds = %0, %Lj15
   %tmp.6.2 = phi i32 [ 0, %0 ], [ %tmp.6.1.reg2mem.0, %Lj15 ]

Additionally, if you just run that pass in opt in addition to -O1 (-sjljehprepare -O1), then calling llc on the result will fail with an assert:

Assertion failed: (MI.isEHLabel() && "expected EH_LABEL"), function EmitSjLjDispatchBlock, file /Data/imacdev/llvm/lib/Target/X86/X86ISelLowering.cpp, line 25273.

That's with trunk@290046; with 3.9 I get errors about invalid relocation calculations instead. So it seems like this pass cannot be used in isolation.

I've posted the full unoptimized code for the test module at target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 - Pastebin.com (although to run it after compiling, you'd need more).

Jonas