According to this page:
data coming from L1 is only about three times as expensive as data
coming from a register. So putting a register check after *every*
call is probably not going to be profitable, compared to a
thread-local global variable check after every invoke... if they
happen often on a thread, that variable will probably be in cache, and
if they don't happen often, the performance impact will be minimal.
Of course if most methods have variables with destructors, I'll end up
with a check of some kind after almost every (non-nounwind) call
anyway, so a register check would be better. On the other hand,
implementing the register check would seem to require native codegen
changes at callsites as opposed to an IR-modifying pass with a
possible new intrinsic or two.
Anyway, here's my new plan:
1. A thread local global variable, type i8*, initialized to zero.
2. At invoke callsites, right before the invoke call a native method
(mysetjmp) that:
a. Saves ESI, EDI, EBX, EBP, ESP to a buffer alloca'd within the
method containing the invokesite..
b. Sets EAX to 0
c. Returns.
3. The return value of that native method (EAX) is checked, and if
nonzero, branch to unwind label. Otherwise, save the value of the
thread-local-global into the buffer, write the address of that
alloca'd buffer into the thread-local global and make the call.
4. After the call returns, copy the old thread-local-global value out
of the alloca'd buffer back to the thread-local-global.
The unwind instruction will then:
1. Load the thread-local-global value. If it's zero, there's nowhere
to unwind to, so abort.
2. Restore ESI, EDI, EBX, EBP, ESP, and the thread-local-global value
from the buffer.
3. Set EAX to 1.
4. Jump to 2c. (the return instruction for the native method mysetjmp).
The native method will return with all callee-saved registers restored
and a return value in EAX of 1, which will cause the following check
to branch to the unwind label.
Invoke sites only write five callee-saved registers to the stack, and
read/write one pointer to a single thread-local global variable, and
make one direct call. Unwind sites make one direct call, read five
callee-saved registers from the stack (some distance up, so those
memory values might not be warm) and read/write one pointer to a
single thread-local global variable.
The next step would be to replace the mysetjmp call with a new
intrinsic, and then I'd have to save EIP and do an indirect jump to it
at the unwind site instead of jumping to a constant offset within the
native mysetjmp. Making mylongjmp call a new intrinsic will
necessitate no other modifications.