Why doesn't `unwind_phase2` skip the cleanup frame after an `_Unwind_Resume`?

In investigating nounwind declaration of _Unwind_Resume generates bad gcc_except_table · Issue #56825 · llvm/llvm-project · GitHub, I found that when performing a cleanup during exception unwinding, the sequence appears to be the following (based on tracing libunwind calls):

  1. unwind_phase2 reaches the frame with the cleanup.
  2. unwind_phase2 calls the personality function, which installs a context for the cleanup.
  3. The cleanup does its thing and then calls _Unwind_Resume to resume unwinding.
  4. _Unwind_Resume enters unwind_phase2 again.
  5. unwind_phase2 processes the frame with the cleanup again (this time with the IP pointing to just past the _Unwind_Resume call).
  6. unwind_phase2 calls the personality function for this IP, which expects there to be a no-op call site table entry covering it (and the absence of that entry led to the bug I was investigating).
  7. The personality function tells the unwinding to proceed past this frame.

We end up processing the frame with the cleanup twice, which seems inefficient and necessitates a dummy call site table entry. Both of those are probably pretty minor in the grand scheme of things, but it still feels unnecessary.

We could avoid this by having unwind_phase2 perform an extra unwind cursor step when it’s called from _Unwind_Resume, so that it skips over the cleanup frame the second time. As far as I can tell, none of libgcc, nongnu libunwind, or LLVM libunwind implement this though, so I’m sure there’s a good reason for that. What am I missing that would make this extra cursor step not work? (I haven’t thought about forced unwinds at all, for example.)

Another way to avoid this extra step would be for the compiler-generated cleanup code to perform a tail call to _Unwind_Resume. That feels conceptually cleaner than the unwinder performing the extra cursor step (and I guess the unwinder has to assume that _Unwind_Resume might have been tail-called, which would make the extra step invalid). It would require the return address register to be restored before the tail call though (or the stack pointer to be restored if the return address is stored on the stack), which might make it not worth it (and also more complicated to codegen).

I think I’ve answered part of my question (the extra unwind cursor step isn’t always valid because the unwinder doesn’t know if _Unwind_Resume was tail-called), but I’m curious if there’s other reasons as well.

The _Unwind_Resume callsite is not actually required to have an unwind-table entry that says to unwind to the parent. It can actually have anything you like!

I’m not sure if there’s an existing use-case for that, but that’s something I’m planning to take advantage of in my call unwindabort proposal. (…which is still waiting for me to finish the work and send it out, sorry I’m so slow there…). So I’d prefer that we not break that. :slight_smile:

As for tail-calling…that may work, although it’d make the backtrace less useful, as it’d be missing the top frame. Also, as a general rule, LLVM won’t tail call a noreturn function – exactly because in many cases, you want to see a useful backtrace there.

1 Like

Yup, we’re looking forward to the unwindabort work :slight_smile: I’m not sure that interacts with this though. From what I understand, that proposes to omit call site table entries to handle termination, but if you’re calling _Unwind_Resume you’re explicitly not terminating (at least for the current frame), right?

EDIT: Ah, I just read the resume unwindabort portion of that proposal again. I’m not sure what circumstances that’s intended to be used under though … is it for handling the case where you have a catch in a noexcept function handling only certain types, and you want the resume for the other cases to cause termination?