LLDB: Unwinding based on Assembly Instruction Profiling

Hi

As far as I know, if the unwinding based on Assembly Instruction
Profiling fails in LLDB then either EH frame unwinding or some other
mechanism comes into picture to unwind properly. Am I right?

In this case, should LLDB change the unwinder plan from Assembly
Instruction Profiling to EH Frame based unwinding so that in future
the unwinding is always done with the new unwind plan rather than
first checking the assembly based unwind plan and then falling back to
EH Frame based unwind plan?

Thanks

EH frame can't be used to unwind when we are in the first frame because it is only valid at call sites. It also can't be used in frames that are asynchronously interrupted like signal handler frames. So at frame zero, we typically just fall back to the default unwind plan for the current architecture which is usually follow the frame pointer for most systems.

This is not necessarily true, GCC can build them like that. I don't
think we have a flag for clang/LLVM to create full async unwind tables.

Joerg

Most compilers don't generate stuff that is complete, and if it is complete, I am not aware of any markings on EH frame that states it is complete. So we really can't use it unless we know the info is complete. Was there ever an additional augmentation letter that was attached to the complete EH frame info?

If we are trying to unwind from a non call site (frame 0 or signal handler) then the current implementation first try to use the non call site unwind plan (usually assembly emulation) and if that one fails then it will fall back to the call site unwind plan (eh_frame, compact unwind info, etc.) instead of falling back to the architecture default unwind plan because it should be a better guess in general and we usually fail with the assembly emulation based unwind plan for hand written assembly functions where eh_frame is usually valid at all address.

Generating asynchronous eh_frame (valid at all address) is possible with gcc (I am not sure about clang) but there is no way to tell if a given eh_frame inside an object file is valid at all address or only at call sites. The best approximation what we can do is to say that each eh_frame entry is valid only at the address what it specifies as start address but we don’t make a use of it in LLDB at the moment.

For the 2nd part of the original question, I think changing the eh_frame based unwind plan after a failed unwind using instruction emulation is only a valid option for the PC where we tried to unwind from because the assembly based unwind plan could be valid at other parts of the function. Making the change for that 1 concrete PC address would make sense, but have practically no effect because the next time we want to unwind from the given address we use the same fall back mechanism as in the first case and the change would have only a very small performance gain.

Tamas

Hi all, sorry I missed this discussion last week, I was a little busy.

Greg's original statement isn't correct -- about a year ago Tong Shen changed lldb to using eh_frame for the currently-executing frame. While it is true that eh_frame is not guaranteed to describe the prologue/epilogue, in practice eh_frame always describes the epilogue (gdb wouldn't couldn't without this, with its much more simplistic unwinder). Newer gcc's also describe the epilogue. clang does not (currently) describe the epilogue. Tong's changes *augment* the eh_frame with an epilogue description if it doesn't already have one.

gcc does have an "asynchronous unwind tables" option -- "asynchronous" meaning the unwind rules are defined at every instruction location. But the last time I tried it, it did nothing. They've settled on an unfortunate middle ground where eh_frame (which should be compact and only describe enough for exception handling) has *some* async unwind instructions. And the same unwind rules are emitted into the debug_frame section, even if -fasynchronous-unwind-tables is used.

In the ideal world, eh_frame should be extremely compact and only sufficient for exception handling. debug_frame should be extremely verbose and describe the unwind rules at all unwind locations.

As Tamas says, there's no indication in eh_frame or debug_frame as to how much is described: call-sites only (for exception handling), call-sites + prologue, call-sites + prologue + epilogue, or fully asynchronous. It's a drag, if the DWARF committee ever has enough reason to break open the debug_frame format for some other changes, I'd like to get more information in there.

Anyway, point is, we're living off of eh_frame (possibly "augmented") for the currently-executing stack frame these days. lldb may avoid using the assembly unwinder altogether in an environment where it finds eh_frame unwind instructions for every stack frame.

(on Mac, we've switched to a format called "compact unwind" -- much like the ARM unwind info that Tamas recently added support for, this is an extremely small bit of information which describes one unwind rule for the entire function. It is only applicable or exception handling, it has no way to describe prologues/epilogues. compact unwind is two 4-byte words per function. lldb will use compact unwind / ARM unwind info for the non-zeroth stack frames. It will use its assembly instruction profiler for the currently-executing stack frame.)

Hope that helps.

J

Greg's original statement isn't correct -- about a year ago Tong Shen changed lldb to using eh_frame for the currently-executing frame. While it is true that eh_frame is not guaranteed to describe the prologue/epilogue, in practice eh_frame always describes the epilogue (gdb wouldn't couldn't without this, with its much more simplistic unwinder). Newer gcc's also describe the epilogue. clang does not (currently) describe the epilogue. Tong's changes *augment* the eh_frame with an epilogue description if it doesn't already have one.

Ahhh.... that paragraph was not clear. I wrote that "in practice eh_frame always describes the epilogue". I meant "always describes the prologue".

lldb needs the prologue description to step in to/step over functions correctly, at least at the first instruction of the function.

It's been five-six years since I worked on gdb's unwinder, but back when I worked on it, it didn't have multiple unwind schemes it could pick from, or the ability to use different unwind schemes in different contexts, or the ability to fall back to different unwind schemes. That may not be true any longer, I don't know. But back then it was an all-or-nothing approach, so if it was going to use eh_frame, it had to use it for everything.

Hi Jason

Thanks a lot for the detailed information. I am sorry to post my
queries a bit late. Here are few things that I want to ask.

When eh_frame has epilogue description as well, the Assembly profiler
doesn't need to augment it. In this case, is eh_frame augmented unwind
plan used as Non Call Site Unwind Plan or Assembly based Unwind Plan
is used? I checked FuncUnwinders::GetUnwindPlanAtNonCallSite()
function. When there is nothing to augment in eh_frame Unwind plan,
then GetEHFrameAugmentedUnwindPlan() function returns nullptr and
AssemblyUnwindPlan is used as Non Call Site Unwind Plan. Is it the
expected behavior?

About your comments on gcc producing ''asynchronous unwind tables'',
do you mean that gcc is not producing asynchronous unwind tables as it
keeps *some* async unwind instructions and not all of them?

Abhishek

Hi Abhishek,

When eh_frame has epilogue description as well, the Assembly profiler
doesn't need to augment it. In this case, is eh_frame augmented unwind
plan used as Non Call Site Unwind Plan or Assembly based Unwind Plan
is used?

Yes, you're correct.

If an eh_frame unwind plan describes the epilogue and the prologue, we will use it at "non-call sites", that is, the currently executing function.

If we augment an eh_frame unwind plan by adding epilogue instructions, we will use it at non-call sites.

If an eh_frame unwind plan is missing epilogue, and we can't augment it for some reason, then it will not be used at non-call sites (the currently executing function).

The assembly unwind plan will be used for the currently executing function if we can't use the eh_frame unwind plan.

I checked FuncUnwinders::GetUnwindPlanAtNonCallSite()
function. When there is nothing to augment in eh_frame Unwind plan,
then GetEHFrameAugmentedUnwindPlan() function returns nullptr and
AssemblyUnwindPlan is used as Non Call Site Unwind Plan. Is it the
expected behavior?

Yes. FuncUnwinders::GetEHFrameAugmentedUnwindPlan gets the plain eh_frame unwind plan, passes it to UnwindAssembly_x86::AugmentUnwindPlanFromCallSite().

UnwindAssembly_x86::AugmentUnwindPlanFromCallSite will verify that the unwind plan describes the prologue. If the prologue isn't described, it says that this cannot be augmented.

It then looks to see if the epilogue is described. If the epilogue is described, it says the unwind plan is usable as-is.

If the epilogue is not described, it will use the assembly unwinder to add the epilogue unwind instructions.

About your comments on gcc producing ''asynchronous unwind tables'',
do you mean that gcc is not producing asynchronous unwind tables as it
keeps *some* async unwind instructions and not all of them?

"asynchronous" means that the unwind instructions are valid at every instruction location.

"synchronous" means that the unwind instructions are only valid at places where an exception can be thrown, or a function is called that may throw an exception.

Inside lldb, I use the terminology "non-call site" to mean "asynchronous". You're at an arbitrary instruction location, for instance, you're in the currently-executing function. I use "call site" to mean synchronous - a function has called another function, so it's in the middle of the function body, past the prologue, before the epilogue. This is a function higher up on the stack.

The terms are confusing, I know.

The last time I checked, gcc cannot be made to emit truly asynchronous unwind instructions. This is easy to test on a i386 binary compiled with -fomit-frame-pointer. For instance (the details will be a little different on an ELF system but I bet it will be similar if the program runs position independent aka pic):

% cat >test.c
#include <stdio.h>
int main () { puts ("HI"); }
^D
% clang -arch i386 -fomit-frame-pointer test.c
% lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (i386).(lldb) disass -b -n main
a.out`main:
a.out[0x1f70] <+0>: 83 ec 0c subl $0xc, %esp
a.out[0x1f73] <+3>: e8 00 00 00 00 calll 0x1f78 ; <+8>
a.out[0x1f78] <+8>: 58 popl %eax
a.out[0x1f79] <+9>: 8d 80 3a 00 00 00 leal 0x3a(%eax), %eax
a.out[0x1f7f] <+15>: 89 04 24 movl %eax, (%esp)
a.out[0x1f82] <+18>: e8 0d 00 00 00 calll 0x1f94 ; symbol stub for: puts

Look at the call instruction at +3. What is this doing? It calls the next instruction, which does a pop %eax. This is loading the address main+8 into eax so it can get the address of the "HI" string which is at main+8+0x3a. It's called a "pic base", or position independent code base, because this program could be loaded at any address when it is run, the instructions can't directly reference the address of the "HI" string.

If I run this program and have lldb dump its assembly unwind rules for the function:

(lldb) image show-unwind -n main
row[0]: 0: CFA=esp +4 => esp=CFA+0 eip=[CFA-4]
row[1]: 3: CFA=esp+16 => esp=CFA+0 eip=[CFA-4]
row[2]: 8: CFA=esp+20 => esp=CFA+0 eip=[CFA-4]
row[3]: 9: CFA=esp+16 => esp=CFA+0 eip=[CFA-4]
row[4]: 34: CFA=esp +4 => esp=CFA+0 eip=[CFA-4]

It gets this right. After the call instruction at +3, the CFA is now esp+20 because we just pushed a word on to the tack. And after the pop instruction at +8, the CFA is back to esp+16 because we popped that word off the stack.

An asynchronous unwind plan would describe these stack movements. A synchronous unwind plan will not -- they are before any point where we could throw an exception, or before we call another function.

(notice that you need to use -fomit-frame-pointer to get this problem. If ebp is set up as the frame pointer, it doesn't matter how we change the stack pointer for the rest of the function.)

Hope that helps.