Inquiry regarding AddOneMoreFrame function in UnWindLLDB

Hello,
I am currently working on Bug 27687 (PrintStackTraces), so the reason for the failure is the erroneous unwinding of the frames from the zeroth frame. The error is not detected in AddOneMoreFrame, since it only checks for 2 more frames, if it was checking more frames in AddOneMoreFrame, it would have detected the error. Now my questions are →

→ is that is there any specific reason for only checking 2 frames instead of more ?
→ Is is safe to assume that in the absence of prologue and epilogue the assembly unwinder will always fail ?

Best Regards,
A Ravi Theja

Hello,
I posted this query a while ago, i still have no answers, I am currently working on Bug 27687 (PrintStackTraces), so the reason for the failure is the erroneous unwinding of the frames from the zeroth frame. The error is not detected in AddOneMoreFrame, since it only checks for 2 more frames, if it was checking more frames in AddOneMoreFrame, it would have detected the error. Now my questions are →

→ is that is there any specific reason for only checking 2 frames instead of more ?
→ Why no make the EH CFI based unwinder the default one and make the assembly the fallback ?

Best Regards,
A Ravi Theja

Hello,
      I posted this query a while ago, i still have no answers, I am currently working on Bug 27687 (PrintStackTraces), so the reason for the failure is the erroneous unwinding of the frames from the zeroth frame. The error is not detected in AddOneMoreFrame, since it only checks for 2 more frames, if it was checking more frames in AddOneMoreFrame, it would have detected the error. Now my questions are ->

-> is that is there any specific reason for only checking 2 frames instead of more ?

The stepping machinery uses the unwinder on each stop to figure out whether it has stepped in or out, which is fairly performance sensitive, so we don't want AddOneMoreFrame to do more work than it has to.

Jim

Hello,
     I posted this query a while ago, i still have no answers, I am currently working on Bug 27687 (PrintStackTraces), so the reason for the failure is the erroneous unwinding of the frames from the zeroth frame. The error is not detected in AddOneMoreFrame, since it only checks for 2 more frames, if it was checking more frames in AddOneMoreFrame, it would have detected the error. Now my questions are ->

-> is that is there any specific reason for only checking 2 frames instead of more ?

The stepping machinery uses the unwinder on each stop to figure out whether it has stepped in or out, which is fairly performance sensitive, so we don't want AddOneMoreFrame to do more work than it has to.

Most common case for a bad unwind, where the unwinder is stuck in a loop, is a single stack frame repeating. I've seen loops as much as six frames repeating (which are not actually a series of recursive calls) but it's less common.

-> Why no make the EH CFI based unwinder the default one and make the assembly the fallback ?

Sources of unwind information fall into two categories. They can describe the unwind state at every instruction of a function (asynchronous) or they can describe the unwind state only at function call boundaries (synchronous).

Think of "asynchronous" here as the fact that the debugger can interrupt the program at any point in time.

Most unwind information is designed for exception handling -- it is synchronous, it can only throw an exception in the body of the function, or an exception is passed up through it when it is calling another function.

For exception handling, there is no need/requirement to describe the prologue or epilogue instructions, for instance.

eh_frame (and DWARF's debug_frame from which it derives) splits the difference and makes things quite unclear. It is guaranteed to be correct for exception handling -- it is synchronous, and is valid in the middle of the function and when it is calling other functions -- but it is a general format that CAN be asynchronous if the emitter includes information about the prologue or epilogue or mid-function stack changes. But eh_frame is not guaranteed to be that way, and in fact there's no way for it to indicate what it describes, beyond the required unwind info for exception handling.

On x86, gcc and clang have always described the prologue unwind info in their eh_frame. gcc has recently started describing the epilogue too (clang does not). There's code in lldb (e.g. UnwindAssembly_x86::AugmentUnwindPlanFromCallSite) written by Tong Shen when interning at Google which will try to detect if the eh_frame describes the prologue and epilogue. If it does, it will use eh_frame for frame 0. If it only describes the prologue, it will use the instruction emulation code to add epilogue instructions and use that at frame 0.

There are other sources of unwind information similar to eh_frame that are only for exception handling. Tamas added ArmUnwindInfo last year which reads the .ARM.exidx unwind tables. I added compact unwind importing - an Apple specific format that uses a single 4-byte word to describe the unwind state for each function, which can't describe anything in the prologue/epilogue. These formats definitely can't be used to unwind at frame 0 because we could be stopped anywhere in the prologue/epilogue where they are not accurate.

It's unfortunate that eh_frame doesn't include a way for the producer to declare how async the unwind info is, it makes the debugger's job a lot more difficult.

J

Ok , currently the problem that I am facing is that there are cases in which eh_frame should have been used for frame 0 but it isn’t and the assembly unwind just gives wrong information which could only be detected if the debugger tried to extract more frames. Now the usage of AddOneMoreFrame in UnwindLLDB is to try to get more than one frames in the stack. I want to run both the unwinders and select the one that gives more number of frames.

It gets so tricky! It's hard for the unwinder to tell the difference between a real valid stack unwind and random data giving lots of "frames".

It sounds like the problem that needs fixing is to figure out why the assembly unwind is wrong for frame 0. What do you get for

disass -a <address inside function>

image show-unwind -a <address inside function>

?

Hello,
This is happening in TestPrintStackTraces, where we can end up here:

This has no eh_frame unwind instructions. Even if we were using eh_frame at frame 0, you'd be out of luck.

I forget the exact order of fallbacks. I think for frame 0 we try to use the assembly profile unwind ("async unwind plan") and if we can't do that we fall back to the eh_frame unwind ("sync unwind plan") and as a last resort we'll use the architecture default unwind plan. Which, for a stack frame like this that doesn't do the usual push rbp; mov rsp, rbp sequence, means we'll skip at least one stack frame.

The assembly inspection unwind plan from AssemblyParse_x86 looks correct to me. This function saves some register on the stack (all of them argument or volatile registers, so that's weird & the assembly profiler probably won't record them; whatever), calls a function, restores the reg values and then jumps to the returned function pointer from that first func call. Maybe this is some dynamic loader fixup routine for the first time an external function is called and the solib needs to be paged in.

You're stopped in the body of the function (offset 86) where the stack pointer is still as expected. I'd have to think about that unwind entry for offset +94 (if you were stopped on the jmp instruction) a bit more - that's a bit unusual. But unless you're on the jmp, I can't see this unwind going wrong.

J

This has no eh_frame unwind instructions. Even if we were using eh_frame
at frame 0, you'd be out of luck.

     I did not understand how eh_frame unwind instructions are not there,
pardon me asking, can you tell me how you inferred that ?