Problem unwinding from inside of a CRT function

Having some trouble unwinding when I’m broken inside of a CRT function. Another caveat is that I don’t have symbols for this CRT function. So the problem could be anything from something I’ve done wrong on my side, to an issue when symbols aren’t present, to something else. Here is the source code of this program:

#include <stdio.h>

int main (void)
{
printf(“This is line 1\n”);
printf(“This is line 2\n”);
printf(“This is line 3\n”);
return 1;
}

Here is the disassembly of main:

(lldb) disassemble -n main -F intel
0x1235040 : push ebp
0x1235041 <main+1>: mov ebp, esp
0x1235043 <main+3>: sub esp, 0x14
0x1235046 <main+6>: lea eax, [0x1230040]
0x123504c <main+12>: mov dword ptr [ebp - 0x4], 0x0
0x1235053 <main+19>: mov dword ptr [esp], eax
0x1235056 <main+22>: call 0x12350a1
0x123505b <main+27>: lea ecx, [0x1230050]
(snipped for brevity)

(Using the argument to “call” as the breakpoint address)
(lldb) break set -a 0x12350a1
Breakpoint 3: address = 0x012350a1
(lldb) run
Process 17044 launching
(lldb) Process 17044 launched: ‘d:\testexe\expr_test.exe’ (i386)
(lldb) Process 17044 stopped

  • thread #1: tid = 0x40ec, 0x012350a1 expr_test.exe, stop reason = breakpoint 3.1
    frame #0: 0x012350a1 expr_test.exe
    → 0x12350a1: pushl $0xc
    0x12350a3: pushl $0x1241000
    0x12350a8: calll 0x1235be0
    0x12350ad: xorl %edi, %edi
    (lldb) disassemble -b -F intel

→ 0x12350a1: 6a 0c push 0xc
0x12350a3: 68 00 10 24 01 push 0x1241000
0x12350a8: e8 33 0b 00 00 call 0x1235be0
0x12350ad: 33 ff xor edi, edi
0x12350af: 89 7d e4 mov dword ptr [ebp - 0x1c], edi
0x12350b2: 33 c0 xor eax, eax
0x12350b4: 39 45 08 cmp dword ptr [ebp + 0x8], eax
0x12350b7: 0f 95 c0 setne al
0x12350ba: 85 c0 test eax, eax
0x12350bc: 75 15 jne 0x12350d3

Here’s my register values:

(lldb) register read
General Purpose Registers:
eax = 0x01230040
ebx = 0x00000000
ecx = 0x00000001
edx = 0x00000000
edi = 0x00000000
esi = 0x00000000
ebp = 0x00EAF920
esp = 0x00EAF908
eip = 0x012350A1
eflags = 0b00000000000000000000001000010110

And using the value of esp to dump the stack (sorry, I don’t know how to use the -f argument to format this more nicely),

(lldb) memory read 0x00EAF908
0x00eaf908: 5b 50 23 01 40 00 23 01 00 00 00 00 00 00 00 00 [P#.@.#…
0x00eaf918: 28 f9 ea 00 00 00 00 00 68 f9 ea 00 4e 52 23 01 (…h…NR#.

So the return address is 0x0123505b. Cross-referencing this with the original disassembly of main(), it looks like this is the correct value.

So it seems like the Unwinder has all the information it needs, but yet I’m still only getting 1 frame. Any suggestions how to dig into this?

BTW, probably a given, but I’m doing this on Windows with a Windows executable. So it could be something to do with that as well, but I’m not really familiar with how the unwinder works.

FWIW it never calls RegisterContext::CreateRegisterContextForFrameIndex with any index other than 0.

lldb is stopped on the first instruction of a function (at address 0x12350a1). But there's no symbol for this function -- lldb doesn't KNOW it's at the first instruction of a function. It can't profile the assembly instructions of the function to figure out how the unwind instructions should work. So lldb falls back to the "architecture default unwind plan", e.g. ABISysV_x86_64::CreateDefaultUnwindPlan(), which assumes that the way to find the caller's eip address is to dereference ebp.

Basically, you need some source of unwind info for functions. On Linux systems, this would be eh_frame instructions. On Mac OS X, there's "compact unwind" info that does the same thing. I'm sure Windows has something -- it's needed to do exception handling.

If you don't have any source of unwind information, you need to have accurate function start addresses so lldb can look at the instruction stream and make up an unwind plan from those (v. UnwindAssembly-x86.cpp). But this means you need to know the start address of all functions in the file. On Mac OS X we have a special little section (LC_FUNCTION_STARTS) that encodes the length of each function in the file--so even if the function names are stripped before shipping, we can find the start address of each function easily. lldb adds these to the symbol table and makes up function names for them.

If you have no compiler-generated unwind information (that lldb can parse) and you don't have start addresses for all functions in the file, things are going to work poorly.

For what it's worth, when looking at unwind issues it's usually easiest to turn on the unwind logging. "log enable lldb unwind". That'll show what lldb was up to.

Also, another ABI method is useful - see ABISysV_x86_64::CreateFunctionEntryUnwindPlan(). This is the UnwindPlan that lldb will use when it knows it is at the start of a function before any instructions have been executed.

J

Yuck :frowning: Sounds like it’s going to be a lot of work to get this working then. We’re working with the Microsoft ABI, not the Itanium ABI, so I’m assuming that’s why ABISysV_x86_64::CreateFunctionEntryUnwindPlan isn’t doing anything for me. Presumably I need to implement an ABIMicrosoft_x86_x64 plugin.

Which is unfortunate, because it seems to be needed even for basic stepping to work, like step over. Originally I was just trying to implement stepping, and that’s how I ran into this issue. So that brings me to a related question. Why is step over as complicated as it is? It seems to me like step over can be implemented by disassembling 1 opcode, adding the size of the opcode to the current pc, and having the ThreadPlan::ShouldStop always return false unless the pc is equal to old_pc + size_of_opcode.

It currently has a lot of logic for comparing frames against each other and things like that though.

I doubt the ABI very different regarding CreateDefaultUnwindPlan() and CreateFunctionEntryUnwindPlan() -- they describe a very minimal set of how to backtrace without any knowledge.

You'll also need to implement the RegisterIsVolatile() method based on the Windows ABI. This tells lldb which registers are non-volatile aka preserved. For the SysV ABI, that says that a function can call another function and the contents of ebx will not be modified -- ebx is non-volatile/preserved.

"step over" relies heavily on the unwinder. When you say step over/next, lldb instruction steps through a line in your function. Each time it instruction-steps, it looks to see that it is still in the same function and still within that source line address range. If it is in a new function, it needs to determine: Did I call a function? Or did I return out of my original function during the step? To tell what happened, it backtraces one frame to see if the caller is the function it originally started in.

So you need to be able to unwind reliably from the first instruction of a function. Which, without any symbols or unwind info, is going to be pretty difficult to get right.

J

You are describing "thread step-inst". That should pretty much always work regardless of unwinder, etc.

Source step over, as Jason said, is much more complicated.

Jim

Thanks, I wasn’t aware of the distinction.

As an aside, Windows comes with an API that does its own unwinding. It works quite well and is aware of many minor details such as Microsoft specific compiler flags that affect the way code is generated, and also works (somewhat) in the presence of FPO, no symbols, and other edge cases. If I wanted to make unwinding on Windows use this API, what would be the best way to fit that in to LLDB? In theory it would replace UnwindLLDB for any situation where the binary was using a Microsoft ABI.

You'd want to look at the UnwindLLDB and RegisterContextLLDB classes.

UnwindLLDB is the top-level class which lldb asks "hey can you give me another stack frame above this one". UnwindLLDB creates a RegisterContext for that stack frame -- a RegisterContextLLDB -- and so when someone wants to ask "what's the saved pc value for this stack frame", it goes to the register context and knows how to retrieve that.

My biggest concern about the API you mention is whether it assumes that it is doing an in-process unwind. lldb is an external process so if the API is looking at lldb's symbols and memory layout, that isn't going to do you any good.

But it may be possible to delegate all of this to another API assuming it can work on another process. So when you ask for the value of rbx in frame 2, the RegisterContext for that frame would bounce over to the windows api, get the result and hand it back to lldb.

J

It’s not in process. It’s actually designed specifically for debuggers to use for walking stacks of target processes.

Should be possible to create an Unwind and RegisterContext subclasses to work with this. There are several different subclasses that you can look at - UnwindLLDB/RegisterContextLLDB are fairly sophisticated/complicated. A more trivial example to start with would be UnwindMacOSXFrameBackchain and RegisterContextMacOSXFrameBackchain.

Btw, I’m still a little uncomfortable that not having unwind/ symbol info at any point no matter how deep in a function call chain, has the possibility to mess up a step over. In my original example, i had symbols for main but not printf. Is that not sufficient to step over a call to printf? It should be able to know from that a) the bounds of main(), b) the pc corresponding to the next line of source after printf, and c) the value of esp. Aren’t those 3 pieces of information enough to step over any line of source, regardless of whether you have unwind information for the code inside the function you’re stepping over?

Btw, I'm still a little uncomfortable that not having unwind/ symbol info at any point no matter how deep in a function call chain, has the possibility to mess up a step over. In my original example, i had symbols for main but not printf. Is that not sufficient to step over a call to printf? It should be able to know from that a) the bounds of main(), b) the pc corresponding to the next line of source after printf, and c) the value of esp. Aren't those 3 pieces of information enough to step over any line of source, regardless of whether you have unwind information for the code inside the function you're stepping over?

The stepping machinery is not architecture specific, nor should it be. It's the unwinder's job to get the stack frames right on a per-architecture basis, and then the stepping machinery is the consumer of the unwind info. The stepping machinery also needs to know things like "how to I get back out of a function that I stepped into" which again rely on the unwind being accurate.

Jim

One important thing to get right before proceeding is getting the correct address bounds of all functions so that the disassembly unwinder can do its job. You said you are stopped at a function, but don't know the function bounds. You will want to modify your object file reader (COFF?) to create a viable symbol table that can be used. On MacOSX we use the actual symbol table from the object file and supplement it with all sorts of goodies:
1 - LC_FUNCTION_STARTS load command which tells us all function bounds even if their symbols have been stripped
2 - the PLT entries are made into symbols
3 - more data from the __LINKEDIT is used to create other symbols

Can you modify your COFF plug-in to get the symbols bounds for every function somehow? Then we can rely on the unwind plan that manually disassembles the functions and makes its own unwind info.

Greg

I’m still trying to wrap my head around the way LLDB does things.

If I understand correctly, when you run thread step-over, it enters single step mode (by setting the trap flag on the CPU). Every time the CPU traps, LLDB picks this up and asks the ThreadPlan “should i stop now?” “should i stop now?” until the ThreadPlan returns true. And part of this “should i stop now” involves generating an unwind.

If my understanding is correct, then I have some questions:

  1. Isn’t this extremely slow? What if I’m in main(), and the program I’m debugging is, say, clang, and I say “step over the entire compilation”? It seems like this would take a decade to return.

  2. What if one of the instructions is a pushf / popf? You step over a pushf, then later you try to continue, and it continues over the popf, which restores the trap flag. Now LLDB is confused because it doesn’t think it’s single stepping, but the CPU does. How does this work?

I'm still trying to wrap my head around the way LLDB does things.

If I understand correctly, when you run thread step-over, it enters single step mode (by setting the trap flag on the CPU). Every time the CPU traps, LLDB picks this up and asks the ThreadPlan "should i stop now?" "should i stop now?" until the ThreadPlan returns true. And part of this "should i stop now" involves generating an unwind.

That's close, but not quite correct.

1) In the original implementation, (and this is how gdb does it, BTW) lldb single-stepped till "something interesting happened." As an optimization, when you are doing any kind of step through source range, I changed lldb so it runs from "instruction that could branch" to "instruction that could branch" using breakpoints. Then when it hits an instruction that could branch it single steps that instruction, and then figures out from where that went what to do next.

BTW, if it were helpful to figure out what to do next, we could store some info (the old stack frame or whatever) when we hit a branch instruction, and then use it when the single-step completed. I haven't needed to do that yet, however; Jason's always been able to get the unwinder work reliably enough not to require this.

2) If the single step pushes a frame, and we are "stepping over", lldb sets a breakpoint on the return address and continues. When the return address is hit (for the current frame of course since it could be hit recursively) then we continue stepping as above.

3) And of course, if the single step over the branch pops a frame, then we stop.

If my understanding is correct, then I have some questions:

1) Isn't this extremely slow? What if I'm in main(), and the program I'm debugging is, say, clang, and I say "step over the entire compilation"? It seems like this would take a decade to return.

Right, if we were to keep single stepping after we push a frame it would be very slow. You would definitely have noticed that. But as you see, we don't.

Note, sometimes not doing single-stepping through gobs of library code requires some tricky footwork. For instance, if you are stepping into and hit a cross-library call, you first step to the inter-library procedural stub. The particular stub you are stepping through might not have been bound up yet, so if you can't tell by some other means where that call would go you would have to single stepping through lots of loader code to get to the real function call. lldb has a whole section of code to predict the trampoline endpoint, some of this coming from the dynamic linker plugin, and in the case of ObjC some of it coming from the ObjCRuntime plugin.

2) What if one of the instructions is a pushf / popf? You step over a pushf, then later you try to continue, and it continues over the popf, which restores the trap flag. Now LLDB is confused because it doesn't think it's single stepping, but the CPU does. How does this work?

Interesting. gdb does the same thing that lldb does except it does single step through all the instructions in the current source range. In all the years of supporting gdb I never had this come up. Maybe these instructions don't get used all that often in code you're likely to be source stepping through?

Anyway, this isn't an issue for lldb, since pushf/popf don't count as branch instructions, so we would continue over them rather than single stepping.

Hope that helps,

Jim

Let’s say you’re stopped at the first line of foo() here.

  1. void foo()

  2. {

  3. → printf(

  4. “Line 1\n”);

  5. printf(“Line 2\n”);

  6. printf(“Line 3\n”);

  7. }

When you step-over, why can’t it just say: Ok, current source line is 3. Debug info tells me that the next instruction begins on line 5. Line 5 corresponds to address 0x12345678. Put a breakpoint on 0x12345678. To account for the fact that foo() may be recursive, also save off a copy of the stack pointer. When the breakpoint is hit, stop if the stack pointer is the same or less than the saved value (depending on the definition of “less” for your architecture), otherwise don’t stop.

(Still trying to process the rest of the points in your email)

That would be easy if you knew the current source line had no internal branches, but debug information doesn't have semantic information, only address range information.

Suppose the line current line is:

     if (foo == 7) goto someLabel;

How do you know where that is going to go from source range information? You could pre-scan the line for branches and if the target is static you could figure out where they are going to go and set breakpoints on all the targets. Then of course in some languages you even have computed goto's, so you would actually have to do a full emulation of the instruction range to know what is going to happen. At which point just watching the instructions as they go past is more accurate.

Jim

1) In the original implementation, (and this is how gdb does it, BTW) lldb
single-stepped till "something interesting happened." As an optimization,
when you are doing any kind of step through source range, I changed lldb so
it runs from "instruction that could branch" to "instruction that could
branch" using breakpoints. Then when it hits an instruction that could
branch it single steps that instruction, and then figures out from where
that went what to do next.

Nice.

BTW, if it were helpful to figure out what to do next, we could store some

info (the old stack frame or whatever) when we hit a branch instruction,
and then use it when the single-step completed. I haven't needed to do
that yet, however; Jason's always been able to get the unwinder work
reliably enough not to require this.

First, we should definitely teach the Windows unwinder to fall back to
frame pointers if no debug info is present. That's an obvious win.

However, there are lots of environments (not just Windows) where unwinding
is unreliable due to third party libraries, so it'd be nice if we can get
by without unwinding.

2) If the single step pushes a frame, and we are "stepping over", lldb sets

a breakpoint on the return address and continues. When the return address
is hit (for the current frame of course since it could be hit recursively)
then we continue stepping as above.

Any objection to asking the target if the previous opcode is something
typically used for a call (x86 call, ARM bl), single step, and then load
the retaddr or link register? Is that hard to thread through? I suppose it
would fire on 32-bit x86 PIC sequences (call 0 ; pop %ebx), but that won't
hurt.

>
> I'm still trying to wrap my head around the way LLDB does things.
>
> If I understand correctly, when you run thread step-over, it enters single step mode (by setting the trap flag on the CPU). Every time the CPU traps, LLDB picks this up and asks the ThreadPlan "should i stop now?" "should i stop now?" until the ThreadPlan returns true. And part of this "should i stop now" involves generating an unwind.
>

That's close, but not quite correct.

1) In the original implementation, (and this is how gdb does it, BTW) lldb single-stepped till "something interesting happened." As an optimization, when you are doing any kind of step through source range, I changed lldb so it runs from "instruction that could branch" to "instruction that could branch" using breakpoints. Then when it hits an instruction that could branch it single steps that instruction, and then figures out from where that went what to do next.

BTW, if it were helpful to figure out what to do next, we could store some info (the old stack frame or whatever) when we hit a branch instruction, and then use it when the single-step completed. I haven't needed to do that yet, however; Jason's always been able to get the unwinder work reliably enough not to require this.
Let's say you're stopped at the first line of foo() here.

1. void foo()
2. {
3. -> printf(
4. "Line 1\n");
5. printf("Line 2\n");
6. printf("Line 3\n");
7. }

When you step-over, why can't it just say: Ok, current source line is 3. Debug info tells me that the next instruction begins on line 5. Line 5 corresponds to address 0x12345678. Put a breakpoint on 0x12345678. To account for the fact that foo() may be recursive, also save off a copy of the stack pointer. When the breakpoint is hit, stop if the stack pointer is the same or less than the saved value (depending on the definition of "less" for your architecture), otherwise don't stop.

How about:

2 for (int i=0; i<100; i++)
3 -> printf ("i = %i\n", i); //
4 printf ("this won't be executed after line 3 except for the last time\n");

If you set a breakpoint on line 4 after line 3 when you will fail to return to line 3 when single stepping.

How about:

2 -> goto carp;
3 puts("won't ever be executed");
4 carp:
5 puts("will get executed");

If you set a breakpoint at line 3 you won't stop.

Another:

2 -> throw foo();
3 puts("this will never get hit");

If you set a breakpoint at line 3 you will never hit it.

Please trust that we know what we are doing when it comes to single stepping. I am glad you are thinking about how things are done, but just be sure think about the problem in a wider scope than "the code I am thinking about is linear" and think about all sorts of single stepping and what you would expect to happen.

Also, did you get my comment about improving functions bounds in the COFF parser? If you can do this, you won't really need to do any of the unwinding stuff because the assembly unwinder will take care of it, you just need to get good function bounds for everything using any means necessary in the ObjectFileCOFF parser by making all the symbols you can. You also need to identify what parts are trampolines. For example a call to printf usually goes through a PLT entry. These are often in one place in your binary and often there are not symbols in the symbol table for these. Identifying the symbols with a name like "printf" and also making the symbol a eSymbolTypeTrampoline will allow us to not set a breakpoint on your "printf" trampoline when you say "b printf" on the command line, and it will also give function bounds to these small trampoline code sections so we can step and unwind through them.

Greg

be sure think about the problem in a wider scope than “the code I am thinking about is linear” and think about all sorts of single stepping and what you would expect to happen.

Thanks very much for helping us understand this stuff. I was also super curious how this single step stuff works and what the non-obvious (to us) cases are.

When we’re up to speed we should be able to really contribute some good stuff.

Vince