eh_frame or debug_frame

I’m actually struggling with this right now. I’m trying to implement an OS plugin so goroutines show up as threads.
The go compiler puts instruction accurate unwind info into .debug_frame, I’m not sure what (if anything) goes into eh_frame.
However lldb uses the disassembly instead of the dwarf info. The x86 unwinder assumes that all threads have the same LLDB register numbers, but other parts of the code require that the LLDB register number is < (number of registers). Goroutines only store sp and ip, so it seems I’m going to have to create a custom RegisterContext subclass to get the existing unwinder to work for goroutines.

Can't your OS plugin for the goroutines use the same sp and ip register numbers as x86_64 (instead of 0 and 1 like you might be using right now) when it reports them to lldb, and return all the other registers as "unavailable" if they're requested?

The tricky bit about living on eh_frame / debug_frame is that lldb doesn't know what kind of unwind info it is being given. Is it just for exception handling locations? Does it contain prologue setup? epilogue? Is it fully asynchronous - giving unwind details at all locations? There aren't any flags in eh_frame/debug_frame that could give us a hint about what we're working with.

Yes, I’m writing a class to do that now. It’s just not supported by any of the existing register contexts.

Go doesn’t have exception handlers, so it doesn’t write .eh_frame. Wouldn’t it make sense to use .debug_frame if .eh_frame is missing?

With my custom RegisterContext I got backtraces to work for my memory threads. But something strange is going on. I have 10 threads that should have identical traces, but the first has 5 frames, then 4, 3, 2, and the rest only have 1 frame.

There’s a log here, thread 6 is the one with the complete backtrace. https://gist.github.com/ribrdb/386fb0e555e82483d21d

Comparing thread 7 with thread 6, things seem fine up to line 627:

As Jason said, there is nothing in the EH frame or .debug_frame that says "I only have partial info that is only valid at callsites" or "I have complete unwind info". So we don't know when to trust the unwind info for frame zero. If the go compiler always generates complete .debug_frame, you should mark it somehow so we can know to trust it at all locations. By default we would set the "m_is_complete" to false, but you can set it to "true" when the language for the compile unit is Go.

Then we would need to make the unwinder always try to get unwind for frame zero and ask it if it is complete. If so, use it, else fall back to doing assembly unwind.

Greg

Go doesn't have exception handlers, so it doesn't write .eh_frame. Wouldn't it make sense to use .debug_frame if .eh_frame is missing?

Yes it does make sense.

With my custom RegisterContext I got backtraces to work for my memory threads. But something strange is going on. I have 10 threads that should have identical traces, but the first has 5 frames, then 4, 3, 2, and the rest only have 1 frame.

Yeah, you will need to trace through and see what is going wrong by debugging this.

.eh_frame is used for more than exception handlers, it is also used for
backtraces e.g. in SIGSEGV handlers. The difference is that .debug_frame
is normally not mapped, so it is not easily available.

Joerg

I thought the rule was "if you can access .debug_frame and it is
available for the IP, use it, otherwise fallback to .eh_frame".

Joerg

Go doesn't have exception handlers, so it doesn't write .eh_frame. Wouldn't it make sense to use .debug_frame if .eh_frame is missing?

We could do that. I'm surprised if go is emitting x86_64 code without eh_frame. As Joerg points out, debug_frame is great but it may not be available when an analysis tool is examining a binary. eh_frame has the benefit of always being in the binary.

With my custom RegisterContext I got backtraces to work for my memory threads. But something strange is going on. I have 10 threads that should have identical traces, but the first has 5 frames, then 4, 3, 2, and the rest only have 1 frame.

It's easiest to isolate one thread backtrace in a situation like this. For instance, looking at thread 7 in your program. (the unwind algorithms have no cross-thread information passing):

th7/fr0 initialized frame current pc is 0xdaef cfa is 0x20809feb8 using assembly insn profiling UnwindPlan

lldb is using the assembly unwind inspection for frame 0. You said that all ten threads should have the same backtrace but thread #2 is at 0x2fe8c, #3 is at 0x209a, threads 4-15 are at 0xdaef. You meant threads 4-15 should all be the same.

     th7/fr5 pc = 0x0000000000002078
     th7/fr5 fp = 0xffffffffffffffff
    th7/fr4 supplying caller's stack pointer (7) value, computed from CFA
     th7/fr5 sp = 0x000000020809ffc8
     th7/fr5 active row: 0x0000000000002050: CFA=rbp+16 => rbp=[rbp] rsp=rbp+16 rip=[rbp+8]

That's the architectural default unwind plan for x86_64 ABIs. Over in thread 6, it looks like failed to unwind past frame 5 with the assembly unwind, figured the assembly unwind was incorrect, and tried switching over to using the architectural default unwind plan:

th6/fr0 supplying caller's saved reg 6's location, cached
     th6/fr5 full unwind plan 'assembly insn profiling' has been replaced by architecture default unwind plan 'x86_64 default unwind plan' for this function from now on.
     th6/fr5 supplying caller's saved reg 16's location using x86_64 default unwind plan UnwindPlan
     th6/fr5 supplying caller's register 16 from the stack, saved at CFA plus offset -8
      th6/fr6 could not get pc value
      Frame 6 invalid RegisterContext for this frame, stopping stack walk
th6 Unwind of this thread is complete.

From this point forward main.okread() will use the arch default unwind plan which isn't going to work.

Can you try rolling back r219772 and seeing if that helps? I suspect lldb may be slowly stripping off the last frame of the unwind for each thread as it progresses.

J

PS- "bt all" works just as well as "thread backtrace all".

With gcc/clang, we've found that eh_frame and debug_frame were identical so we never bothered to read debug_frame -- on the platforms where lldb is running today, eh_frame and debug_frame are either both present or both absent. We could certainly start reading debug_frame if it is available and eh_frame isn't.

rolling back r219772 (Be more consistent about null checks for the Process and ABI in GetFullUnwindPlanForFrame) doesn’t seem to have any effect.

urgh, sorry, I wasn't paying attention to the svn log output when I copy & pasted the rev. It's this change I wanted to mention - r219247. It's going to be someone calling TryFallbackUnwindPlan(), I just added some new cases where that could be called. It may not be my most recent change (219247) but it's going to be that method which is causing the problem.

IMO the SysV AMD64 psABI blesses .eh_frame as something that programs can
rely on for unwind info, while .debug_frame is only covered by DWARF. The
way I see it, the ABI takes precedence over the previous way of storing
unwind info. On the other hand if Go never wants to coexist with any other
exception-using or stack crawling code, then maybe omitting it makes sense.

If someone has sync unwind data working for LLVM, we can experiment with
the size difference for different code sets. I would expect it to make a
difference e.g. for pure C code on many architectures. Right now, there
is not much reason with GCC and Clang to choose one or the other, as you
wrote.

Joerg

Yeah, I doubt the fully asynchronous unwind information would be very large. We're already putting the prologue unwind instructions in both eh_frame and debug_frame (which is completely unnecessary in eh_frame). For code using a frame pointer register (so we don't need to track all changes to the stack pointer), the prologue and epilogue(s) are the only instructions that need to be described. IIRC clang today doesn't describe the epilogue in eh_frame -- apparently modern gcc's are doing that. So for fp-code, gcc is basically emitting asynchronous unwind instructions in eh_frame today but without guarantees about it. If the compiler generated omit-frame-pointer code, so we need to see all stack pointer changes, that's where we'd see problems. Especially with i386 where we don't have pc-relative addressing, a common sequence of instructions is "call next-instruction; pop $ebx" which gives you the address of the pop instruction in register ebx and you can find pc-relative data by that. But if we're generating omit-frame-pointer code, that stack movement would not be described in eh_frame and the debugger won't know how to backtrace for one instruction.

It sounds like a minor thing -- but if the goal is "accurate backtraces at every instruction boundary" (which is a useful goal for a debugger), it needs to be handled. There are other cases where unwinding can be tricky but if a function uses a frame pointer register most of the complicated stuff goes away.

On Mac OS X we've stopped emitting eh_frame in almost all cases. For exception handling we have a home-grown scheme called compact unwind info that Nick came up with. It has a number of benefits over eh_frame - one is that you can index into it without scanning the entire section to find a function, two is that each function's unwind instructions are described with a single 32-bit value which describes how to restore registers off the stack and how to unwind out of the function. For typical compiler-generated code, compact unwind is able to fully represent the unwind details for exception handling. Obviously it doesn't describe things like the prologue or epilogue -- it is focused 100% on the "synchronous unwind" problem, for actual exception handling.

lldb doesn't currently read compact unwind (we're relying on the assembly inspection parser full-time right now) but it's one of my free-weekend TODOs to add a parser. The only reason I mention it is because no one seems to care about the size of eh_frame / debug_frame these days... you'd think the size of eh_frame would be something worth worrying about given that it's paged in to binaries as they execute but that doesn't seem to be the case.

J

So adding “return false” to the top of TryFallbackUnwindPlan() fixes the problem.
The call at UnwindLLDB:177, when !reg_ctx_sp->IsValid() seems to be the only one I’m hitting.

Yeah, I was afraid of that.

What I'm trying to do with this code is say "unwind using your super-super smart techniques ... but if you hit a wall, try the simplistic unwind method and see if you can get further."

The problem here is that lldb is doing the full stack walk as far as it can be walked ... but it thinks maybe switching to the architecture default unwind plan might get it further (which it does not). The switch to the arch default unwind plan is destructive - it replaces the assembly profile unwind instructions for that function - and is remembered for future stack walks. That's why your threads get progressively fewer backtraces.

I'll need to look into this and come up with a fix. I don't suppose your go binary runs on mac os x, does it? It would be great if I had a failing test program in front of me while I try to come up with a fix.

Yes, I’m using os x. You could try using the binary I uploaded in http://llvm.org/bugs/show_bug.cgi?id=21118
That will only have one go thread though.

That one doesn't seem to repo the problem. The unwinder always comes back with

(lldb) bt
* thread #1: tid = 0x4ff734, 0x000000000000201a test`main.foo(x=1) + 26 at test.go:4, stop reason = breakpoint 1.1
  * #0: 0x000000000000201a test`main.foo(x=1) + 26 at test.go:4
    #1: 0x0000000000002111 test`main.main + 49 at test.go:15
    #2: 0x000000000000d463 test`runtime.main + 243 at proc.go:63
    #3: 0x00000000000259f0 test`runtime.gosched_m + 192 at proc.c:1641
(lldb)

I can si and backtrace again and I get the same backtrace -- lldb sees that the saved pc for frame 4 would be in non-executable memory and stops the stack walk:

    th1/fr4 pc = 0x00000002080b7f98
   th1/fr3 supplying caller's saved reg 6's location using x86_64 default unwind plan UnwindPlan
   th1/fr3 supplying caller's register 6 from the stack, saved at CFA plus offset -16
    th1/fr4 fp = 0x0000000000000000
   th1/fr3 supplying caller's stack pointer (7) value, computed from CFA
    th1/fr4 sp = 0x00000002080c0010
    th1/fr4 using architectural default unwind method
    th1/fr4 pc is in a non-executable section of memory and this isn't the 2nd frame in the stack walk.
    Frame 4 invalid RegisterContext for this frame, stopping stack walk

I needed the patch you appended to http://llvm.org/bugs/show_bug.cgi?id=21118 / http://reviews.llvm.org/D5735 to run the program. Do you have llvm commit access? I'll commit the patch if you don't.

J