Stack trace from within syscall on x86 Linux

Hi Jason,

I’m running into a problem where LLDB is unable to print a stack trace while the inferior is invoking a syscall. (__kernel_vsyscall() inside getline()). Siva told me that you were knowledgeable about this topic, could you please provide me with some pointers as to how I can implement this in x86 Linux?

Thanks,
Chaoren

After some more debugging, I no longer think it’s related to the syscall. The subroutine that invokes the syscall has a strange prologue (see below) which results in a return address stored at ebp+12 instead of ebp+4, which seems to be what LLDB uses to determine if a frame is valid. Is there some way we could not rely on the return address being at ebp+4? Maybe cache esp after every call instruction? Or keep looking up the stack for a valid return address? Or should we call this an intended behavior and use something other than fgets while waiting to be attached in TestHelloWorld.

libc.so.6`__read:

→ 0xf7d9cbec <+28>: calll *%gs:0x10

→ 0xf7fdb420: pushl %ecx
0xf7fdb421: pushl %edx
0xf7fdb422: pushl %ebp
0xf7fdb423: movl %esp, %ebp
0xf7fdb425: sysenter

Hi guys, sorry for not seeing this thread yesterday.

lldb should be able to handle this function, if and only if it knows the start address for the function/symbol. The assembly instruction profiler can do the right thing with an instruction sequence like this.

The first thing to do is

(lldb) image show-unwind -a $pc

when you're stopped in the function, or

(lldb) image show-unwind -n __read

This will dump the various UnwindPlans that lldb can use for this function.

To see how the unwinder is actually walking the stack, before you step into __read, do

(lldb) log enable lldb unwind

then go into __read and try to backtrace a couple of frames.

The output may not be easy to read - but send it along and I can interpret.

J

The problem is not the __read function, but a subroutine that the __read function calls. GDB shows it as __kernel_vsyscall, but LLDB doesn’t have a name for it, I don’t think LLDB even considers it a function.

“image show-unwind -a $pc” in the subroutine shows no unwind data.

The problem is not the __read function, but a subroutine that the __read function calls. GDB shows it as __kernel_vsyscall, but LLDB doesn't have a name for it, I don't think LLDB even considers it a function.

"image show-unwind -a $pc" in the subroutine shows no unwind data.

================================================================================
-> 0xf7fdb420: pushl %ecx
    0xf7fdb421: pushl %edx
    0xf7fdb422: pushl %ebp
    0xf7fdb423: movl %esp, %ebp
(lldb) image show-unwind -a $pc
error: no unwind data found that matches '$pc'.

That's going to be a problem. lldb doesn't have a start address for this __kernel_vsyscall function. It will fall back to using an "architecture default unwind plan" which is what it uses when it doesn't know anything about the function it's stopped in. This assumes that the current function sets up the ebp frame pointer register and that the caller's saved eip and ebp values can be found on the stack right off the current ebp value.

If gdb can get the symbol name for this, we need to figure out how it's doing that and do that. If we knew the start address of this function, lldb would have no problems profiling the assembly instructions.

(lldb) si
th1/fr0 supplying caller's saved eip (8)'s location using assembly insn profiling UnwindPlan

We had an assembly language profile unwind plan for stack frame 0. (I think this log is unrelated to the __kernel_vsyscall method? I'm not sure what you're showing here.)

th1/fr0 supplying caller's register eip (8) from the stack, saved at CFA plus offset -4 [saved at 0xffffda8c]
th1/fr1 pc = 0xf7d31c73
th1/fr0 supplying caller's register ebp (6) from the live RegisterContext at frame 0
th1/fr1 fp = 0xf7cc1940

stack frame #1 has a eip of 0xf7d31c73 and a ebp of 0xf7cc1940.

th1/fr0 supplying caller's saved esp (7)'s location using assembly insn profiling UnwindPlan
th1/fr0 supplying caller's register esp (7), value is CFA plus offset 0 [value is 0xffffda90]
th1/fr1 sp = 0xffffda90

stack frame #1 has a sip of 0xffffda90.

th1/fr1 with pc value of 0xf7d31c73, symbol name is '_IO_file_underflow'

stack frame #1 is _IO_file_underflow.

th1/fr1 active row: 0x00000000f7d31b67: CFA=esp+48 => ebx=[CFA-20] ebp=[CFA-8] esi=[CFA-16] edi=[CFA-12] eip=[CFA-4]

This shows the Unwind row for stack frame #1, _IO_file_underflow, how we will retrieve the registers for stack frame #2. "CFA" is "Canonical Frame Address", a stack address which doesn't change for the entire lifetime of a function, which on x86 is the value of the stack pointer value before the CALL instruction.

This says that the CFA is the value of esp + 48. And we can find the saved ebx at CFA-20 or esp+28. etc.

th1/fr0 supplying caller's saved esp (7)'s location, cached
th1/fr1 CFA is 0xffffdac0: Register esp (7) contents are 0xffffda90, offset is 48
th1/fr1 m_cfa = 0xffffdac0
th1/fr1 initialized frame current pc is 0xf7d31c73 cfa is 0xffffdac0

th1/fr0 supplying caller's saved eip (8)'s location, cached
th1/fr0 using architectural default unwind method
th1/fr0 with pc value of 0xf7fdb420, no symbol/function name is known.
th1/fr0 0x00000000f7fdb420: CFA=ebp +8 => esp=CFA+0 ebp=[CFA-8] eip=[CFA-4]

Now we're somewhere interesting - I think the process stopped anew here. Now we're in the middle of code where we have no symbol, no start address, so the only thing we can do is try to use the architectural default unwind plan and blindly walk the stack.

th1/fr0 CFA is 0xf7cc1948: Register ebp (6) contents are 0xf7cc1940, offset is 8
th1/fr0 initialized frame current pc is 0xf7fdb420 cfa is 0xf7cc1948 using i386 default unwind plan UnwindPlan
Process 22545 stopped
* thread #1: tid = 22545, 0xf7fdb420, name = 'getline32', stop reason = instruction step into
    frame #0: 0xf7fdb420
-> 0xf7fdb420: pushl %ecx
    0xf7fdb421: pushl %edx
    0xf7fdb422: pushl %ebp
    0xf7fdb423: movl %esp, %ebp

(lldb) bt
th1/fr0 supplying caller's saved eip (8)'s location using i386 default unwind plan UnwindPlan

Yeah, using the architectural default unwind plan.

th1/fr0 supplying caller's register eip (8) from the stack, saved at CFA plus offset -4 [saved at 0xf7cc1944]
th1/fr1 pc = 0xf7cc1e08

We get a saved pc value of 0xf7cc1e08 for stack frame #1.

th1/fr0 supplying caller's saved ebp (6)'s location using i386 default unwind plan UnwindPlan
th1/fr0 supplying caller's register ebp (6) from the stack, saved at CFA plus offset -8 [saved at 0xf7cc1940]
th1/fr1 fp = 0xf7cc1940
th1/fr0 supplying caller's saved esp (7)'s location using i386 default unwind plan UnwindPlan
th1/fr0 supplying caller's register esp (7), value is CFA plus offset 0 [value is 0xf7cc1948]
th1/fr1 sp = 0xf7cc1948
th1/fr1 using architectural default unwind method

th1/fr1 had a pc of 0xf7cc1e08 which is not in executable memory but on frame 1 -- allowing it once.

It looks like 0xf7cc1e08 is bogus. But sometimes we get one bogus pc address on a stack, it happens, we'll try to ignore that.

th1/fr0 supplying caller's saved ebp (6)'s location, cached
th1/fr1 CFA is 0xf7cc1948: Register ebp (6) contents are 0xf7cc1940, offset is 8
th1/fr1 same CFA address as next frame, assuming the unwind is looping - stopping
Frame 1 invalid RegisterContext for this frame, stopping stack walk

It noticed that the frame above stack frame #1 would have the same CFA which isn't possible and it realized it was off track.

We need to modify the ObjectFileELF::ParseSymtab() to then create synthetic symbols for these things. You will need to dig around in the ELF spec and figure out how to reconstruct symbols for such things. If you do this, then everything should work. The ObjectFileMachO pulls all sorts of tricks to make sure that we create functions for everything the object file knows about and should be considered as a block of code and ObjectFileELF should do the same.

Greg

I found this bug from a year ago (https://llvm.org/bugs/show_bug.cgi?id=17384) that explains everything.

Greg, ObjectFileELF::CreateMemoryInstance is no longer just a stub. Do you know if linux-gate.so is read but not parsed, or still not read at all?