Adding support for FreeBSD kernel coredumps (and live memory lookup)

Hi,

I'm working on a FreeBSD-sponsored project aiming at improving LLDB's
support for debugging FreeBSD kernel to achieve feature parity with
KGDB. As a part of that, I'd like to improve LLDB's ability of working
with kernel coredumps ("vmcores"), plus add the ability to read kernel
memory via special character device /dev/mem.

The FreeBSD kernel supports two coredump formats that are of interest to
us:

1. The (older) "full memory" coredumps that use an ELF container.

2. The (newer) minidumps that dump only the active memory and use
a custom format.

At this point, LLDB recognizes the ELF files but doesn't handle them
correctly, and outright rejects the FreeBSD minidump format. In both
cases some additional logic is required. This is because kernel
coredumps contain physical contents of memory, and for user convenience
the debugger needs to be able to read memory maps from the physical
memory and use them to translate virtual addresses to physical
addresses.

Unless I'm mistaken, the rationale for using this format is that
coredumps are -- after all -- usually created when something goes wrong
with the kernel. In that case, we want the process for dumping core to
be as simple as possible, and coredumps need to be small enough to fit
in swap space (that's where they're being usually written).
The complexity of memory translation should then naturally fall into
userspace processes used to debug them.

FreeBSD (following Solaris and other BSDs) provides a helper libkvm
library that can be used by userspace programs to access both coredumps
and running kernel memory. Additionally, we have split the routines
related to coredumps and made them portable to other operating systems
via libfbsdvmcore [1]. We have also included a program that can convert
minidump into a debugger-compatible ELF core file.

We'd like to discuss the possible approaches to integrating this
additional functionality to LLDB. At this point, our goal is to make it
possible for LLDB to correctly read memory from coredumps and live
system.

Plan A: new FreeBSDKernel plugin

Can you give an example workflow of how these core files are used by a
developer? For some background.

Most of my experience is in userspace, the corefile is "offline" debug
and then you have "live" debug of the running process. Is that the
same here or do we have a mix since you can access some of the live
memory after the core has been dumped?

I'm wondering if a FreeBSD Kernel plugin would support these corefiles
and/or live debug, or if they are just two halves of the same
solution. Basically, would you end up with a FreeBSDKernelCoreDump and
a FreeBSDKernelLive plugin?

Can you give an example workflow of how these core files are used by a
developer? For some background.

Right now, the idea is that when the kernel crashes, the developer can
take the vmcore file use LLDB to look the kernel state up. Initially,
this means reading the "raw" memory, i.e. looking up basic symbol values
but eventually (like kGDB) we'd like to add basic support for looking up
kernel thread states.

Most of my experience is in userspace, the corefile is "offline" debug
and then you have "live" debug of the running process. Is that the
same here or do we have a mix since you can access some of the live
memory after the core has been dumped?

It's roughly the same, i.e. you either use a crash dump (i.e. saved
kernel state) or you use /dev/mem to read memory from the running
kernel.

I'm wondering if a FreeBSD Kernel plugin would support these corefiles
and/or live debug, or if they are just two halves of the same
solution. Basically, would you end up with a FreeBSDKernelCoreDump and
a FreeBSDKernelLive plugin?

I think one plugin is the correct approach here. Firstly, because
the interface for reading memory is abstracted out to a single library
and the API is the same for both cases. Secondly, because the actual
interpreting logic would also be shared.

Right now, the idea is that when the kernel crashes, the developer can
take the vmcore file use LLDB to look the kernel state up.

Thanks for the explanation. (FWIW your first email is clear now that I
read it properly but this still helped me :))

2) How to integrate "live kernel" support into the current user
interface? I don't think we should make major UI modifications to
support this specific case but I'd also like to avoid gross hacks.

Do you think it will always be one or the other, corefile or live
memory? I assume you wouldn't want to fall back to live memory because
that memory might not have been in use at the time of the core dump.
But I'm thinking about debuggers where they use the ELF file as a
quicker way to read memory. Not sure if lldb does this already but you
could steal some ideas from there if so.

Using /dev/mem as the path seems fine unless you do need some
combination of that and a corefile. Is /dev/mem format identical to
the corefile format? (probably not an issue anyway because the plugin
is what will decide how to use it)

Your plans B and C seem like they are enablement of the initial use
case but have limited scope for improvements. The gdb-remote wrapper
for example would work fine but would you hit issues where the current
FreeBSD plugin is making userspace assumptions? For example the
AArch64 Linux plugin assumes that addresses will be in certain ranges,
so if you connected it to an in kernel stub you'd probably get some
surprises.

So I agree a new plugin would make the most sense. Only reason I'd be
against it is if it added significant maintenance or build issues but
I'm not aware of any. (beyond checking for some libraries and plenty
of bits of llvm do that) And it'll be able to give the best
experience.

Do you have a plan to test this if it is an in tree plugin? Will the
corefiles take up a lot of space or would you be able to craft minimal
files just for testing?

1. The (older) "full memory" coredumps that use an ELF container.

2. The (newer) minidumps that dump only the active memory and use

a custom format.

Maybe a silly question, is the "minidumps" here the same sort of
minidump as lldb already supports
(Introduction)?
Or mini meaning small and/or sparse relative to the ELF container core
files.

I see that the minidump tests use yaml2obj to make their files, but if
you end up only needing 1 file and it would need changes to yaml2obj
probably not worth pursuing.

Having a new plugin for opening these kinds of core files seems reasonable to me. The extra dependency is unfortunate, but I guess that's what we have to work with.

Since this is still an elf file, you might still be able to use yaml2obj to create realistic-looking core files (it's support for program headers is pretty good these days).

The live kernel debugging sounds... scary. Can you explain how would this actually work? Like, what would be the supported operations? I presume you won't be able to actually "stop" the kernel, but what will you actually be able to do?

Without more details its hard for me to say whether this should be a separate plugin (or even a server, since that's what we tend to use for live debugging).

pl

Yes, it is scary. No, the system doesn't stop -- it's just a racy way
to read and write kernel memory. I don't think it's used often but I've
been told that sometimes it can be very helpful in debugging annoying
non-crash bugs, especially if they're hard to reproduce.

Interesting.

So how would this be represented in lldb? Would there be any threads, registers? Just a process with a bunch of modules ?

pl

Using GDB (kgdb) as an example - it lists a thread for every
kernel/userspace thread. For example,
...
  593 Thread 100691 (PID=20798: sleep)
sched_switch (td=0xfffffe0118579100, flags=<optimized out>)
    at /usr/home/emaste/src/freebsd-git/laptop/sys/kern/sched_ule.c:2147
...

and it can fetch per-thread register state:

(kgdb) thread 593
[Switching to thread 593 (Thread 100691)]
#0 sched_switch (td=0xfffffe0118579100, flags=<optimized out>) at
/usr/home/emaste/src/freebsd-git/laptop/sys/kern/sched_ule.c:2147
2147 cpuid = td->td_oncpu = PCPU_GET(cpuid);
(kgdb) info reg
rax <unavailable>
rbx 0x882c545e 2284606558
rcx <unavailable>
rdx <unavailable>
rsi <unavailable>
rdi <unavailable>
rbp 0xfffffe01172617d0 0xfffffe01172617d0
rsp 0xfffffe0117261708 0xfffffe0117261708
....

(kgdb) bt
#0 sched_switch (td=0xfffffe0118579100, flags=<optimized out>) at
/usr/home/emaste/src/freebsd-git/laptop/sys/kern/sched_ule.c:2147
#1 0xffffffff80ba4261 in mi_switch (flags=flags@entry=260) at
/usr/home/emaste/src/freebsd-git/laptop/sys/kern/kern_synch.c:542
#2 0xffffffff80bf428e in sleepq_switch
(wchan=wchan@entry=0xffffffff81c8db21 <nanowait+1>, pri=pri@entry=108)
    at /usr/home/emaste/src/freebsd-git/laptop/sys/kern/subr_sleepqueue.c:608
...

I am fine with a new plug-in to handle this, but I want to verify a few things first:

Can this core dump file format basically allow debugging of multiple targets? For example could you for example want to examine the kernel itself as is, but also provide a view into any of the user space processes that exist? Mach-o kernel dumps can currently do this, but I am not sure how much of this code is public. The idea was you connect to the kernel dump, but you can create new targets that represent each user space process as it's own target within LLDB. The Apple tool would vend a new GDB remote protocol for each user space process and all memory reads that are asked of this GDB remote protocol that is created for each process can be asked for memory and each instance would translate the address correctly using the TLB entries in the kernel and give the user a user space view of this process.

So the idea is connect to the kernel core file and display only the things that belong to the kernel, including all data structures and kernel threads in the target that represents the kernel. Have a way to list all of the user space processes that can have targets created so that each user space process can be debugged by a separate target in LLDB.

The natural area to do this would with a new lldb_private::Platform, or extending the existing PlatformFreeBSD. If you did a "platform select remote-freebsd", followed by a "platform connect --kernel-core-file /path/to/kernel/core.file", then the platform can be asked to list all available processes, one of which will be the kernel itself, and one process for each user space process that can have a target created for it. Then you can "process attach --pid <pid>" to attach to the kernel (we would need to make up a process ID for the kernel, and use the native process ID for all user space processes). The the new core file plug-in can be used to create a ProcessFreeBSDKernelCore instance that can be created and knows how to correctly answer all of the process questions for the targeted process.