[RFC] OS Awareness in LLDB

Hi lldb-dev,

I’m a senior student at Saint Petersburg State University. The one of my possible diploma themes is “OS Awareness in LLDB”. Generally, the OS awareness extends a debugger to provide a representation of the OS threads - or tasks - and other relevant data structures, typically semaphores, mutexes, or queues.

I want to ask the community if OS awareness is interesting for LLDB users and developers? The main goal is to create some base on top of LLDB that can be extended to support awareness for different operating systems.

Also, if you have a good article or other useful information about OS awareness, please share it with me.

Thanks in advance!

Hi Alexander, are you interested in user-mode, kernel-mode debugging or both?

Fore reference, the current state of the art regarding OS-awareness debugging is debugging tools for windows (windbg & co.). This is not surprising since the tools were developed alongside Windows. Obviously they are specific to Windows, but it’s good example of how the OS-awareness might look like.

Hi Leonard,

I think it will be kernel-mode debugging since debugging an application in user mode is not an OS awareness imo. Of course, some of kernel’s modules might run in user-mode, but it will be ok I think.

Thanks for your reference, I’ll take a look at it.

Also, I found out that ARM supports OS awareness in their DS-5 debugger. They have a mechanism for adding new operating systems. All you need to do is to describe OS’ model (thread’s or task’s structure for example). I think that is how it might be done in LLDB.

I don’t totally agree with this. I think there are a lot of useful os awareness tasks in user mode. For example, you’re debugging a deadlock and want to understand the state of other mutexes, who owns them, etc. or you want to examine open file descriptors. In the case of a heap corruption you may wish to study the internal structures of your process’s heap, or even lower level, the os virtual memory page table structures.

There’s quite a lot you can still do in user mode, but definitely there is more in kernel mode. As Leonard said, try put WinDbg as a lot of this stuff already exists so it’s a good reference

Looks like I don’t completely understand what is the difference between user-mode and kernel-mode from the debugger’s point of view. Could you please explain me this?

Conceptually it’s different levels of abstraction: a user-mode debugger handles processes, threads as first class concepts. In kernel-mode (or kernel land), these are just data structures that the code (the kernel) is managing. From a more pragmatic perspective, the difference is in where the debugging hooks are implemented and what interfaces are exposed (for example a kernel mode debugger can normally “poke” around any piece of memory and it has to be aware of things like VA mappings, while a user-mode debugger is only allowed to control a limited slice of the system - ex. control a sub-process through something like ptrace)

Unless you’re specifically looking at kernel debugging I’d stay away from that. For one thing, LLDB is mostly used as an user-mode debugger so the impact of any improvements would be bigger.

Regarding the value of OS-awareness for user-mode debugging, I agree with Zach - for example windbg provides both kernel mode and user mode !locks commands. The only suggestion I’d add is to consider an expanded view of the “OS” to include runtime components which may not be technically part of what most people think of as the “OS”: user-mode loaders and high level things like std::mutex, etc.

lldb has one feature - the "Operating System Plugin" that is specifically designed for debugging threads in kernel contexts. The OS plugin allows a kernel to present it's notion of threads into lldb_private::Threads. The xnu kernel macros have an implementation of this, as do some other embedded OS'es. lldb actually gets used pretty extensively for debugging xnu - the Darwin kernel.

Kuba is adding the notion of "Frame recognizers" which can give significance to particular functions when they appear on the stack (for instance displaying the first argument of read, etc. as a file handle even if you have no debug information for it.) That's another way that you could express your understanding of the OS you are running on for debugger users. Greg wrote a data formatter for the Darwin implementation of pthread_mutex that shows the thread that has the lock and some other information like that. So data formatters are also a way lldb can express knowledge of the host OS.

Jim

So, if I understand you write, I can look at OS plugin and add a support of mutexes or memory pages for example?

чт, 1 нояб. 2018 г. в 1:05, Jim Ingham <jingham@apple.com>:

Right now, the OS plugin only supports the job of adding threads. And that makes more sense as a plugin, because for instance if you had a cooperative threading scheme that you were laying on top of the system threads in a User Space process, you can use the Operating System plugin to show you the cooperative threads. This is not an abstract example... I think it should stay with just that job.

The place where lldb holds this sort of knowledge is supposed to be in the Platform class. So for instance, to comprehend mutexes you really just need a data formatter. The trick is that it has to be a data formatter that is provided by the platform. Similarly you want to have frame recognizers for interesting lower-level calls in the system. The machinery will shortly be there to do that, but loading the particular recognizers will either need to be done by hand, or coordinated by the Platform. In general, I think most of the kinds of re-presentation you need to do to make OS objects and processes more comprehensible can be built as general mechanisms like the above. Then the Platform can coordinate providing the set of more general transformations that are appropriate to the Platform you are targeting.

Jim

As Jason pointed out, we also have the SystemRuntime Plugin. That is intended to provide extra runtime available information based on the current system. For instance, on Darwin we use it to present the libdispatch queue view of threads on the system, and to decorate threads that are doing work on behalf of some queue with the state of the thread that enqueued the work at the time the work item was enqueued.

If for instance you had a way to gather all the locks in the process (something we've been asked to do but I don't know how to do it on Darwin...), that would be the place to put that functionality.

Jim

I’m new in plugin ecosystem, so I have some misunderstanding. You wrote that to comprehend mutexes we just need a data formatter, but how can we get the raw data of all mutexes in our OS? I thought I was supposed to write a generic code that will use a user-defined (specific for each OS) way to collect all mutexes and then use some data formatter to show them.

I don’t think we are disagreeing, probably I was just being a little too terse. What I’m saying is that to PRESENT a mutex in some detail, we need an OS specific data formatter, so in that case, we just need some agent, either the Platform or the SystemRuntime (but the Platform seems better to me for this) to provide the OS specific data formatters. But we don’t need a new facility for that, just a way to dial up the right OS-specific instances. And if we can reuse more general features to present this sort of OS specific info we are better off; it reduces lldb’s surface area and provides facilities we might find other neat ways to use.

OTOH, for a runtime feature like “Gather all the mutexes currently in flight” we would need some new code. There’s nothing in lldb that does that sort of job. The SystemRuntime seems like the right place for that code to go.

Jim

The data formatter in lldb is a subsystem that allows users to define custom display options, the Platform is the agent that provides a way to work with a specific platform. I agree that we need the Platform to provide the OS specific data formatters, but also the Platform should have a functionality to find all OS mutexes, am I right?

The data formatter in lldb is a subsystem that allows users to define custom display options, the Platform is the agent that provides a way to work with a specific platform. I agree that we need the Platform to provide the OS specific data formatters, but also the Platform should have a functionality to find all OS mutexes, am I right?

Yes, that seems right to me.

Jim