lldb_private::RegisterContext vs lldb_private::RegisterInfoInterface

Hi,

When deriving RegisterContext<OS>_<Arch>, why some platforms (Arch+OS)
are deriving it from lldb_private::RegisterContext while others are
deriving from lldb_private::RegisterInfoInterface or in other words
how to decide on the base class to derive from between those two and
what are the implications?

Thanks,
Ramana

Hi Ramana,

Looks like just a naming issue - classes derived from RegisterInfoInterface should be named as RegisterInfo<OS>_<Arch>, because they just implement a common interface to access targets's register info structures. Whereas RegisterContext relates to certain execution context and concrete frame, and implements process-specific functions, for example restoring registers state after expression evaluation.

Please, correct me anyone, if I'm wrong.

Tatyana

Seems like this class was added for testing. RegisterInfoInterface is a class that creates a common API for getting lldb_private::RegisterInfo structures.

A RegisterContext<OS>_<Arch> class uses one of these to be able to create a buffer large enough to store all registers defined in the RegisterInfoInterface and will actually read/write there registers to/from the debugged process. RegisterContext also caches registers values so they don't get read multiple times when the process hasn't resumed. A RegisterContext subclass is needed for each architecture so we can dynamically tell LLDB what the registers look like for a given architecture. It also provides abstractions by letting each register define its registers numbers for Compilers, DWARF, and generic register numbers like PC, SP, FP, return address, and flags registers. This allows the generic part of LLDB to say "I need you to give me the PC register for this thread" and we don't need to know that the register is "eip" on x86, "rip" on x86_64, "r15" on ARM. RegisterContext classes can also determine how registers are read/written: one at a time, or "get all general purpose regs" and "get all FPU regs". So if someone asks a RegisterContext to read the PC, it might go read all GPR regs and then mark them all as valid in the register context buffer cache, so if someone subsequently asks for SP, it will be already cached.

So RegisterInfoInterface defines a common way that many RegisterContext classes can inherit from in order to give out the lldb_private::RegisterInfo (which is required by all subclasses of RegisterContext) info for a register context, and RegisterContext is the one that actually will interface with the debugged process in order to read/write and cache those registers as efficiently as possible for the current program being debugged.

Thank you.

Thank you Greg for the detailed response.

Can you please also shed some light on the NativeRegisterContext. When
do we need to subclass NativeRegisterContext and (how) are they
related to RegisterContext<OS>_<Arch>?
It appears that not all architectures having
RegisterContext<OS>_<Arch> have sub classed NativeRegisterContext.

Regards,
Ramana

When supporting a new architecture, our preferred route is to modify lldb-server (a GDB server binary that supports native debugging) to support your architecture. Why? Because this gets you remote debugging for free. If you go this route, then you will subclass a lldb_private::NativeRegisterContext and that will get used by lldb-server (along with lldb_private::NativeProcessProtocol and lldb_private::NativeThreadProtocol). If you are adding a new architecture to Linux, then you will likely just need to subclass NativeRegisterContext.

The other way to go is to subclass lldb_private::Process, lldb_private::Thread and lldb_private::RegisterContext.

The nice thing about the lldb_private::Native* subclasses is that you only need to worry about native support. You can use #ifdef and use system header files, where as the non native route, those classes need to be able to debug remotely and you can't rely on system headers (lldb_private::Process, lldb_private::Thread and lldb_private::RegisterContext) since they can be compiled on any system for possibly local debugging (if current arch/vendor/os matches the current system) and remote (if you use lldb-server or another form for RPC).

I would highly suggest getting going the lldb-server route as then you can use system header files that contain the definitions of the registers and you only need to worry about the native architecture. Linux uses ptrace and has much the the common code filtered out into correct classes (posix ptrace, linux specifics, and more.

What architecture and os are you looking to support?

Greg Clayton

I recently added Hexagon Linux support to lldb-server; I did what Greg suggested below - subclassed NativeRegisterContextLinux, like the other architectures did. I also added the software breakpoint opcode to NativeProcessLinux. After that, it was just a matter of getting the register accessor functions in NativeRegisterContextLinux_hexagon.cpp correct.

Thank you Ted for your comments.

Thank you so much Greg for your comments.

What architecture and os are you looking to support?

The OS is Linux and the primary use scenario is remote debugging.
Basically http://lists.llvm.org/pipermail/lldb-dev/2017-June/012445.html
is what I am trying to achieve and unfortunately that query did not
get much attention of the members.

Thanks,
Ramana

Thank you so much Greg for your comments.

What architecture and os are you looking to support?

The OS is Linux and the primary use scenario is remote debugging.
Basically http://lists.llvm.org/pipermail/lldb-dev/2017-June/012445.html
is what I am trying to achieve and unfortunately that query did not
get much attention of the members.

Sorry about missing that. I will attempt to address this now:

I have to implement a debugger for our HW which comprises of CPU+GPU where
the GPU is coded in OpenCL and is accelerated through OpenVX API in C++
application which runs on CPU. Our requirement is we should be able to
debug the code running on both CPU and GPU simultaneously with in the same
LLDB debug session.

Interesting. There are two ways to accomplish this:
1 - Treat the CPU as one target and the GPU as another.
2 - Treat the CPU and GPU as one target

There are tricky areas for both, but for sanity I would suggest options #1.

The tricky things with solution #1 is how to manage switching the targets between the CPU and GPU when events happen (CPU stops, or GPU stops while the other is running or already stopped). We don’t have any formal “cooperative targets” yet, but we know they will exist in the future (client/server, vm code/vm debug of vm code, etc) so we will be happy to assist with questions if and when you get there.

Option #2 would be tricky as this would be the first target that has multiple architectures within one process. IF the CPU and GPU be be controlled separately, then I would go with option #1 as LLDB currently always stops all threads in a process when any thread stops. You would also need to implement different register contexts for each thread within such a target. It hasn’t been done yet, other than through the OS plug-ins that can provide extra threads to show in case you are doing some sort of user space threading.

GPU debugging is tricky since they usually don’t have a kernel or anything running on the hardware. Many examples I have seen so far will set a breakpoint in the program at some point by compiling the code with a breakpoint inserted, run to that breakpoint, and then if the user wants to continue, you recompile with breakpoints set at a later place and re-run the entire program again. Is your GPU any different? Since they will be used in an OpenCL context maybe your solution is better? We also had discussions on how to represent the various “waves” or sets of cores running the same program on the GPU. The easiest solution is to make one thread per distinct core on the GPU. The harder way would be to treat a thread as a collection of multiple cores and each variable value now can have one value per core.

We also discussed how to single step in a GPU program. Since multiple cores on the GPU are concurrently running the same program, there was discussion on how single stepping would work. If you are stepping and run into an if/then statement, do you walk through the if and the else at all times? One GPU professional was saying this is how GPU folks would want to see single stepping happen. So I think there is a lot of stuff we need to think about when debugging GPUs in general.

Looking at the mailing list archive I see that there were discussions about
this feature in LLDB here
[http://lists.llvm.org/pipermail/lldb-dev/2014-August/005074.html.](http://lists.llvm.org/pipermail/lldb-dev/2014-August/005074.html.)

What is the present status i.e. what works today and what is to be improved
of simultaneous multiple target debugging support in LLDB? Were the changes
contributed to LLDB mainstream?

So we currently have no cooperative targets in LLDB. This will be the first. We will need to discuss how hand off between the targets will occur and many other aspects. We will be sure to comment when and if you get to this point.

How can I access the material for [http://llvm.org/devmtg/2014-10/#bof5](http://llvm.org/devmtg/2014-10/#bof5)
(Future directions and features for LLDB)

Over the years we have talked about this, but it never really got into any real amount of detail and I don’t think the BoF notes will help you much.

Appreciate any help/guidance provided on the same.
I do believe approach #1 will work the best. The easiest thing you can do is to insulate LLDB from the GPU by putting it behind a GDB server boundary. Then we need to really figure out how we want to do GPU debugging. 

Hopefully this filled in your missing answers. Let me know what questions you have.

Greg

Sorry, I could not respond yesterday as I was of out of office.

Interesting. There are two ways to accomplish this:
1 - Treat the CPU as one target and the GPU as another.
2 - Treat the CPU and GPU as one target

The tricky things with solution #1 is how to manage switching the targets
between the CPU and GPU when events happen (CPU stops, or GPU stops while
the other is running or already stopped). We don't have any formal
"cooperative targets" yet, but we know they will exist in the future
(client/server, vm code/vm debug of vm code, etc) so we will be happy to
assist with questions if and when you get there.

I was going along the option #1. Would definitely post here with more
questions as I progress, thank you. Fortunately, the way OpenVX APIs
work is, after off-loading the tasks to GPU, they will wait for the
GPU to complete those tasks before continuing further. And in our
case, both CPU and GPU can be controlled separately. Given that, do
you think I still need to bother much about "cooperative targets"?

GPU debugging is tricky since they usually don't have a kernel or anything
running on the hardware. Many examples I have seen so far will set a
breakpoint in the program at some point by compiling the code with a
breakpoint inserted, run to that breakpoint, and then if the user wants to
continue, you recompile with breakpoints set at a later place and re-run the
entire program again. Is your GPU any different?

We also discussed how to single step in a GPU program. Since multiple cores
on the GPU are concurrently running the same program, there was discussion
on how single stepping would work. If you are stepping and run into an
if/then statement, do you walk through the if and the else at all times? One
GPU professional was saying this is how GPU folks would want to see single
stepping happen. So I think there is a lot of stuff we need to think about
when debugging GPUs in general.

Thanks for sharing that. Yeah, ours is a little different. Basically,
from the top level, the affinity in our case is per core of the GPU. I
am not there yet to discuss more on this.

So we currently have no cooperative targets in LLDB. This will be the first.
We will need to discuss how hand off between the targets will occur and many
other aspects. We will be sure to comment when and if you get to this point.

Thank you. Will post more when I get there.

Regards,
Ramana

Sorry, I could not respond yesterday as I was of out of office.

Interesting. There are two ways to accomplish this:
1 - Treat the CPU as one target and the GPU as another.
2 - Treat the CPU and GPU as one target

The tricky things with solution #1 is how to manage switching the targets
between the CPU and GPU when events happen (CPU stops, or GPU stops while
the other is running or already stopped). We don't have any formal
"cooperative targets" yet, but we know they will exist in the future
(client/server, vm code/vm debug of vm code, etc) so we will be happy to
assist with questions if and when you get there.

I was going along the option #1. Would definitely post here with more
questions as I progress, thank you. Fortunately, the way OpenVX APIs
work is, after off-loading the tasks to GPU, they will wait for the
GPU to complete those tasks before continuing further. And in our
case, both CPU and GPU can be controlled separately. Given that, do
you think I still need to bother much about "cooperative targets"?

If you just want to make two targets that know nothing about each other, then that is very easy. Is that what you were asking?

GPU debugging is tricky since they usually don't have a kernel or anything
running on the hardware. Many examples I have seen so far will set a
breakpoint in the program at some point by compiling the code with a
breakpoint inserted, run to that breakpoint, and then if the user wants to
continue, you recompile with breakpoints set at a later place and re-run the
entire program again. Is your GPU any different?

We also discussed how to single step in a GPU program. Since multiple cores
on the GPU are concurrently running the same program, there was discussion
on how single stepping would work. If you are stepping and run into an
if/then statement, do you walk through the if and the else at all times? One
GPU professional was saying this is how GPU folks would want to see single
stepping happen. So I think there is a lot of stuff we need to think about
when debugging GPUs in general.

Thanks for sharing that. Yeah, ours is a little different. Basically,
from the top level, the affinity in our case is per core of the GPU. I
am not there yet to discuss more on this.

ok, let me know when you are ready to ask more questions.

So we currently have no cooperative targets in LLDB. This will be the first.
We will need to discuss how hand off between the targets will occur and many
other aspects. We will be sure to comment when and if you get to this point.

Thank you. Will post more when I get there.

Sounds good.

We've done something similar in-house running on a Snapdragon with an ARM and a Hexagon DSP. We use Android Studio to debug an app on the ARM that sends work down to the Hexagon, running an app under Linux. On the Hexagon we ran lldb, and were able to debug both apps talking to each other.

Sorry, I could not respond yesterday as I was of out of office.

Interesting. There are two ways to accomplish this:
1 - Treat the CPU as one target and the GPU as another.
2 - Treat the CPU and GPU as one target

The tricky things with solution #1 is how to manage switching the targets
between the CPU and GPU when events happen (CPU stops, or GPU stops while
the other is running or already stopped). We don't have any formal
"cooperative targets" yet, but we know they will exist in the future
(client/server, vm code/vm debug of vm code, etc) so we will be happy to
assist with questions if and when you get there.

I was going along the option #1. Would definitely post here with more
questions as I progress, thank you. Fortunately, the way OpenVX APIs
work is, after off-loading the tasks to GPU, they will wait for the
GPU to complete those tasks before continuing further. And in our
case, both CPU and GPU can be controlled separately. Given that, do
you think I still need to bother much about "cooperative targets"?

If you just want to make two targets that know nothing about each other, then that is very easy. Is that what you were asking?

Probably I am not getting the significance of "cooperative targets"
for our setup at this point. Will get back after I dig a little more
deeper.

If I may ask, among the options that Greg had mentioned in the earlier
replies, which was the approach you have chosen?

We've done something similar in-house running on a Snapdragon with
an ARM and a Hexagon DSP. We use Android Studio to debug an app on
the ARM that sends work down to the Hexagon, running an app under Linux.
On the Hexagon we ran lldb, and were able to debug both apps talking to each other.

You run lldb (client) on both ARM and Hexagon? Or you run lldb-server
on the Hexagon and lldb (client) on ARM? Or something else other than
that?

We treat them as 2 separate targets. We're actually running 2 separate lldb instances - one in Android Studio, and 1 in a shell on Hexagon Linux.

Android Studio is running on the host (x86 Linux or Windows PC). We have an adb shell open to android on the ARM, and from there ssh to Linux on the Hexagon. The DSP doesn't have a direct connection to the outside world; everything goes through the ARM.

Yes, Greg, that's lldb and lldb-server running on the DSP!

Thanks for sharing those details Ted.

Regards,
Ramana