Remote Debug of AArch64 Targets Running QNX

ayushsahay · November 25, 2024, 5:20pm

This is to request comments on the proposition to support remote debug of AArch64 targets running BlackBerry’s QNX Neutrino Real-Time Operating System in LLVM’s LLDB sub-project. The intention isn’t to support a toolchain for QNX but to support just the remote debug of AArch64 targets running QNX via LLDB.

Note that the remote debug support in question will be provided for QNX 7.1 using the QNX Momentics Tool Suite.

Introduction to the QNX Neutrino Real-Time Operating System

BlackBerry’s QNX Neutrino Real-Time Operating System is a commercial Unix-like [1], POSIX-compliant [2], microkernel-based, real-time operating system primarily targeting the embedded systems market including automotive [3] [4], medical devices, robotics [5], transportation, and industrial embedded systems. [6] [7]

Why do we need support for QNX in LLDB?

Customers have requested a unified debug solution for automotive platforms using LLDB. QNX’s existing debug server (pdebug [8] [9]) doesn’t use the GDB Remote Serial Protocol; so, LLDB can’t talk to it. Our solution is to port ‘lldb-server’ to QNX.

Process/Thread control/manipulation on QNX Neutrino

‘ptrace’ isn’t available on QNX but operations that can be performed via ‘ptrace’ on other OSes can be performed via ‘devctl’ [10] on QNX.

Development status

ATM, we have 5 changes (specifically, #97417,#97439, #97487, #97536, and #97630) that look to add support for –

Launching a debuggee
Attaching to a debuggee
Having the debuggee come up stopped at the entry point
Setting breakpoints
Stopping at breakpoints
Resuming the debuggee’s execution
Single-stepping the debuggee’s execution
Interrupting the debuggee’s execution
Reading/writing contents of the debuggee’s memory
Reading/writing contents of the debuggee’s registers
Reading/writing contents of the debuggee’s variables
Dumping the debuggee’s stack trace

#97417 adds the QNX Neutrino Real-Time Operating System to llvm::Triple.

#97439 detects QNX Neutrino Real-Time Operating System targets.

#97487 provisions the QNX platform plugin.

#97536 enables the POSIX dynamic loader plugin for QNX.

#97630 provisions the QNX host and process plugins.

Future work

Running LLDB’s API tests is in progress and will be the subject of future work. The plan is to provide a builder and/or workers for post check-in verification.

Summary

We’re looking to contribute the source changes for supporting the remote debug of AArch64 targets running QNX to LLDB and maintain them, and are requesting comments on this.

ayushsahay · November 25, 2024, 5:20pm

References (I)

“QNX Neutrino compared with Unix,” [Online]. Available: https://www.qnx.com/developers/docs/7.1/index.html#com.qnx.doc.neutrino.user_guide/topic/os_intro_LikeUnix.html.
“Certifications,” [Online]. Available: https://blackberry.qnx.com/en/developers/certifications.
“Chongqing Yazaki Selects BlackBerry to Power Digital LCD Cluster for the Chinese Market,” [Online]. Available: https://www.blackberry.com/us/en/company/newsroom/press-releases/2023/chongqing-yazaki-selects-blackberry-power-digital-lcd-cluster-chinese-market.
“BlackBerry Software Is Now Embedded In Over 235 Million Vehicles,” [Online]. Available: https://www.blackberry.com/us/en/company/newsroom/press-releases/2023/blackberry-software-is-now-embedded-in-over-235-million-vehicles.
“BlackBerry Announces Collaboration with AMD to Advance Foundational Precision and Control for Robotics Industry,” [Online]. Available: https://www.blackberry.com/us/en/company/newsroom/press-releases/2024/blackberry-announces-collaboration-with-amd-to-advance-foundational-precision-and-control-for-robotics-industry.

ayushsahay · November 25, 2024, 5:25pm

References (II)

“QNX Neutrino Real-Time Operating System (RTOS),” [Online]. Available: https://blackberry.qnx.com/en/products/foundation-software/qnx-rtos.
“QNX,” [Online]. Available: https://en.wikipedia.org/wiki/QNX.
“pdebug,” [Online]. Available: https://www.qnx.com/developers/docs/7.1/index.html#com.qnx.doc.neutrino.user_guide/topic/security_pdebug.html.
“Process-level Debugger,” [Online]. Available: https://www.qnx.com/developers/docs/7.1/index.html#com.qnx.doc.neutrino.utilities/topic/p/pdebug.html.
“devctl(),devctlv(),” [Online]. Available: https://www.qnx.com/developers/docs/7.1/index.html#com.qnx.doc.neutrino.lib_ref/topic/d/devctl.html.

DavidSpickett · November 26, 2024, 10:40am

Note: I’m not the sole decision maker here but I am trying to be more clear about our platform support “rules” (as in, we need to document some). So I will make some bold statements that others may disagree with me on.

The biggest question for me here is why does this need to be upstream?

Obviously I don’t have any knowledge of your commercial pressures, but there are other options here. Tell me if you’ve assessed those:

Adding GDB protocol support to pdebug.
Teaching LLDB to speak pdebug’s protocol (take GDBRemoteCommunicationClient, make a PDebugRemoteCommunicationClient).

These require you to maintain pdebug of course, and we’d be very unlikely to accept major lldb client changes if the server side wasn’t upstream too - so I’m sure you’ve thought about it but I have to ask anyway.

My gut says that if you’re looking to move away from the legacy tools anyway, porting lldb-server is a good option.

This shouldn’t be a problem. We do generally support things with ptrace but then there’s Windows that’s completely different and that fits in fine.

My main question here is why does this need to be upstream?

(I keep meaning to write these points up as a formal document but too late for that right now )

I respect the openness and you are of course free to ask for feedback even if you weren’t going to upstream it, but we have to consider the cost of carrying this code.

I know from my own employer that sometimes we upstream things basically as a simple way to provide it to partners to build on. So I understand that angle but as a community, I do not think we should give that any weight. A contribution has to have its own merits.

Also if you’re doing this then I would say ok why not implement the full toolchain?

If the code you’re upstreaming can’t be built without a downstream toolchain, this is not good. At least with IBM’s AIX they are porting clang too.

You might have desires for partners to contribute fixes upstream as well, but I’m wary of that because 1. you could do that on a fork and 2. I don’t like agreeing to things based on future promises and 3. that will be more burden for the few reviewers we have, even if you are also helping out.

I do welcome the interest from the automotive segment in lldb, but again I don’t think our goal here is to build a user base so I wouldn’t accept it based on that (and that’s a terrible argument given that Android + iOS is in the millions already).

In some sense being upstream lowers the maintenance burden for you because we are less likely to break at least non-native code related to QNX. But don’t think it removes the burden, because if you’re not on the hook for fixing issues with it, we (at least I will) will gladly allow people to arbitrarily ignore QNX when making changes. And if that happens too many times, it will be removed.

(I doubt any of that is news to you, but if you need to quote this to your management to be crystal clear, there is it )

I am not saying that because this is a commercial OS, we would not accept it. We have already said ok in principle to an AIX port, which is also commercial.

And we support Mac OS too don’t we, but my point is more - commercial, not available on demand, runs in strange scenarios. So for AIX I need PowerPC, here I need some supported device. Mac at least I could borrow a colleague’s work laptop or go to a store and get one same day.

(also Apple invented lldb so the point is moot, but if they were proposing Mac support now, I would ask the same things)

So read the AIX thread (Port LLDB to IBM AIX), and here’s my summary of what I think are the minimum things we would need to accept this upstream:

At least 2 maintainers for the platform (llvm-project/lldb/Maintainers.rst at main · llvm/llvm-project · GitHub). Otherwise when one of you sends a PR, we will have no platform expert to review it.
API tests must be running, at least partially, before anything is accepted upstream. I personally do not want to see a thousand skipIfQNX either, so it will need to be reasonably mature (you can mitigate that by skipping categories of tests instead if you know e.g. breakpoints don’t work yet).
A buildbot for this configuration that is silent, given that no contributor has immediate access to your platform so they cannot be expected to solve issues arising on it.
- This needs to be set up as early as possible, ideally day one building whatever works using the QNX toolchain right now.
- - because it seems that engineers freely agree to adding a buildbot but management hates paying to host them
A commitment to reproduce and solve those issues yourselves, which will need to happen in the majority of cases (the community can help but when it comes to QNX details, that’s on you).
A process by which an external contributor can reproduce a QNX build/setup without having to sign license agreements, start free trials or otherwise do anything that their employer is going to refuse to do. This doesn’t have to be quick but it has to be possible. For example, I am upstreaming new AArch64 support which folks can reproduce using Linux and a free simulator from Arm or with QEMU.
- If this is not possible due to the QNX business model, you may be able to substitute this with even stronger commitments on the previous points. Essentially saying that you will never ask anyone upstream to deal with a QNX problem (but again, why be upstream at that point, is what I would ask myself in your shoes).

Look at that and consider the costs that apply regardless of whether you are upstream or downstream. You could be downstream and still report issues to us, the only difference is we wouldn’t have your code in front of us, but a public fork could solve that. We have accepted fixes in the past where we can trigger it with another target in a test case.

You can also run buildbot locally - How To Add Your Build Configuration To LLVM Buildbot Infrastructure — LLVM 20.0.0git documentation . So that’s an option if you wanted to start downstream but be able to report in a way that upstream will be receptive to.

Also think about whether you’re likely to require downstream patches regardless of how much is upstream. Again, from projects at Arm I know we’ve tried to be 100% upstream but it never actually happens.

And if you’re going to support the needs of these customers, are you going to fight the uphill battle to upstream specific new commands and API features and so on?

All this is a lot to take in and I do not mean to drown you in text here, or imply that I am dismissing this proposal. I will at some point try to make a more succinct, generalised set of guidelines that the community can agree on for future proposals.

bulbazord · December 2, 2024, 10:38pm

I’d also like to hear more about this. I think porting lldb-server is not a bad option, but it does come with at least 2 downsides (that I can think of).

lldb-server is not exactly small. A typical build of lldb-server will be a few dozen MBs at least. This is because it links against some of LLDB’s libraries (which in turn depend on LLVM libraries). That leads to the next downside.
You’ll need to build more than just LLDB for QNX, you may end up building portions of LLVM as well. I see that you are explicitly not looking to ship a toolchain, but you may end up needing to do some of that work to succeed.

Aside from David’s suggestions, another option is to write a QNX debug server from scratch. Apple platforms use a debug agent called debugserver which is small and separate from the rest of the LLDB implementation.

DavidSpickett · December 3, 2024, 11:29am

I’m pretty sure it even includes some dead code, because when I first started I would breakpoint things in the wrong process and get very confused.

(which could be fixed, or you could try LTO-ing lldb-server)

Will need at least triple recognition (not a massive amount of work). The other thing I think of is the JIT, obviously it supports AArch64 already but whether QNX uses the standard ABI (abi-aa/aapcs64/aapcs64.rst at main · ARM-software/abi-aa · GitHub) and relocations and so on.

Howto: GDB Remote Serial Protocol is not written for LLDB specifically but it’s as close as you’ll get to a guide on doing this.

You would be able to run the lldb test suite against that server, so there’d be plenty of coverage to figure out the missing parts.

As there is no proper specification or test suite for the GDB RSP, it’s just whatever lldb and gdb (and sometimes qemu) decide to do, and you pick one to model yourself against. We try to keep in sync but it doesn’t always happen.

Note that debugserver is also upstream llvm-project/lldb/tools/debugserver at main · llvm/llvm-project · GitHub. It’ll have a lot of Darwin specifics in it but maybe the main packet handling loop is enough to make a prototype from.

tedwoodward · December 14, 2024, 7:45pm

@ayushsahay works at Qualcomm, not Blackberry. We can’t do either of these, because:

We don’t have access to pdebug sources
Blackberry hasn’t released documentation on the pdebug protocol, so unless we reverse engineer it, we can’t add it to lldb. Plus, gdb-remote is got its tentacles into everything in lldb, so adding a new remote protocol is…problematic. Perhaps when the platform rework is done, we can look into that

DavidSpickett · December 16, 2024, 10:07am

Would have been good to include that up front but it’s not a problem it doesn’t change my concerns.

I agree that reverse engineering pdebug is not a solution for your purposes. So now I understand the motivation for Qualcomm to do this work and to do it in this manner.

My concerns about accepting this upstream are largely unchanged, though I do acknowledge that you have been involved with LLDB for some time already and have shipped products based on it: https://llvm.org/devmtg/2024-10/slides/lightning/Woodward-RISC-V-Debugger.pdf

• We took the upstream code and released a product based on it

So on the technical side, I expect that the code would be high quality.

However I am yet to see any public validation of LLDB on RISC-V, so you can see why adding a niche commercial operating system into the mix would increase my skepticism that I would see any for QNX either.

(RISC-V Linux is different in that you are not the sole owner of it, and work had started prior to your involvement, but it is an example I have been thinking about recently)

That said, maybe you can make the argument that QNX support will be isolated / easy to reproduce / watched closely by Qualcomm so that the frequency and cost of problems with QNX is severely reduced.

So tell me what your answers (or counter points) to these points are, and we can continue from there.

I should note that we have a variety of validation strategies going on in LLDB, and I don’t think that’s a bad thing. There are platforms that are validated per commit, per release, within llvm, outside of llvm, and so on. I think LLDB can afford to have more flexibility than other projects, in the end it’s all about the cost of a problem to the community and the platform stakeholder.

ayushsahay · January 1, 2025, 3:30pm

Hi, David!

I apologize for the delayed response. I’m afraid that I’d got caught up in one matter after another.

Thanks for reviewing the RFC! These are very pertinent questions, indeed, and I’ll try to respond to these at the earliest; I’ll need to consult the management prior to making any official commitment, and everyone’s out of office ATM.

In the meanwhile, I wanted to let you know that we’ve provisioned support for facilitating the execution of API tests downstream (on a branch that’s based off of LLVM 17.0.0). We’re employing an environment powered by an image of QNX Neutrino hosted on QQVP (Qualcomm’s QEMU based virtual platform solution which combines various models from several simulation frameworks together by means of SystemC) [“QQVP Qualcomm’s SystemC and QEMU Modelling solution,” [Online]. Available: https://www.youtube.com/watch?v=fsP7bhXvzmQ.] and the QNX Software Development Platform.

The intention is to provision a builder and/or workers that leverage said test infrastructure to provide the post check-in verification service.

Here’s a summary of the results of the tests executed thus far –

Result	Count
Unsupported	568
Passed	485
Expectedly Failed	74
Failed	9

Kindly note that tests that couldn’t be resolved (22), timed out (10), unexpectedly failed (116), or are flakey (3) have been either skipped, or marked as expected failures ATM.

Also note that the 9 unexpected failures included in the statistics are on account of intermittent failures in calls to devctl.

DavidSpickett · January 3, 2025, 10:21am

Thank you for the results, this is very encouraging.

ayushsahay · April 5, 2025, 3:15pm

Hi, David,

I recently learned that QNX has kicked off a free access to the QNX SDP for non-commercial use [1]. Additionally, they’ve released a QNX 8.0 Quick Start Target Image for Raspberry Pi 4 [2]. IIUC, then QEMU facilitates the emulation of Rasperry Pi 4 [3]. I imagine that this could ease external contributions. What do you think?

Thanks, and best regards,
Ayush

[1] Free Access to QNX SDP 8.0 for Non-Commercial Use
[2] qnx / Quick Start Target Images / Raspberry Pi 4 QNX 8.0 Quick Start Target Image · GitLab
[3] Raspberry Pi boards (raspi0, raspi1ap, raspi2b, raspi3ap, raspi3b, raspi4b) — QEMU documentation

DavidSpickett · April 7, 2025, 8:46am

Certainly a good step forward. Though as with any “non-commercial use”, everyone will have to read those terms very carefully.

Would be nice if QNX did not put those terms behind a login wall.

(to be clear, do not copy the terms here - but if you have a contact at QNX, please pass on the feedback that the situation could be improved)

Certainly if Qualcomm has licenses for QNX, this is an easier way to test things, and truly non-commercial users will enjoy it of course.

To recap, I’m not asking for a perfect test plan where everyone can reproduce instantly. As my recent survey found (RFC: Surveying LLDB's supported platforms and architectures) we support a lot of platforms, with varying levels of validation. It is tempting to set a single bar for everyone but history shows we can do well without that, and doing so is often impractical.

So here I’m looking for clarity and a plan, which may or may not be extensive. If it’s justified, then it’s fine, and we all know what to expect.

(for me at least, I’m not the only community member around here though, anyone else feel free to chime in!)

Topic		Replies	Views
Question about remote debugging protocol setup LLDB	3	119	April 11, 2014
Serial Debugging KGDB Kernel Bugs - SIGTRAP, And "serial://" Use With Plugin Shortcut LLDB	6	155	April 4, 2024
LLGS for Free/NetBSD (was: Re: [PATCH] D25756: FreeBSD ARM support for software single step.) LLDB	4	86	October 25, 2016
ARM AARCH64 and AARCH32 in the same debugger LLDB	4	73	January 13, 2015
lldb-gdbserver work LLDB	6	111	December 4, 2013