Serial Debugging KGDB Kernel Bugs - SIGTRAP, And "serial://" Use With Plugin Shortcut

I was originally going to post this at https://github.com/vadimcn/codelldb/discussions/new?category=q-a
But once I realized it looked to be bugs and not directly related to the extension, I opted to post here…
EDIT: v16.0.6


I wasn’t certain if serial support was fully incorporated in LLDB

GDB succeeds:

      {
        "name": "gdb",
        "type": "cppdbg",
        "request": "launch",
        "cwd": "/<pathTo>/linux-stable",
        "MIMode": "gdb",
        "miDebuggerPath": "/usr/bin/aarch64-linux-gnu-gdb",
        "miDebuggerServerAddress": "/dev/ttyUSB0",
        "targetArchitecture": "arm64",
        "program": "/<pathTo>/vmlinux"
      }

LLDB succeeds (using QEMU):

      {
        "name": "lldb qemu",
        "type": "lldb",
        "request": "custom",
        "targetCreateCommands": [
          "target create /<pathTo>/vmlinux"
        ],
        "processCreateCommands": [
          "gdb-remote 127.0.0.1:1234"
        ],
        "preLaunchTask": "qemu"
      }

LLDB fails (attempt from console first):
Before exiting screen, the last message was KGDB: Waiting for connection from remote gdb....

$ lldb
(lldb) platform select remote-gdb-server
  Platform: remote-gdb-server
 Connected: no
(lldb) platform connect serial:///dev/ttyUSB0
  Platform: remote-gdb-server
  Hostname: (null)
 Connected: yes
(lldb) target create <pathTo>/vmlinux
Current executable set to '/<pathTo>/vmlinux' (aarch64).
(lldb) gdb-remote serial:///dev/ttyUSB0
error: gdb-remote [<hostname>:]<portnum>
(lldb) process connect --plugin gdb-remote serial:///dev/ttyUSB0
Process 1 stopped
* thread #1, stop reason = signal SIGTRAP
    frame #0: 0xffffffc08005ed6c vmlinux`arch_local_irq_enable at irqflags.h:51:1
   48           } else {
   49                   __daif_local_irq_enable();
   50           }
-> 51   }
   52  
   53   static __always_inline void __daif_local_irq_disable(void)
   54   {

When I saw gdb-remote was a shortcut to the plugin, that made me give the full statement a try which surprised me that it accepted serial while the shortcut did not.

But as you can see, even with it finally connecting, it craps out with a SIGTRAP.

OH! Going back to vscode (I had added the launcher once I saw the full plugin statement “worked”(?)), I see it’s still spitting out output from the device:

...
Executing script: processCreateCommands
2
Stop reason: signal SIGTRAP
device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
Freeing initrd memory: 12996K
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu: 	4-...0: (0 ticks this GP) idle=138c/1/0x4000000000000000 softirq=13/13 fqs=2582
rcu: 	(detected by 1, t=5252 jiffies, g=-1171, q=1140 ncpus=8)
Sending NMI from CPU 1 to CPUs 4:
NMI backtrace for cpu 4
CPU: 4 PID: 1 Comm: swapper/0 Not tainted 6.7.9 #14
Hardware name: radxa Radxa ROCK 5 Model B/Radxa ROCK 5 Model B, BIOS 2024.04-rc3-geac52e4b 04/01/2024
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
rcu: 	4-....: (7543 ticks this GP) idle=138c/1/0x4000000000000000 softirq=13/13 fqs=7776
pc : __do_softirq+0x8c/0x22c
rcu: 	(detected by 2, t=22645 jiffies, g=-1171, q=1140 ncpus=8)
lr : ____do_softirq+0x18/0x28
Sending NMI from CPU 2 to CPUs 4:
sp : ffffffc0809fbf80
rcu: rcu_sched kthread starved for 2498 jiffies! g-1171 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=5
x29: ffffffc0809fbf90
rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
 x28: 000000000000000a
rcu: RCU grace-period kthread stack dump:
 x27: 0000000000000282
task:rcu_sched       state:I

 stack:0     pid:15    tgid:15    ppid:2      flags:0x00000008
x26: ffffffc08090d320
Call trace:
 x25: ffffffc0808687b0
 __switch_to+0xc0/0x13c
 x24: ffffffc0808850c0
 __schedule+0x414/0x4d0

 schedule+0x4c/0x78
x23: ffffff8000828000
 schedule_timeout+0xa0/0x10c
 x22: ffffffc080070714
 rcu_gp_fqs_loop+0x190/0x704
 x21: 00000000ebdb9000
 rcu_gp_kthread+0xbc/0x130

x20: ffffffc080a3b900
 kthread+0xd8/0xf0
 x19: ffffffc0800100b8
 ret_from_fork+0x10/0x20
 x18: ffffffc0809e5010
rcu: Stack dump where RCU GP kthread last ran:

Sending NMI from CPU 2 to CPUs 5:
x17: ffffffc06f64d000 x16: ffffffc0809f8000 x15: 0000000000000174
x14: 0000000002dfc431 x13: 0000000000002a8e x12: 0000000029aaaaab
x11: 0038fb672092de00 x10: ffffffc08086cc80 x9 : 0000000000000100
x8 : ffffffc06f64d000 x7 : 69203a7265707061 x6 : 6d2d656369766564
x5 : ffffffc0809881fc x4 : 02d9f83121f96cea x3 : 000000000000000f
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
x2 : 0000000000000000 x1 : ffffffc080013a90
rcu: 	4-....: (7543 ticks this GP) idle=138c/1/0x4000000000000000 softirq=13/13 fqs=13029
 x0 : 0000000000000000

And just to put it out there, I’m running from a tinyconfig as a baseline and building out for exactly what I need.
It’s currently at the stage where it gets into the initramfs and systemd/dracut timesout looking for my sdcard for root which I expected.
I thought I’d see what benefit may come from stepping through boot debug.

So while the original issue began with using serial with the gdb-remote shortcut (I gather I should post a bug with LLDB?), it has now become that SIGTRAP.

Yes this alias hard codes connect://:

(lldb) help gdb-remote
Connect to a process via remote GDB server.
If no host is specifed, localhost is assumed.
gdb-remote is an abbreviation for 'process connect --plugin gdb-remote connect://<hostname>:<port>'
Expects 'raw' input (see 'help raw-input'.)

Syntax: gdb-remote [<hostname>:]<portnum>

But we could certainly explain that better, please open an issue for it Issues · llvm/llvm-project · GitHub.

That’s what the debug server is telling lldb has happened. Or, lldb has told the server to stop the program and SIGTRAP is the generic stopped reason we get back. It’s possible that GDB does not do this or handles it implicitly in some fashion.

Are you able to continue from that point when using lldb? If so then it may just be an artifact of what lldb does on first connect.

I’m not sure how we could have explained this better. We showed the alias, which explicitly fills in the connection method, and we also gave a syntax which only allows specifing hostname and port number - nothing having to do with serial connections.

I originally would have been with you on not needing to explain it better based on what I know now, but I now tried to put myself into my own shoes when I first encountered the error.

Given I was going off of doc examples for usage, I didn’t know it was an alias and just used it as I did (altered), thinking it’d “just work”.

It wasn’t until I did help for it, that I saw it was an alias and what it was an alias for, which was great.

I think the “better error” would be to explicitly state that the scheme usage is not allowed for the alias and to refer to help for the alias for more information (or perhaps expand the “alias” command).

I otherwise may have been left scratching my head why a command (unknown to be an alias) would only allow remote targets via IP.

gdb-remote is a regular expression command. The arguments that you passed didn’t match any of the regular expressions defined by the regex command. When that happens, lldb prints the syntax string for the regex command as the error. That seemed pretty clear, we showed you exactly the command’s usage, and it’s pretty clear your arguments didn’t satisfy that usage ([hostname:]<portnum>).

The regex command doesn’t have any way to know what you were intending or why you submitted arguments that didn’t match any of its regex’s. The only way we could give the message you want here would be to add “common failing input regex’s with associated error messages” to the regex command, and then try to guess common invalid inputs and add them one by one to the command.

Given that running help command is a pretty natural next step when you get a usage error for a command, I don’t think this effort is warranted.

I think I get what you’re saying now that you mention it’s a regex (rather than just an alias).
I’m going to presume if the regex fails a match, it’ll end right there with that simple error.
Before you mentioned it was a regex, I was going with the presumption that’d just forward the parameters to the full command, which I thought could’ve been produced in the error message in the “expanded” form.

So what’s known is the required parameter of portnum, with an option for hostname: in that regex form.
So if I passed serial:///dev<...> it would’ve failed that portnum match right then and there, because the / is not a digit, and not proceeded.

Frankly, I wish I remembered how I even managed to try using the serial scheme for the alias. It likely would’ve been based off of searches. Surely anything posted would’ve shown the full-form usage. Dunno, maybe I’ll find that later…
Hmm, maybe it was Re: [lldb-dev] Serial port support in LLDB (from my history) and I just combined the elements I found in there.
Again, I knew nothing about process connect at that time and was searching for serial and gdb-remote, since gdb-remote is what I’ve been seeing in the docs, which that ML thread provided (and I see now it also has the process connect mentioned too, but again I wasn’t looking for that back then).

In any case, at least this ought to show up in searches for anyone else that may encounter this issue.
I’ll be attempting the other part of it later as I’m trying to work out a different issue (unrelated to lldb) first.