LLDB/NetBSD extended set of tasks

Hello,

The contract for the LLDB port on NetBSD has been prolonged by The
NetBSD Foundation. The additional time will cover the features that were
delayed in order to address blockers that were unveiled during the work
that has been done.

I've summarized the newly finished task segment in this blog entry:

http://blog.netbsd.org/tnf/entry/ptrace_2_tasks_segment_finished

My current plan is to return to LLDB and finish the following tasks:
  I. Register context and breakpoints support on NetBSD/amd64.
II. NetBSD Threads support
III. NetBSD/i386 (32-bit x86) support.

To finalize the first goal I use LLVM/Clang/LLDB SVN rev. 296360 as the
base for my local patches. I work in pkgsrc-wip/lldb-netbsd and I
develop there local patches.

The current Test Suite status reports 267/1235 tests passed
successfully. This number of passing tests is expected to start growing
once the goals will be achieved and LLDB will be rendered into a
functional debugger on NetBSD.

Hello,

The contract for the LLDB port on NetBSD has been prolonged by The
NetBSD Foundation. The additional time will cover the features that were
delayed in order to address blockers that were unveiled during the work
that has been done.

I've summarized the newly finished task segment in this blog entry:

NetBSD Blog

My current plan is to return to LLDB and finish the following tasks:
  I. Register context and breakpoints support on NetBSD/amd64.

Halfway point status.

(LLDB developers please check the second part.)

Proper support for breakpoints is all-or-nothing, it's difficult to draw
firmly the line between "fully functional" and "implemented with bugs".
Functional software breakpoints are also the next milestone and the goal
is get tracing a simple 1-threaded program from spawning to its
termination; correctly and with all the features - as per GDB Remote
Protocol messages - in line with Linux.

Among the implemented/improved features:
- Tracee's .text section disassembling works,
- Backtracing (unwinding stack) of the tracee works,
- Listing General Purpose and Special Registers works,
- Reading General Purpose Registers works,
- Setting software breakpoint (placing it into tracee's .text segment)
works,
- Triggering software breakpoint (previously set with a debugger) works,
- Rewinding Program Counter and hiding breakpoint from disassembled
.text section code works,
- Scanning tracee's virtual address map works,
- Reading Elf Auxiliary Vector (AUXV) works.

What clearly doesn't work is unsetting software breakpoint as the
debuggee session is crashing afterwards without proper unsetting a trap
(and the child is killed by the kernel with SIGSEGV). Software
breakpoints are all-or-nothing, enabling them keeps unveiling bugs in
code fragments which already appear to work.

TODO:
- Fixing software breakpoints support,
- Special Registers (Floating Point..) reading/writing,
- Unless it will be too closely related to develop threads - Hardware
watchpoints support in line with the Linux amd64 code,

As of today the number of passing tests has been degraded. This has been
caused due the fact that LLDB endeavors to set breakpoints in every
process (on the "_start" symbol) - this blocks tracing or simulating
tracing of any process.

Not planned for this segment:
- Fixing single step trap - it may automatically get fixed with the
above features, but it's not the goal for now. This will be part of the
threads segment.
- Everything related to threads and x86 32-bit (i386) support.
- Fixing other non-blocker bugs, not related to software breakpoints,
FPR, debug registers.

My plan for April+ (+ means that it might consume some of May..) is as
follows:
- Alter the design of Resume actions to handle the whole process tasks
(next to per-thread-ones),
- Disable emitting stop reason of all stopped threads in multithreaded
software on NetBSD as it's not applicable in our thread model (as in
contrary to Linux),
- Alter stopped reason retrieving to ask process (NativeProcess), not
thread (NativeThread).
- Alter watchpoints API to call the process (NativeProcess), not thread
(NativeThread).
- Alter thread type container in process (nativeProcess) to allow to
use std::set-like containing tid (integers/lwpid_t) to store the current
image of threads.
- Support in the current thread function "0" (or "-1" according to the
GDB Remote protocol) to mark that the whole process was interrupted/no
primary thread (from a tracer point of view)

The general goal is to eliminate NetiveThreadNetBSD, because my current
feeling is that the whole purpose of keeping this class on par with
Linux is duplicating the work already done by the NetBSD kernel. The
current result is that I need to call process-global events from
NativeThreadNetBSD and I'm enforced by the generic framework to keep
this dummy struct around. My local copy of the NativeRegisterContext
class is partly affected as well.. as ptrace(2) calls are called for the
process, not for singular threads out of the process context.

This task needs proper designing and collaboration. I think I will start
with adding basic and possible support for threads in the existing
framework and once done, move on to refactoring the generic code.

Of course any help - prior April - helping (Net)BSD threads support
landing is appreciated!

Hello,

The contract for the LLDB port on NetBSD has been prolonged by The
NetBSD Foundation. The additional time will cover the features that were
delayed in order to address blockers that were unveiled during the work
that has been done.

I've summarized the newly finished task segment in this blog entry:

NetBSD Blog

My current plan is to return to LLDB and finish the following tasks:
  I. Register context and breakpoints support on NetBSD/amd64.

Halfway point status.

(LLDB developers please check the second part.)

Proper support for breakpoints is all-or-nothing, it's difficult to draw
firmly the line between "fully functional" and "implemented with bugs".
Functional software breakpoints are also the next milestone and the goal
is get tracing a simple 1-threaded program from spawning to its
termination; correctly and with all the features - as per GDB Remote
Protocol messages - in line with Linux.

Among the implemented/improved features:
- Tracee's .text section disassembling works,
- Backtracing (unwinding stack) of the tracee works,
- Listing General Purpose and Special Registers works,
- Reading General Purpose Registers works,
- Setting software breakpoint (placing it into tracee's .text segment)
works,
- Triggering software breakpoint (previously set with a debugger) works,
- Rewinding Program Counter and hiding breakpoint from disassembled
.text section code works,
- Scanning tracee's virtual address map works,
- Reading Elf Auxiliary Vector (AUXV) works.

What clearly doesn't work is unsetting software breakpoint as the
debuggee session is crashing afterwards without proper unsetting a trap
(and the child is killed by the kernel with SIGSEGV). Software
breakpoints are all-or-nothing, enabling them keeps unveiling bugs in
code fragments which already appear to work.

TODO:
- Fixing software breakpoints support,
- Special Registers (Floating Point..) reading/writing,
- Unless it will be too closely related to develop threads - Hardware
watchpoints support in line with the Linux amd64 code,

As of today the number of passing tests has been degraded. This has been
caused due the fact that LLDB endeavors to set breakpoints in every
process (on the "_start" symbol) - this blocks tracing or simulating
tracing of any process.

This is necessary so that we can read the list shared libraries loaded
by the process and set any breakpoints in them. Note that currently
(at least on Linux) we are doing it actually too late -- at this point
the constructors in the shared libraries have already executed, so we
cannot set breakpoints or debug the initialization code. I haven't yet
investigated how to fix this.

Not planned for this segment:
- Fixing single step trap - it may automatically get fixed with the
above features, but it's not the goal for now. This will be part of the
threads segment.
- Everything related to threads and x86 32-bit (i386) support.
- Fixing other non-blocker bugs, not related to software breakpoints,
FPR, debug registers.

My plan for April+ (+ means that it might consume some of May..) is as
follows:
- Alter the design of Resume actions to handle the whole process tasks
(next to per-thread-ones),
- Disable emitting stop reason of all stopped threads in multithreaded
software on NetBSD as it's not applicable in our thread model (as in
contrary to Linux),

- Alter stopped reason retrieving to ask process (NativeProcess), not
thread (NativeThread).
- Alter watchpoints API to call the process (NativeProcess), not thread
(NativeThread).
- Alter thread type container in process (nativeProcess) to allow to
use std::set-like containing tid (integers/lwpid_t) to store the current
image of threads.

We will need to discuss this in detail. I am not sure removing the
NativeThreadNetBSD class completely will is a worthwhile goal, but we
can certainly work towards making it's parent class dumber, and remove
operations that don't make sense for all users. If e.g. your
watchpoints are per-process, then we can pipe watchpoint setting code
through NativeProcessProtocol, and NativeProcessNetBSD will implement
that directly, while the linux version will delegate to the thread.
However, even in your process model each thread has a separate set of
registers, so I think it makes sense to keep the register manipulation
code there.

- Support in the current thread function "0" (or "-1" according to the
GDB Remote protocol) to mark that the whole process was interrupted/no
primary thread (from a tracer point of view)

Teaching all parts of the debugger (server is not enough, I think you
would have to make a lot of client changes as well) about
whole-process events might be a big task. I wondering whether you
wouldn't make more progress if you just fudged this and always
attributed these events to the primary thread. I think we would be in
a better position to design this properly once most of the debugger
functionality was operational for you. What kind of per-process events
are we talking about here? Is there anything more here than a signal
directed at the whole process? AFAICT, most of the stop reasons
(breakpoint, watchpoint, single step, ...) are still linked to a
specific thread even in your process model. I think you could get to a
point where lldb is very useful even without getting these events
"correct".

cheers,
pl

TODO:
- Fixing software breakpoints support,

Fixed!

267->596 of succeeded tests out of 1200+ - please scroll for details.

- Special Registers (Floating Point..) reading/writing,
- Unless it will be too closely related to develop threads - Hardware
watchpoints support in line with the Linux amd64 code,

As of today the number of passing tests has been degraded. This has been
caused due the fact that LLDB endeavors to set breakpoints in every
process (on the "_start" symbol) - this blocks tracing or simulating
tracing of any process.

This is necessary so that we can read the list shared libraries loaded
by the process and set any breakpoints in them. Note that currently
(at least on Linux) we are doing it actually too late -- at this point
the constructors in the shared libraries have already executed, so we
cannot set breakpoints or debug the initialization code. I haven't yet
investigated how to fix this.

I see.

It's interesting use-case; Right now I'm not sure how properly address it.

Thank you for your insight.

We will need to discuss this in detail. I am not sure removing the
NativeThreadNetBSD class completely will is a worthwhile goal, but we
can certainly work towards making it's parent class dumber, and remove
operations that don't make sense for all users. If e.g. your
watchpoints are per-process, then we can pipe watchpoint setting code
through NativeProcessProtocol, and NativeProcessNetBSD will implement
that directly, while the linux version will delegate to the thread.
However, even in your process model each thread has a separate set of
registers, so I think it makes sense to keep the register manipulation
code there.

I put all the threading potential challenges, each one will need to be
discussed. Refactoring is by definition cost and should be reduced to
minimum, while getting proper support on the platform. I think

Our watchpoints (debug registers) are per-thread (LWP) only.

- Support in the current thread function "0" (or "-1" according to the
GDB Remote protocol) to mark that the whole process was interrupted/no
primary thread (from a tracer point of view)

Teaching all parts of the debugger (server is not enough, I think you
would have to make a lot of client changes as well) about
whole-process events might be a big task.

I think long term this might be useful. I noted in the GDB Remote
specification that this protocol is embeddable into simulators and
low-level kernel APIs without regular threads, however it's not urgently
needed to get aboard for standard user-level debugging facilities; while
it will be useful in the general set of capabilities in future.

I wondering whether you
wouldn't make more progress if you just fudged this and always
attributed these events to the primary thread. I think we would be in
a better position to design this properly once most of the debugger
functionality was operational for you.

Agreed.

This is why the initial goal of mine is to get as far as possible
without touching the generic subsystems and get basic threading support.

What kind of per-process events
are we talking about here?

I'm mostly thinking about ResumeActions - to resume the whole process,
while being able single-stepping desired thread(s).

(We also offer PT_SYSCALL feature, but it's not needed right now in LLDB).

Is there anything more here than a signal
directed at the whole process?

single-stepping
resume thread
suspend thread

I'm evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
marks a thread for single-stepping. This code is needed to allow us to
combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.

I was thinking about ResumeActions marking which thread to
resume/suspend/singlestep, whether to emit a signal (one per global
PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.

To some certain point it might be kludged with single-thread model for
basic debugging.

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]
- Resume Thread 2 (PT_RESUME)
- Suspend Thread 3 (PT_SUSPEND)
- Set single-step Thread 2 (PT_SETSTEP)
- Set single-step Thread 4 (PT_SETSTEP)
- Clear single-step Thread 5 (PT_CLEARSTEP)
- Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the
PT_CONTINUE button at the end.

AFAICT, most of the stop reasons
(breakpoint, watchpoint, single step, ...) are still linked to a
specific thread even in your process model. I think you could get to a
point where lldb is very useful even without getting these events
"correct".

I was thinking for example about this change (it's not following the
real function name nor the prototype):

  GetStoppedReason(Thread) -> GetStoppedReason(Process,Thread)

The Linux code would easily route it to desired thread and (Net)BSD
return immediately the requested data. The need to have these functions
in NativeThread (enforced by the framework) is the only purpose I keep
them there, while there is global stopped reason on NetBSD (per-process).

cheers,
pl

Thank you for your response.

Last but not the least after getting software breakpoints to work the
obligatory Test Summary diff between:

http://netbsd.org/~kamil/lldb/check-lldb-r296360-2017-02-28.txt

and

http://netbsd.org/~kamil/lldb/check-lldb-r296360-2017-03-16.txt
(pkgsrc-wip/lldb-netbsd git rev. 2c9c8e7b56d)

TODO:
- Fixing software breakpoints support,

Fixed!

267->596 of succeeded tests out of 1200+ - please scroll for details.

- Special Registers (Floating Point..) reading/writing,
- Unless it will be too closely related to develop threads - Hardware
watchpoints support in line with the Linux amd64 code,

As of today the number of passing tests has been degraded. This has been
caused due the fact that LLDB endeavors to set breakpoints in every
process (on the "_start" symbol) - this blocks tracing or simulating
tracing of any process.

This is necessary so that we can read the list shared libraries loaded
by the process and set any breakpoints in them. Note that currently
(at least on Linux) we are doing it actually too late -- at this point
the constructors in the shared libraries have already executed, so we
cannot set breakpoints or debug the initialization code. I haven't yet
investigated how to fix this.

I see.

It's interesting use-case; Right now I'm not sure how properly address it.

Thank you for your insight.

We will need to discuss this in detail. I am not sure removing the
NativeThreadNetBSD class completely will is a worthwhile goal, but we
can certainly work towards making it's parent class dumber, and remove
operations that don't make sense for all users. If e.g. your
watchpoints are per-process, then we can pipe watchpoint setting code
through NativeProcessProtocol, and NativeProcessNetBSD will implement
that directly, while the linux version will delegate to the thread.
However, even in your process model each thread has a separate set of
registers, so I think it makes sense to keep the register manipulation
code there.

I put all the threading potential challenges, each one will need to be
discussed. Refactoring is by definition cost and should be reduced to
minimum, while getting proper support on the platform. I think

Our watchpoints (debug registers) are per-thread (LWP) only.

- Support in the current thread function "0" (or "-1" according to the
GDB Remote protocol) to mark that the whole process was interrupted/no
primary thread (from a tracer point of view)

Teaching all parts of the debugger (server is not enough, I think you
would have to make a lot of client changes as well) about
whole-process events might be a big task.

I think long term this might be useful. I noted in the GDB Remote
specification that this protocol is embeddable into simulators and
low-level kernel APIs without regular threads, however it's not urgently
needed to get aboard for standard user-level debugging facilities; while
it will be useful in the general set of capabilities in future.

I wondering whether you
wouldn't make more progress if you just fudged this and always
attributed these events to the primary thread. I think we would be in
a better position to design this properly once most of the debugger
functionality was operational for you.

Agreed.

This is why the initial goal of mine is to get as far as possible
without touching the generic subsystems and get basic threading support.

What kind of per-process events
are we talking about here?

I'm mostly thinking about ResumeActions - to resume the whole process,
while being able single-stepping desired thread(s).

(We also offer PT_SYSCALL feature, but it's not needed right now in LLDB).

Is there anything more here than a signal
directed at the whole process?

single-stepping
resume thread
suspend thread

I'm evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
marks a thread for single-stepping. This code is needed to allow us to
combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.

I was thinking about ResumeActions marking which thread to
resume/suspend/singlestep, whether to emit a signal (one per global
PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.

To some certain point it might be kludged with single-thread model for
basic debugging.

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]
- Resume Thread 2 (PT_RESUME)
- Suspend Thread 3 (PT_SUSPEND)
- Set single-step Thread 2 (PT_SETSTEP)
- Set single-step Thread 4 (PT_SETSTEP)
- Clear single-step Thread 5 (PT_CLEARSTEP)
- Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the
PT_CONTINUE button at the end.

I thought about something like this model, that's why all the step commands take a thread-id, and why there's a "thread continue" separate from "process continue"... The idea was you would make "thread ..." commands to set up the state you wanted, and then you would use "process continue" to signal the resumption of the task as a whole. These commands would of course have different meanings for no-stop debugging.

I toyed around with it for a while, but ended up deciding this would generally be too complicated for most folks to use effectively, so I didn't wire it up. Be trivial to implement, however, since all the stepping operations are "push thread plan" then wait around for somebody to continue.

But it looks like all the "whole process" events you are talking about are not stop reasons but more start actions. That makes sense, but what whole process stop events do you mean?

Jim

A process can be stopped with a signal. A signal can be emitted to:
(1) a particular thread,
(2) the whole process.

A particular thread can be stopped due to:
- [PL_EVENT_SIGNAL] being signaled (a signal emitted to the whole
process or this particular thread)
- [PL_EVENT_SUSPENDED] being suspended (PT_SUSPEND, _lwp_suspend(2) or
similar),
- [PL_EVENT_NONE] no action the whole process stopped, because of a
sibling thread that was signaled

If there was no particular thread targeted with a signal we cannot
retrieve the thread that caused interruption of the process. It differs
to FreeBSD and Linux as these systems offer always a thread that is
culprit for interruption. In this scenario we would use "currentthread =
whole process".

The GDB Remote Protocol handles it with special thread numbers 0 and -1.
(I'm not certain what's the exact difference between "all threads" and
"any thread" in the protocol).

In my local code, I'm populating all threads within the tracee
(NativeThread) with exactly the same stop reason - for the "whole
process" case. I can see - on the client side - that it prints out the
same message for each thread within the process as all of them captured
a stop action.

In Linux, it is possible to trigger multiple stop reasons on each thread
separately, on NetBSD the first one wins. LLDB offers an extension in
the GDB Remote Protocol to transfer stop reasons from all threads that
were stopped due to some occurred event. This is not applicable on
NetBSD. Faking it, this or that way, can be good enough for the first
initial and functional port, but there is in my opinion technical dept
over the port.

This can be kludged and I can set as the current thread (the one that
caused interruption) the previously used one or the first one in the list.

I'm evaluating it from the point of view of a tracee with 10.000 threads
and getting efficient debugging experience. This is why I would ideally
reduce NativeThread to a container that is sorted, hashale box of
integers (lwpid_t) and shut down stopped reason extension called for
each stopped in debuggee.

But first things first, I need to make it functional with dummy
solutions first.

And yes, I actually want to be able to debug 10.000 LWPs within a debugger.

The main consumer of thread stop reasons is the execution control (ThreadPlans - which handle stepping & function calling - and StopInfo::PerformAction which handles breakpoint/watchpoint hits). The only bad effect of populating all the threads with the whole process signals is if any of the plans did anything special with that signal. Some of the thread plans do care about a few signals, but those are mostly SIGSEGV and the like (the function calling plans care about this.) I can't see what it would mean to send a whole process SIGSEGV, however, that seems like it is always going to be a thread specific thing. Ditto for however you see a breakpoint hit (SIGTRAP?) Those really have to be thread specific...

I can't think of anything else this would really affect, so going forward with your "process => all threads" fiction is probably fine for a first pass.

Jim

What kind of per-process events
are we talking about here?

I’m mostly thinking about ResumeActions - to resume the whole process,
while being able single-stepping desired thread(s).

(We also offer PT_SYSCALL feature, but it’s not needed right now in LLDB).

Is there anything more here than a signal
directed at the whole process?

single-stepping
resume thread
suspend thread

I’m evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
marks a thread for single-stepping. This code is needed to allow us to
combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.

I was thinking about ResumeActions marking which thread to
resume/suspend/singlestep, whether to emit a signal (one per global
PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.

To some certain point it might be kludged with single-thread model for
basic debugging.

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]

  • Resume Thread 2 (PT_RESUME)
  • Suspend Thread 3 (PT_SUSPEND)
  • Set single-step Thread 2 (PT_SETSTEP)
  • Set single-step Thread 4 (PT_SETSTEP)
  • Clear single-step Thread 5 (PT_CLEARSTEP)
  • Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the
PT_CONTINUE button at the end.

None of this is really NetBSD-specific, except the whole-process signal at the end (which I am going to ignore for now). I mean, the implementation of it is different, but there is no reason why someone would not want to perform the same set of actions on Linux for instance. I think most of the work here should be done on the client. Then, when the user issues the final “continue”, the client sends something like $vCont;s:2;s:4;c:5. Then it’s up to the server to figure out how execute these actions. On NetBSD it would execute the operations you mention above, while on linux it would do something like ptrace(PTRACE_SINGLESTEP, 2); ptrace(PTRACE_SINGLESTEP, 4); ptrace(PTRACE_CONTINUE, 5); (linux lldb-server already supports this actually, although you may have a hard time convincing the client to send a packet like that).

So I don’t believe there will be any sweeping changes necessary to support this in the future. If I understand it correctly, you are working on the server now. All you need to do there is to make sure you translate the set of actions in the packet to the proper sequence of ptrace calls. You can even write lldb-server-style tests for that. Then, we can discuss what would be the best user-level interface to specify complex actions like this, and teach the client to send these packets.

AFAICT, most of the stop reasons
(breakpoint, watchpoint, single step, …) are still linked to a
specific thread even in your process model. I think you could get to a
point where lldb is very useful even without getting these events
“correct”.

I was thinking for example about this change (it’s not following the
real function name nor the prototype):

GetStoppedReason(Thread) → GetStoppedReason(Process,Thread)

The Linux code would easily route it to desired thread and (Net)BSD
return immediately the requested data. The need to have these functions
in NativeThread (enforced by the framework) is the only purpose I keep
them there, while there is global stopped reason on NetBSD (per-process).

Ok, I think we can talk about tweaks like that once you have something upstream. Right now it does not seem to me like that should pose a big development obstacle.

In my local code, I’m populating all threads within the tracee
(NativeThread) with exactly the same stop reason - for the “whole
process” case. I can see - on the client side - that it prints out the
same message for each thread within the process as all of them captured
a stop action.

Indeed, that can be a nuissance. The whole-process events is probably the first thing we should look at after the port is operational. I think this can be handled independently of the fancy resume actions we talk about above, which as Jim pointed out, would be very hard for users to comprehend anyway.

I’m evaluating it from the point of view of a tracee with 10.000 threads
and getting efficient debugging experience. This is why I would ideally
reduce NativeThread to a container that is sorted, hashale box of
integers (lwpid_t) and shut down stopped reason extension called for
each stopped in debuggee.

I wouldn’t worry too much about the performance of this part of the code. If you get to the point where you debug a process with ten thousand threads, I think you’ll find that there are other things which are causing performance problems.

Thank you for your analysis.

I can check siginfo(2) of each signal and verify it. ptrace(2)
(PT_GET_SIGINFO) gives me destination of a signal:
specific-thread/all-threads. The siginfo(2) structures gives signal's
source/reason. For example si_code can contain SI_USER for kill(2),
SI_QUEUE for sigqueue(2) etc.

Hypothetically someone would pass SIGSEGV manually for the whole process
- eg. using the kill(1) command - but it's rather anomaly and I don't
expect a debugger to have a defined behavior for such events. I would
just pass that sort of signals to the child and ignore in the
MonitorCallback code.

The NetBSD version of raise(3) uses _lwp_kill(2) a LWP-specific kill(2)
to emit a signal to a specified thread within the same process. Just for
sake of curiosity - the FreeBSD LLDB code breaks (assert(3) fires) after
calling "raise(SIGTRAP)" from the child.

On 16 March 2017 at 21:43, Kamil Rytarowski <n54@gmx.com

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]
- Resume Thread 2 (PT_RESUME)
- Suspend Thread 3 (PT_SUSPEND)
- Set single-step Thread 2 (PT_SETSTEP)
- Set single-step Thread 4 (PT_SETSTEP)
- Clear single-step Thread 5 (PT_CLEARSTEP)
- Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the

What kind of per-process events
are we talking about here?

I'm mostly thinking about ResumeActions - to resume the whole process,
while being able single-stepping desired thread(s).

(We also offer PT_SYSCALL feature, but it's not needed right now in LLDB).

Is there anything more here than a signal
directed at the whole process?

single-stepping
resume thread
suspend thread

I'm evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
marks a thread for single-stepping. This code is needed to allow us to
combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.

I was thinking about ResumeActions marking which thread to
resume/suspend/singlestep, whether to emit a signal (one per global
PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.

To some certain point it might be kludged with single-thread model for
basic debugging.
_CONTINUE button at the end.

None of this is really NetBSD-specific, except the whole-process signal
at the end (which I am going to ignore for now). I mean, the
implementation of it is different, but there is no reason why someone
would not want to perform the same set of actions on Linux for instance.
I think most of the work here should be done on the client. Then, when
the user issues the final "continue", the client sends something like
$vCont;s:2;s:4;c:5. Then it's up to the server to figure out how execute
these actions. On NetBSD it would execute the operations you mention
above, while on linux it would do something like
ptrace(PTRACE_SINGLESTEP, 2); ptrace(PTRACE_SINGLESTEP, 4);
ptrace(PTRACE_CONTINUE, 5); (linux lldb-server already supports this
actually, although you may have a hard time convincing the client to
send a packet like that).

Right. I also don't expect the LLDB client to export so fine-grained
commands for a user. I don't think that it would be appropriate in C-API
either. Something similar to "set scheduler-lock on" from GDB, with
single-step option sounds fine for my purposes.

I was thinking about a division between setting thread plan and resuming
the execution. We will be back to it once I will be working on threads.

So I don't believe there will be any sweeping changes necessary to
support this in the future. If I understand it correctly, you are
working on the server now. All you need to do there is to make sure you
translate the set of actions in the packet to the proper sequence of
ptrace calls. You can even write lldb-server-style tests for that. Then,
we can discuss what would be the best user-level interface to specify
complex actions like this, and teach the client to send these packets.

I see, makes sense.

AFAICT, most of the stop reasons
(breakpoint, watchpoint, single step, ...) are still linked to a
specific thread even in your process model. I think you could get to a
point where lldb is very useful even without getting these events
"correct".

I was thinking for example about this change (it's not following the
real function name nor the prototype):

  GetStoppedReason(Thread) -> GetStoppedReason(Process,Thread)

The Linux code would easily route it to desired thread and (Net)BSD
return immediately the requested data. The need to have these functions
in NativeThread (enforced by the framework) is the only purpose I keep
them there, while there is global stopped reason on NetBSD (per-process).

Ok, I think we can talk about tweaks like that once you have something
upstream. Right now it does not seem to me like that should pose a big
development obstacle.

It might be similar with hardware assisted watchpoints, but let's
discuss it later.

    In my local code, I'm populating all threads within the tracee
    (NativeThread) with exactly the same stop reason - for the "whole
    process" case. I can see - on the client side - that it prints out the
    same message for each thread within the process as all of them captured
    a stop action.

Indeed, that can be a nuissance. The whole-process events is probably
the first thing we should look at after the port is operational. I think
this can be handled independently of the fancy resume actions we talk
about above, which as Jim pointed out, would be very hard for users to
comprehend anyway.

Acknowledged.

    I'm evaluating it from the point of view of a tracee with 10.000 threads
    and getting efficient debugging experience. This is why I would ideally
    reduce NativeThread to a container that is sorted, hashale box of
    integers (lwpid_t) and shut down stopped reason extension called for
    each stopped in debuggee.
     
I wouldn't worry too much about the performance of this part of the
code. If you get to the point where you debug a process with ten
thousand threads, I think you'll find that there are other things which
are causing performance problems.

I focused on massively threaded applications since the beginning of my
work on ptrace(2). While I expressed my motivations, I don't set this as
my current goal.

The FreeBSD platform is substantially the same as NetBSD except this
nuisance with the whole-process signal. There are mostly accidents with
diverged API.

I would like to upstream all of my local code by the end of this month,
once I will finish FPR accessors and watchpoints. I need to rebase my
local branch to the current trunk and polish for upstream quality. In
this iteration of the Native Process Plugin/Thread/Register NetBSD I
will skip the concept of more than 1 thread and x86 32-bit support.

As of today LLDB/NetBSD is close to be useful as a debugger for a real
job. Breakpoints, single-stepping, backtracing etc work fine.

I keep locally (in pkgsrc-wip[1]) a patchset altering or adding 27
files. Their total length is 3284 lines. I wish I emptied it before
moving on to threads.

[1] https://github.com/NetBSD/pkgsrc-wip/tree/master/lldb-netbsd/patches

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]
- Resume Thread 2 (PT_RESUME)
- Suspend Thread 3 (PT_SUSPEND)
- Set single-step Thread 2 (PT_SETSTEP)
- Set single-step Thread 4 (PT_SETSTEP)
- Clear single-step Thread 5 (PT_CLEARSTEP)
- Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the
PT_CONTINUE button at the end.

None of this is really NetBSD-specific, except the whole-process signal at the end (which I am going to ignore for now). I mean, the implementation of it is different, but there is no reason why someone would not want to perform the same set of actions on Linux for instance. I think most of the work here should be done on the client. Then, when the user issues the final "continue", the client sends something like $vCont;s:2;s:4;c:5. Then it's up to the server to figure out how execute these actions. On NetBSD it would execute the operations you mention above, while on linux it would do something like ptrace(PTRACE_SINGLESTEP, 2); ptrace(PTRACE_SINGLESTEP, 4); ptrace(PTRACE_CONTINUE, 5); (linux lldb-server already supports this actually, although you may have a hard time convincing the client to send a packet like that).

The big problem with this sequence is non-stop mode. To continue thread 5 while threads 2 and 4 are stepping, and thread 3 is stopped is not legal using all-stop mode. lldb only supports non-stop mode in the gdb-remote communications layer; the guts of the debugger do not support it, and could get very confused when threads 2 and 4 stop, but thread 5 is still running.

I think these are actually two different concepts you are conflating here.
The non-stop mode is about "what happens when an event happens on one
thread" -- on all-stop we stop all threads (that happen to be running), in
non-stop we only stop the affected thread. Lldb handles the first, but not
the second. However, this is different from deciding which threads will get
a chance to run in the first place. The sequence of commands we talked
about above is probably not possible, but you can certainly choose to run
only a subset of threads (e.g. with SBThread::Suspend), and this should
work fine in current lldb.