> B.
> I'm still verifying single stepping of LWPs in processes with multiple
> threads. I have an impression that something is fragile there.Ok. Let me know when you have a reproducible problem...
The problem looks similar to PT_RESUME and PT_SUSPEND (per-LWP operations).
With multiple LWPs after creation of a thread followed by raising a
signal for the tracer, a process cannot be singlestepped as one thread
apparently never starts or dies (?) and _lwp_wait() (for reasonable
value of lwpid_t: 2) returns EDEADLK.
_lwp_makecontext()
_lwp_create()
raise(signal)
_lwp_wait()
This is not restricted to PT_SETSTEP, the same happens with PT_STEP.
I will go into this rabbit hole and debug it till squashing the bug.
It will take a while, but getting understanding what's going on is
beneficial (besides profit of just correcting it).
There was filed another report for PT_RESUME... there is tension from
the community:
"Several ptrace_wait test cases fail under DEBUG+LOCKDEBUG"
http://gnats.netbsd.org/52213
> C.
> LLDB tests trigger dmesg errors (default GENERIC kernel), there are
> entries like:
> fill_vmentry: vp 0xfffffe87288967e8 error 2
> fill_vmentry: vp 0xfffffe86e1a15930 error 2
> fill_vmentry: vp 0xfffffe87047f8bd8 error 2
> fill_vmentry: vp 0xfffffe87051af7e0 error 2
> fill_vmentry: vp 0xfffffe86ef0b63f0 error 2This is DIAGNOSTIC and it is tangentially related to your favorite
friend (F_GETPATH)Let me explain what's wrong here. Getting from a file descriptor
to a vnode is always a success (if the file descriptor refers to one)
(vp is the pointer to a vnode here).
Getting from a vnode to a path is not (here you get 2 ENOENT from
vnode_to_path):1. The file is removed so there is no path (what I suspect is happening here).
2. There are more than one paths and it is not deterministic which one you get
(usually does not matter, but it does when you don't have permission to
get to the one returned but you have to the other)
3. vnode_to_path() uses the reverse-namei cache to do its deed. This can
lose in 2 different ways:
- cache eviction: not really an issue unless there is memory pressure
(still need to handle it, but infrequent).
- path component length... The dreaded NCHNAMLEN (31) constant which
is the component namelength limit for the current namei cache
implementation (we should really fix that one day).This is why I keep saying forget adding F_GETPATH unless you can make it
work reliably first
Thank you for the analysis. These reports aren't fatal to the stability
of the system. Once I will sort out the noise from tests, I will have a
closer look at this.