Thread resumes with stale signal after executing InferiorCallMmap

Hi,

I am using LLDB 3.7.0 C++ API. My program stops at a certain breakpoint and if I call SBFrame::EvaluateExpression() there, when I let it go it terminates with SIG_ILL on an innocent thread. I dug up into this, and there seems to be two independent problems there, this mail is about the second one.

  1. EvaluateExpression() calls Process::CanJIT() which in turn executes mmap() on the inferior. This mmap gets SIG_ILL because execution starts at address which is 2 bytes before the very first mmap instruction. I am still looking why LLDB server decided to do that - I am pretty sure that the client asked to set the program counter to correct value.
  2. So, the thread execution terminates and the signal is recorded on Thread::m_resume_signal. This field is not cleared during Thread::RestoreThreadStateFromCheckpoint() and fires when I resume the program after breakpoint.

So, what would be the best way to deal with the situation? Should I add “resume signal” field to ThreadStateCheckpoint? Or would StopInfo be a better place for that? Or something else?

Thanks,
Eugene

Does it only happen for InferiorCallMmap, or does an expression evaluation that crashes in general set a bad signal on resume? I don't see this behavior in either case on OS X, so it may be something in the Linux support. Be interesting to figure out why it behaves this way on Linux, so whatever we do we're implementing it consistently.

Jim

Even on Linux call to InferiorCallMmap does not fail consistently. In many cases it survives. I just happened to have 100% repro on this specific breakpoint in my specific problem. I.e. the burden of investigation is on me, since I cannot share my program.

But I am not looking at this SIG_ILL yet. Whatever the problem is with mmap - the client must not carry this signal past expression evaluation. I.e. I believe that we can construct any arbitrary function that causes signal, call it from evaluate expression, and then continue would fail. I suspect that this problem might be applicable to any POSIX platform.

As it turned out, my initial analysis was incorrect. m_resume_signal is calculated from StopInfo::m_value (now I wonder why do we need two fields for that?). And after mmap call, m_stop_info on the thread is null. So, my current theory is that there is an event with SIG_ILL that is stuck in the broadcaster and is picked up and processed much later.

Even on Linux call to InferiorCallMmap does not fail consistently. In many cases it survives. I just happened to have 100% repro on this specific breakpoint in my specific problem. I.e. the burden of investigation is on me, since I cannot share my program.

But I am not looking at this SIG_ILL yet. Whatever the problem is with mmap - the client must not carry this signal past expression evaluation. I.e. I believe that we can construct any arbitrary function that causes signal, call it from evaluate expression, and then continue would fail. I suspect that this problem might be applicable to any POSIX platform.

It doesn't happen on OS X, though when it comes to signal handling in the debugger OS X is an odd fish...

As it turned out, my initial analysis was incorrect. m_resume_signal is calculated from StopInfo::m_value (now I wonder why do we need two fields for that?).

The signal that you stop with is not necessarily the one you are going to resume with. For instance, if you use "process handle SIG_SOMESIG -p 0" to tell lldb not to propagate the signal, then the resume signal will be nothing, even though the stop signal is SIG_SOMESIG.

And after mmap call, m_stop_info on the thread is null. So, my current theory is that there is an event with SIG_ILL that is stuck in the broadcaster and is picked up and processed much later.

When the expression evaluation completes, the StopInfo from the last "natural" stop should be put back in place in the thread. After all, if you hit a breakpoint, run an expression, then ask why that thread stopped, you want to see "hit a breakpoint" not "ran a function call". Sounds like that is failing somehow.

Jim

Hi,

I believe the SIGILL problem you are referring to is the problem
described in bug <https://llvm.org/bugs/show_bug.cgi?id=23659&gt;\. This
was fixed in r244875, but unfortunately, this was after 3.7 branch so
this patch did not make it there. I recommend to try the master
branch, I think this should work for you now (and do let me know if
the problem persists).

pl

Yes, that’s exactly what I see, thanks a lot!

Does it explain why I see SIGILL reappear when I let process continue after mmap execution? I.e. do I need to look into this more?

Thanks,
Eugene

Yes, that's exactly what I see, thanks a lot!

You're wellcome.

Does it explain why I see SIGILL reappear when I let process continue after
mmap execution? I.e. do I need to look into this more?

No. That could still be a bug somewhere. Feel free to look into it..

pl

There was a bug in how the thread plans that were made for the InferiorCall* functions in InferiorCallPOSIX.cpp that would cause the thread plan for running the function to get discarded too early. That was fixed in r250084, you might also try that out and see if it fixes your issue.

Jim

I tried to repro it using standard LLDB client on a simple program that I could share. Well, the problem does not exactly repro as in my case - i.e. the signal is not re-delivered to the thread. But LLDB does get confused state: the program continues but at the same time lldb shows its prompt as if the program were stopped. I.e. the recovery from signal during expression evaluation is buggy. Sorry, I am not sure that I will have time to look into it deeper - it is not a blocker for me at this time.

Sure, fair enough.

If you have made a small repro case already though, it would be great
if you could file a bug about it, so we don't forget it.

pl

Will do!