Failing LIT-based lldb-mi tests

[+lldb-dev, since this affects all bots equally]

Alexander, it looks like there is a race condition in some of the LIT-based lldb-mi tests.

For example, exec-next.test:

  # Test lldb-mi -exec-next command.

  # Check that we have a valid target created via '%lldbmi %t'.
  # CHECK: ^done

  -break-insert main
  # CHECK: ^done,bkpt={number="1"

  -exec-run
  # CHECK: ^running
  # CHECK: *stopped,reason="breakpoint-hit"

  -exec-next --thread 0
  # Check that exec-next can process the case of invalid thread ID.
  # CHECK: ^error,msg="Command 'exec-next'. Thread ID invalid"

  ...

Here we are not actually waiting for the breakpoint to be hit. Synchronous mode ensures that the lldb-mi driver waits for each command to be completed, but that only means that -exec-run only returns after the process has been launched and does not include waiting for the program to actually hit a breakpoint. So if the program runs very slowly (such as often happens on a very busy bot), the -exec-next and subsequent commands are scheduled before the breakpoint is hit. In order to fix this we either need to move any tests that schedule commands after hitting breakpoints back to Python, or we need to add a new -wait-for-breakpoint MI command for testing only to force a synchronization point.

Could you please take look?

-- adrian

In order to test my hypothesis, you could set the breakpoint in a different function than main and insert a sleep() into the program to simulate the condition on the bots, i.e.:

  void foo() {
    // actual test code below
    ...
  }

  int main(int argc, char ** argv) {
    sleep(10);
    foo();
  }

and in the test:

  # instead of main:
  -break-insert foo
  # and then :slight_smile:
  -some-command-that-would-fail-if-executed-before-breakpoint-is-hit

-- adrian

Do you agree that we need to have a timeout in the -wait-for-breakpoint command? It could have default value and can be set with a command argument.

Before we continue to discuss -wait-for-breakpoint; where you actually able to verify my suspicion that that is what is happening on the bots? Fred suggested to me offline today that in synchronous mode, perhaps -exec-* should be waiting for the process to be stopped, which would also sound like a reasonable and less invasive solution to the problem.

-- adrian

[adding lldb-dev back to the conversation]

AFAIK, there is no mechanism in lldb-mi to distinguish a command that expects a frame, so we need to modify each command manually. Am I right?
If so, I found the Process::WaitForProcessToStop method which we can add to SB API and use in lldb-mi.

сб, 11 авг. 2018 г. в 0:50, Adrian Prantl <aprantl@apple.com>:

I was wondering how this worked in the regular SBAPI that we use for all the "normal" python-based LLDB tests. The implementation of SBProcess::Continue() for example calls Process::Resume() or Process::ResumeSynchronous() depending on whether synchronous mode is set or not.
It's not immediately obvious to me whether -exec-run should wait until the process stopped before returning or whether -exec-step should wait until the process stopped before executing.

Based on a cursory reading of the sources it seems like SBTarget::Launch should block until the process stopped when it is in synchronous mode. Can you confirm this? If that is the case, can you figure out why -exec-run does not inherit this behavior?

-- adrian

One important question: what do you mean talking about “block”? Does it mean that SBTarget::Launch blocks the process and the user can’t continue working with this process until it stops?

сб, 11 авг. 2018 г. в 2:41, Adrian Prantl <aprantl@apple.com>:

Pretty much. The same way as the interactive (lldb) command line interface behaves: You enter “run” and you won’t get a prompt again until the process stops. I’m imagining (but haven’t verified) that synchronous mode behaves like that.

– adrian

It seems that lldb-mi in such a situation just hangs:

build/bin/lldb-mi --synchronous bash
(gdb)
-file-exec-and-symbols “bash”
^done
(gdb)
=library-loaded,id="/bin/bash",target-name="/bin/bash",host-name="/bin/bash",symbols-loaded=“0”,loaded_addr="-",size=“0”
-exec-run
^C^C
exit

It doesn’t react to Ctrl+C and exit command.

сб, 11 авг. 2018 г. в 2:54, Adrian Prantl <aprantl@apple.com>:

That’t interesting. Where you able to simulate the error seen on the bots by inserting a sleep() unto the debugged program?

– adrian

I reproduced the test you suggested and got following output:

build/bin/lldb-mi --synchronous a.out < llvm/tools/lldb/lit/tools/lldb-mi/exec/lldb-mi-fail.test
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x000000000000000f”,func="??",file="??",fullname="??/??",line=“0”,times=“0”,original-location=“f”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x000000000000000f”,func="??",file="??",fullname="??/??",line=“0”,times=“0”,original-location=“f”}
(gdb)
^done
(gdb)
^running
=thread-group-started,id=“i1”,pid=“5075”
(gdb)
=thread-created,id=“1”,group-id=“i1”
=thread-selected,id=“1”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/ld-2.23.so",target-name="/lib/x86_64-linux-gnu/ld-2.23.so",host-name="/lib/x86_64-linux-gnu/ld-2.23.so",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="[vdso]",target-name="[vdso]",host-name="[vdso]",symbols-loaded=“1”,symbols-path="",loaded_addr=“0x00007ffff7ffa000”,size=“0”
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
(gdb)
=thread-exited,id=“1”,group-id=“i1”
=thread-group-exited,id=“i1”,exit-code=“0”
*stopped,reason=“exited-normally”
(gdb)
^done
(gdb)
^done
(gdb)
^error,msg=“Resume timed out.”
(gdb)
^done
(gdb)

As a command that needs a breakpoint to be hit I chose the -exec-next --thread 1. It’s the same error which is seen on the bots.

сб, 11 авг. 2018 г. в 6:27, Adrian Prantl <aprantl@apple.com>:

Can you help me understand the order in which events happened in that log? I don't see the -exec-next command in the log. If you type in the commands manually, does it become obvious which ones are blocking and which ones aren't (but should be)?

-- adrian

Sure, this is the log with the typed commands:

build/bin/lldb-mi a.out
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
-exec-run
^running
=thread-group-started,id=“i1”,pid=“4410”
(gdb)
=thread-created,id=“1”,group-id=“i1”
=thread-selected,id=“1”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/ld-2.23.so",target-name="/lib/x86_64-linux-gnu/ld-2.23.so",host-name="/lib/x86_64-linux-gnu/ld-2.23.so",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="[vdso]",target-name="[vdso]",host-name="[vdso]",symbols-loaded=“1”,symbols-path="",loaded_addr=“0x00007ffff7ffa000”,size=“0”
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
(gdb)
*running,thread-id=“all”
(gdb)
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
-break-insert main
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x000000000040052f”,func=“main”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“7”,times=“0”,original-location=“main”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x000000000040052f”,func=“main”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“7”,times=“0”,original-location=“main”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x000000000040052f”,func=“main”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“7”,times=“0”,original-location=“main”}
(gdb)
-exec-next
^error,msg=“Resume timed out.”
(gdb)
(gdb)
=thread-exited,id=“1”,group-id=“i1”
=thread-group-exited,id=“i1”,exit-code=“0”
*stopped,reason=“exited-normally”
(gdb)

No one of the commands isn’t a blocking one, but I think that -exec-run should be blocking.

Oops, I made a mistake, I typed break-insert main. Here is the right output:

build/bin/lldb-mi a.out
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
-exec-run
^running
=thread-group-started,id=“i1”,pid=“6406”
(gdb)
=thread-created,id=“1”,group-id=“i1”
=thread-selected,id=“1”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/ld-2.23.so",target-name="/lib/x86_64-linux-gnu/ld-2.23.so",host-name="/lib/x86_64-linux-gnu/ld-2.23.so",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="[vdso]",target-name="[vdso]",host-name="[vdso]",symbols-loaded=“1”,symbols-path="",loaded_addr=“0x00007ffff7ffa000”,size=“0”
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
(gdb)
*running,thread-id=“all”
(gdb)
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
-break-insert func
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
-exec-next
^error,msg=“Resume timed out.”
(gdb)
(gdb)
*stopped,reason=“breakpoint-hit”,disp=“del”,bkptno=“1”,frame={level=“0”,addr=“0x0000000000400514”,func=“func”,args=[],file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”},thread-id=“1”,stopped-threads=“all”
(gdb)
(gdb)
*running,thread-id=“all”
(gdb)
(gdb)
*stopped,reason=“breakpoint-hit”,disp=“del”,bkptno=“1”,frame={level=“0”,addr=“0x0000000000400514”,func=“func”,args=[],file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”},thread-id=“1”,stopped-threads=“all”
(gdb)

Oops, I made a mistake, I typed break-insert main. Here is the right output:

build/bin/lldb-mi a.out
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
-exec-run
^running
=thread-group-started,id=“i1”,pid=“6406”
(gdb)
=thread-created,id=“1”,group-id=“i1”
=thread-selected,id=“1”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/ld-2.23.so",target-name="/lib/x86_64-linux-gnu/ld-2.23.so",host-name="/lib/x86_64-linux-gnu/ld-2.23.so",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="[vdso]",target-name="[vdso]",host-name="[vdso]",symbols-loaded=“1”,symbols-path="",loaded_addr=“0x00007ffff7ffa000”,size=“0”
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
(gdb)
*running,thread-id=“all”
(gdb)
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
(gdb)
=library-loaded,id="/lib/x86_64-linux-gnu/libc.so.6",target-name="/lib/x86_64-linux-gnu/libc.so.6",host-name="/lib/x86_64-linux-gnu/libc.so.6",symbols-loaded=“1”,symbols-path="/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so",loaded_addr="-",size=“0”
-break-insert func
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x0000000000400514”,func=“func”,file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”,times=“0”,original-location=“func”}
(gdb)
-exec-next
^error,msg=“Resume timed out.”
(gdb)
(gdb)
*stopped,reason=“breakpoint-hit”,disp=“del”,bkptno=“1”,frame={level=“0”,addr=“0x0000000000400514”,func=“func”,args=[],file=“test.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/test.c",line=“2”},thread-id=“1”,stopped-threads=“all”

Do you get control back before the breakpoint is hit? If yes, should -exec-run block until the process stops (in synchronous mode)?

Yes, I do, I’m able to pass any command to lldb-mi before the breakpoint is hit. I guess that making exec-run to block the control until the process stops might be the solution we are looking for.

Right. I believe that it should (in synchronous mode) behave like “run” in the normal lldb command interpreter and block until the process has stopped. You could look at differences in the implementation of -exec-run and “run” in the command interpreter first.

– adrian

It seems that the real problem is that sometimes lldb-mi can’t launch a process:

build/bin/lldb-mi --synchronous a.out < llvm/tools/lldb/lit/tools/lldb-mi/exec/exec-step-instruction.test
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x00000000004004df”,func=“main”,file=“main.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/main.c",line=“6”,times=“0”,original-location=“main”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x00000000004004df”,func=“main”,file=“main.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/main.c",line=“6”,times=“0”,original-location=“main”}
(gdb)
^done
(gdb)
^error,msg=“process launch failed: ‘A’ packet returned an error: -1”
(gdb)
^done
(gdb)
^done
(gdb)
^error,msg=“Command ‘exec-step-instruction’. Thread ID invalid”
(gdb)
^done
(gdb)
^done
(gdb)
^error,msg=“Command ‘exec-next-instruction’. Thread ID invalid”
(gdb)
^done
(gdb)
^done
(gdb)
^error,msg=“this SBThread object is invalid”
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)

exec-run executes fine in synchronous mode(the only bad thing here is that lldb-mi doesn’t react to any external events like SIGSTOP - ctrl+C, while lldb does). All previous logs were made in asynchronous mode and are wrong.

It seems that the real problem is that sometimes lldb-mi can’t launch a process:

build/bin/lldb-mi --synchronous a.out < llvm/tools/lldb/lit/tools/lldb-mi/exec/exec-step-instruction.test
(gdb)
-file-exec-and-symbols “a.out”
^done
(gdb)
^done
(gdb)
=library-loaded,id="/home/alexander/workspace/gsoc/a.out",target-name="/home/alexander/workspace/gsoc/a.out",host-name="/home/alexander/workspace/gsoc/a.out",symbols-loaded=“0”,loaded_addr="-",size=“0”
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done
(gdb)
^done,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x00000000004004df”,func=“main”,file=“main.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/main.c",line=“6”,times=“0”,original-location=“main”}
(gdb)
=breakpoint-modified,bkpt={number=“1”,type=“breakpoint”,disp=“keep”,enabled=“y”,addr=“0x00000000004004df”,func=“main”,file=“main.c”,fullname="/home/alexander/workspace/gsoc/llvm/tools/lldb/lit/tools/lldb-mi/exec/inputs/main.c",line=“6”,times=“0”,original-location=“main”}
(gdb)
^done
(gdb)
^error,msg=“process launch failed: ‘A’ packet returned an error: -1”

Do you think you might be able to improve the error message here and get at the underlying failure?

– adrian

After long-long debugging I found out that lldb-mi can’t successfully launch a process since sometimes it isn’t connected with something(I don’t know what is it).
I found out that it fails after if (IsConnected()) from GDBRemoteCommunication::SendPacketNoLock(llvm::StringRef payload).

Do you have any thoughts about this?