LLDB tests getting stuck on GDBRemoteCommunicationClientTest.GetMemoryRegionInfo ?

I just did a clean build (debug) on Linux, and I noticed that the LLDB tests seem to consistently get stuck:

– Testing: 1002 tests, 12 threads –
99% [==========================================================================================================================================-] ETA: 00:00:01
lldb-Suite :: types/TestIntegerTypes.py

At this point there are a bunch of llvm-lit processes waiting and two suspicious LLDB unit tests:

ProcessGdbRemoteTests --gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfo

ProcessGdbRemoteTests --gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfoInvalidResponse

I took a quick look and they both seem to blocked on communicating with the remote:

thread #2, name = ‘ProcessGdbRemot’, stop reason = signal SIGSTOP

frame #0: 0x00007f2d216e4383 libc.so.6`__GI_select + 51

frame #1: 0x000056464a7afd6c ProcessGdbRemoteTests`SelectHelper::Select(this=0x00007f2d1eb07910) at SelectHelper.cpp:224

frame #2: 0x0000564647c24745 ProcessGdbRemoteTests`lldb_private::ConnectionFileDescriptor::BytesAvailable(this=0x000056464d563800, timeout=0x00007f2d1eb09f40, error_ptr=0x00007f2d1eb07dd0) at ConnectionFileDescriptorPosix.cpp:586

frame #3: 0x0000564647c23e58 ProcessGdbRemoteTests`lldb_private::ConnectionFileDescriptor::Read(this=0x000056464d563800, dst=0x00007f2d1eb07e00, dst_len=8192, timeout=0x00007f2d1eb09f40, status=0x00007f2d1eb07dcc, error_ptr=0x00007f2d1eb07dd0) at ConnectionFileDescriptorPosix.cpp:390

frame #4: 0x0000564647afc2ca ProcessGdbRemoteTests`lldb_private::Communication::ReadFromConnection(this=0x000056464d53e580, dst=0x00007f2d1eb07e00, dst_len=8192, timeout=0x00007f2d1eb09f40, status=0x00007f2d1eb07dcc, error_ptr=0x00007f2d1eb07dd0) at Communication.cpp:286

frame #5: 0x0000564647afbad6 ProcessGdbRemoteTests`lldb_private::Communication::Read(this=0x000056464d53e580, dst=0x00007f2d1eb07e00, dst_len=8192, timeout=0x00007f2d1eb09f40, status=0x00007f2d1eb07dcc, error_ptr=0x00007f2d1eb07dd0) at Communication.cpp:169

frame #6: 0x0000564647c3bf6a ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(this=0x000056464d53e580, packet=0x00007f2d1eb0a0e0, timeout=Timeout<std::ratio<1, 1000000> > @ 0x00007f2d1eb09f40, sync_on_timeout=true) at GDBRemoteCommunication.cpp:351

frame #7: 0x0000564647c3bca5 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunication::ReadPacket(this=0x000056464d53e580, response=0x00007f2d1eb0a0e0, timeout=Timeout<std::ratio<1, 1000000> > @ 0x00007f2d1eb09f90, sync_on_timeout=true) at GDBRemoteCommunication.cpp:301

frame #8: 0x0000564647c39c72 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponseNoLock(this=0x000056464d53e580, payload=(Data = “qSupported:xmlRegisters=i386,arm,mips”, Length = 37), response=0x00007f2d1eb0a0e0) at GDBRemoteClientBase.cpp:212

frame #9: 0x0000564647c39a23 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponse(this=0x000056464d53e580, payload=(Data = “qSupported:xmlRegisters=i386,arm,mips”, Length = 37), response=0x00007f2d1eb0a0e0, send_async=false) at GDBRemoteClientBase.cpp:176

frame #10: 0x0000564647c44e0a ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetRemoteQSupported(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:370

frame #11: 0x0000564647c4427b ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapReadSupported(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:200

frame #12: 0x0000564647c4c661 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::LoadQXferMemoryMap(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:1609

frame #13: 0x0000564647c4bb4e ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapRegionInfo(this=0x000056464d53e580, addr=16384, region=0x00007f2d1eb0a6c0) at GDBRemoteCommunicationClient.cpp:1583

frame #14: 0x0000564647c4b95d ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetMemoryRegionInfo(this=0x000056464d53e580, addr=16384, region_info=0x00007ffd8b1a8870) at GDBRemoteCommunicationClient.cpp:1558

frame #15: 0x000056464797ee25 ProcessGdbRemoteTests`operator(__closure=0x000056464d5636a8) at GDBRemoteCommunicationClientTest.cpp:339

frame #16: 0x000056464798a9d6 ProcessGdbRemoteTests`std::__invoke_impl<lldb_private::Status, GDBRemoteCommunicationClientTest_GetMemoryRegionInfoInvalidResponse_Test::TestBody()::<lambda()> >((null)=__invoke_other @ 0x00007f2d1eb0a910, __f=0x000056464d5636a8)> &&) at invoke.h:60

frame #17: 0x000056464798613c ProcessGdbRemoteTests`std::__invoke<GDBRemoteCommunicationClientTest_GetMemoryRegionInfoInvalidResponse_Test::TestBody()::<lambda()> >(__fn=0x000056464d5636a8)> &&) at invoke.h:96

frame #18: 0x00005646479c1750 ProcessGdbRemoteTests`std::thread::_Invoker<std::tuple<GDBRemoteCommunicationClientTest_GetMemoryRegionInfoInvalidResponse_Test::TestBody()::<lambda()> > >::_M_invoke<0>(this=0x000056464d5636a8, (null)=_Index_tuple<0> @ 0x00007f2d1eb0a980) const at thread:234

Any of this ring a bell?

Thanks!

I just did a clean build (debug) on Linux, and I noticed that the LLDB

tests seem to consistently get stuck:

                                                              -- Testing:

1002 tests, 12 threads --

  99%

[==========================================================================================================================================-]
ETA: 00:00:01

lldb-Suite :: types/TestIntegerTypes.py

At this point there are a bunch of llvm-lit processes waiting and two

suspicious LLDB unit tests:

ProcessGdbRemoteTests

--gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfo

ProcessGdbRemoteTests

--gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfoInvalidResponse

I took a quick look and they both seem to blocked on communicating with

the remote:

thread #2, name = 'ProcessGdbRemot', stop reason = signal SIGSTOP

These tests should have two threads communicating with each other. Can you
check what the other thread is doing?

My bet would be that fact that we are now running dotest tests concurrently
with the unittests is putting more load on the system (particularly in
debug builds), and the communication times out. You can try increasing the
timeout in GDBRemoteTestUtils.cpp:GetPacket to see if that helps.

Thanks Pavel. It doesn’t look like a timeout to me:

  1. First, the other (main) thread is just waiting on the std::future::get() on the final EXPECT_TRUE(result.get().Success())

#0 0x00007fe4bdfbb6cd in pthread_join (threadid=140620333614848, thread_return=0x0) at pthread_join.c:90

#14 0x000055b855bdf370 in std::future<lldb_private::Status>::get (this=0x7ffe4498aad0) at /usr/include/c++/7/future:796

#15 0x000055b855b8c502 in GDBRemoteCommunicationClientTest_GetMemoryRegionInfo_Test::TestBody (this=0x55b85bc195d0)

*at /usr/local/google/home/mosescu/extra/llvm/src/tools/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp:*330

  1. The part that seems interesting to me is this part of the callstack I mentioned:

frame #9: 0x0000564647c39a23 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponse(this=0x000056464d53e580, payload=(Data = “qSupported:xmlRegisters=i386,arm,mips”, Length = 37), response=0x00007f2d1eb0a0e0, send_async=false) at GDBRemoteClientBase.cpp:176

frame #10: 0x0000564647c44e0a ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetRemoteQSupported(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:370

frame #11: 0x0000564647c4427b ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapReadSupported(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:200

frame #12: 0x0000564647c4c661 ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::LoadQXferMemoryMap(this=0x000056464d53e580) at GDBRemoteCommunicationClient.cpp:1609

frame #13: 0x0000564647c4bb4e ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapRegionInfo(this=0x000056464d53e580, addr=16384, region=0x00007f2d1eb0a6c0) at GDBRemoteCommunicationClient.cpp:1583

frame #14: 0x0000564647c4b95d ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetMemoryRegionInfo(this=0x000056464d53e580, addr=16384, region_info=0x00007ffd8b1a8870) at GDBRemoteCommunicationClient.cpp:1558

frame #15: 0x000056464797ee25 ProcessGdbRemoteTests`operator(__closure=0x000056464d5636a8) at GDBRemoteCommunicationClientTest.cpp:339

It seems that the client is attempting extra communication which is not modeled in the mock HandlePacket(), so it simply hangs in there. If that’s the case I’d expect this issue to be more widespread (unless my source tree is in a broken state).

This is the fist time I looked at this part of the code so it’s possible I missed something obvious though.

PS. just a wild guess, could it be related to : rL327970: Re-land: [lldb] Use vFlash commands when writing to target's flash memory… ?

Right, I see what's going on now. Yes, you're right, the commit you mention
has added extra packets which are not handled in the mock. The reason this
is hanging for you is because you are using a debug build, which has a much
larger packet timeout (1000s i think). In the release build this passes,
because the second packet is optional and the function treats the lack of
response to the second packet as an error/not implemented. If you waited
for 15 minutes, I think you'd see the tests pass as well.

I'll have this fixed soon.

PS. just a wild guess, could it be related to : rL327970: Re-land: [lldb]

Use vFlash commands when writing to target&#039;s flash memory… ?

Thanks Pavel. It doesn't look like a timeout to me:

1. First, the other (main) thread is just waiting on the

std::future::get() on the final EXPECT_TRUE(result.get().Success())

#0 0x00007fe4bdfbb6cd in pthread_join (threadid=140620333614848,

thread_return=0x0) at pthread_join.c:90

...
#14 0x000055b855bdf370 in std::future<lldb_private::Status>::get

(this=0x7ffe4498aad0) at /usr/include/c++/7/future:796

#15 0x000055b855b8c502 in

GDBRemoteCommunicationClientTest_GetMemoryRegionInfo_Test::TestBody
(this=0x55b85bc195d0)

     at

/usr/local/google/home/mosescu/extra/llvm/src/tools/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp:330

2. The part that seems interesting to me is this part of the callstack I

mentioned:

     frame #9: 0x0000564647c39a23

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponse(this=0x000056464d53e580,
payload=(Data = "qSupported:xmlRegisters=i386,arm,mips", Length = 37),
response=0x00007f2d1eb0a0e0, send_async=false) at
GDBRemoteClientBase.cpp:176

     frame #10: 0x0000564647c44e0a

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetRemoteQSupported(this=0x000056464d53e580)
at GDBRemoteCommunicationClient.cpp:370

     frame #11: 0x0000564647c4427b

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapReadSupported(this=0x000056464d53e580)
at GDBRemoteCommunicationClient.cpp:200

     frame #12: 0x0000564647c4c661

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::LoadQXferMemoryMap(this=0x000056464d53e580)
at GDBRemoteCommunicationClient.cpp:1609

     frame #13: 0x0000564647c4bb4e

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapRegionInfo(this=0x000056464d53e580,
addr=16384, region=0x00007f2d1eb0a6c0) at
GDBRemoteCommunicationClient.cpp:1583

     frame #14: 0x0000564647c4b95d

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetMemoryRegionInfo(this=0x000056464d53e580,
addr=16384, region_info=0x00007ffd8b1a8870) at
GDBRemoteCommunicationClient.cpp:1558

     frame #15: 0x000056464797ee25

ProcessGdbRemoteTests`operator(__closure=0x000056464d5636a8) at
GDBRemoteCommunicationClientTest.cpp:339

It seems that the client is attempting extra communication which is not

modeled in the mock HandlePacket(), so it simply hangs in there. If that's
the case I'd expect this issue to be more widespread (unless my source tree
is in a broken state).

This is the fist time I looked at this part of the code so it's possible

I missed something obvious though.

> I just did a clean build (debug) on Linux, and I noticed that the LLDB
tests seem to consistently get stuck:

> --

Testing:

1002 tests, 12 threads --

> 99%

[==========================================================================================================================================-]

ETA: 00:00:01
> lldb-Suite :: types/TestIntegerTypes.py

> At this point there are a bunch of llvm-lit processes waiting and two
suspicious LLDB unit tests:

> ProcessGdbRemoteTests
--gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfo
> ProcessGdbRemoteTests

--gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfoInvalidResponse

> I took a quick look and they both seem to blocked on communicating

with

the remote:

> thread #2, name = 'ProcessGdbRemot', stop reason = signal SIGSTOP

These tests should have two threads communicating with each other. Can

you

check what the other thread is doing?

My bet would be that fact that we are now running dotest tests

concurrently

with the unittests is putting more load on the system (particularly in
debug builds), and the communication times out. You can try increasing

the

Ok, r331374 ought to fix that. The situation was a bit more complicated
then I thought, because the function was behaving differently if one builds
lldb with xml support, so i've had to update the test to work correctly in
both situations.

Right, I see what's going on now. Yes, you're right, the commit you

mention

has added extra packets which are not handled in the mock. The reason this
is hanging for you is because you are using a debug build, which has a

much

larger packet timeout (1000s i think). In the release build this passes,
because the second packet is optional and the function treats the lack of
response to the second packet as an error/not implemented. If you waited
for 15 minutes, I think you'd see the tests pass as well.

I'll have this fixed soon.

> PS. just a wild guess, could it be related to : rL327970: Re-land:

[lldb]

Use vFlash commands when writing to target&#039;s flash memory… ?

>> Thanks Pavel. It doesn't look like a timeout to me:

>> 1. First, the other (main) thread is just waiting on the
std::future::get() on the final EXPECT_TRUE(result.get().Success())

>> #0 0x00007fe4bdfbb6cd in pthread_join (threadid=140620333614848,
thread_return=0x0) at pthread_join.c:90
>> ...
>> #14 0x000055b855bdf370 in std::future<lldb_private::Status>::get
(this=0x7ffe4498aad0) at /usr/include/c++/7/future:796
>> #15 0x000055b855b8c502 in
GDBRemoteCommunicationClientTest_GetMemoryRegionInfo_Test::TestBody
(this=0x55b85bc195d0)
>> at

/usr/local/google/home/mosescu/extra/llvm/src/tools/lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp:330

>> 2. The part that seems interesting to me is this part of the callstack

I

mentioned:

>> frame #9: 0x0000564647c39a23

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponse(this=0x000056464d53e580,

payload=(Data = "qSupported:xmlRegisters=i386,arm,mips", Length = 37),
response=0x00007f2d1eb0a0e0, send_async=false) at
GDBRemoteClientBase.cpp:176
>> frame #10: 0x0000564647c44e0a

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetRemoteQSupported(this=0x000056464d53e580)

at GDBRemoteCommunicationClient.cpp:370
>> frame #11: 0x0000564647c4427b

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapReadSupported(this=0x000056464d53e580)

at GDBRemoteCommunicationClient.cpp:200
>> frame #12: 0x0000564647c4c661

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::LoadQXferMemoryMap(this=0x000056464d53e580)

at GDBRemoteCommunicationClient.cpp:1609
>> frame #13: 0x0000564647c4bb4e

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetQXferMemoryMapRegionInfo(this=0x000056464d53e580,

addr=16384, region=0x00007f2d1eb0a6c0) at
GDBRemoteCommunicationClient.cpp:1583
>> frame #14: 0x0000564647c4b95d

ProcessGdbRemoteTests`lldb_private::process_gdb_remote::GDBRemoteCommunicationClient::GetMemoryRegionInfo(this=0x000056464d53e580,

addr=16384, region_info=0x00007ffd8b1a8870) at
GDBRemoteCommunicationClient.cpp:1558
>> frame #15: 0x000056464797ee25
ProcessGdbRemoteTests`operator(__closure=0x000056464d5636a8) at
GDBRemoteCommunicationClientTest.cpp:339

>> It seems that the client is attempting extra communication which is not
modeled in the mock HandlePacket(), so it simply hangs in there. If that's
the case I'd expect this issue to be more widespread (unless my source

tree

is in a broken state).

>> This is the fist time I looked at this part of the code so it's

possible

I missed something obvious though.

>>> > I just did a clean build (debug) on Linux, and I noticed that the

LLDB

>>> tests seem to consistently get stuck:

>>> > --
Testing:
>>> 1002 tests, 12 threads --

>>> > 99%

[==========================================================================================================================================-]

>>> ETA: 00:00:01
>>> > lldb-Suite :: types/TestIntegerTypes.py

>>> > At this point there are a bunch of llvm-lit processes waiting and

two

>>> suspicious LLDB unit tests:

>>> > ProcessGdbRemoteTests
>>> --gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfo
>>> > ProcessGdbRemoteTests

--gtest_filter=GDBRemoteCommunicationClientTest.GetMemoryRegionInfoInvalidResponse

Great, thanks Pavel!