Responsibilities of a buildbot owner

I had a chat with Jonas earlier today and one of the things that came out was that we actually have three separate suites of tests in lldb:
  - shell
  - unit
  - api

The category that causes the most pain in general, including on the Windows lldb bot, is the API tests. The shell tests are very stable and so are all (but one) of the unit tests.

Since, as Pavel pointed out, there's not a very active community for lldb on Windows, one thing we could do is run only the shell and unit test suites on the Windows buildbot and drop the API tests. This would allow us to prevent complete bit rot by providing relatively good coverage while at the same time removing the most unstable tests from the buildbot. Then we could dispense with having to disable individual API tests when they show instability on Windows.

I drafted a patch that would do that (with the assumption that everyone would be on board):
https://reviews.llvm.org/D117267

Let me know if you disagree with this course of action or have any other concerns.

Thanks,
-Stella

Hi Stella,

This is in reference to my email on lldb-dev about setting up a LLDB window on Arm64 buildbot. We are currently working on setting up a Arm64 bot that will run only unit-tests and shell-tests. However in future we are going to be taking up LLDB on Windows Arm64 maintenance and hope to run a full featured testsuite on our buildbots. Meanwhile, as python API support is a very important LLDB feature, not running API tests will result in an incremental pile of windows specific failures which will increase engineering effort required for stabilising LLDB on windows. I have suggested reducing the number of parallel API tests on windows to see if it reduces the amount of noise generated by flaky tests.

https://reviews.llvm.org/D117363

In the case it doesnt work, I’ll take up the ownership of Windows x64 buildbot as well and try to keep noise reduced similar to what I do for LInux Arm/Arm64 LLDB bots.

Thanks!

Omair Javaid
www.linaro.org

Thanks Omair!

I’ll wait for your change to go in and we can evaluate what else might need to happen afterwards.

I’ve been running some local tests with LLDB_USE_LLDB_SERVER set to 1 and that appears to have made them more stable locally. I think we should consider defaulting to using lldb-server on Windows instead of the other way around. @Greg Clayton do you happen to know why it defaults to not using lldb-server?

Thanks,

-Stella

Thanks Omair!

I’ll wait for your change to go in and we can evaluate what else might need to happen afterwards.

I’ve been running some local tests with `LLDB_USE_LLDB_SERVER` set to 1 and that appears to have made them more stable locally. I think we should consider defaulting to using lldb-server on Windows instead of the other way around. @Greg Clayton <mailto:clayborg@gmail.com> do you happen to know why it defaults to not using lldb-server?

I do not but the golden path that we really want people to follow is to use the lldb-server to debug things. This allows remote debugging to work well in all cases instead of being just some avenue that no one tests.

Benefits of using lldb-server:
- Mac and linux have been using it since the beginning and the ProcessGDBRemote is the best supported process plug-in as it has see many different GDB remote clients and served multiple architectures really well
- We can get a packet log for tests to see what actually went wrong. When using ProcessWindows, unless we have logging on every API call and event that is generated, we have no hope of figuring any issues out. Anyone can enable a log with “log enable -f /tmp/packets.txt gdb-remote packets” and send that to someone to help figure out issues
- Dynamic register information is transferred and allows the logs to be even more useful since we know all of the registers from the register context detection packets
- Makes remote debugging possible and it works really well.

So I would highly suggest to switch over to using the lldb-server permanently if possible and I would like to see the ProcessWindows class go away in the future. The main reason is we will be able to see what is going on by checking the lldb-server logs when we have a flaky tests. I would be happy to help figure out issues on windows if I can see the packet log for a flaky test where we have one log that passes the test and one that fails it. I am quite good at looking at these logs and figuring out what is going wrong. With ProcessWindows and absolutely no logging, we have no hope of figuring any buildbot issue out unless we can reliably reproduce the issue. Also, we have a TON of testing on the lldb-server debugging since 99% of all LLDB users use it (wither lldb-server or debugserver for Darwin (macOS, iOS, tvOS, watchOS)).

So a big vote to enable this, and if all goes well, remove the ProcessWindows class and always use lldb-server from here on out if all goes well