Enabling tests on the Windows LLVM buildbot

Hi all,

I’m going to be submitting a change shortly to enable “ninja check-lldb” on the upstream Windows lldb buildbot. For now this is an experiment to see how well this will go, but I would eventually like this to become permanent. As with build breakages, the bot needs to stay green, so here’s what I’m thinking:

  1. If your change breaks the Windows buildbot, please check to see if it’s something obvious. Did you use a Python string instead of a bytes? Did you hardcode /dev/null instead of using a portable approach? Did you call a function like getpid() in a test which doesn’t exist on Windows? Did you hardcode _Z at the beginning of a symbol name instead of using a mangling-aware approach? Clear errors in patches should be fixed quickly or reverted and resubmitted after being fixed.

  2. If you can’t identify why it’s broken and/or need help debugging and testing on Windows, please revert the patch in a timely manner and ask me or Adrian for help.

  3. If the test cannot be written in a way that will work on Windows (e.g. requires pexpect, uses an unsupported debugger feature like watchpoints, etc), then xfail or skip the test.

  4. In some cases the test might be flaky. If your patch appears to have nothing to do with the failure message you’re seeing in the log file, it might be flaky. Let it run again and see if it clears up.

Again, this is just an experiment, so I may turn this off if it doesn’t end up being a net positive.

HI,

I am glad to see more automated testing of lldb. I think it's very
valuable as a lot of people don't have access to that platform.

Hi all,

I'm going to be submitting a change shortly to enable "ninja check-lldb" on
the upstream Windows lldb buildbot. For now this is an experiment to see
how well this will go, but I would eventually like this to become permanent.
As with build breakages, the bot needs to stay green, so here's what I'm
thinking:

1. If your change breaks the Windows buildbot, please check to see if it's
something obvious. Did you use a Python string instead of a bytes? Did you
hardcode /dev/null instead of using a portable approach? Did you call a
function like getpid() in a test which doesn't exist on Windows? Did you
hardcode _Z at the beginning of a symbol name instead of using a
mangling-aware approach? Clear errors in patches should be fixed quickly or
reverted and resubmitted after being fixed.

2. If you can't identify why it's broken and/or need help debugging and
testing on Windows, please revert the patch in a timely manner and ask me or
Adrian for help.

3. If the test cannot be written in a way that will work on Windows (e.g.
requires pexpect, uses an unsupported debugger feature like watchpoints,
etc), then xfail or skip the test.

Sounds reasonable. I'd like to add a clarifying point (2.5): If you
have added a new test, and this test fails on some other platform AND
there is no reason to believe that this is due to a problem in the
test (like the python3 bytes thingy, etc.), then you can just xfail
the test for the relevant architecture is fine. The typical situation
I'm thinking of here is person A fixing a bug in code specific to
platform X and adding a platform-agnostic test, which exposes a
similar bug in platform Y. If all the existing tests pass then the new
patch is definitely not making the situation worse, while taking the
patch out would leave platform X broken (and we do want to encourage
people to write tests for bugs they fix). In this case, I think a more
appropriate course of action would be notifying the platform
maintainer (email, filing a bug, ...) and providing the background on
what is the test attempting to do and any other insight you might have
into why it could be broken.

What do you think?

4. In some cases the test might be flaky. If your patch appears to have
nothing to do with the failure message you're seeing in the log file, it
might be flaky. Let it run again and see if it clears up.

I'm curious if you have done any measurements about what the ratio of
flaky builds for your platform is. I am currently inching towards
doing the same thing for the linux buildbot as well (*). I've gotten
it down to about 2--3 flaky builds per week, which I consider an
acceptable state, given the circumstances, but I'm going to continue
tracking down all the other issues as well. So, I'm asking this, as I
think we should have some common standard of what is considered to be
acceptable buildbot behaviour. In any case, I'm interested to see how
the experiment turns out.

(*) My current plan for this is end of june, when I get back from
holiday, so I can keep a close eye on it.

cheers,
pl

That sounds reasonable to me.

I hope to re-enable tests on the FreeBSD buildbot shortly as well. I
have a "temporary" build-only buildbot I put into service when the
previous ones needed to be decommissioned.

Since FreeBSD's currently the only platform still using the old-style
POSIX in-process debug support it's quite likely we could run into a
failure when a test is added. I'd prefer to have the test marked XFAIL
on FreeBSD with a bug report (or at least a post to the mailing list)
than for it to be backed out pending investigation.

A bit of a tangent but for reference, on FreeBSD 10 I currently see
the following set of undesired test results:

ERROR: test_with_run_command_dwarf
(functionalities/data-formatter/data-formatter-stl/libstdcpp/string/TestDataFormatterStdString.py)
ERROR: test_with_run_command_dwarf
(functionalities/data-formatter/data-formatter-stl/libstdcpp/list/TestDataFormatterStdList.py)
ERROR: test_with_run_command_dwarf
(functionalities/data-formatter/data-formatter-stl/libstdcpp/iterator/TestDataFormatterStdIterator.py)
ERROR: [EXCEPTIONAL EXIT 10 (SIGBUS)] test_python_os_plugin_dwarf
(functionalities/plugins/python_os_plugin/TestPythonOSPlugin.py)
UNEXPECTED SUCCESS: test_and_run_command_dwarf
(lang/c/register_variables/TestRegisterVariables.py)
UNEXPECTED SUCCESS: test_and_run_command_dwarf
(lang/c/const_variables/TestConstVariables.py)
TIMEOUT: test_asm_int_3
(functionalities/breakpoint/debugbreak/TestDebugBreak.py)
TIMEOUT: test_with_dsym_and_python_api_dwarf
(lang/go/expressions/TestExpressions.py)