[RFC] Testsuite in lldb & possible future directions

Hi,
in the last couple of months a lot of people put a lot of attentions
and energy into lldb and we're starting to see the first results. I
decided to sit down and write this e-mail to state where we are and
what are some possible future directions for the projects, in terms of
better quality/higher testability.

Current state:

1) We got the testsuite on MacOS to build with zero unexpected
successes and zero failures (modulo one change I'm going to push
tomorrow). This is a collaborative effort and it's very important
because it allows us to push for unexpected successes as failures on
the bots, allowing us to discover issues quicker. Other platform I
think are improving their state as well, mainly thanks to the work of
Pavel and Jan.

2) Adrian is pushing some changes that will allow us to build tests
out of tree. This is particularly good because it doesn't matter the
build directly and indirectly makes the testsuite more reliable as you
can just wipe the build directory when you're done and retry. It also
opens the path for new improvements, e.g. sharing a module cache
across all tests instead of creating a module cache per-test.

3) Vedant spent a lot of time fixing modules issues & ubsan/asan
failures, which. again, improved reliability.

Future:
1) I'm currently exploring ways of testing more akin to what llvm
does. We already have a bunch of unit tests, and I'm considering to
leverage `lldb-test` to experiment testing parts of the compilers that
don't really need interactivity (e.g. expression
parsing/autocomplete). I'm not sure whether this will lead to
anything, but if we can make the process easier for developers coming
from other bits of LLVM, I think we should (at least we should give it
a try).

2) Now that we raised the quality of our standards (we've been with
green bots for a while), I'd love to keep some time and focus on the
standards for future commits.
I think we should adhere to the LLVM policy here
https://llvm.org/docs/DeveloperPolicy.html#quality
So, commits that break tests/break the build/don't include a test
might be reverted. In addition, commits with design decisions not
previously discussed might be reverted if there are concerns about the
general direction.

3) In the short term I plan to remove support for unmaintained
languages (Java/Go/Swift). This allows us to bring them back again (or
bring new languages) but have a better plan for testability &
maintainability.

I'm pretty sure other people have ideas on what they're working
on/want to push/do next, so I'll let them speak.

Conclusions:
The reliability of the suite (and the health of the codebase) is very
important to us. If you have issues, let us know.
In general, I'm looking for any kind of feedback, feel free to speak!

Thanks,

Hi,
in the last couple of months a lot of people put a lot of attentions
and energy into lldb and we're starting to see the first results. I
decided to sit down and write this e-mail to state where we are and
what are some possible future directions for the projects, in terms of
better quality/higher testability.

Current state:

1) We got the testsuite on MacOS to build with zero unexpected
successes and zero failures (modulo one change I'm going to push
tomorrow). This is a collaborative effort and it's very important
because it allows us to push for unexpected successes as failures on
the bots, allowing us to discover issues quicker. Other platform I
think are improving their state as well, mainly thanks to the work of
Pavel and Jan.

I don't mean to belittle this statement, as I think the situation has
definitely improved a lot lately, but I feel I have to point out
that's I've never been able to get a clean test suite run on a mac
(not even the "0 failures" kind of clean). I'm not sure what are these
caused by, but I guess that's because the tests are still very much
dependent on the environment. So, I have to ask: what kind of
environment are you running those tests in?

My machine is not a completely off-the-shelf mac, as it has some
google-specific customizations. I don't really know what this
encompasses, but I would be surprised if these impact the result of
the test suite. If I had to bet, my money would be on your machines
having some apple-specific stuff which is not available in regular
macs.

I tried an experiment today. I configured with: cmake ../../llvm
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -GNinja. First
problem I ran into was that I couldn't run check-lldb, as the clang I
have just built was unable to compile any of the test binaries due to
missing headers (this could be a manifestation of the SDKROOT issue we
ran into a couple of weeks ago). So, I tried running with the system
compiler and I got this output:

FAIL: test_c_global_variables_dwarf
(lang/c/global_variables/TestGlobalVariables.py)
FAIL: test_c_global_variables_gmodules
(lang/c/global_variables/TestGlobalVariables.py)
FAIL: test_dsym (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_dwarf (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_gmodules (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_with_python_api_dsym (lang/cpp/class_static/TestStaticVariables.py)
FAIL: test_with_python_api_dwarf (lang/cpp/class_static/TestStaticVariables.py)
FAIL: test_with_python_api_gmodules
(lang/cpp/class_static/TestStaticVariables.py)
ERROR: test_debug_info_for_apple_types_dsym
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
ERROR: test_debug_info_for_apple_types_dwarf
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
ERROR: test_debug_info_for_apple_types_gmodules
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
UNEXPECTED SUCCESS: test_lldbmi_output_grammar
(tools/lldb-mi/syntax/TestMiSyntax.py)
UNEXPECTED SUCCESS: test_process_interrupt_dsym
(functionalities/thread/state/TestThreadStates.py)
UNEXPECTED SUCCESS: test_process_interrupt_gmodules
(functionalities/thread/state/TestThreadStates.py)

So, I guess my question is: are you guys looking into making sure that
others are also able to reproduce the 0-fail+0-xpass state? I would
love to run the mac test suite locally, as I tend to touch a lot of
stuff that impacts all targets, but as it stands now, I have very
little confidence that the test I am running reflect in any way the
results you will get when you run the test on your end.

I am ready to supply any test logs or information you need if you want
to try to tackle this.

3) In the short term I plan to remove support for unmaintained
languages (Java/Go/Swift). This allows us to bring them back again (or

I hope you meant OCaml instead of Swift. :stuck_out_tongue:

Hi,
in the last couple of months a lot of people put a lot of attentions
and energy into lldb and we're starting to see the first results. I
decided to sit down and write this e-mail to state where we are and
what are some possible future directions for the projects, in terms of
better quality/higher testability.

Current state:

1) We got the testsuite on MacOS to build with zero unexpected
successes and zero failures (modulo one change I'm going to push
tomorrow). This is a collaborative effort and it's very important
because it allows us to push for unexpected successes as failures on
the bots, allowing us to discover issues quicker. Other platform I
think are improving their state as well, mainly thanks to the work of
Pavel and Jan.

I don't mean to belittle this statement, as I think the situation has
definitely improved a lot lately, but I feel I have to point out
that's I've never been able to get a clean test suite run on a mac
(not even the "0 failures" kind of clean). I'm not sure what are these
caused by, but I guess that's because the tests are still very much
dependent on the environment. So, I have to ask: what kind of
environment are you running those tests in?

My machine is not a completely off-the-shelf mac, as it has some
google-specific customizations. I don't really know what this
encompasses, but I would be surprised if these impact the result of
the test suite. If I had to bet, my money would be on your machines
having some apple-specific stuff which is not available in regular
macs.

I tried an experiment today. I configured with: cmake ../../llvm
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -GNinja. First
problem I ran into was that I couldn't run check-lldb, as the clang I
have just built was unable to compile any of the test binaries due to
missing headers (this could be a manifestation of the SDKROOT issue we
ran into a couple of weeks ago). So, I tried running with the system
compiler and I got this output:

FAIL: test_c_global_variables_dwarf
(lang/c/global_variables/TestGlobalVariables.py)
FAIL: test_c_global_variables_gmodules
(lang/c/global_variables/TestGlobalVariables.py)
FAIL: test_dsym (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_dwarf (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_gmodules (functionalities/ubsan/basic/TestUbsanBasic.py)
FAIL: test_with_python_api_dsym (lang/cpp/class_static/TestStaticVariables.py)
FAIL: test_with_python_api_dwarf (lang/cpp/class_static/TestStaticVariables.py)
FAIL: test_with_python_api_gmodules
(lang/cpp/class_static/TestStaticVariables.py)
ERROR: test_debug_info_for_apple_types_dsym
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
ERROR: test_debug_info_for_apple_types_dwarf
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
ERROR: test_debug_info_for_apple_types_gmodules
(macosx/debug-info/apple_types/TestAppleTypesIsProduced.py)
UNEXPECTED SUCCESS: test_lldbmi_output_grammar
(tools/lldb-mi/syntax/TestMiSyntax.py)
UNEXPECTED SUCCESS: test_process_interrupt_dsym
(functionalities/thread/state/TestThreadStates.py)
UNEXPECTED SUCCESS: test_process_interrupt_gmodules
(functionalities/thread/state/TestThreadStates.py)

So, I guess my question is: are you guys looking into making sure that
others are also able to reproduce the 0-fail+0-xpass state? I would
love to run the mac test suite locally, as I tend to touch a lot of
stuff that impacts all targets, but as it stands now, I have very
little confidence that the test I am running reflect in any way the
results you will get when you run the test on your end.

I am ready to supply any test logs or information you need if you want
to try to tackle this.

Yes, I'm definitely interested in making the testusuite
working/reliable on any configuration.
I was afraid there were a lot of latent issues, that's why I sent this
mail in the first place.
It's also the reason why I started thinking about `lldb-test` as a
driver for testing, because I found out the testsuite being a little
inconsistent/brittle depending on the environment it's run on (which,
FWIW, doesn't happen when you run lit/FileCheck or even the unit tests
in lldb). I'm not currently claiming switching to a different method
would improve the situation, but it's worth a shot.

3) In the short term I plan to remove support for unmaintained
languages (Java/Go/Swift). This allows us to bring them back again (or

I hope you meant OCaml instead of Swift. :stuck_out_tongue:

Oh, yes, sigh.

FWIW, I strongly believe we should all agree on a configuration to run
tests and standardize on that.
It's unfortunate that we have two build systems, but there are plans
to move away from manually generating xcodebuild, as many agree it's a
terrible maintenance burden.
So, FWIW, I'll share my conf (I'm on high Sierra):

git clone https://github.com/monorepo
symlink clang -> tools
symlink lldb -> tools
symlink libcxx -> projects (this particular one has caused lots of
trouble for me in the past, and I realized it's undocumented :()

cmake -GNinja -DCMAKE_BUILD_TYPE=Release ../llvm
ninja check-lldb

This *should* work just fine for every developer (and we should error
out if all the dependencies are not in place). If it doesn't, well,
it's a bug.
Can you please try with this and report all the bugs that you find?
I'll work with you to fix them, as I'm particularly interested in
getting the lldb experience for users flawlessly out-the-box (at least
on the platforms I work on :slight_smile:

Despite Zachary's claims, I do not believe this is caused by the test
driver (dotest). It's definitely not beautiful, but I haven't seen an
issue that would be caused by this in a long time. The issue is that
the tests are doing too much -- even the simplest involves compiling a
fully working executable, which pulls in a lot of stuff from the
environment (runtime libraries, dynamic linker, ...) that we have no
control of. And of course it makes it impossible to test the debugging
functionality of any other platform than what you currently have in
front of you.

In this sense, the current setup makes an excellent integration test
suite -- if you run the tests and they pass, you can be fairly
confident that the debugging on your system is setup correctly.
However, it makes a very bad regression test suite, as the tests will
be checking something different on each machine.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that's something I don't
really have an answer to) is how to make writing tests like that easy.
E.g. for these "foreign" language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don't have any tooling for that. Since debug
info is very central to what we do, having something like that would
go a long way towards improving the testing situation, and it would be
useful for C/C++ as well, as we generally need to make sure that we
work with a wide range of compiler versions, not just accept what ToT
clang happens to produce.

PS: I saw your second email as well. I'm going to try out what you
propose, most likely tomorrow.

On 6 February 2018 at 04:11, Davide Italiano via lldb-dev

So, I guess my question is: are you guys looking into making sure that
others are also able to reproduce the 0-fail+0-xpass state? I would
love to run the mac test suite locally, as I tend to touch a lot of
stuff that impacts all targets, but as it stands now, I have very
little confidence that the test I am running reflect in any way the
results you will get when you run the test on your end.

I am ready to supply any test logs or information you need if you want
to try to tackle this.

Yes, I'm definitely interested in making the testusuite
working/reliable on any configuration.
I was afraid there were a lot of latent issues, that's why I sent this
mail in the first place.
It's also the reason why I started thinking about `lldb-test` as a
driver for testing, because I found out the testsuite being a little
inconsistent/brittle depending on the environment it's run on (which,
FWIW, doesn't happen when you run lit/FileCheck or even the unit tests
in lldb). I'm not currently claiming switching to a different method
would improve the situation, but it's worth a shot.

Despite Zachary's claims, I do not believe this is caused by the test
driver (dotest). It's definitely not beautiful, but I haven't seen an
issue that would be caused by this in a long time. The issue is that
the tests are doing too much -- even the simplest involves compiling a
fully working executable, which pulls in a lot of stuff from the
environment (runtime libraries, dynamic linker, ...) that we have no
control of. And of course it makes it impossible to test the debugging
functionality of any other platform than what you currently have in
front of you.

In this sense, the current setup makes an excellent integration test
suite -- if you run the tests and they pass, you can be fairly
confident that the debugging on your system is setup correctly.
However, it makes a very bad regression test suite, as the tests will
be checking something different on each machine.

Yes, I didn't complain about "dotest" in general, but, as you say, the
fact that it pull in lots of stuffs we don't really have control on.
Also, most of the times I actually found out we've been sloppy watching
bots for a while, or XFAILING tests instead of fixing them and that resulted in
issues piling up). This is a more general problem not necessarily tied to
`dotest` as a driver.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that's something I don't

+1.

really have an answer to) is how to make writing tests like that easy.
E.g. for these "foreign" language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don't have any tooling for that. Since debug
info is very central to what we do, having something like that would
go a long way towards improving the testing situation, and it would be
useful for C/C++ as well, as we generally need to make sure that we
work with a wide range of compiler versions, not just accept what ToT
clang happens to produce.

I think the plan here (and I'd love to spend some time on this once we
have stability, which seems we're slowly getting) is that of enhancing
`yaml2*` to do the work for us.
I do agree is a major undertaken but even spending a month on it will
go a long way IMHO. I will try to come up with a plan after discussing
with folks in my team (I'd really love to get also inputs from DWARF
people in llvm, e.g. Eric or David Blake).

PS: I saw your second email as well. I'm going to try out what you
propose, most likely tomorrow.

Thanks!

On 6 February 2018 at 04:11, Davide Italiano via lldb-dev

So, I guess my question is: are you guys looking into making sure that
others are also able to reproduce the 0-fail+0-xpass state? I would
love to run the mac test suite locally, as I tend to touch a lot of
stuff that impacts all targets, but as it stands now, I have very
little confidence that the test I am running reflect in any way the
results you will get when you run the test on your end.

I am ready to supply any test logs or information you need if you want
to try to tackle this.

Yes, I’m definitely interested in making the testusuite
working/reliable on any configuration.
I was afraid there were a lot of latent issues, that’s why I sent this
mail in the first place.
It’s also the reason why I started thinking about lldb-test as a
driver for testing, because I found out the testsuite being a little
inconsistent/brittle depending on the environment it’s run on (which,
FWIW, doesn’t happen when you run lit/FileCheck or even the unit tests
in lldb). I’m not currently claiming switching to a different method
would improve the situation, but it’s worth a shot.

Despite Zachary’s claims, I do not believe this is caused by the test
driver (dotest). It’s definitely not beautiful, but I haven’t seen an
issue that would be caused by this in a long time. The issue is that
the tests are doing too much – even the simplest involves compiling a
fully working executable, which pulls in a lot of stuff from the
environment (runtime libraries, dynamic linker, …) that we have no
control of. And of course it makes it impossible to test the debugging
functionality of any other platform than what you currently have in
front of you.

I’m not claiming that it’s definitely caused by dotest and that moving away from dotest is going to fix all the problems. Rather, I’m claiming that dotest has an unknown amount of flakiness (which may be 0, but may be large), and the alternative has a known amount of flakiness (which is very close to, if not equal to 0). So we should do it because, among other benefits, it replaces an unknown with a known that is at least as good, if not better.

In this sense, the current setup makes an excellent integration test
suite – if you run the tests and they pass, you can be fairly
confident that the debugging on your system is setup correctly.
However, it makes a very bad regression test suite, as the tests will
be checking something different on each machine.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that’s something I don’t
really have an answer to) is how to make writing tests like that easy.
E.g. for these “foreign” language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don’t have any tooling for that.

Most of these other language plugins are being removed anyway. Which language plugins are going to still remain that aren’t some flavor of c/c++?

I think that the path forward is to massively expand test coverage in all areas. We need roughly 20x - 30x the amount of tests that we currently have. 25,000 ~ 30,000 tests that run equally well on all platforms is a good target to shoot for.

The goal of tests is, obviously, to increase test coverage by increasing the amount of code that is tested. Another way to increase test coverage is to reduce the amount of code that isn’t tested. If you can delete an untested branch then even if you don’t add a test, you’ve increased test coverage. To that end, we should be looking to assert more liberally and end the dubious practice of defensive programming.

On the subject of lldb-test. I think the existing test suite serves its purpose as an integration test suite well, and I would even say that it has a reasonable amount of test coverage from what you could expect of an integration test suite. But what we need is a regression test suite. I don’t think we should spend a significant amount of time adding new tests to the integration test suite. It’s coverage is already decent. I think almost all new tests going forward should be very lightweight, targeted, regression tests that do not depend on the host platform at all. lldb-test is the perfect vehicle for this kind of test. It’s still early and doesn’t do much, so we will need to continually add more functionality to lldb-test as well to realize this goal, but I think we can rapidly expand test coverage going this route.

Finally, I think we should get buildbots running sanitized builds of LLDB under the test suite. For LLDB specifically I think TSan and UBSan would add the most value, but long term I think we should get all sanitizers enabled.

I'm not claiming that it's definitely caused by dotest and that moving away
from dotest is going to fix all the problems. Rather, I'm claiming that
dotest has an unknown amount of flakiness (which may be 0, but may be
large), and the alternative has a known amount of flakiness (which is very

Well, it may be unknown to you, but as someone who has managed a bot
running tests for a long time, I can tell you that the it's pretty
close to 0. Some test still fail sometimes, but the failure rate is
approximately at the same level as failures caused by the bot not
being able to reach the svn server to fetch the sources.

That said, I'm still in favor of replacing the test runner with lit. I
just think it needs to be done with a steady hand.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that's something I don't
really have an answer to) is how to make writing tests like that easy.
E.g. for these "foreign" language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don't have any tooling for that.

Most of these other language plugins are being removed anyway. Which
language plugins are going to still remain that aren't some flavor of c/c++?

Well, right now we have another thread proposing the addition of a
Rust plugin, and we will want to resurrect Java support sooner or
later. Go/Ocaml folks may want to do the same, if doing that will not
involve them inventing a whole test framework.

So, I'm not sure where you were heading with that question..

I’m not claiming that it’s definitely caused by dotest and that moving away
from dotest is going to fix all the problems. Rather, I’m claiming that
dotest has an unknown amount of flakiness (which may be 0, but may be
large), and the alternative has a known amount of flakiness (which is very

Well, it may be unknown to you, but as someone who has managed a bot
running tests for a long time, I can tell you that the it’s pretty
close to 0. Some test still fail sometimes, but the failure rate is
approximately at the same level as failures caused by the bot not
being able to reach the svn server to fetch the sources.

As someone who gave up on trying to set up a bot due to flakiness, I have a different experience.

That said, I’m still in favor of replacing the test runner with lit. I
just think it needs to be done with a steady hand.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that’s something I don’t
really have an answer to) is how to make writing tests like that easy.
E.g. for these “foreign” language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don’t have any tooling for that.

Most of these other language plugins are being removed anyway. Which
language plugins are going to still remain that aren’t some flavor of c/c++?

Well, right now we have another thread proposing the addition of a
Rust plugin, and we will want to resurrect Java support sooner or
later. Go/Ocaml folks may want to do the same, if doing that will not
involve them inventing a whole test framework.

So, I’m not sure where you were heading with that question…

Rust is based on llvm so we have the tools necessary for that. The rest are still maybe and someday so we can cross that bridge when (if) we come to it

As someone who gave up on trying to set up a bot due to flakiness, I have a
different experience.

I did not say it was easy to get to the present point, and I am
certain that the situation is much harder on windows. But I believe
this is due to reasons not related to the test runner (such various
posixism spread out over the codebase and the fact that windows uses a
completely different (i.e. lest tested) code path for debugging).

FWIW, we also have a windows bot running remote tests targetting
android. It's not as stable as the one hosted on linux, but most of
the issues I've seen there also do not point towards dotest.

Rust is based on llvm so we have the tools necessary for that. The rest are
still maybe and someday so we can cross that bridge when (if) we come to it

I don't know enough about Rust to say whether that is true. If it uses
llvm as a backend then I guess we could check-in some rust-generated
IR to serve as a test case (but we still figure out what exactly to do
with it).

However, I would assert that even for C family languages a more
low-level approach than "$CC -g" for generating debug info would be
useful. People generally will not have their compiler and debugger
versions in sync, so we need tests that check we handle debug info
produced by older versions of clang (or gcc for that matter). And
then, there are the tests to make sure we handle "almost valid" debug
info gracefully...

This last category is really interesting (and, unfortunately, given our current testing strategy, almost entirely untested).
I think the proper thing here is that of having tooling that generates broken debug info, as yaml2obj can generate broken object files, and test with them.
lldb does a great deal of work trying to “recover” with a lot of heuristics in case debug info are wrong but not that off. In order to have better control of this codepath, we need to have a better testing for this case, otherwise this will break (and we’ll be forced to remove the codepath entirely).

Right, so I tried following these instructions as precisely as I could.

- The first thing that failed is the libc++ link step (missing -lcxxabi_shared).

So, I added libcxxabi to the build, and tried again.
Aaand, I have to say the situation is much better now: I got two
unexpected successes and one timeout:
UNEXPECTED SUCCESS: test_lldbmi_output_grammar
(tools/lldb-mi/syntax/TestMiSyntax.py)
UNEXPECTED SUCCESS: test_process_interrupt_dsym
(functionalities/thread/state/TestThreadStates.py)
TIMEOUT: test_breakpoint_doesnt_match_file_with_different_case_dwarf
(functionalities/breakpoint/breakpoint_case_sensitivity/TestBreakpointCaseSensitivity.py)

On the second run I got these results:
FAIL: test_launch_in_terminal (functionalities/tty/TestTerminal.py)
UNEXPECTED SUCCESS: test_lldbmi_output_grammar
(tools/lldb-mi/syntax/TestMiSyntax.py)
UNEXPECTED SUCCESS: test_process_interrupt_dwarf
(functionalities/thread/state/TestThreadStates.py)

So, checking out libc++ certainly helped (this definitely needs to be
documented somewhere) a lot. Of these, the MI test seems to be failing
consistently. The rest appear to be flakes. I am attaching the logs
from the second run, but there doesn't appear to be anything
interesting there...

Failure-LaunchInTerminalTestCase-test_launch_in_terminal.log (3.12 KB)

UnexpectedSuccess-MiSyntaxTestCase-test_lldbmi_output_grammar.log (2.55 KB)

UnexpectedSuccess-ThreadStateTestCase-test_process_interrupt_dwarf.log (4.47 KB)

Terrific that we're making progress! I plan to take a look at the
`lldb-mi` failure soon, as I can reproduce it here fairly
consistently.

About the others, we've seen
functionalities/breakpoint/breakpoint_case_sensitivity/TestBreakpointCaseSensitivity.py
failing on the bots and I think might be due to a spotlight issue
Adrian found (and fixed).
You might still have `.dSYM` bundles from stale build directories, i.e.

To fix this, you need to wipe out all old build artifacts:

- Inside of the LLDB source tree:
$ git clean -f -d

- Globally:
$ find / -name a.out.dSYM -exec rm -rf \{} \;

This s a long shot, but might help you

On 6 February 2018 at 04:11, Davide Italiano via lldb-dev

So, I guess my question is: are you guys looking into making sure that
others are also able to reproduce the 0-fail+0-xpass state? I would
love to run the mac test suite locally, as I tend to touch a lot of
stuff that impacts all targets, but as it stands now, I have very
little confidence that the test I am running reflect in any way the
results you will get when you run the test on your end.

I am ready to supply any test logs or information you need if you want
to try to tackle this.

Yes, I'm definitely interested in making the testusuite
working/reliable on any configuration.
I was afraid there were a lot of latent issues, that's why I sent this
mail in the first place.
It's also the reason why I started thinking about `lldb-test` as a
driver for testing, because I found out the testsuite being a little
inconsistent/brittle depending on the environment it's run on (which,
FWIW, doesn't happen when you run lit/FileCheck or even the unit tests
in lldb). I'm not currently claiming switching to a different method
would improve the situation, but it's worth a shot.

Despite Zachary's claims, I do not believe this is caused by the test
driver (dotest). It's definitely not beautiful, but I haven't seen an
issue that would be caused by this in a long time. The issue is that
the tests are doing too much -- even the simplest involves compiling a
fully working executable, which pulls in a lot of stuff from the
environment (runtime libraries, dynamic linker, ...) that we have no
control of. And of course it makes it impossible to test the debugging
functionality of any other platform than what you currently have in
front of you.

In this sense, the current setup makes an excellent integration test
suite -- if you run the tests and they pass, you can be fairly
confident that the debugging on your system is setup correctly.
However, it makes a very bad regression test suite, as the tests will
be checking something different on each machine.

Yes, I didn't complain about "dotest" in general, but, as you say, the
fact that it pull in lots of stuffs we don't really have control on.
Also, most of the times I actually found out we've been sloppy watching
bots for a while, or XFAILING tests instead of fixing them and that resulted in
issues piling up). This is a more general problem not necessarily tied to
`dotest` as a driver.

So I believe we need more lightweight tests, and lldb-test can provide
us with that. The main question for me (and that's something I don't

+1.

really have an answer to) is how to make writing tests like that easy.
E.g. for these "foreign" language plugins, the only way to make a
self-contained regression test would be to check-in some dwarf which
mimics what the compiler in question would produce. But doing that is
extremely tedious as we don't have any tooling for that. Since debug
info is very central to what we do, having something like that would
go a long way towards improving the testing situation, and it would be
useful for C/C++ as well, as we generally need to make sure that we
work with a wide range of compiler versions, not just accept what ToT
clang happens to produce.

I think the plan here (and I'd love to spend some time on this once we
have stability, which seems we're slowly getting) is that of enhancing
`yaml2*` to do the work for us.
I do agree is a major undertaken but even spending a month on it will
go a long way IMHO. I will try to come up with a plan after discussing
with folks in my team (I'd really love to get also inputs from DWARF
people in llvm, e.g. Eric or David Blake).

The last time I looked into yaml2obj was to use it for testing llvm-dwarfdump and back then I concluded that it needs a lot of work to be useful even for testing dwarfdump. In the current state it is both too low-level (e.g., you need to manually specify all Mach-O load commands, you have to manually compute and specify the size of each debug info section) and too high-level (it can only auto-generate one exact version of .debug_info headers) to be useful.

If we could make a tool whose input roughly looks like the output of dwarfdump, then this might be a viable option. Note that I'm not talking about syntax but about the abstraction level of the contents.

In summary, I think this is an interesting direction to explore, but we shouldn't underestimate the amount of work necessary to make this useful.

-- adrian