Test suite rebuilding test executables many times

While looking into a Windows-specific issue involving TestTargetAPI.py, I noticed that we are building the exact same executable many times. Every single test has a line such as self.buildDwarf() or self.buildDsym(). Those functions will first run make clean and then run make, essentially rebuilding the exact same program.

Is this necessary for some reason? Each test suite already supports suite-specific setup and tear down by implementing a suite-specific setUp and tearDown function. Any particular reason we can’t build the executables a single time in setUp and clean them a single time in tearDown?

I don’t think we need to retro-actively do this for every single test suite as it would be churn, but in a couple of places it would actually fix test failures on Windows, and improve performance of the test suite as a side benefit (as a result of reducing the number of compilations that need to happen)

Thoughts?

Another possibility is changing the arguments to buildDwarf and buildDsym. Currently they take a clean argument with a default value of True. Does this really need to be True? If this were False by default it could drastically speed up the test suite. And I can’t think of a reason why make clean would need to run by default, because tear down is going to have to clean up the files manually anyway

It is fairly common practice (at least it is for me) when figuring out why a test failed, or adding to a test case, or when looking for a good example file to poke at, etc, to go to some relevant test directory, do a "make" then poke around a bunch. I don't generally remember to clean when I'm done. If the test suite didn't do make clean before running the tests then I'd get whatever state I left the binaries in after that investigation. So I prefer doing make clean the first time you run a test in a given directory, but I have no objection to trying not to do the clean on subsequent tests in the same directory. Also we do "dsym" and then "non-dsym" builds in the same directory on OS X, so we'd have to make sure that we clean when switching back & forth between the two kinds of tests, or we will leave a dSYM around at some point and stop testing .o file debugging. Now that support is coming in for "dwo" debugging on Linux, we probably should also add the ability to test normal & dwo debugging there as well. So this soon won't be just an OS X oddity...

Finally, there are some tests that rebuild the binaries on purpose - sadly I don't remember which ones. If we're lucky they would fail if you switched the default and you could go fix them, but if you are unlucky they would succeed without actually testing what they were supposed to test. So a little care would be needed to find these.

Jim

The first and second issues (cleaning once at startup, switching between dsym and dwarf tests) can probably both be solved at the same time by having the test runner sort the runs and do all dsym tests first, and then all dwarf tests, and having TestBase do make clean once before each of those steps. What do you think?

I’m going to do some timings tomorrow to see how much faster the test suite is when clean=False is the default. I already confirmed that it fixes all the failures I’m seeing though, so as long as it’s agreeable I’d like to make this change.

I’ll wait and see if anyone can remember which tests rebuild binaries on purpose. Otherwise I will try to look through them and see if I can figure it out.

TestInferiorChanged is one that I remember.

I think this is a good thing to do, but it will need to be done with a
steady hand.

I was also thinking about the dsym/dwo tests.. Instead of basically
having a copy of each test for dwarf and dsym (and soon also dwo), how
about having just one test, and have some higher level logic (the test
runner) know that it needs to execute each test multiple times. The
tests would then just do a buildDefault() (or something) and on the
first run it would build normal dwarf, on the second one dsym, etc. If
we need to run a test only for some combination of debug infos, we
could have @skipIfDsym annotations, like we do for the rest of stuff.
I think this will make what Zachary is proposing easier to do, and it
will make the test writing less awkard.

What do you think? I'm ready to chip in on this if we agree to go down
this way...

pl

I thought of this too and I started prototyping it.

The issue that I ran into is that dsym and dwarf tests can all be xfailed, skipped, etc for different reasons, so if there is one method body, you need a way to still define the set of conditions under which dsym and dwarf tests should run, skip, xfail, timeout, etc.

Do you want to start writing @skipIfDwarfAndOsIsLinuxButCompilerIsNotClang? Because I know I don’t want to deal with the combinatorial explosion of decorators that would result :slight_smile:

I have some ideas here as well, for example I think we only actually need 1 decorator that we can configure via keyword arguments that can handle arbitrarily complex scenarios of xfailing, debug infos, etc. e.g. @lldb_test(debug_types=“dwo,dwarf”, xfail = {…}, skip = {…}). But it’s all unrelated to the original problem I’m trying to solve. So I think it would be good to design a solution for that, but to do it separately.

The nice thing about just changing the default from clean to not clean is that it’s about a 3 line change, has potentially large speed improvements across the board, and also fixes bugs.

Not sure if this is relevant, but I seem to recall the remote test execution would spin off each test method run (test case level, not test suite level) into a new directory. I don’t think that would be inherently broken by a no-clean scenario but we’d want to make sure it doesn’t break.

-Todd

I think I’m goign to abandon this idea for now. Mostly because I’ve found a workaround which is a) one line of code and b) only affects windows. So the impact of this workaround is narrower, no risk of messing up any tests which depend on cleaning, and gives more time to come up with a more comprehensive solution, such as something like I and Pavel proposed earlier.

That seems fair. It would be great to have some higher level mechanism to generate runs of the tests that only differ by how the target programs are built. For instance, if some day we ever get serious about optimized code debugging (suppresses giggle) it would be good to run the test suite on optimized binaries as well... So if you are inclined to work on that, that would be excellent, and then we can revisit how to do clean etc. at that time.

Jim