Support for deferred execution in lit

This is a topic that came out in our recent discussions about remote test execution in libc++ and other runtimes.

libc++ lit test suite has support for running tests remotely using a custom executor. compiler-rt has a similar support. The problem is that this is done in a very ad-hoc way on a per-command basis.

The most basic example looks as follows:

RUN: %{exec} %t.exe

When executing tests locally, %{exec} would be empty (or it could be a binary like env). When executing tests remotely, for example over SSH which is the most common case, %{exec} is expanded into a script that uses SCP to copy the binary to the remote target and SSH to execute it. While we’re waiting for the command to finish, the test execution is blocked.

When you only have a handful of tests, it’s a reasonable approach, but it becomes a problem with a large number of tests (as in the case of libc++) because the overhead of copying and executing tests one-by-one can be significant. It gets worse if setting up the target test environment is expensive, which can be the case for some embedded environments.

It would be more efficient to bundle up all binaries (with their dependencies), copy them over to the target and run them all but that pattern is difficult to express in lit right now.

https://reviews.llvm.org/D77657 is one possible implementation but there are some unresolved issues described in the details of that change.

While we could try and workaround some of these issues, we think that a better solution would be to introduce a notion of “deferred execution” into lit, so any RUN lines marked as deferred wouldn’t be run immediately and the test would be reported as having a new status DEFERRED. We would then ideally have some way of collecting all deferred commands and providing a custom handler (for example via TestingConfig) that could do things like packaging up all binaries and executing them on the target device.

We think that such a feature would be generally useful but I’d like to collect more feedback before we go ahead with the implementation. Do you think such a feature would be useful? Is there another way of supporting batched/deferred execution of test binaries with lit?

We’ve been looking at some related questions and have some other options that you might be interested in. First of all a comment: The idea of ‘deferred execution’ seems problematic since it fundamentally changes the semantics of the report from lit. I’m curious how you are able to use this in practice to, for instance, check that tests were eventually run somewhere?

We have an embedded environment where building LLVM itself is barely feasible due to resource constraints. As a result we cross-compile LLVM and generate an installed tarball copied to the embedded system. However, this means that we can’t test the cross-compiled LLVM executables until we get to the embedded system.
Our approach has been to factor the test directory into a hierarchical cmake project. This project then uses the standard LLVM cmake export mechanisms (i.e. find_project) to find LLVM. This refactoring has no effect on a regular in-tree toplevel build. However, we can checkout the LLVM tree on the embedded system and build just the test area using the installed tarball of LLVM. I think this refactoring of cmake is something that would be relatively easy to carry out on the LLVM tree. Relative to your current approach, this moves the problem of tarballing and remote code execution out of lit’s responsibility and into a more devops/release responsibility, which makes more sense to me.

Perhaps you also have other goals, such as partitioning tests to run on multiple target nodes? I haven’t thought too much about how this would interact.

Separately, we also have the problem of tests that need to behave differently in different contexts. e.g.:
RUN: clang --target=my_cross_target … -o test.elf
RUN: %run% test.elf

In this case, we’d like to be able to test the compilation part outside of the target, but when we run the same test on the target machine, we can compile and run. In this case we do something similar (as you see above) using a lit subsitution that varies depending on the cmake environment. Doing this is somewhat clumsy and I’ve thought it would be nicer to move this into lit, allowing the test to be:

RUN: clang --target=my_cross_target … -o test.elf
RUN_ON_TARGET: %run% test.elf

In this case the behavior of RUN*: lines would be configurable in the lit.cfg.py. This could implement part of your current use case (although maybe there would be impacts on how the reporting is done?)

Steve

We are executing these library tests on a baremetal target, so for us it is not possible to copy over batches of files (we can’t even use SCP and SSH). At the moment we use a simple python script as our %{exec} command that loads the cross-compiled binary into the targets memory, where it is then executed.

Having the tests compiled in batch and then executed one by one would work for us however. In general, as long as our use case continues to work without any major changes to our test infrastructure, I’m all for trying to improve the testing performance of the libraries.

Cheers,

Dominik

We’ve been looking at some related questions and have some other options that you might be interested in. First of all a comment: The idea of ‘deferred execution’ seems problematic since it fundamentally changes the semantics of the report from lit. I’m curious how you are able to use this in practice to, for instance, check that tests were eventually run somewhere?

I think that’s one of the open questions and I can see several potential solutions. We could leave it entirely up to the user to ensure that the tests were executed eventually. We could extend lit so it’s still responsible for executing the tests but it does so via a user provided (remote) executor.

We have an embedded environment where building LLVM itself is barely feasible due to resource constraints. As a result we cross-compile LLVM and generate an installed tarball copied to the embedded system. However, this means that we can’t test the cross-compiled LLVM executables until we get to the embedded system.
Our approach has been to factor the test directory into a hierarchical cmake project. This project then uses the standard LLVM cmake export mechanisms (i.e. find_project) to find LLVM. This refactoring has no effect on a regular in-tree toplevel build. However, we can checkout the LLVM tree on the embedded system and build just the test area using the installed tarball of LLVM. I think this refactoring of cmake is something that would be relatively easy to carry out on the LLVM tree. Relative to your current approach, this moves the problem of tarballing and remote code execution out of lit’s responsibility and into a more devops/release responsibility, which makes more sense to me.

I agree with that distinction but it’s not an approach that would be feasible for us since we cannot even run CMake or Python on the target at the moment. In our case, we want to package and execute individual test binaries (with their data dependencies).

Perhaps you also have other goals, such as partitioning tests to run on multiple target nodes? I haven’t thought too much about how this would interact.

It’s not something we’re actively looking into right now but if that option existed we would take advantage of it.

Separately, we also have the problem of tests that need to behave differently in different contexts. e.g.:
RUN: clang --target=my_cross_target … -o test.elf
RUN: %run% test.elf

In this case, we’d like to be able to test the compilation part outside of the target, but when we run the same test on the target machine, we can compile and run. In this case we do something similar (as you see above) using a lit subsitution that varies depending on the cmake environment. Doing this is somewhat clumsy and I’ve thought it would be nicer to move this into lit, allowing the test to be:

RUN: clang --target=my_cross_target … -o test.elf
RUN_ON_TARGET: %run% test.elf

In this case the behavior of RUN*: lines would be configurable in the lit.cfg.py. This could implement part of your current use case (although maybe there would be impacts on how the reporting is done?)

That idea was also suggested in https://reviews.llvm.org/D77657 and I think of it as a superset of the deferred execution in that the execution of the command is deferred if the host doesn’t match the target. It would require for lit to become cross-compilation aware, that is to have a notion of host and target, which would require more changes but is likely going to be more useful than just deferred execution alone.

I also want to point out there are different ways of remote execution and those might require slightly different model. It would be good to have all the cases considered when we are discussing the correct approach to take. Two examples I have is:

  • Runtime tests for libcxx/libcxxabi/compiler-rt/libunwind. Those are different from normal llvm/clang lit tests because it can have cross compile target. When building for host, the tests works because RUN line can describe both compile time command and runtime command but for cross compile target, you can’t really run the tests because you need a way to run the compile on the host and runtime command on the device. You can’t just bundle everything to send to device since I might not have a toolchain to be used on device to compile. One other option is using gtest. When I was looking at libunwind, I almost wanted to rewrite the test suite into gtest (because it is very small while libcxx tests are too large to be rewritten) so I can simple build the test, install into device and drive it on the device side with lit test.

  • The other way of remote execution is for distributed build/test. If you have a distribute build system, the bottleneck of the build/test is definitely running the testsuite. In this case, we might be thinking of execute the RUN line for compile remotely. Bundle everything up for this works but it will be hard to distribute to a pool of nodes without huge overhead. I know this is different from the problem we try to solve here but it is interesting to think if we want to remodel how lit works.

Steven

I also want to point out there are different ways of remote execution and those might require slightly different model. It would be good to have all the cases considered when we are discussing the correct approach to take. Two examples I have is:

  • Runtime tests for libcxx/libcxxabi/compiler-rt/libunwind. Those are different from normal llvm/clang lit tests because it can have cross compile target. When building for host, the tests works because RUN line can describe both compile time command and runtime command but for cross compile target, you can’t really run the tests because you need a way to run the compile on the host and runtime command on the device. You can’t just bundle everything to send to device since I might not have a toolchain to be used on device to compile. One other option is using gtest. When I was looking at libunwind, I almost wanted to rewrite the test suite into gtest (because it is very small while libcxx tests are too large to be rewritten) so I can simple build the test, install into device and drive it on the device side with lit test.

We have two or three downstream lit tests that are for running a linked executable on our remote targets. In our case, we just have a separate python script that does all the work of connecting to and sending the data to the target, which the lit RUN line invokes. However, this isn’t a deferred execution model, since the test still waits for the result to be reported before proceeding.

  • The other way of remote execution is for distributed build/test. If you have a distribute build system, the bottleneck of the build/test is definitely running the testsuite. In this case, we might be thinking of execute the RUN line for compile remotely. Bundle everything up for this works but it will be hard to distribute to a pool of nodes without huge overhead. I know this is different from the problem we try to solve here but it is interesting to think if we want to remodel how lit works.

Somewhat an aside, but I’m mentoring a GSOC student this year to look at distributing lit tests. On the basis that spawning a distributed execution for a single RUN line is disproportionately expensive, he’s planning on sending off whole lit tests (or batches of lit tests) to individual executor agents. He’s currently using an HTCondor system for initial bring-up, but the aim is to produce something that can easily be adapted to your distribution system of choice.

I also want to point out there are different ways of remote execution and those might require slightly different model. It would be good to have all the cases considered when we are discussing the correct approach to take. Two examples I have is:

  • Runtime tests for libcxx/libcxxabi/compiler-rt/libunwind. Those are different from normal llvm/clang lit tests because it can have cross compile target. When building for host, the tests works because RUN line can describe both compile time command and runtime command but for cross compile target, you can’t really run the tests because you need a way to run the compile on the host and runtime command on the device. You can’t just bundle everything to send to device since I might not have a toolchain to be used on device to compile. One other option is using gtest. When I was looking at libunwind, I almost wanted to rewrite the test suite into gtest (because it is very small while libcxx tests are too large to be rewritten) so I can simple build the test, install into device and drive it on the device side with lit test.

To clarify, when I said that we would like to bundle tests for remote execution, what I had in mind is bundling the cross-compiled binaries (that were compiled on the host). We have no intention of running the compiler on the target in the foreseeable future.

  • The other way of remote execution is for distributed build/test. If you have a distribute build system, the bottleneck of the build/test is definitely running the testsuite. In this case, we might be thinking of execute the RUN line for compile remotely. Bundle everything up for this works but it will be hard to distribute to a pool of nodes without huge overhead. I know this is different from the problem we try to solve here but it is interesting to think if we want to remodel how lit works.

I considered using CMake to build the tests to simplify the process since we could rely on the existing (distributed) build and packaging support. This might be feasible for libunwind or libc++abi, but libc++ contains a large number of compile fail and pass tests and I don’t know how to represent those in CMake.