Currently, the flang/test/Driver/OpenMP/ directory hosts a few tests which test the end-to-end lowering from Fortran code to llvm-ir.
These are mainly to test functionalities which are implemented across several stages, such as OpenMP map-type flags which are partially set when lowering from from Fortran to FIR and then later again when lowering to llvm IR.
As these aren’t really Driver tests, we should move them to a new directory for testing lowering to llvm-ir.
Please comment on whether you think we should have such a test directory, and if so, then where should it reside.
I believe in general, end to end testing is not encouraged in llvm-project repository. There are some lengthy threads on this subject in RFC: End-to-end testing. LIT tests are generally supposed to test the specific functionality or pass that a patch adds. I believe this is specifically to keep the tests simple, reduce dependencies (eg: not to cause a failure in a flang test when an LLVM pass changes).
Given that some projects already test and the fact that OpenMP testing does depend a lot on other factors (MLIR, FIR, OpenMPIRBuilder etc). A reasonable testing has to ensure that it works with all these components. It would be good to have some end-to-end testing (Fortran to llvm-ir) and exeuction tests somewhere in the Flang test directory.
And we have tests for the individual lowering steps as well, these tests are more to ensure that overall the functionality hasn’t been broken from changes to its components.
“functional testing” which is what we do in MLIR integration tests, where we’re trying to actually execute some piece of code and check for correctness. This is simple pass/fail signal in general (sometimes it can be tricky because floating point numerics, but overall it is quite robust).
“IR checks”: which is to be more “unit test” as you describe.
It’s not clear to me that “Fortran to llvm-ir” fits any of these categories.
FIR transformation pass to remove host functions which do not contain target regions.
OpenMPToLLVMIR translation removes host functions that contain target regions.
We have separate tests for the individual steps above. However, I feel like we should have some tests to make sure they work end-to-end as well.
The above example is one such scenario, there are a couple of others as well where we might want to have some similar test which tests fortran to llvm-ir lowering.
It’s not clear to me how would such tests be written so it’s hard to say.
I’d encourage to think about the coverage these tests would provide compared to the existing tests (what scenario are they protecting from?) and how much coupling of the entire stack they rely on.
The motif behind these tests comes from the fluid nature of how these functionalities are implemented.
Taking the above function-filtering scenario as an example, until recently we had a separate pass which dealt with removal of host functions containing target regions called EarlyOutliningPass as part of its other main functionalities. However, this pass got removed (as it’s core functionality was made redundant). Thus it went unnoticed that such host functions containing target regions were now present in the device IR.
Having an end-to-end test which just makes sure that host functions in the Fortran code are not present in the final device llvm IR, agnostic of the implementation details of where such logic is implemented would help us catch them.
We could argue that we should have been more cautious when removing the EarlyOutliningPass in this case and replaced this logic along with some new tests, but having an end-to-end test would make it much easier to catch such changes.
From my understanding when offloading with OpenMP currently in Flang+OpenMP (borrowed from Clang’s flow) there’s a device dependency on the host where the host generated IR file is passed to the device so that it can load it and do some processing of it (primarily to synchronize kernel environment and argument metadata, perhaps more in the case of Clang with a more comprehensive feature set) and this can affect the device IR’s output (one of the main cases I can think of off the top of my head is with declare target’s link clause).
This is perhaps one thing this type of test might be able to facilitate/help with, but perhaps it’s easier than I think it is to have and test this kind of dependency in the current MLIR test infrastructure.