Opinions on conditional dependencies on proprietary tools

FWIW, our testing for GPUs is problematic for many reasons. We should have buildbots that execute on GPUs. They will have to rely on vendor tools. I don’t see that changing any time soon. Even if we would implement ptxas, we still need vendor tools (and drivers) for the next step, so it is unclear to me if we really want to wait for a fully open toolchain until we stop testing GPU code (generation) outside the project only.

Agreed, and we do have the bots (e.g clang-cuda-t4) for running CUDA tests on three major GPU architectures. However, it’s only a subset of external dependencies.

For syntax checking we don’t need a GPU, just some tools from the SDK.

I agree with @thakis that development of an open-source replacement tools should be prioritized over using proprietary tools, but I don’t think we should disallow proprietary tools either. Developing a compiler (ptxas is an optimizing assembler for multiple GPU architectures) for a GPU without available documentation is not feasible, IMO, especially considering that we’d have to repeat it for every new GPU generation which tends to come every couple of years.

I think for regression tests specifically, we want to ensure that the experience developers see locally is the same as what the buildbots are testing, to the largest extent possible. Having the semantics of the test change based on what tools are installed locally is confusing. (The issue isn’t really that the tool is proprietary; the issue is that it’s not in the LLVM tree, and it’s not in the list of tools we expect to be installed in the “Getting Started” guide.)

That isn’t to say we shouldn’t have buildbots that run GPU tests, but I think it belongs elsewhere.

I agree about not having magically changing behavior and did argue against such behavior in the early revision of the patch.

This is not an issue here, AFAICT. Behavior changes depending on what the user specified explicitly during the build configuration. It’s not any different from enabling/disabling certain back-ends and having some tests running or not depending on that.

For the users who do not configure their build with ptxas enabled (and installed), there are no visible changes to the build and test. Users who do configure their builds to enable ptxas use in tests, do get extra testing done. This will eventually be used on CUDA buildbot where I will have CUDA SDK installed and will rerun LLVM tests with the additional coverage provided by these tests. However, having these tests conditionally available in the LLVM itself has the benefit that we will be able to catch errors before landing the changes.

Hmmmm disagree. The whole point of buildbots is to test things you don’t have locally. IMO, ptxas is just one more of these, just like most people don’t have Windows (with its entire set of proprietary tools).

Took another look. So this extra testing only runs if the user explicitly specifies -DPTXAS_EXECUTABLE to CMake, or sets the environment variable LLVM_PTXAS_EXECUTABLE? And there isn’t any automatic detection?

So the point of this would be to detect changes to the output that accidentally break ptxas. Given that the point of the NVPTX backend is specifically to interface with ptxas, I guess that makes sense. Not sure how much benefit you get out of this vs. just running some generic test-suite on the buildbot, though.

Maybe leave a short note in each test so it’s obvious to developers why their test is failing on the buildbot where it doesn’t fail locally. (And the ptxas run should not be treated as a substitute for writing reasonable FileCheck checks.)

The point of the regression tests, specifically, is to ensure that that compiler generates a known output for particular set of inputs. We try hard to make that mapping independent of the host for almost all tests.

What I really don’t want to see is a situation where some aspect of LLVM IR changes, and we have a bunch of regression tests that are impossible to update because they only run on specific hosts.

The way the current ptxas-enabled tests work, the assembler text goes through the usual FileCheck verification, and then if ptxas is enabled, it also runs. I agree ptxas should never be the only verification, and my understanding is that the folks working in that area also agree.

1 Like

I am less concerned about depending on a proprietary tool for some bots (as someone mentioned we have CUDA bots that require the proprietary CUDA drivers installed) but rather what we do when we hit issues with such tool.

For example what if a perfectly valid change to instCombine triggers a crash in ptxas in the test suite reported by a bot? Who is responsible for investigating and root causing this: the test suite owner / GPU developers or the author of the instCombine change?
If we’re just exposing a bug in ptxas that crashes a benchmark in the test-suite: reverting the change can be a temporary solution of course, but what is the path forward for the LLVM contributor?
It is in general a problem with any third-party dependencies, even though if for the non-proprietary ones we can try to patch them ourselves (which may be not practical anyway).

1 Like

Bots are supposed to have owners who should be willing to help out with issues that occur only on their bots. Is that not how it works?

There is a similar situation with regards to MLIR building for a FPGA target like XILINX. The final step from MLIR to FPGA netlist is proprietary.
So, I would probably tend to allow an option to use proprietary tools, but favor open source tools if they are available.

Yes, and yes. We already do a fairly decent job checking that LLVM generates what we, developers, expect to be a valid PTX and that is not going to go away. If anything, once ptxas test finds a problem, that specific problem would be a primary candidate for converting it into a FileCheck test and make the standard test set more robust.

Sure: but while a bot owner for PowerPC can help debugging a miscompile because they have access to the hardware, there is more to it here when you’re actually dependent of a buggy proprietary tool, the bot owner won’t tell you much more than “ptxas is crashing, it does not like this (otherwise valid) PTX”.

This is not different from the current state of affairs – we’re already depending on ptxas and other proprietary tools on the CUDA bots. E.g. we can get exactly the same problem now if a CUDA test-suite test would tickle something wrong in LLVM which would make ptxas fail.

Specifically in case of NVPTX and ptxas tests, it would make me, the owner of CUDA bots, extremely curious about what triggered the failure. If there’s a bug, it is very likely that it will bite my users. For what it’s worth, historically, we (Google) found most NVPTX and ptxas bugs on our own code base.

If it indeed turns out NVIDIA’s tool fault, I’ll file a bug with NVIDIA and would work on figuring out how to avoid triggering the bug, or implement a workaround in LLVM. The sad reality is that we do need to be able to work with ptxas versions that are already out there and will not get fixed.

This scenario actually makes it even more important to catch such issues early, before they hit the end users, who would likely encounter it on a much larger code, compared to LLVM tests, and that would make it much harder to reduce and debug.

1 Like

I agree that we should strive for making tests independent of the host (installed proprietary tools). Every introduction of a new proprietary tool must be prudent. The issue is that they are prone to lure new tests checking the wrong layer (action at a distance) and causing contribution barrier for others.

In this particular usage, for the NVPTX target, I agree that there are cases (test coverage gap) with no good replacement for additionally running the proprietary tool ptxas. The use seems fine if the reviewers pays extra attention on new usage (e.g. whether the ptxas invocation is necessary? does it check too much? how likely an updated ptxas will cause the test to adjust? could it cause routine trouble to people contributing general codegen components?). The bot maintainers need to be willing to help deal with issues only on their bots.

1 Like

In my humble opinion, the tests and the code that depends on a proprietary tool should live outside of the repo. If the absence of such a tool prevents the user from running all the test case available on the repo then it will cause issues for many users. For example a linux distribution might rely on the test cases to verify their pre-built package for the LLVM and such tools can easily break their workflows.

Given that I’m wearing most of the hats @MaskRay mentioned, I can address some of these points.

  • whether the ptxas invocation is necessary?
    For ptxas most of the tests are essentially pass/fail syntax checks.
    I would say that they would be essential for the tests where we specifically check generation of particularly hairy instructions (e.g. CodeGen/NVPTX/wmma.py). That said, any PTX produced by successful LLVM compilation (perhaps, modulo inline assembly) must be syntactically valid, so it would be beneficial to run it on all llc tests in NVPTX codegen, so we can cover wider variety of instructions.

  • does it check too much?
    ptxas is primarily used as pass/fail syntax checker. We do not need the code to make sense or to work, just to pass syntax checks.

  • how likely an updated ptxas will cause the test to adjust?
    Rarely, if ever. ptxas is backward-compatible and accepts older PTX syntax.
    It’s possible that at some point NVIDIA will deprecate old PTX versions, but they have not done it so far and do currently support really old versions that predate NVPTX back-end in LLVM.
    Most of the relevant changes will be the new tests for the new PTX versions introduced by new CUDA SDK versions, once we implement their support in clang and LLVM.

  • could it cause routine trouble to people contributing general codegen components?
    Unlikely. We’ve dealt with the old issues uncovered by these tests. Most likely scenario where the issues would pop up is addition of new NTPTX instructions or variants of already existing instructions and the authors would be expected to run these tests before landing the changes.

  • The bot maintainers need to be willing to help deal with issues only on their bots.
    I already do that for CUDA bots, so yes, I am willing to help.

1 Like

Please see above. These tests are conditional on the user explicitly enabling the tool during build configuration. Users who did not opt-in are not affected.

1 Like

Thanks for the reply. I just randomly picked some questions but appreciate that you took time to even answer them individually:)

I just realized the patch author asavonic is also the one who spent significant time on improving lit for the conditional statement support. I know very little about NVPTX but taking a look at D121727 ([NVPTX] Integrate ptxas to LIT tests) the additional RUN lines seem useful. I support the current form of D121727.

So on the one hand, I favor a “live and let live” approach on stuff that is isolated to individual targets.

That said, having looked at the review in question (https://reviews.llvm.org/D121727), it seems to me a not particularly useful change in the first place. As was pointed out, the point of test cases is to verify that the compiler produces the expected output. Piping the output to an assembler and then throwing away the output of that assembler (except for a single bit of information) is almost entirely pointless. Building in support for a propietary and external tool just for that sets a bad precedent IMHO.

If the goal is to improve test coverage of the NVPTX backend, it would be better to change the test cases so that they actually test the compiler’s output. You don’t need ptxas for that. A better approach would be to run the test cases through update_llc_test_checks.py.

In this case we also want to verify that the produced output is valid, which is not always the same as ‘expected’.

The problem is that developer’s idea of the expected output does not always match the reality and the point of this seemingly pointless test is to find such cases.

We do test compiler output. The problem is that we’re testing vs our expectations of the correct output, not the reality as defined by ptxas.

If/when ptxas catches a problem it indicates that there’s both a problem in LLVM and in the tests that either didn’t check for something, or incorrectly considered such produced output valid. Increased coverage is a nice bonus, but the primary purpose is to detect cases where our idea of ‘correct’ is wrong. It’s not a hypothetical problem. This mismatch between the expectations and the reality has happened on multiple occasions.

Off the top of my head, we ran into this problem fairly regularly:

  • on multiple occasions produced symbols with characters PTX does not accept
  • produced debug info in a way ptxas considered invalid (wrong label/section location)
  • ran into ptxas bugs when it could not parse DWARF generated by LLVM
  • generated instructions with types not accepted by ptxas
  • generated address labels in the wrong location.
  • produced assembly directives not accepted by ptxas.
  • generated invalid instruction variants.

Some of the problems were not checked for by regular tests.
Some were checked, but incompletely (nobody expected issues to pop up where they did)
Others were checked and were correct, as far as the developer was concerned, yet produced code was not useful in practice because ptxas disagreed, due to bugs or disagreement with the documentation used to implement new instructions.