Best practices for using update_*_test_checks.py

Hi,

I've been playing around with the update_mir_test_checks.py utility lately,
and it seems like a useful tool. However, I think the convenience of
the tool makes it easy to use it to generate checks without actually
verifying the checks are correct.

This actually happened to me when I first started using it, and I almost
sent out a patch with a bug. I also think this may have been the culprit
with r332452 which was fixed by r332531.

I think it would be good to develop best practices for using these scripts to
help developers and reviewers to avoid these kinds of mistakes.

I personally don't think these scripts should be used at all unless the
code has been extensively tested in some other way, like with an external
test suite, but I am not all that familiar with these scripts and their
use cases so maybe that is too extreme. I'm curious what some of the
more frequent users of these scripts think about this.

-Tom

Hi,

I've been playing around with the update_mir_test_checks.py utility lately,
and it seems like a useful tool. However, I think the convenience of
the tool makes it easy to use it to generate checks without actually
verifying the checks are correct.

These tools simply parse the functions out of the the llc/opt/etc output,
deduplicate it between prefixes somewhat, and update the check lines.
Nothing more.

When developing, you basically have to do N steps:
1. Actually come up with the tests
2. Fill them with the current (pre-patch) output
3. Do the patch itself
4. Update the check lines in tests
5. Verify the results
6. maybe go back to step 3, or even 1.

Steps 2 and 4 are monkey's work. One could just as easily mindlessly
manually update those check lines. Having to do that by hand does not
ensure that the person actually reads and verifies the the text [s]he copies.

In fact, i'd argue the opposite is true. If you have to do that by hand,
eventually you will get bored of it, and just dumbly copy it, without even
trying to read it. But if all you have to do is run the script, it, you may
read the test changes output. Additionally, this really simplifies coming
up with the new tests, which in turn may expose more issues.

In other words, it always boils down to the human factor.
Banning these scripts will not help with anything,
other than creating more monkey work.

This actually happened to me when I first started using it, and I almost
sent out a patch with a bug. I also think this may have been the culprit
with r332452 which was fixed by r332531.

I think it would be good to develop best practices for using these scripts to
help developers and reviewers to avoid these kinds of mistakes.

I personally don't think these scripts should be used at all unless the
code has been extensively tested in some other way, like with an external
test suite, but I am not all that familiar with these scripts and their
use cases so maybe that is too extreme. I'm curious what some of the
more frequent users of these scripts think about this.

-Tom

Roman.

I’m new to working with the LLVM tests and until recently wasn’t aware of the expectation to go through the steps above. I would have found knowing about it and especially the rationale for it helpful. So I think Tom’s suggestion to put together some guidance for the use of these scripts is a good idea.

That said, I also fell into the trap Tom describes and missed a bug I had introduced because I relied on the update_test_checks.py script and didn’t notice the problem in the expected IR it emitted (the bug was pointed out in D123816). If I had instead written the expected IR myself there’s some chance I might have noticed it. In another change, though, I managed to miss the trailing colon after the CHECK directives in a few of my tests I wrote by hand (and noted a few others in PR54679). These latter problems would have been avoided by using the script.

So I would suggest to include besides the best practices a mention of both of these pitfalls (as well as any others I’m yet to fall into) but leave it up to each test author to decide whatever approach works best for them. What matters most is that the tests are correct, comprehensive, and maintainable. How each of us goes about developing such tests will likely depend on our experience and individual preferences.