DebugInfo optimization tests and automatic CHECK generation

@LebedevRI @jmorse @OCHyams @pogo59

Hey folks - looks like a few tests have been contributed recently that might not be quite right.

coming out of this commit/subsequent discussion/revert: [NFC][DebugInfo] Autogenerate check lines in assignment-tracking/sroa/* · llvm/llvm-project@a33b40d · GitHub

It looks like a bunch of SROA debug info tests are in llvm/test/DebugInfo (I think mostly the tests in llvm/test/DebugInfo are for code generation - I think we usually put debuginfo tests for specific optimizations along with the rest of the tests for that optimization/pass) & these tests may not be following the usual practices for tests of those optimization passes.

I think it might be suitable for these tests to move over to the SROA test directory, and if the other tests there tend to use auto-generated CHECKs, these ones probably should too. (if the infrastructure for that doesn’t work well with debug info IR metadata - might be worth looking into that, or perhaps not auto-generating checks for these tests for now)

First of all – sorry. That disscussion could have gone better, but not just on my side.
Somehow having to deal with such manual tests just send me
into state of disbelief, which didn’t help there at all… Sorry.

Even ignoring the question of autogeneration, those tests are really fragile,
and improvements to the passes they cover causes the IR they are covering
to change in unobvious ways, and it is not at all obvious how to preserve
the coverage they were intending to test.

Roman

Yeah, sorry here too. I can get touchy too easily. I do try to watch it but there’s a reason I’m sometimes known as Mr Grumpy.

I have not been watching the assignment-tracking work at all, unfortunately. The state of test/DebugInfo has always been a little confused; if you believe in the naming convention, that should be where the tests for lib/DebugInfo go, which is the dumper code. But the tests for production of debug info have kind of awkwardly been split between test/DebugInfo and test/CodeGen. We pretty much just live with it.

However, that’s the codegen side, and here we’re talking about IR transforms. I think it’s totally awesome that we are working on making that better for debug info, but I agree with @dblaikie that the IR optimization + debug-info tests really belong with the optimizer tests.

I will observe, however, that these tests are not intended to test what the optimizer does with instructions, but what it does with debug info. Making that happen, without also making them too fragile for middle-end people to adapt them as needed, may take a little effort.

I can see some advantages to the autogeneration, but those advantages mostly come up after the initial run of the generator scripts. Once that initial run is in place, then future updates are deltas of the generated checks, and those are relatively straightforward to review/validate. Getting a test to a point where autogeneration can be used, however, may take some doing.

Yeah, advice/info that’d be helpful from optimization experts could include tips on the most reliable/least likely to change in the future instances of these optimizations. Ways to constrain the example so much that it basically only can do the core functionality of the optimization.
Likely that’s enough for the debug info work, and less likely to be perturbed by improvements/changes in the optimization.
That said, if there’s ways to fit the optimization debug info testing into the autogenerated check world, that’s cool too.

I’ll repeat a few points i’ve made:

  • These debuginfo things seem to be missing from LangRef
  • I’m worried that more of such tests will appear, and make the problem worse.
  • Given the current UTC output for these tests, what, specifically, isn’t being checked?

Which things in particular? Ah, the llvm.dbg.* intrinsics? Yeah, that does seem to be missing. The DI* metadata look like they’re documented. & lots of this is documented in more detail over in Source Level Debugging with LLVM — LLVM 16.0.0git documentation

At the very least, DIAssignID metadata. Perhaps i’m not looking in the right place?
@pogo59’s comments seem to imply that there are
much more things that aren’t being handled. What are they?

Looks like this is documented here: Debug Info Assignment Tracking — LLVM 16.0.0git documentation (& some mention in SourceLevelDebugging and HowToUpdateDebugInfo)

But yeah, maybe at least some breadcrumbs from LangRef for intrinsics and the new DI* nodes, etc.

For what it’s worth, I agree that these tests should be auto-generated. However, maybe we can improve UTC to make it at least a bit more amenable to the debuginfo case. For debuginfo it is important to check the metadata, but the only way to currently do this is --check-globals, which will also add check lines for globals and attributes, which are likely not relevant. A dedicated --check-metadata flag could help reduce the noise in the generated check lines. This would also be helpful for some other tests, e.g. those checking TBAA metadata.

Unfortunately, debuginfo metadata is very noisy, but at the same time I don’t have ideas on how we can extract only the “relevant” parts in an automated way. Generally though, we have a very strong preference towards being able to regenerate tests over having less noisy tests. In most cases, it is clear from the test diff whether a change is harmless or not, especially if the test contains a comment on what it is testing.

Somewhat orthogonal to the issue at hand, but I strongly agree that we should be adding comments on what the test is trying to test. When merging downstream, I often see minor test output changes and I always have to dig through various layers of git blame+review comments to actually figure out if this is fine or something that needs fixing on my end.
Adding test descriptions would also be helpful to review whether a test could be further reduced to avoid unnecessary noise (which is more likely with autogenerated test lines).

In the patch that got reverted, in at least one case there were comments to that effect. However, by applying the script, those comments were no longer associated with the checks they were describing, and any hope of understanding what was going on was therefore lost.

This leads me to say: Converting an existing manual test to autogenerated must not be done without review. It was only by chance that we caught the missing DIAssignID problem, as well as the detached-comments problem, because that commit was not reviewed.

Given that there is active work on improving optimized debug info, more such tests will definitely appear. The current discussion is directly about how to keep it from becoming worse.

I know there are lots of kinds of debug-info metadata, and more are being proposed, for example in [RFC] Heterogeneous Debug Info. I don’t work with IR much so I don’t have specifics. I only know enough to be worried about it. :slight_smile: And that the idea of applying the scripts to debug-info tests is completely new, therefore the behavior of the scripts in that context is uncertain and needs to be validated before we start relying on it too much.

1 Like

Hm, I think you have the cart before the horse here.

I’ll rephrase this in the order that makes sense to me. See if you agree.

As a project, we rely on the use of automatic test updates for bulk updates. Not all tests use this mechanism, but we are generally moving toward it as needed to enable bulk updates for projects like opaque pointers. Given this, contributors of new IR features should be responsible for ensuring their work is compatible with the update scripts. This isn’t anything unique to the update scripts; we generally expect new contributions to work well within the ecosystem without requiring specialized knowledge from other contributors. Other examples would be requiring extensions be documented in LangRef, and requiring lit testing in the first place.

I disagree. This change was reviewed - via post commit review.

I don’t think there is anything in this discussion which justifies a special case in our general review standards. It is up to the author of the change to decide if something is trivial. Most autogen changes are.

You can argue that the author of this change should have submitted it for review without arguing that all auto-gen changes should be submitted to review.

Does this risk lost coverage? Yes. In the exact same way we risk an NFC change causing a bug. We choose to allow the risk to enable development.

TOTALLY BY CHANCE. Have I not emphasized this enough? I don’t do enough reviews, I know that, and I looked at this one because I had an idle moment. That’s the only reason we caught it.

Just to be clear: If a test already uses auto-gen, fine.

I am saying that converting an existing test to auto-gen must be done with especial care, because blithely believing that the scripts are a magical “source of truth” is failing to understand the whole testing process. Verifying that a modified test continues to validate what it was intended to validate is a key part of that process.

Clearly where the patch that started all this is concerned, the author failed to take that care. That we happened to notice this one does not fill me with confidence that we always will; rather the opposite.

1 Like

Hmmm – we definitely could have communicated the changes adding DIAssignID better, and missed adding it to the LangRef. I think because it’s not something that’s on by default we figured no-one would “see” them, which obviously isn’t true for tests, sorry about that. We’ll try to get proper documentation in there shortly, or can back out for now.

Enabling auto-update of this kind of metadata is straight forwards, although unlike other metadata attachments each DIAssignID needs to have its distinctness checked, and so needs to appear as a global check like with --check-globals. This isn’t hard; converting the assignment-tracking/sroa/* tests will require some restructuring and rewording though to preserve the test explanations and commentry.

I’ve found auto-updating both useful and awkward in the past; as other commenters have pointed out it’s highly valuable to have an English language explanation of what property is being checked by the test. Auto-update tests that don’t have explanations make it much more difficult to identify the property that needs to be verified, properties that can be reasonably subtle in uncommonly seen metadata.

I’ve started looking at some of the past conversions to autogenerated tests, and some of them have lost important CHECK lines. For example in c6d7e80ec4c17 llvm/test/Transforms/SROA/dbg-inline.ll , two different DILocation nodes are checked for, that one has an inlinedAt attribute and the other doesn’t. This was lost in the conversion to automatically generated CHECK lines. I’ve added --check-globals in 2b1d45b227bd, which is pretty ugly and doesn’t communciate what feature is being checked for, but at least preserves coverage.

6a77477d5354 deletes a check for a DILocation in llvm/test/Transforms/SROA/alignment.ll , the intention behind which isn’t clear and I’m still trying to work out. I’ve also seen some non-debug-info metadata disappear in these upgrades (73cb542930b and llvm/test/Transforms/SimplifyCFG/no-md-sink.ll) which I’m less able to address.

I think this shows it’s vital that these changes are reviewed pre-commit, seeing how we’ve lost test coverage that would have been easily spotted and could have been easily addressed in advance.

Thanks very much @dblaikie for bringing this up. Sorry it’s taken me a while to respond, I went on PTO just before this happened.


I appreciate that those tests in particular can be fiddly to update. FWIW I am happy to help update these tests if this situation reoccurs. And just in general, if I saw someone cc the debug-info group on phabricator asking a question about a test I would endeavour to help.


I’ve added a patch to update LangRef here:
https://reviews.llvm.org/D141131


+1
I’m thankful this was caught and discussed but I can imagine something like this slipping by unnoticed in the future. I feel the same way about this whole situation as @pogo59 and @jmorse. I don’t think I can add anything to the discussion about whether this should be an autogen-check test, or whether post-commit review is/was enough in this instance.

+1

I repeat: Converting a non-auto-generated test to auto-generated must be reviewed by a subject-matter expert. And I hate to say it but most people who spend their days thinking about optimizations don’t actually pay much attention to debug info. “Dropping metadata is always correct” doesn’t mean that specific checks for debug-info metadata can be thrown away as useless.

Furthermore, the scripts historically haven’t cared about debug info and blithely assuming they will always do-the-right-thing is demonstrably incorrect.

I think such review is a chancy thing. I have no hard data but I do skim cfe-commits and llvm-commits more or less on a daily basis, and post-commit review replies are incredibly rare. It may be that post-commit review essentially never happens; it may be that post-commit review happens but the reviewer essentially never finds anything to say. We have no data.

My intuition is that post-commit review is actually pretty unusual, something that we give lip-service to but don’t have any means of tracking. Given the number of comments that phabricator reviews draw, even for people who are comfortable committing without review, it seems highly strange to have most non-phab-review commits go in with no comment at all.

(I’m not counting reverts, which are almost entirely due to breakage rather than review.)

Since it’s now been roughly a month, let me ask two questions.

  1. Have the debug intrinsics/metadata at hand been documented in LangRef?

  2. Have the update scripts been fixed to correctly handle debug intrinsics/metadata?