Enabling opaque pointers by default

Since the last status update the opaque pointer migration has reached the point where both LLVM and Clang fully support opaque pointers (modulo some minor pending patches). I think it is time to talk about enabling opaque pointers by default.

The core problem with enabling opaque pointers is that they have major impact on tests: At least any tests with pointer types in their CHECK lines will have different output when opaque pointers are enabled (even if the difference is only cosmetic). As a rough estimate, there are 2.3k affected Clang tests and 4.5k affected LLVM tests. Updating so many tests in an atomic enablement commit is unrealistic, and would cause massive churn on reverts (and I do expect this to go through some revert cycles).

Something worth noting is that code that works with typed pointers will usually work with opaque pointers, but the converse is not true (this is because typed pointers require correct bitcasts, while opaque pointers don’t). As such, I think our general approach should be to enable opaque pointers on the Clang side first, so that the primary consumer no longer cares about typed pointers, and then mass-migrate tests to use opaque pointers.

There’s a few ways I could see this playing out. One of them would be:

  1. Annotate all affected Clang tests with -opaque-pointers=0.
  2. Enable opaque pointers in Clang by default.
  3. Gradually remove -opaque-pointers=0 from Clang tests while updating test expectations.
  4. Gradually switch LLVM tests to use opaque pointers.
  5. Globally enable opaque pointers by default in LLVM.

Another one would be based on ⚙ D122488 [OpaquePtrs][Clang] Add -normalize-opaque-pointers cc1 option, which adds an option for performing codegen with typed pointers and then rendering the final result with opaque pointers, with the intention that we still test typed pointer bitcast consistency, but make the final result the same as if opaque pointers were used.

  1. Gradually migrate all affected Clang tests to -normalize-opaque-pointers.
  2. Enable opaque pointers in Clang by default.
  3. Gradually switch LLVM tests to use opaque pointers.
  4. Globally enable opaque pointers by default in LLVM.

This one has the advantage that ability to test Clang with typed pointers would be retained, at least to some degree. However, I’m not really certain how well the “typed codegen rendered as opaque” to “opaque codegen” equivalence would work in practice, it’s possible that there would still be differences in results.

I’d appreciate some input on the preferred way to handle this. As discussed previously, the general plan here is that once opaque pointers are the default, typed pointer support will be maintained on a best-effort basis until the LLVM 15 release branch – that is, we’ll not go out of the way to break it, but also aren’t responsible for keeping it working and (most importantly) tested.

7 Likes

Why is bulk updating the tests unrealistic? I assume 95+% of tests can be updated with a simple script. Is “gradually switch” something you want to just stretch out over a matter of weeks, running it over a subset of tests every few days? The test update to require types on sret/bval was a lot smaller than this, and wasn’t nearly as bad as I expected.

I think the first option is a bit less ugly as it avoids inventing new machinery.

1 Like

A significant fraction of tests can indeed be updated automatically, and I’ve done some initial experiments on that. Unfortunately, this is a lot less reliable than one would like – it’s not just a matter of replacing pointer types and intrinsic mangling, you also need to remove bitcast and zero GEP constant expressions and bitcast instructions (and sometimes but not always zero GEP instructions). This gets further complicated by many ad-hoc CHECK lines (especially on the Clang side) that only match some part of an instruction. Automation helps, but a non-trivial amount of manual intervention will be involved here.

The thing I’m primarily concerned about is that test updates don’t need to be done in the same commit that switches the default. In terms of timeframe, the clang side tests can probably be updated within a week – but I think it would be good to only do that once we have some confidence the default switch will not be reverted, and have the ability to commit test updates in smaller batches than 2.5k at once.

I tend to agree.

I hadn’t considered that typed pointer support in clang could potentially bitrot within days of making all clang tests only use opaque pointers. I think letting people who track Clang head with downstream patch who haven’t been keeping up with opaque pointers have around a month to temporarily pin to typed pointers and update their internal usages seems reasonable. So given that, sprinkling the -normalize-opaque-pointers option on existing clang tests makes sense until we decide that clang support for opaque pointers is no longer necessary. But I think -normalize-opaque-pointers should also force typed pointers so that, again, typed pointer support in clang doesn’t potentially bitrot immediately.

The aforementioned typed pointer support bitrot.

I think if -normalize-opaque-pointers forces typed pointer codegen, then just doing -opaque-pointers=0 would make more sense to me. I’d only see value in the normalization option if it allows testing with both opaque pointers and typed pointers by flipping the internal default, so interested parties could easily keep running tests with typed pointers.

I think that it would also be fine to use -opaque-pointers=0 tests, switch the default and leave that state around for a month, before we update tests. This does run the risk of breaking the default configuration, because there are a few rare instances where “works with typed pointers implies works with opaque pointers” doesn’t hold up even on the clang side (this is almost always due to code with a hardcoded assumption that a bitcast will be present).

Sounds good to me.

But I think -normalize-opaque-pointers should also force typed pointers so that, again, typed pointer support in clang doesn’t potentially bitrot immediately.

As long as we have a buildbot testing with opaque pointers off, I think that addresses this objection.

By using the normalization step, we can have the ability to run all the tests in EITHER mode. And thus, via buildbots, we can run tests in BOTH modes, which avoids the bitrot and ensures valid test-coverage for opaque-pointers-enabled mode.

I wonder how many tests actually “make sense” when converted to the opaque form. I’d imagine there to be at least a couple of tests that guard against some kind of regression that can only be realistically exposed via typed pointers (or requires extra steps to trigger in opaque’ed IR).

I did some additional testing with the -normalize-opaque-pointers approach, and unfortunately I don’t think this is going to be viable after all. My assumption here was that -opaque-pointers and -normalize-opaque-pointers would produce the same result in nearly all cases, but this turns out to not be the case.

For example, for a set of 311 OpenMP codegen tests passing -normalize-opaque-pointers, 85 of them fail with -opaque-pointers. There are various reasons for this. One is differences in constant expression folding. Another is differences in value names – and in basic block names in particular, which normally do not use FileCheck placeholders.

I don’t think the normalization approach makes sense if it produces different output in such a significant fraction of tests. So I think we should go with the first option. Based on the feedback, this would go as follows:

  1. Add -no-opaque-pointers to Clang tests that are affected by opaque pointers.
  2. Enable opaque pointers in Clang by default.
  3. Wait a while so people following tip-of-trunk can incorporate the change and report any issues.
  4. Gradually remove -no-opaque-pointers from tests while updating check lines. At this point typed pointer support will likely start to bitrot (but we’ll still accept patches for fixing it).

As a heads up, all the preliminary work here has finished and I plan to land the default switch (⚙ D123300 [Clang] Enable opaque pointers by default) on Monday. Organizations that need to temporarily opt-out of the change can do so using the -DCLANG_ENABLE_OPAQUE_POINTERS=OFF cmake option, or by passing -Xclang -no-opaque-pointers to Clang.

2 Likes

Congrats on landing this!

Regarding testing with -DCLANG_ENABLE_OPAQUE_POINTERS=OFF, I think this will be fairly similar to the new pass manager transition where it will quickly be infeasible to keep check-clang clean under both opaque pointers off by default and on by default as people add tests that don’t specify -Xclang [-no]-opaque-pointers.

That’s pretty easy, just have a buildbot with that config?

To close the loop here, the change to enable opaque pointers by default in Clang landed on Monday, Apr 11th. Thanks to everyone who has been reporting bugs with the new configuration!

I’d like to get a rough idea on when we should start migrating Clang and LLVM tests to use opaque pointers, in preparation for switching the default on the LLVM side as well. It would be great to know which LLVM consumers that follow tip-of-tree development currently use -DCLANG_ENABLE_OPAQUE_POINTERS=OFF while finalizing their downstream migration efforts, as we’d probably want to at least wait for those to finish first.

We (Unity) don’t - but then again we’re not using Clang (on the Burst team) so that is unsurprising!

We still haven’t turned on opaque pointers due to some performance regressions which I’m still in the process of investigating.

We saw clang crashes when opaque pointers were enabled for our production clang build using instrumented PGO profiles. We are still investigating the root cause.

1 Like

Verified the issue we saw is the same as Assertion in file SelectionDAG.cpp:1454 when trying to lower @llvm.masked.scatter with opaque pointers on X86 · Issue #55021 · llvm/llvm-project · GitHub.

Apologies but I still need more time to sort out the fallout from turning on opaque pointers in clang.

Just for anybody curious

  1. Opaque pointers have uncovered what I believe to be a miscompile in GVN, currently investigating
  2. Some real races newly reported by tsan
  3. Some benchmarks were getting optimized away :slight_smile:
  4. Mysterious deadlocks in large integration tests
  5. After wrangling with internal performance testing infrastructure, aside from benchmarks affected by loop unrolling thresholds (balanced positives and negatives), found one ThinLTO ARM benchmark acting weird where the instruction count went down but the cycle count went up (setting function alignment didn’t change it)

The GVN issue was fixed.

Mysterious deadlocks turned out to be a LoopVectorizer issue, that was fixed.

@MatzeB did either of those help?