Problem with wrong DomTree in BasicAA / ValueTracking

Hi llvm-dev!

By adding some checks that the dominator tree is correct in BasicAliasAnalysis and ValueTracking, such as below, a number of lit tests are failing.

I stumbled upon this problem when analyzing some miscompiles for our OOT target, but I think this also indicates that there could be similar problems in-tree.

Here are some examples of checks that I've played around with:

diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index a0da8b7347ea..adb4b6908ada 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -1099,6 +1099,8 @@ AliasResult BasicAAResult::aliasGEP(
     const GEPOperator *GEP1, LocationSize V1Size, const AAMDNodes &V1AAInfo,
     const Value *V2, LocationSize V2Size, const AAMDNodes &V2AAInfo,
     const Value *UnderlyingV1, const Value *UnderlyingV2, AAQueryInfo &AAQI) {
+ if (DT)
+ DT->verify();
   DecomposedGEP DecompGEP1 = DecomposeGEPExpression(GEP1, DL, &AC, DT);
   DecomposedGEP DecompGEP2 = DecomposeGEPExpression(V2, DL, &AC, DT);

@@ -1699,6 +1701,8 @@ AliasResult BasicAAResult::aliasCheck(const Value *V1, LocationSize V1Size,
/// noalias(V, phi(VA, VB)) if noalias(V, VA) and noalias(V, VB).
bool BasicAAResult::isValueEqualInPotentialCycles(const Value *V,
                                                   const Value *V2) {
+ if (DT)
+ DT->verify();
   if (V != V2)
     return false;

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 303240d03c72..dc2918a0f895 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -257,6 +257,8 @@ KnownBits llvm::computeKnownBits(const Value *V, const APInt &DemandedElts,
                                  const DominatorTree *DT,
                                  OptimizationRemarkEmitter *ORE,
                                  bool UseInstrInfo) {
+ if (DT)
+ DT->verify();
   return ::computeKnownBits(
       V, DemandedElts, Depth,
       Query(DL, AC, safeCxtI(V, CxtI), DT, UseInstrInfo, ORE));
@@ -266,6 +268,8 @@ bool llvm::haveNoCommonBitsSet(const Value *LHS, const Value *RHS,
                                const DataLayout &DL, AssumptionCache *AC,
                                const Instruction *CxtI, const DominatorTree *DT,
                                bool UseInstrInfo) {
+ if (DT)
+ DT->verify();
   assert(LHS->getType() == RHS->getType() &&
          "LHS and RHS should have the same type");
   assert(LHS->getType()->isIntOrIntVectorTy() &&
@@ -393,6 +397,8 @@ unsigned llvm::ComputeNumSignBits(const Value *V, const DataLayout &DL,
                                   unsigned Depth, AssumptionCache *AC,
                                   const Instruction *CxtI,
                                   const DominatorTree *DT, bool UseInstrInfo) {
+ if (DT)
+ DT->verify();
   return ::ComputeNumSignBits(
       V, Depth, Query(DL, AC, safeCxtI(V, CxtI), DT, UseInstrInfo));
}
@@ -548,6 +554,8 @@ bool llvm::isAssumeLikeIntrinsic(const Instruction *I) {
bool llvm::isValidAssumeForContext(const Instruction *Inv,
                                    const Instruction *CxtI,
                                    const DominatorTree *DT) {
+ if (DT)
+ DT->verify();
   // There are two restrictions on the use of an assume:
   // 1. The assume must dominate the context (or the control flow must
   // reach the assume whenever it reaches the context).
@@ -2071,6 +2079,8 @@ static bool isGEPKnownNonNull(const GEPOperator *GEP, unsigned Depth,
static bool isKnownNonNullFromDominatingCondition(const Value *V,
                                                   const Instruction *CtxI,
                                                   const DominatorTree *DT) {
+ if (DT)
+ DT->verify();
   if (isa<Constant>(V))
     return false;

My current suspicion has been that the LegacyPassManager is calculating the "last user" incorrectly sometimes, ending up freeing the DominatorTree/BasicAliasAnalysis passes before the last use. If that is true, then it still is a bit scary that there are stale pointers (?) to an invalid BasicAA/DominatorTree being used later.

One thing that strengthens the suspicion is that if we tell the pass manager that the chained analyses used by AliasAnalysis and BasicAliasAnalysis are "transitive", then the above problem with faulty DominatorTrees aren't seen any longer.

Here is the patch I've played around with to make sure DT is correct:
diff --git a/llvm/lib/Analysis/AliasAnalysis.cpp b/llvm/lib/Analysis/AliasAnalysis.cpp
index f5b62ef06a23..fae7a84332fd 100644
--- a/llvm/lib/Analysis/AliasAnalysis.cpp
+++ b/llvm/lib/Analysis/AliasAnalysis.cpp
@@ -883,8 +883,8 @@ bool AAResultsWrapperPass::runOnFunction(Function &F) {

void AAResultsWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
- AU.addRequired<BasicAAWrapperPass>();
- AU.addRequired<TargetLibraryInfoWrapperPass>();
+ AU.addRequiredTransitive<BasicAAWrapperPass>();
+ AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();

   // We also need to mark all the alias analysis passes we will potentially
   // probe in runOnFunction as used here to ensure the legacy pass manager
diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
index f9275daf1224..a0da8b7347ea 100644
--- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp
+++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp
@@ -1875,9 +1875,9 @@ bool BasicAAWrapperPass::runOnFunction(Function &F) {

void BasicAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
- AU.addRequired<AssumptionCacheTracker>();
- AU.addRequired<DominatorTreeWrapperPass>();
- AU.addRequired<TargetLibraryInfoWrapperPass>();
+ AU.addRequiredTransitive<AssumptionCacheTracker>();
+ AU.addRequiredTransitive<DominatorTreeWrapperPass>();
+ AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
   AU.addUsedIfAvailable<PhiValuesWrapperPass>();
}

There is however some problems with such a fix. Because then the following tests hits assertions when setting up the pipeline:
  LLVM :: CodeGen/Hexagon/bug15515-shuffle.ll
  LLVM :: CodeGen/Hexagon/const-pool-tf.ll
  LLVM :: CodeGen/Hexagon/glob-align-volatile.ll
  LLVM :: CodeGen/Hexagon/loop-idiom/lcssa.ll
  LLVM :: CodeGen/Hexagon/loop-idiom/pmpy-mod.ll
  LLVM :: Transforms/GVNHoist/hoist-mssa.ll
  LLVM :: Transforms/GVNHoist/hoist-newgvn.ll
  LLVM :: Transforms/GVNHoist/hoist-pr20242.ll
  LLVM :: Transforms/GVNHoist/hoist-pr28933.ll
  LLVM :: Transforms/GVNHoist/hoist-recursive-geps.ll
  LLVM :: Transforms/SimplifyCFG/Hexagon/disable-lookup-table.ll
  LLVM :: Transforms/SimplifyCFG/Hexagon/switch-to-lookup-table.ll

So maybe addRequiredTransitive isn't supposed to be used like that?

Or there is something bad with GVNHoist and Hexagon?

I haven't looked any closer at Hexagon, but looked a bit at GVNHoist without really finding anything weird (comparing it with other passes that require AAResults). The assert can be seen if applying the patch above and then running

  opt -gvn-hoist -gvn-hoist -S /dev/null

and it says

  opt: ../lib/IR/LegacyPassManager.cpp:607: void llvm::PMTopLevelManager::setLastUser(ArrayRef<llvm::Pass *>, llvm::Pass *): Assertion `AnalysisPass && "Expected analysis pass to exist."' failed.

Afaict the pass that it fails to find is 'Basic Alias Analysis (stateless AA impl)'. But I do not understand why.

Anyone who knows more about addRequiredTransitive, or how one is supposed to tell the pass manager about nested analysis pass/group dependencies?

PS. This problem has been mentioned a bit in rGb21840751278 , as we started to see miscompiles after that commit and after rGa3614a31c46a . Those commits seem to have exposed the problem with faulty dominator trees a bit more after adding more usage of the sometimes faulty DT.

Regards,
Björn

I’m not familiar enough with the pass manager to say something meaningful here, but the addRequiredTransitive() issue sounds similar to what is discussed in https://lists.llvm.org/pipermail/llvm-dev/2020-November/146923.html.

Regards,

Nikita

Here is an attempt to fix the problem: https://reviews.llvm.org/D94138

Would be nice to add some verifiers (or expensive checks?) to detect
when the DT is wrong, but not really sure how. Calling verify()
every time DominatorTree is used is perhaps a bit costly.
But it might also be that the pass manager should make sure that
this can't happen by invalidating things better when a pass isn't
preserved or when it is removed.

/Björn

(Post)DominatorWrapperPass implement domtree verification in (Post)DominatorTreeWrapperPass::verifyAnalysis when expensive checks are enabled or --verify-dom-info is passed.
I don’t know under what exact conditions verifyAnalysis is actually checked though, so I’m not sure if that’d work in those cases.

-Jakub

From: Jakub (Kuba) Kuderski <kubakuderski@gmail.com>
Sent: den 6 januari 2021 16:24
To: Björn Pettersson A <bjorn.a.pettersson@ericsson.com>
Cc: Nikita Popov <nikita.ppv@gmail.com>; jay.foad@amd.com; llvm-dev
<llvm-dev@lists.llvm.org>
Subject: Re: [llvm-dev] Problem with wrong DomTree in BasicAA /
ValueTracking

(Post)DominatorWrapperPass implement domtree verification in
(Post)DominatorTreeWrapperPass::verifyAnalysis when expensive checks are
enabled or `--verify-dom-info` is passed.
I don't know under what exact conditions verifyAnalysis is actually
checked though, so I'm not sure if that'd work in those cases.

-Jakub

Well. The problem here was that the pass manager incorrectly freed some analysis passes too early. And after that point it no longer verify them, even though later passes both preserved and used the transitively required analyses.

So my thoughts was about how one could spot that an analysis result (such as DomTree) was used after having been free'd by the pass manager. If we could trust that the pass manager is correct it wouldn't be a problem. So how could one assure that discarded/invalide passes aren't used (I don't know exactly in what way ValueTracking/BasicAliasAnalysis could use an invalid (deleted?) DominatorTree without being caught by for example running lit tests with an asan build, but we haven't seen any problems in such runs afaik).

/Björn