[RFC] A New Pass as an Alternative to DependenceAnalysis

Summary

There are several issues with the DependenceAnalysis (commonly referred to as DA) pass. This RFC proposes introducing a new pass as an alternative to DA, rather than attempting to fix the existing implementation. While the new pass may offer reduced analytical capabilities, it is expected to be simpler and easier to reason about in terms of correctness.

Background

A recent investigation into the DependenceAnalysis (DA) pass has revealed several bugs. While some of these are minor, others are fundamental in nature. These issues prevent DA from being enabled by default. I’ve been working on addressing them, but as a result, I’ve come to believe that re-implementing a new pass might be easier than fixing the existing one.

What DA does

To clarify the details of the issues, let me briefly explains what DA does. The algorithm underlying DA is described in the paper Practical dependence testing.

DA analyzes loops and detects loop-carried dependencies between two memory operations.

Consider the following loop:

for (I = 0; I < 100; I++) {
  A[2 * I + 0] = 0;
  A[2 * I + 1] = 1;
}

To explicitly indicate that the two instructions may belong to different iterations, we denote the first store as A[2 * I0 + 0] and the second as A[2 * I1 + 1]. DA analyzes under what conditions – specifically, the relationship between I0 and I1 – the two stores may access the same memory location. In this case, 2 * I0 + 0 always yields an even number, while 2 * I1 + 1 always yields an odd number. Therefore, DA concludes that there is no dependency between these stores.

Here is another example:

for (I = 0; I < 100; I++)
  for (J = 0; J < 100; J++) {
    A[I][J + 1] = 0;
    A[I + 1][J] = 1;
  }

Similarly, we denote the first access as A[I0][J0 + 1] and the second as A[I1 + 1][J1]. In this case, there exists loop-carried dependency between these stores, e.g., both of them write to A[1][1] at some point. However, based on the relations I0 = I1 + 1 and J0 + 1 = J1, DA can deduce the dependency distances, which in this case are denoted as [1 -1]. These relations also imply the dependency directions, represented as [< >]. That is, DA analyzes not only the existence of dependencies but also their distance and direction (IIUC, LoopAccessAnalysis should perform similar analysis as well).

As demonstrated above, DA analyzes the relationships between subscripts (or indices) for each dimension when accessing multi-dimensional arrays (including 1D arrays). What’s important here is that DA handles inequalities and performs algebraic transformations.

Issues in DA

In short, the core issue lies in the gap between the mathematical foundations of the underlying algorithm and the semantics of LLVM IR. The main bugs that have actually been observed are as follows. Note that the analysis is based on SCEV, where subscripts are generally represented using SCEVAddRecExpr.

Subscript wrapping not accounted for

This is arguably the most critical issue. The underlying algorithm of DA, as described in the referenced paper, assumes that each subscript is either loop-invariant or changes by a constant step during loop execution. In other words, it treats the domain of subscripts as the set of all integers. However, in LLVM, the index type has a finite bit width, so subscripts may wrap around, violating the algorithm’s assumptions and potentially leading to incorrect results. At a minimum, we must ensure that subscripts do not wrap during loop execution.

Several factors further complicate this issue:

  • DA also analyzes expressions of the form %A * %i + %B , which would be like {%B,+,%A}<%loop> in SCEV, where %A and %B are loop-invariant and SCEVUnknown. Handling non-constant values introduces additional complexity.
  • In expressions like {%B,+,(%X * %Y)}<nsw><nuw><%loop>, even if the AddRec itself doesn’t wrap, %X * %Y may still wrap. It remains unclear how such cases affect the correctness of DA.

Mixed interpretation of signed and unsigned

DA generally treats values as signed. However, certain values, such as BTC, are typically should be interpreted as unsigned. Due to this inconsistency, inequalities like x < BTC (in the mathematical sense) may be evaluated using x <s BTC. When BTC holds a large value (i.e., out of range for signed integers), interpreting it as signed can cause it to appear negative, potentially leading to incorrect result.

Intermediate computation overflow

Similar to subscript wrapping, intermediate computations during analysis can cause overflow. Presumably, explicit overflow checks should be inserted for almost all arithmetic operations using APInt in DA. This issue becomes even more complex when dealing with SCEVUnknown as well (As an experiment, I tried inserting ScalarEvolution::willNotOverflow somewhat arbitrarily, then many tests failed…).

Parametric delinearization

In LLVM, memory accesses are sometimes linearized, even those targeting multi-dimensional arrays (especially, if I understand correctly, once the migration from GEP to ptradd is complete, all accesses should be linearized).
Hence DA performs a process called delinearize.For example, given an access to i8 in the form of %base + (%i * 100 + %j * 10 + %k), where %i, %j and %k are IVs, by delinearizing, DA interprets it as an access like %base[%i][%j][%k] to 3D array of the size i8[?][10][10]. By doing so, instead of analyzing the subscript expression %i * 100 + %j * 10 + %k directly, DA can treat %i, %j, and %k separately. This simplifies the analysis and improves precision. Range checks like 0 <= %j < 10 and 0 <= %k < 10 are required, and these are already performed in the existing DA. Note that the outermost dimension is written as [?] because the current implementation does not perform checks on that dimension.

Similarly, the presence of SCEVUnknown complicates this issue as well. For example, an access like %base + (%i * %M * %N + %j * %N + %k) may be interpreted as an access like %base[%i][%j][%k] to 3D array of the size i8[?][%M][%N]. While this may appear correct at first glance, issues can arise if %M * %N wraps.
For example, suppose the index type is 8-bit and (%M, %N) = (2^4 - 1, 2^4). Then the original access would be:

%i * %M * %N + %j * %N + %k = -2^4 * %i + 2^4 * %j + %k = 2^4 * (%j - %i) + %k.

This means that when %i = %j, the same address is accessed, which implies that %i and %j cannot be treated independently.


Proposal

Now I think that it’s impractical to address all of the issues described above. Therefore, I propose introducing a new pass from scratch. For convenience, this RFC refers to it as NewDA.
The initial goal for NewDA is to achieve default enablement.
To be clear, there is currently nothing like prototype – this is still in the conceptual stage.

NewDA is essentially intended to be a simplified subset of the existing DA except for monotonicity check and unification to signed interpretation.

Add subscript wrap check (monotonicity check)

As described above, we need to guarantee that subscripts doesn’t wrap. For convenience, this RFC refers to the property as monotonicity. There has been an attempt to introduce this into the existing DA (https://github.com/llvm/llvm-project/pull/154527), but progress has stalled due to the discovery of new corner cases and test regressions. NewDA will incorporate this (or similar) mechanism from the outset.

Unify all integer interpretation as signed

To simplify the implementation, all interpretations should be unified as either signed or unsigned. From a correctness standpoint, both are acceptable, but signed interpretation currently seems easier to implement. As a corollary, the analysis should bail out if DA cannot prove that BTC is non-negative (in signed interpretation).

Features to be removed in NewDA

Subscripts containing SCEVUnknown

Subscripts containing SCEVUnknown as its sub-expression makes monotonicity check and wrap checks difficult. In NewDA, we avoid handling subscripts like {0,+,%X}<%loop> and instead restrict the analysis to simpler forms such as {0,+,42}<%loop>, making these checks easier.

Parametric delinearization

As described above, there are issues with parametric delinearization. While this feature doesn’t necessarily need to be removed if we can prove no-wrapping, for simplicity, I’d prefer to remove this feature in the initial implementation.

Constraint propagation

DA generally treats each dimension of an array independently. However, there is a mechanism called constraint propagation, which reflects information obtained from one dimension onto others. This involves rewriting subscript expressions. Although this behavior hasn’t been thoroughly investigated, such rewrites may break the monotonicity of the subscripts.

Notably, there exists a regression test named Propagating.ll, which appears to test this feature. However, as far as I tried, constraint propagation was not triggered in this test (as confirmed this from the debug output). This process may not be particularly useful in the first place, and it’s possible that omitting it would cause little to no issues.

Several test routines

DA performs dependence analysis by invoking a series of dependence-testing routines in order, such as xxxSIV, xxxRDIV, and others. I’d like to remove some of them, as they may affect correctness or offer limited utility.

At this point, I’m considering removing the following in NewDA:

  • symbolicRDIVtest: I think symbolic analysis would be less useful in NewDA.
  • banerjeeMIVtest: It effectively rewrites subscripts, which may violate monotonicity.

Comparison with existing DA

  • Pros
    • Simpler implementation, making correctness easier to verify (at least than existing DA).
    • Easier to maintain.
    • May reduce compile time.
  • Cons
    • Lower analytical capability.
      • It’s unclear how much this will impact real-world use cases.

Notably, even if all current bugs in DA were fixed, the following issues would likely remain:

  • I think it’s difficult to reliably prevent similar bugs from occurring in the future.
  • Adding the missing validations may increase compile time.

Migration

Currently the following transformations use DA:

  • LoopInterchange
  • LoopUnrollAndJam
  • LoopFuse

I don’t intend to replace all of these uses with NewDA immediately. At the moment, I plan to replace only LoopInterchange with NewDA. Also, to make it easier to switch between DA and NewDA, I plan to derive the API of NewDA from that of the existing DA.

Alternatives

Remove certain features from the current DA

I’d prefer to avoid making changes while carefully avoiding regressions across all passes using DA.

Attempt to fix all issues in the current DA

I think it is impractical, but I’m open to suggestions if anyone has ideas on how to address the issues with DA mentioned above.

4 Likes

Thanks for the proposal, and detailed investigation and work on DA. Without referring to the specifics of DA, I don’t think there’s a single pass that’s been turned back on after being turned off for years, and would argue that things like NewGVN should be removed from the tree.

I would actually question the relevance of ā€œPractical Dependence Testingā€, and wonder if going down the same route of implementing ideas from it is the way to go. We actually have an interesting success story in the form of LoopAccessAnalysis, which is remarkably good at generating runtime checks. LAA currently suffers from not being able to analyze outer loops, or multi-indexed memory, but I think these problems may be solvable within LAA itself, mainly due to its small size. Therefore, it might make sense to start from LAA, instead of starting from scratch and add yet another dependence analysis to the tree?

In my understanding, LAA is specialized for vectorization, whereas DA aims for something closer to polyhedral analysis. Since the motivation of loop transformations is not limited to vectorization (e.g., loop distribution is useful for reducing register pressure), having separate passes for different purposes is reasonable, even if there is some overlap in functionality. At least, IIUC, LAA assumes that the program order doesn’t change, which I believe makes it unsuitable for general loop transformations. Anyway, I don’t know much about LAA, so there might be a smart approach.

I agree that we don’t need to follow it strictly. To begin with, even if NewDA is implemented, I don’t intend to port all of DA’s functionality to it. That said, apparently, some of the relatively simple algorithms implemented in DA can cover a fair number of basic cases. While I don’t think we should be constrained by the existing implementation, I believe it’s worth reusing them.

Rewriting DA from scratch is a big task and be looking forward to see it and not go the way of NewGVN.

Unify all integer interpretation as signed

This would mean bail out on any for (size_t i = 0; i < n; ++i). The pattern is too common to just ignore.

Have you considered adding a bit (i.e. i32 → i33) for unsigned values? Since the dependence-test is a yes/no/maybe answer, that wouldn’t be visible from the outside.

Parametric delinearization

I think the multi-dimensional array issue can only be satisfyingly solved with additional information from the frontend. I would suggest to plan for an ability to use such information, such as structural GEP.

Subscripts containing SCEVUnknown

I think handling of some SCEVUnknown that are invariant to the loops is necessary for a lot of use-cases. This is often a function parameter. E.g., we want to be able to determine that A[x] and A[x+1] do not alias.

Several test routines

:+1: Additional tests can always be added into a working implementation

Constraint propagation

Does this include removing assumptions? Emitting runtime-checks from SCEVPredicate would allow applying an optimization only for such cases where the contraints are true. E.g. to allow an optimization for unsigned types, as long as its value at runtime remains in the ā€œnon-negativeā€ range.

It needs more careful thought, but for example, if we know that the subscript consists only of ā€œpositiveā€ values (like 2 * i + 1), then proceeding with the analysis in the unsigned sense might be fine. But I’m not entirely sure what happens in a case like for (size_t i = 0; i < n; i++) { A[2 * i - 1] = ... }, where the subscript can be ā€œnegativeā€.

As an alternative, simply doing loop-versioning with a runtime check like n < 2^63 could be a viable solution.

If all the related values (e.g., the coefficients and constants of the subscripts) are constants, I feel that simply increasing the bit width might be a reasonable option.

What I mean by ā€œparametric delinearizationā€ here is something like A[i*M*N + j*M + k], where M and N are not constants. More concretely, I’d like to exclude the cases like those in DADelin.ll.

I agree with you that additional information would be needed to handle multi-dimensional fixed-size arrays. Anyway, if we actually end up implementing NewDA, I plan to separate this part of the logic from the other parts, same as in the existing DA.

EDIT : Even though I think such information is necessary to solve the delinearization problem completely, I’m still not sure whether such a ā€œcomplete solutionā€ is really needed. I believe we can delinearize basic cases with heuristics, and for now that might be sufficient.

That makes sense. When an AddRec has SCEVUnknown as its sub-expression, things get complicated, so I’d like to avoid that. However, if the entire subscript is loop-invariant, it seems we should take that into account.

No. I was referring to DependenceInfo::propagate. It may call addToCoefficient and/or zeroCoefficient, which I suspect breaks monotonicity assumptions.
FWIW, the current DA has the capability to record SCEVPredicate, but it does not support emitting them. I think these kinds of features need to be properly supported…

Actually, LAA is also used by some non-vectorization-related passes like LoopVersioning, LoopFlatten (not turned on by default), and LoopVersioningLICM (not turned on by default). Yes, it was initially developed for the purposes of vectorization, and that has given it a focus. To illustrate the fact that it is really a dependence analysis, see this code comment:

  // Rationale:
  // We basically want to check if the absolute distance (|Dist/Step|)
  // is >= the loop iteration count (or > MaxBTC).
  // This is equivalent to the Strong SIV Test (Practical Dependence Testing,
  // Section 4.2.1); Note, that for vectorization it is sufficient to prove
  // that the dependence distance is >= VF; This is checked elsewhere.
  // But in some cases we can prune dependence distances early, and
  // even before selecting the VF, and without a runtime test, by comparing
  // the distance against the loop iteration count. Since the vectorized code
  // will be executed only if LoopCount >= VF, proving distance >= LoopCount
  // also guarantees that distance >= VF.

I think it’s possible to rework LAA to lift any fundamental limitations: what’s currently missing is a motivating use-case, as vectorization mostly works. I’m advocating this route solely for practical reasons:

  1. LAA has excellent test coverage via its users. It’s always better to consolidate related functionality, as we’d have more users exercising the same functionality, surfacing any miscompiles sooner.
  2. Making LAA do outer-loop analysis would unblock our long-standing goal of having outer-loop vectorization.
  3. Development risk is vastly lowered, as patches would be live from the time they land.

Again, I’m not sure what the best route would be, as I’m not familiar with DA, but I would urge you to at least investigate the possibility of extending LAA.

1 Like

Rewriting DA from scratch is a big task and be looking forward to see it and not go the way of NewGVN.

+1. It is tempting as a developer, but as a strategist, we should avoid. Let’s not follow the path NewGVN took, which is under discussion for removal. You don’t want to invest a lot of time in rewriting, getting it enabled, and doing all the process.

It should bail out here because of missing nuw, not just because it is unsigned. Even though Clang will not emit nuw, an analysis may determine expressions that are not wrapping (like for monotocity), add nuw, which in turn would allow expanding the integer range internally. A[2 * i - 1] obviously does wrap for i=0. This is not a case that we would need to handle.

For statically-sized arrays, in de-linearization is not strictly needed. In that case the equations are linear (if the subscripts are linear) and can be solved symbolically (which is what Polly does).

It would be good to have something robust. There are simple cases, for statically-sized as well for dynamically-sizes arrays, that from the outside does not look like that adding any complexity, like I mentioned in the de-typification thread. For statically-sized arrays it is adding +1 to an outer subscript (already happens when the initial value not being 0). For dynamically-sized arrays, it is using n+1 as array size, or not using at least one induction variable per subscript. Being sensitive to such simple changes gives an analysis a bad reputation.

This of course requires that the source does not already write the subscript de-linearized (like you example A[i*M*N + j*M + k]). Which is unfortunately common because C does not even support dynamically-sized arrays before C99, so a heuristic would be nice, but it is also more obvious to the user why it may not work.

I’ve looked through LAA. My current personal conclusion is: yes, it may be possible to consolidate LAA and DA, but I’d prefer to keep them separate. The reasons are as follows:

  • Indeed, both are forms of ā€œdependence analysisā€, but it feels like they’re addressing slightly different problems. LAA appears to be primarily focused on vectorization.
    • DepType is a prime example, as it considers store-to-load forwarding and vectorization factors. Additionally, I believe the forward/backward dependence in LAA is not exactly the same as the positive/negative dependence in DA. The former refers to the ā€œexecution orderā€ of instructions, which the latter does not take into account. Forcing integration might actually make things more confusing.
    • From another perspective: if I understand correctly, the vectorizer focuses on whether a particular loop is vectorizable, whereas other loop transformations may be more concerned with the relationships between loops.
  • This is entirely subjective, but I feel that both LAA and DA are already quite complex. Merging two complex analyses would only increase the complexity and make maintenance more difficult.

On the other hand, there does appear to be some shared logic between LAA and DA. Some parts are buggy in DA but seem to be correctly implemented in LAA. I found it appealing that LAA’s correct implementation could be reused. Perhaps the ideal approach would be to refactor the common utilities?

(I feel the pass names are poor. Maybe DA should be renamed to something like ā€œLoopNestDependenceAnalysisā€ā€¦)

The primary motivation for rewriting a new one is that I don’t want to spend any more time reviewing and fixing the problems in DA. I seriously believe it would require less effort to implement a new pass from scratch than to address all the issues in DA. The situation doesn’t seem to be the same as with GVN vs. NewGVN.

Yeah, you are right, I guess my reasoning was a bit off.

I really want to avoid solving mathematical problems symbolically under conditions where overflow is allowed. I found it very hard to prove soundness and preserve it throughout development.

Yes, the robustness is a tricky part. What makes it difficult is that the feasible delinearization may not be uniquely determined. I think we should explicitly define something like ā€œorderā€ for the delinearization results. For example, something like ā€œthe result with the larger inner size takes precedenceā€ (completely arbitrary, though).

I think handling a case like A[i*M*N + j*N + k] is more difficult than it looks. I tried a quick experiment the other day and found that some pass simplifies away the mul instructions before reaching DA, so DA cannot prove that M*N doesn’t overflow. When I replaced it with A[(i*M + j)*N + k], the IR changed slightly, and in this case it looked like we could prove that M*N doesn’t overflow. Indeed, they don’t seem to be equivalent in terms of overflow behavior, but I believe it’s not robust if the difference between these two forms affects the result of dependence analysis.

There are a few things unclear to me. After listing the current DA problems, you conclude:

Now I think that it’s impractical to address all of the issues described above. Therefore, I propose introducing a new pass from scratch.

But you haven’t really explained why it is impractical. You do not really explain why this cannot be fixed in DA and why we need a reimplementation. Thus, the main thing that is unclear to me is why DA cannot be fixed? Subscript wrapping and overflow are problems that also need to be addressed in a NewDA, and I guess will similarly be relying on all sorts of SCEV helpers for discovering different loop and subscript propertie, etc., so it is unclear to me what is going to be really different.

In the last sentence you repeat that it is impractical:

I think it is impractical, but I’m open to suggestions if anyone has ideas on how to address the issues with DA mentioned above.

I don’t have a solid plan, but a few suggestions and/or thing to keep in mind are:

  • People have been using LLVM’s loop optimisations for years. Thus, this has been heavily tested, and there is some value in keeping that.
  • If you, for example, don’t like the Banerjee test, can you put this under an internal DA option? It is fairly easy to compartmentalise these kind of things. That’s maybe the only good thing that this is off by default: it is not going to cause disruptions or regressions. And if we are kind to downstream users, providing options allow them to turn this on so they can keep using this. Banerjee is one example, but I guess this could easily be done for some other areas that you mentioned.

As I previously said, and previously done, I will continue to fix all bugs reported against DA.

Thank you for raising these issues to my attention, I will be working on fixing these bugs.

I said impractical because:

  • As far as I know, nobody has full picture of DA. The issues I raised are just I realized ones and I haven’t fully checked the whole DA. I’m pretty sure there are other issues that have not been found yet.
  • In conjunction with previous discussions, I’ve found that no one is working on reviewing the DA codebase from the perspective of correctness. In addition, I can’t trust myself anymore, since there have been several times when parts I thought were fine turned out not to be.

As for adding a new pass, I proposed it because I believe it would take less effort than addressing all issues in DA. The reasons are:

  • Fixing all issues would require rewriting large parts of DA. At least, we need to insert overflow checks in many places where arithmetic operations are performed. In terms of the amount of code to rewrite, I guess the effort would probably be about the same as building a new one.
  • There have been several cases where making things more conservative led to unintended test failures. Each time, it takes time to check whether it’s reasonable.
  • In NewDA, I plan to cut down on several features. That should reduce the number of fixes needed.

It doesn’t make much sense to me to keep maintaining features in upstream that are only used downstream. Considering the maintenance cost, I think it would be better to remove them.

I’d really appreciate if you could also make an effort to look for them yourself, if you truly intend to fix all the bugs.

Yes, I will also do code reviews of DA code as you mentioned here and in the past.

If you reimplement this, we will have 1 person (i.e. you) who will fully understand that new pass. That isn’t a lot more than the number of people who are currently reviewing DA patches who have understanding of certain areas but not necessary the whole thing. There are a couple of folks who are actively reviewing DA patches, I would say that we are getting things done and are getting patches through code-review.

We could ask the question if we have a maintainer for DA, and/or if someone would like to step up, if that helps addressing your concern.

I don’t fully agree with that assessment. I would say that we are looking at this from different angles, which is a good thing. E.g, I am heavily testing it and am running a lot testing and fuzzing and found two more issues #159979 and #159990 after a weekend of fuzzing. Does it help if we present a test plan? For example, we are planing to run an extensive Fortran test suite regularly.

If these were your two main arguments for a reimplementation, then I am not convinced. My preference would be to avoid fragmentation and work together on this. So, for me, the preferred way forward is to discuss:

  • ownership,
  • testing,
  • and how to get DA enabled and running by default with a MVP.

I think this is the shorter route to success than a reimplemenation.

You can easily cut down on several features in the current DA implementation. That was the main reason to suggest to put things under options, not to support downstream users forever, but to be kind to our users which we do have and give them some support short term.

1 Like

As you said, if NewDA becomes a reality, it’s true that I may be the only one who fully understands it (though ideally, someone would step up to collaborate on the implementation). Still, I believe that’s better than having no one at all. Moreover, the goal of NewDA is not to implement something entirely different from DA. Rather, it aims to be a subset of DA, with a simpler and more comprehensible implementation, at the cost of reduced analytical capabilities. I believe this helps people understand what the pass is doing and improve it if they want to.

One of the current issues with DA is that, while we can somewhat understand what a fragment of a function is trying to do, the overall behavior (including its preconditions and/or assumptions) remains unclear. It seems that recent patches to DA have been applied to isolated fragments rather than addressing the function as a whole, which I think is why code reviews tend to proceed smoothly.

For instance, the following is the comments in DependenceInfo::strongSIVtest:

// strongSIVtest -
// From the paper, Practical Dependence Testing, Section 4.2.1
//
// When we have a pair of subscripts of the form [c1 + a*i] and [c2 + a*i],
// where i is an induction variable, c1 and c2 are loop invariant,
//  and a is a constant, we can solve it exactly using the Strong SIV test.
// 
// --- omitted ---
//
bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
                                   const SCEV *DstConst, const Loop *CurSrcLoop,
                                   const Loop *CurDstLoop, unsigned Level,
                                   FullDependence &Result,
                                   Constraint &NewConstraint) const {

The comments state that a is a constant, but as you can see from the function’s arguments, the corresponding parameter Coeff is not necessarily constant. Over years of development, it seems that consistency between comments and implementation has been lost. While this level of inconsistency might be tolerable, there’s a risk that assumptions critical to correctness have also been lost. In NewDA, I plan to address these kinds of issues as well. For this specific case, for example, by introducing a wrapper class like struct MonotonicAffine { const SCEV *Coeff, *Const; };. I believe changes like this will make the code easier to reason about.

Yeah, I know you are running a fuzzer and found issues, not just those two. I think it’s great that you are doing that. However, I see fuzzing as an additional effort to gain extra confidence in correctness, whereas in principle, correctness should be ensured by reviewing the code itself. That’s why I referred to the ā€œcodebaseā€ in my previous reply.
In fact, the DA bugs I’m referring to produce incorrect results without triggering any assertion errors, which I think a fuzzer should not be able to catch. Running more tests is certainly beneficial, but examining the code directly and ensuring the quality of its implementation is an essential process – and I haven’t seen anyone doing that for DA.

If you are talking about putting things under options, I’m not quite agree with it, because:

  • Such short-term solutions can increase complexity and make maintenance harder. I don’t think this is desirable for a pass that’s enabled by default.
  • Even if it’s easy to implement, it’s non-trivial to ensure that all buggy parts are truly disabled. Moreover, I think it’s quite possible for such a flag to be accidentally cleared or turned on.

If we’re really going to run DA in the default pipeline, I’d definitely prefer to delete all suspicious parts rather than just disable them. However, as you mentioned, that would affect downstream users. And more broadly, upstream passes that rely on DA as well. This is also one of the reasons I proposed introducing a new pass. It should allow for a gradual migration.
In addition, some of the issues I listed can’t be resolved by such options. For example, I want to change the argument types of certain functions (e.g., strongSIVtest) from SCEV * to APInt.

Lastly, if the main reason for opposing the introduction of a new pass is simply to avoid fragmentation, then honestly, I’m okay with a split. My top priority is to have the pass enabled in a ā€œgoodā€ state, and I genuinely believe that re-implementing a new pass is the best way to achieve that.

I am not opposing it, because I can’t, you’re free to reimplement anything you like.
The only thing I’m saying is that I don’t understand it, because I haven’t seen convincing arguments, and all the things you mentioned about DA will be issues for NewDA as well. So fragmentation isn’t my main reason to ask questions, it’s the lack of understanding.

I think fragmentation is a big thing though if ā€œresourcesā€ are limited and not to be underestimated. I.e., you would like to get reviews for NewDA if you’re going to upstream that, so that is going to eat up bandwidth from reviewers and the community, and as we noticed while doing this work so far, there aren’t many. And then we end up with 3 implementations doing slightly different things, DA, LAA, NewDA, and then we need to have a migration and that all has a cost too. You’re saying that the cost of this fragmentation is all worth it, I am saying that this is unclear to me.

You haven’t commented on ownership, testing, a minimal viable product, etc., but it looks like you’ve decided on a reimplementation. Anyway, I will repeat that my preference is to work together on this, that is still the case, and that offer still stands.

Could you clarify what you mean by ā€œownershipā€? Are you saying someone like the maintainer of DA?

Regarding other points, I believe I’ve mostly addressed them already, but since that may not have come across clearly, let me explain once again.

  • Testing: (Assuming you’re referring to what kinds of tests to run) My point is that running tests alone is not sufficient. We need to examine and understand the quality of the implementation (i.e., the code) itself. (I believe similar points have been raised multiple times in previous discussions…)

  • Minimal viable product: I think it is what the current DA with (1) necessary validations (monotonicity checks, overflow checks) added, (2) features whose correctness is unclear (e.g., constraint-propagation) removed. Applying these changes to the current DA affects its regression tests and other passes. I don’t want to spend much time to investigate these regression every time I make changes to DA. To be clear, this MVP is what NewDA aims as well as its initial goal.

Yes, we need to tackle the same issues whether we choose DA or NewDA. However, I think there is a difference between ā€œincrementally implementing while preserving soundness (i.e., developing NewDA)ā€ and ā€œfixing the correctness issues with paying attention to regressions (i.e., fixing DA)ā€. I’ve been pursuing the latter, but I’m starting to think the former might be more efficient, especially given that the underlying issues in DA are still not fully understood. FWIW, I actually tried adding monotonicity checks in https://github.com/llvm/llvm-project/pull/154527, but I’ve paused my work because the regression tests react too sensitively to even minor changes, making it hard to proceed.

I think LAA should be separated from DA/NewDA, and that one of the goals of NewDA is to replace the existing DA and eventually remove it. If we can remove the current DA, I believe this cost would be worth it.

Regarding resources, it’s getting to the point where I no longer see much value in allocating them to the current DA, since I’m not confident things will go well if we continue as is. If we proceed with DA development for default enablement, I believe we should bite the bullet and remove some features. Personally, I’m fine with this approach instead of introducing a new pass, provided the resulting degradation is acceptable.

How about we disable those features under a flag and over time we re-enable as we get more comfortable with each feature.

Here’s the code: [DA] Move constraint propagation under flag disabled by default by sebpop Ā· Pull Request #160924 Ā· llvm/llvm-project Ā· GitHub

Some thoughts on this:

  • I think it’s important that we work together towards a common goal of enabling DA-based optimizations by default. I don’t think we’ll have a good outcome if one person works towards this by implementing a new, minimal DA implementation, while other people ignore that effort and work on existing DA instead. We should try to align on a common direction, whatever that direction may be.
  • NewGVN has been mentioned a few times as a bad precedent for this kind of rewrite. I believe that NewGVN is a bad example, because it is not a simplified rewrite of a pass, but an implementation of a completely different algorithm. The more relevant precedent would be LoopUnswitch vs SimpleLoopUnswitch. Notably, that one was a successful replacement, and more in line with what is being proposed here. (The proposed pass is really SimpleDA, not NewDA.)
  • While it is good to avoid unnecessary regressions for downstream users of DA, I believe we need to prioritize making the changes necessary to enable DA in upstream LLVM. If the most expedient way toward that is to delete significant chunks of the DA implementation, we should do that, even if it may regress some downstream use cases.

I think that as long as there is a willingness to remove and/or disable parts of the DA implementation while working towards default enablement, it’s probably better to keep working on improving the existing pass, rather than starting a from-scratch implementation.

2 Likes