Summary
There are several issues with the DependenceAnalysis (commonly referred to as DA) pass. This RFC proposes introducing a new pass as an alternative to DA, rather than attempting to fix the existing implementation. While the new pass may offer reduced analytical capabilities, it is expected to be simpler and easier to reason about in terms of correctness.
Background
A recent investigation into the DependenceAnalysis (DA) pass has revealed several bugs. While some of these are minor, others are fundamental in nature. These issues prevent DA from being enabled by default. Iāve been working on addressing them, but as a result, Iāve come to believe that re-implementing a new pass might be easier than fixing the existing one.
What DA does
To clarify the details of the issues, let me briefly explains what DA does. The algorithm underlying DA is described in the paper Practical dependence testing.
DA analyzes loops and detects loop-carried dependencies between two memory operations.
Consider the following loop:
for (I = 0; I < 100; I++) {
A[2 * I + 0] = 0;
A[2 * I + 1] = 1;
}
To explicitly indicate that the two instructions may belong to different iterations, we denote the first store as A[2 * I0 + 0]
and the second as A[2 * I1 + 1]
. DA analyzes under what conditions ā specifically, the relationship between I0
and I1
ā the two stores may access the same memory location. In this case, 2 * I0 + 0
always yields an even number, while 2 * I1 + 1
always yields an odd number. Therefore, DA concludes that there is no dependency between these stores.
Here is another example:
for (I = 0; I < 100; I++)
for (J = 0; J < 100; J++) {
A[I][J + 1] = 0;
A[I + 1][J] = 1;
}
Similarly, we denote the first access as A[I0][J0 + 1]
and the second as A[I1 + 1][J1]
. In this case, there exists loop-carried dependency between these stores, e.g., both of them write to A[1][1]
at some point. However, based on the relations I0 = I1 + 1
and J0 + 1 = J1
, DA can deduce the dependency distances, which in this case are denoted as [1 -1]
. These relations also imply the dependency directions, represented as [< >]
. That is, DA analyzes not only the existence of dependencies but also their distance and direction (IIUC, LoopAccessAnalysis should perform similar analysis as well).
As demonstrated above, DA analyzes the relationships between subscripts (or indices) for each dimension when accessing multi-dimensional arrays (including 1D arrays). Whatās important here is that DA handles inequalities and performs algebraic transformations.
Issues in DA
In short, the core issue lies in the gap between the mathematical foundations of the underlying algorithm and the semantics of LLVM IR. The main bugs that have actually been observed are as follows. Note that the analysis is based on SCEV, where subscripts are generally represented using SCEVAddRecExpr
.
Subscript wrapping not accounted for
This is arguably the most critical issue. The underlying algorithm of DA, as described in the referenced paper, assumes that each subscript is either loop-invariant or changes by a constant step during loop execution. In other words, it treats the domain of subscripts as the set of all integers. However, in LLVM, the index type has a finite bit width, so subscripts may wrap around, violating the algorithmās assumptions and potentially leading to incorrect results. At a minimum, we must ensure that subscripts do not wrap during loop execution.
Several factors further complicate this issue:
- DA also analyzes expressions of the form
%A * %i + %B
, which would be like{%B,+,%A}<%loop>
in SCEV, where%A
and%B
are loop-invariant andSCEVUnknown
. Handling non-constant values introduces additional complexity. - In expressions like
{%B,+,(%X * %Y)}<nsw><nuw><%loop>
, even if the AddRec itself doesnāt wrap,%X * %Y
may still wrap. It remains unclear how such cases affect the correctness of DA.
Mixed interpretation of signed and unsigned
DA generally treats values as signed. However, certain values, such as BTC, are typically should be interpreted as unsigned. Due to this inconsistency, inequalities like x < BTC
(in the mathematical sense) may be evaluated using x <s BTC
. When BTC holds a large value (i.e., out of range for signed integers), interpreting it as signed can cause it to appear negative, potentially leading to incorrect result.
Intermediate computation overflow
Similar to subscript wrapping, intermediate computations during analysis can cause overflow. Presumably, explicit overflow checks should be inserted for almost all arithmetic operations using APInt
in DA. This issue becomes even more complex when dealing with SCEVUnknown
as well (As an experiment, I tried inserting ScalarEvolution::willNotOverflow
somewhat arbitrarily, then many tests failedā¦).
Parametric delinearization
In LLVM, memory accesses are sometimes linearized, even those targeting multi-dimensional arrays (especially, if I understand correctly, once the migration from GEP to ptradd
is complete, all accesses should be linearized).
Hence DA performs a process called delinearize.For example, given an access to i8 in the form of %base + (%i * 100 + %j * 10 + %k)
, where %i
, %j
and %k
are IVs, by delinearizing, DA interprets it as an access like %base[%i][%j][%k]
to 3D array of the size i8[?][10][10]
. By doing so, instead of analyzing the subscript expression %i * 100 + %j * 10 + %k
directly, DA can treat %i
, %j
, and %k
separately. This simplifies the analysis and improves precision. Range checks like 0 <= %j < 10
and 0 <= %k < 10
are required, and these are already performed in the existing DA. Note that the outermost dimension is written as [?]
because the current implementation does not perform checks on that dimension.
Similarly, the presence of SCEVUnknown
complicates this issue as well. For example, an access like %base + (%i * %M * %N + %j * %N + %k)
may be interpreted as an access like %base[%i][%j][%k]
to 3D array of the size i8[?][%M][%N]
. While this may appear correct at first glance, issues can arise if %M * %N
wraps.
For example, suppose the index type is 8-bit and (%M, %N) = (2^4 - 1, 2^4)
. Then the original access would be:
%i * %M * %N + %j * %N + %k = -2^4 * %i + 2^4 * %j + %k = 2^4 * (%j - %i) + %k
.
This means that when %i = %j
, the same address is accessed, which implies that %i
and %j
cannot be treated independently.
Proposal
Now I think that itās impractical to address all of the issues described above. Therefore, I propose introducing a new pass from scratch. For convenience, this RFC refers to it as NewDA.
The initial goal for NewDA is to achieve default enablement.
To be clear, there is currently nothing like prototype ā this is still in the conceptual stage.
NewDA is essentially intended to be a simplified subset of the existing DA except for monotonicity check and unification to signed interpretation.
Add subscript wrap check (monotonicity check)
As described above, we need to guarantee that subscripts doesnāt wrap. For convenience, this RFC refers to the property as monotonicity. There has been an attempt to introduce this into the existing DA (https://github.com/llvm/llvm-project/pull/154527), but progress has stalled due to the discovery of new corner cases and test regressions. NewDA will incorporate this (or similar) mechanism from the outset.
Unify all integer interpretation as signed
To simplify the implementation, all interpretations should be unified as either signed or unsigned. From a correctness standpoint, both are acceptable, but signed interpretation currently seems easier to implement. As a corollary, the analysis should bail out if DA cannot prove that BTC is non-negative (in signed interpretation).
Features to be removed in NewDA
Subscripts containing SCEVUnknown
Subscripts containing SCEVUnknown
as its sub-expression makes monotonicity check and wrap checks difficult. In NewDA, we avoid handling subscripts like {0,+,%X}<%loop>
and instead restrict the analysis to simpler forms such as {0,+,42}<%loop>
, making these checks easier.
Parametric delinearization
As described above, there are issues with parametric delinearization. While this feature doesnāt necessarily need to be removed if we can prove no-wrapping, for simplicity, Iād prefer to remove this feature in the initial implementation.
Constraint propagation
DA generally treats each dimension of an array independently. However, there is a mechanism called constraint propagation, which reflects information obtained from one dimension onto others. This involves rewriting subscript expressions. Although this behavior hasnāt been thoroughly investigated, such rewrites may break the monotonicity of the subscripts.
Notably, there exists a regression test named Propagating.ll, which appears to test this feature. However, as far as I tried, constraint propagation was not triggered in this test (as confirmed this from the debug output). This process may not be particularly useful in the first place, and itās possible that omitting it would cause little to no issues.
Several test routines
DA performs dependence analysis by invoking a series of dependence-testing routines in order, such as xxxSIV
, xxxRDIV
, and others. Iād like to remove some of them, as they may affect correctness or offer limited utility.
At this point, Iām considering removing the following in NewDA:
symbolicRDIVtest
: I think symbolic analysis would be less useful in NewDA.banerjeeMIVtest
: It effectively rewrites subscripts, which may violate monotonicity.
Comparison with existing DA
- Pros
- Simpler implementation, making correctness easier to verify (at least than existing DA).
- Easier to maintain.
- May reduce compile time.
- Cons
- Lower analytical capability.
- Itās unclear how much this will impact real-world use cases.
- Lower analytical capability.
Notably, even if all current bugs in DA were fixed, the following issues would likely remain:
- I think itās difficult to reliably prevent similar bugs from occurring in the future.
- Adding the missing validations may increase compile time.
Migration
Currently the following transformations use DA:
- LoopInterchange
- LoopUnrollAndJam
- LoopFuse
I donāt intend to replace all of these uses with NewDA immediately. At the moment, I plan to replace only LoopInterchange with NewDA. Also, to make it easier to switch between DA and NewDA, I plan to derive the API of NewDA from that of the existing DA.
Alternatives
Remove certain features from the current DA
Iād prefer to avoid making changes while carefully avoiding regressions across all passes using DA.
Attempt to fix all issues in the current DA
I think it is impractical, but Iām open to suggestions if anyone has ideas on how to address the issues with DA mentioned above.