[RFC] A DataFlow Analysis Framework

Mogball · June 21, 2022, 9:11pm

Hey everyone,

I would like to propose the addition of a more generalized data-flow analysis framework to MLIR. Currently in MLIR, we have the ForwardDataFlowAnalysis class which makes it really easy to define a sparse dataflow analysis in MLIR that works with block and region based control-flow and the call graph. But beyond that, it doesn’t provide a lot of help in writing analyses. Beyond being general, the goals of the framework are to be extensible, debuggable, and composable. Composability isn’t at 100% – I’m still trying to hash it out (more on that later) without butchering the API or performance, so any advice would be appreciated

High-Level Structure

The framework consists of mainly three pieces: DataFlowSolver, DataFlowAnalysis, and AnalysisState. An analysis state represents information computed by an analysis about a program point; for example, the constant value of an op result. The class ProgramPoint represents “things in the IR” such as operations, SSA values, and blocks, but also less concrete things such as “the state of memory locations after the execution of an operation”.

Dataflow analyses subclass DataFlowAnalysis and have a broad mandate in what they can do, but typically they are responsible for setting up dependencies between analysis states and implementing transfer functions between them. Finally, DataFlowSolver does the memory and dependency management of the analyses and runs the fixed-point iteration. The point is that multiple dataflow analyses can run at the same time (and interact with each other) in the same solver instance.

You can read more details about the design in this doc update, but the overall surface looks something like:

struct MyAnalysisState : public AnalysisState {
  /// Some information about the IR.
  int state;
};

struct MyAnalysis : public DataFlowAnalysis {
  LogicalResult initialize(Operation *topLevelOp) override {
    // Set up dependency graph.
  }
  LogicalResult visit(ProgramPoint point) override {
    auto *op = point.get<Operation *>();
    // Transfer function from operands to results.
  }
};

void MyPass::runOnOperation() {
  DataFlowSolver solver;
  solver.load<MyAnalysis>();
  solver.load<MyOtherAnalysis>();
  if (failed(solver.initializeAndRun(getOperation())))
    return signalPassFailure();
  // Query and use results
}

Extending and Composing Analyses

Building analyses from scratch is a lot of code. That’s why the framework ships with utilities for implementing common analysis kinds. For example, SparseDataFlowAnalysis is, on the surface, almost the same as ForwardDataFlowAnalysis, so existing analyses should be easy to port over. DenseDataFlowAnalysis implements the structure of a dense analysis across the IR. It can be used to write, for example, analysis that need to understand memory or other side effects.

The goal of extensibility and composability in this regard is that users don’t have to reimplement or modify out-of-tree these utilities if their dialects/IRs don’t fit the mold perfectly. E.g. their region control-flow constructs differ from RegionBranchOpInterface. Users can extend the framework (including defining new program point kinds) to write their own analyses that compose organically with existing ones.

A common example is using SCCP as a baseline for writing other kinds of conditional analyses:

LogicalResult MyConditionalAnalysis::visit(ProgramPoint point) {
  auto *block = point.get<Block *>();
  if (!getOrCreate<Executable>(block)->isLive()) {
    // Unreachable code. Ignore this block.
    return success();
  }
  // Do some analysis.
}

void MyPass::runOnOperation() {
  DataFlowSolver solver;
  // These two together make SCCP.
  solver.load<DeadCodeAnalysis>();
  solver.load<SparseConstantPropagation>();
  // Load our conditional analysis.
  solver.load<MyConditionalAnalysis>();
}

Another example is running the recently developed integer value range analysis (by @krzysz00) in tandem with SCCP. There is ongoing work on implementing taint analysis in MLIR using a pair of sparse and dense analyses, for example.

Performance

I compared the implementation of SCCP with this framework with the existing SCCP implementation, since this is an important baseline to meet.

On large programs with very little constant folding (<1% of operations fold; infrequent dependency updates and fewer overall iterations), the new framework performs significantly better (~16s vs. forever in one example and ~5s vs. ~24s in another).
On large programs with a lot of constant folding (~92% of operations fold; very frequent dependency updates), the new framework is almost as good as the old one, coming within 5% (~4.2s vs ~4s). This is despite the old SCCP implementation being monolithic and the new one being two separate analyses working in tandem. More optimizations can be made, but I grabbed the low-hanging fruit to reach performance parity first.

The fixed-point iteration right now is as basic as possible, and so there is plenty of room for exploring more novel techniques in literature as well.

Almost Composability

One key feature of composability of analyses is that multiple analyses can provide information to the same analysis state. This works fine except for optimistic analyses. Suppose analyses A and B are both trying to determine potential values of state S. What happens if A gives up and marks the state as overdefined? The information from B will be lost. One solution is to have S := S_a U S_b, where S only goes to overdefined when both S_a and S_b do. But this approach, API-wise and performance-wise, is more invasive than I would like.

Why not Templates?

MLton and SPARTA both implement static analyzers with heavy/extreme use of template programming. Why not do the same for this framework?

This kind of template-heavy programming is not consistent with how the rest of MLIR is built, which is a mix of templates and runtime type info through TypeID.
The current framework supports a dynamic dependency graph, one of the reasons why its SCCP performs better on the aforementioned “large program with infrequent constant folding”.
It seems like a lot of work.

Debuggability

Basically, all the important data structures implement operator<<. With -debug-only="dataflow", the solver prints a trace of invoking analyses, changes to the dependency graph, and propagating state updates. It has thus far been very useful in helping me debug my own code

In-Flight Patches

I threw up a series of WIP patches a while ago adding the framework, dead code analysis, sparse analysis (also swaps out the SCCP implementation), and dense analysis. I have a patch in progress that swaps out the integer value range implementation as well. The first one actually landed already (whoops) but I hope to iterate on the framework continuously.

regehr · June 22, 2022, 2:35am

This is very cool! Regarding composing analyses, there’s a lot of work out there on this (and for example of course SCCP is an optimistic composition of two other analyses).

Mogball · June 22, 2022, 6:39am

Please share any relevant work/papers you know! I didn’t reference any for this implementation but it’d good to have ideas going forward.

rengolin · June 22, 2022, 8:42am

Dumb question: does this mean the passes will run in parallel, and if so, how?

From your example:

Mogball:

void MyPass::runOnOperation() {
  DataFlowSolver solver;
  // These two together make SCCP.
  solver.load<DeadCodeAnalysis>();
  solver.load<SparseConstantPropagation>();
  // Load our conditional analysis.
  solver.load<MyConditionalAnalysis>();
}

I read that code as if the two other passes will run before MyConditionalAnalysis. But it’s not clear if they’ll run before on all ProgramPoints and then go to the next pass, or interleaved, all passes on one ProgramPoint at a time, repeat.

The former is more traditional and easy to compose, the latter is bolder and can lead to more interesting cases, but is also much harder to handle, especially if the other passes independently change semantics with time.

mehdi_amini · June 22, 2022, 10:32am

The intent is that they will run together, compose, and allow to iterate until fixed-point while two different analysis can refine each others’ results. That is the framework tracks dependencies between state update across analyses.

regehr · June 22, 2022, 3:45pm

Ok, I’ll dredge stuff up sort of slowly, here’s a first batch. This has long been sort of an attractive idea and so there’s quite a bit out there and it would probably be good for anyone working on this to at least skim these.

PAG is totally worth reading about:
https://sci-hub.se/https://doi.org/10.1007/s100090050017

of course the old SCCP paper is a must-read:
https://dl.acm.org/doi/pdf/10.1145/201059.201061

the theory of how abstract domains learn from each other is super good stuff, but a lot of it is pretty tough reading, this might or might not be a good place to start:

and then a few more papers I’m less familiar with:

http://projectsweb.cs.washington.edu/research/projects/cecil/www/Papers/engines.html

regehr · June 22, 2022, 3:49pm

one more thing: my group worked on a dataflow framework a while ago and it was sometimes pretty painful to debug, but we came up with a debugging technique that ended up being super useful. basically you run all of the analyses to completion and then instead of feeding the analysis results to the optimizer, we instead turned every dataflow fact into an assertion. so if an integer range analysis concluded the x was in [1…32] at some program point, we simple dropped an assertion into the code at that location assert(x >=1 && x <= 32); then you run the assert-heavy program and watch for assertion violations. we found more bugs this way than with any other technique and I’m not convinced our stuff would have worked on large programs if we’d not done this.

jpienaar · June 22, 2022, 3:58pm

Ooh that’s a good idea!

_sean_silva · June 22, 2022, 5:08pm

I would love to have this available!

+1 on compositionality and non-templatedness. For these things, the visitation order and spatial locality of cooperating analyses will dominate the macroscopic performance. Having a solid baseline performance + the right abstractions to innovate in those domains feels most important to me on the performance front vs more templating.

krzysz00 · June 23, 2022, 8:22pm

@Mogball Re composability: What if, instead of having the set of states of (Do analysis A and analysis B) be S_A U S_B, we went with S_A x S_B. Then, each analysis has its own overdefined state, and overdefined for the whole thing is (overdefined(A), overdefined(B))

Mogball · June 24, 2022, 4:36pm

Ah, yes that’s an interesting idea. To get a single value, you would intersect the individual states. It makes sense to me that overdefined x Sb => Sb.

But I wonder what should happen if analyses disagree? E.g. if A says S is 3 : i32 but B says 4 : i32, should S be overdefined? But if A and B were run independently, then the analysis would conclude different results. This sounds like a correctness problem in the analyses but it’s weird to think about.

For integer range analysis, if A says [1,4] and B says [2,6], does it make sense for S to be [2,4] instead of [1,6]? What happens if the ranges don’t overlap? E.g. [1,2] and [3,4] (maybe also a correctness problem).

The problem I had came with book keeping. Now every analysis state needs to track substates. Ideally I would like a way for states to opt out of the mechanism if they don’t need it (they aren’t a 3 way lattice e.g. liveness). And analyses need to “declare” their outputs so that the framework can allocate slots.

I can try this again and hopefully make the API look nice too (e.g. not passing TypeID everywhere).

regehr · June 24, 2022, 5:00pm

As long as we’re talking about overapproximating static analyses (and so far we are, I believe), then if one analysis concludes that S = { 3 } and another concludes that S = { 4 }, then we know that S = {}, using set intersection as you say. This is not necessarily a correctness bug, it could instead indicate an unreachable program point. Using the debug strategy that I mentioned above, S = {} turns into assert(false).

Mogball · June 24, 2022, 5:50pm

Hah. That really is a neat trick!

What bothers me is that if only one analysis was running, it would conclude S = {3} and so we’d get assert(a == 3). Does this necessarily mean the code was always unreachable anyways?

regehr · June 24, 2022, 6:18pm

Yeah, there’s always imprecision in these analyses, so if we just ran the one analysis we’d simply end up placing an assertion into code that happens to be dead. No problem!

regehr · June 24, 2022, 6:51pm

An issue I’d like to bring up is that dataflow analyses are highly varied, they come in a lot of flavors. For example there’s forward vs backwards, sparse vs dense, non-relational vs. relational, overapproximating vs underapproximating, path sensitive or not, context sensitive or not, flow sensitive or not, interprocedural or not, and probably a few other things I’m forgetting.

Individual dataflow analyses make these decisions on sort of an ad hoc basis and that’s fine, but a framework should be more systematic. So I think it’s important to consider, in advance, which flavors are going to be supported by the new framework. For example, it’s fine to rule out path and context sensitivity, these are usually very expensive. On the other hand, it’s most likely desirable to support both forward and backward analyses (for example, regarding the analyses that are in LLVM, known bits is forwards and demanded bits is backwards).

Separately, when I teach this stuff, this is the reading material that I always use; I consider it the best available introduction to all of these issues:

krzysz00 · June 24, 2022, 7:30pm

To back up some, because I’m not sure I get it - why do we need composition of analyses to be a thing the solver knows about?

I can see two cases:

If we have analysis A(program) and B(program, A(program)), then we can just run A and feed its results to B.
If we have analyses A and B that are independent of each other, we can run them both and track their states separately?

krzysz00 · June 24, 2022, 7:31pm

And wrt path-sensitive … it’d be nice if we could rig up the range analysis to understand how select and if works, but that’s a bunch of effort

Mogball · June 24, 2022, 7:42pm

Interesting that you bring that up, because that’s the whole point of composability. Integer range analysis and SCCP can work together, which would make integer range analysis path sensitive. If an IntegerValueRange of an SSA value can be narrowed to a constant, it sets the ConstantValue of the SSA value which can be used by DeadCodeAnalysis to determine which branches are live. At the same time, sparse constant propagation can be running that just calls fold, and also set the integer ranges of SSA values known to be an integer constant.

I.e., both sparse constant propagation and integer range analysis are potentially producing values for ConstantValue and IntegerValueRange.

krzysz00 · June 24, 2022, 7:44pm

I’ll note that integer range analysis has the same branch-pruning behavior as SCCP, so that interplay doesn’t need to be modeled as two passes working back and forth interactively

krzysz00 · June 24, 2022, 7:48pm

Perhaps the solution here is to optionally allow join()-ing in values from another analysis on an opt-in basis. That is, there’ll be an SCCP::tryJoin(ConstantValue, ConstantIntRanges) and vice-versa

Topic		Replies	Views
[RFC] A dataflow analysis framework for Clang AST Clang Frontend	16	2285	November 19, 2021
[open-analysis] Alias Analysis Design & Implementation and LLVM LLVM Dev List Archives	4	240	November 10, 2003
A survey of dataflow analyses in Clang Clang Frontend	1	284	October 6, 2020
[RFC] Abstracting over SSA form IRs to implement generic analyses LLVM Dev List Archives	12	184	June 9, 2021
Interprocedural data flow analysis Clang Frontend	2	138	August 26, 2008