Hey everyone,
I would like to propose the addition of a more generalized data-flow analysis framework to MLIR. Currently in MLIR, we have the ForwardDataFlowAnalysis
class which makes it really easy to define a sparse dataflow analysis in MLIR that works with block and region based control-flow and the call graph. But beyond that, it doesn’t provide a lot of help in writing analyses. Beyond being general, the goals of the framework are to be extensible, debuggable, and composable. Composability isn’t at 100% – I’m still trying to hash it out (more on that later) without butchering the API or performance, so any advice would be appreciated
High-Level Structure
The framework consists of mainly three pieces: DataFlowSolver
, DataFlowAnalysis
, and AnalysisState
. An analysis state represents information computed by an analysis about a program point; for example, the constant value of an op result. The class ProgramPoint
represents “things in the IR” such as operations, SSA values, and blocks, but also less concrete things such as “the state of memory locations after the execution of an operation”.
Dataflow analyses subclass DataFlowAnalysis
and have a broad mandate in what they can do, but typically they are responsible for setting up dependencies between analysis states and implementing transfer functions between them. Finally, DataFlowSolver
does the memory and dependency management of the analyses and runs the fixed-point iteration. The point is that multiple dataflow analyses can run at the same time (and interact with each other) in the same solver instance.
You can read more details about the design in this doc update, but the overall surface looks something like:
struct MyAnalysisState : public AnalysisState {
/// Some information about the IR.
int state;
};
struct MyAnalysis : public DataFlowAnalysis {
LogicalResult initialize(Operation *topLevelOp) override {
// Set up dependency graph.
}
LogicalResult visit(ProgramPoint point) override {
auto *op = point.get<Operation *>();
// Transfer function from operands to results.
}
};
void MyPass::runOnOperation() {
DataFlowSolver solver;
solver.load<MyAnalysis>();
solver.load<MyOtherAnalysis>();
if (failed(solver.initializeAndRun(getOperation())))
return signalPassFailure();
// Query and use results
}
Extending and Composing Analyses
Building analyses from scratch is a lot of code. That’s why the framework ships with utilities for implementing common analysis kinds. For example, SparseDataFlowAnalysis
is, on the surface, almost the same as ForwardDataFlowAnalysis
, so existing analyses should be easy to port over. DenseDataFlowAnalysis
implements the structure of a dense analysis across the IR. It can be used to write, for example, analysis that need to understand memory or other side effects.
The goal of extensibility and composability in this regard is that users don’t have to reimplement or modify out-of-tree these utilities if their dialects/IRs don’t fit the mold perfectly. E.g. their region control-flow constructs differ from RegionBranchOpInterface
. Users can extend the framework (including defining new program point kinds) to write their own analyses that compose organically with existing ones.
A common example is using SCCP as a baseline for writing other kinds of conditional analyses:
LogicalResult MyConditionalAnalysis::visit(ProgramPoint point) {
auto *block = point.get<Block *>();
if (!getOrCreate<Executable>(block)->isLive()) {
// Unreachable code. Ignore this block.
return success();
}
// Do some analysis.
}
void MyPass::runOnOperation() {
DataFlowSolver solver;
// These two together make SCCP.
solver.load<DeadCodeAnalysis>();
solver.load<SparseConstantPropagation>();
// Load our conditional analysis.
solver.load<MyConditionalAnalysis>();
}
Another example is running the recently developed integer value range analysis (by @krzysz00) in tandem with SCCP. There is ongoing work on implementing taint analysis in MLIR using a pair of sparse and dense analyses, for example.
Performance
I compared the implementation of SCCP with this framework with the existing SCCP implementation, since this is an important baseline to meet.
- On large programs with very little constant folding (<1% of operations fold; infrequent dependency updates and fewer overall iterations), the new framework performs significantly better (~16s vs. forever in one example and ~5s vs. ~24s in another).
- On large programs with a lot of constant folding (~92% of operations fold; very frequent dependency updates), the new framework is almost as good as the old one, coming within 5% (~4.2s vs ~4s). This is despite the old SCCP implementation being monolithic and the new one being two separate analyses working in tandem. More optimizations can be made, but I grabbed the low-hanging fruit to reach performance parity first.
The fixed-point iteration right now is as basic as possible, and so there is plenty of room for exploring more novel techniques in literature as well.
Almost Composability
One key feature of composability of analyses is that multiple analyses can provide information to the same analysis state. This works fine except for optimistic analyses. Suppose analyses A and B are both trying to determine potential values of state S
. What happens if A gives up and marks the state as overdefined? The information from B will be lost. One solution is to have S := S_a U S_b
, where S
only goes to overdefined when both S_a
and S_b
do. But this approach, API-wise and performance-wise, is more invasive than I would like.
Why not Templates?
MLton and SPARTA both implement static analyzers with heavy/extreme use of template programming. Why not do the same for this framework?
- This kind of template-heavy programming is not consistent with how the rest of MLIR is built, which is a mix of templates and runtime type info through
TypeID
. - The current framework supports a dynamic dependency graph, one of the reasons why its SCCP performs better on the aforementioned “large program with infrequent constant folding”.
- It seems like a lot of work.
Debuggability
Basically, all the important data structures implement operator<<
. With -debug-only="dataflow"
, the solver prints a trace of invoking analyses, changes to the dependency graph, and propagating state updates. It has thus far been very useful in helping me debug my own code
In-Flight Patches
I threw up a series of WIP patches a while ago adding the framework, dead code analysis, sparse analysis (also swaps out the SCCP implementation), and dense analysis. I have a patch in progress that swaps out the integer value range implementation as well. The first one actually landed already (whoops) but I hope to iterate on the framework continuously.