Improve Operation::fold implement

We have many dialects, including those in tensorflow, in mlir core, or other dialects defined by mlir users. One thing that I think is very troublesome is that we need to define the hasFolder method for almost all operations in each dialect, and It manually implements the calculation logic of the Operation with C++ code. Many times our dialects actually have CPU backends, I think if we had a jit-like way:

OpFoldResult AddOp::fold(ArrayRef<Attribute> operands) {
  return jit.run("add", operands[0], operands[1]);
}

We can reduce a lot of repetitive work.

The end goal is similar to what IREE does: we do an analysis for trees of constant expressions, apply a (currently primitive) cost model and decide which leaves to eval at compile time (by recursively invoking the compiler with the reference runtime).

The folding hook is really not designed for a lot of the large tensor constant folding: it needs to be cost neutral, and that is not possible to evaluate locally or independent of use case.

There are three parts to our solution, in increasing specificity to our project:

We actually do a two step hoist: we hoist constant trees into globals and initialize them in module initializers (this manner of globals and initializers uses iree ops that need to be upstreamed), rewriting their references into global loads. Then we jit module initializers rooted in immutable globals. In theory, this is another place a cost model could come into play (since init time vs at rest vs use time is a trade-off space), but currently this part is greedy.

We have a fixed point pass pipeline in our frontend where we do constant evaluation and numeric optimization (the two often go together) until convergence.

I’d be open to at least some of this going upstream. I haven’t made any moves on it because before some of the current renames and cleanup work, it would have involved adding to the std dialect mess or hacking more on to builtin.func. But I think we’re not too far off from being able to represent what is needed. Some of what we have is also better done with interfaces, so that is work that needs to happen.

All of our stuff is LLVM licensed, and if anyone would like to help with upstreaming, I’d be happy to partner on that if there is utility in doing so.

1 Like

Thank you very much for your answer. If there are some features, I am very interested in helping to do it.

I’ll be talking through IREE’s input dialect on this Thursday’s ODM, which will include getting feedback on some of the ops we depend on. Let’s start there: this becomes easier if we have the dependent ops upstream.

Sorry I can’t participate in ODM due to time problems, I think of some examples where it is impossible to put some constants in the global Op, for example:

%rhs = some fold result
div(%lhs, %rhs)

Sometimes the value of %rhs fold result is 1, sometimes not. If it is 1 when we are doing algebraic simplification, we can replace the div with lhs. Therefore, in the fold function, we call ourselves to compile the binary code generated by this Operation, which is somewhat similar to the bootstrapping of the compiler. I guess that’s a good idea, it’s just that we can provide something like this.