[RFC] Adding memref.calloc

pablo · December 6, 2023, 12:57pm

Motivation
When I need to allocate a buffer and initialize it to zero, I usually find myself with a memref.alloc followed by a linalg.fill. IMO this is:

Not efficient: The linalg.fill will probably get lowered into a suboptimal implementation of memset-like code.
Unnecessary complex: Having to live with an additional linalg op that can potentially be lowered to loops while we could have both semantics in one op.

Proposal
Adding a memref.calloc or, even more modular, an attribute to memref.alloc called kind to select between uninitialized memory (the default when the attribute is not specified) and initialized memory.

Either way, this can be easily lowered into a calloc call. At the higher level, we can generate this from a tensor.empty that has an attribute similar to the one I described.

ftynse · December 6, 2023, 1:07pm

Can’t one just call calloc and then use a combination of unrealized_conversion_cast and memref.reinterpret_cast to construct the desired memref? This isn’t super elegant, but we may not want to reimplement all of libc.

rengolin · December 6, 2023, 1:57pm

I think you’re mixing IR design with implementation details.

For IRs to be efficient and minimize pattern matching possibilities, they need to describe the problem in as few as possible ways. Your proposal creates additional ways of doing the same thing, forcing compilers to now match against multiple patterns to basically do the same thing.

For example, in our compiler, we match one alloc + fill(0.0) and create even more fills, because we get to fuse them with the following GEMMs, and doing so with a pair fill + matmul is easier than a single fill and multiple matmuls.

Your proposal works for your case, but not this one, so we’d be adding a new pattern, not replacing the existing one.

For lowering to be efficient, you need those patterns to accurately describe the problem at hand. The pair alloc + fill does so elegantly. Matching that to a calloc on a particular platform (target + OS + ABI + environment) is a (simple) problem for the implementation.

I often hear people complaining about “this IR doesn’t look nice/clean/efficient” and that’s the wrong reaction. IRs are not supposed to look anything. They’re supposed to uniquely represent the code concepts we need to run compilers more efficiently, and the fewer ways to represent the same thing we have, the easier it is for the compiler.

The cost of manually writing two ops instead of one pales in comparison to the costs of maintaining multiple pattern matchers for multiple representations of the same thing, or worse, stiff representations that do not allow the compiler to split/join/replace ops in the most optimal way.

pablo · December 6, 2023, 2:27pm

I get your point and I agree with it. However, following that ideas, I think there would be ops that should not exist. To provide an example related to the memref dialect, I can think of realloc. I understand that calloc can be expressed as alloc + fill, but couldn’t realloc be expressed as alloc + copy?

pablo · December 6, 2023, 2:47pm

After thinking about this I think the best way to solve this memref.calloc would simply to lower the linalg.generic to a proper memset. Sure, we still have different ops but it’s not so bad as I initially thought. After your comment now I see it in a different way.

rengolin · December 6, 2023, 2:54pm

It can and probably should. The one problem I see is if liveness analysis isn’t good enough, this could lead to increased memory usage. This could have been the rationale behind this one, I don’t honestly know.

This is the eternal balance between basic vs. complex operations, native support vs. intrinsics or plugins, etc. There isn’t one right answer for everything, and the answers change with time, as the compiler improves.

The rule of thumb I follow for IR ops is: if it will make a big difference for the compiler to have a new op, it would be worth adding it. If the difference is only how humans read it, meh. If it could harm the compiler’s ability to perform analyses and transforms, usually not worth it.

pablo · December 6, 2023, 3:10pm

All this reasoning makes sense if we consider adding a new op. However, adding a new attribute kind wouldn’t change anything in the pattern matching. The memref.alloc would have the same semantics and could be matched in the same way, but it would simply be more powerful since it would be able to decide whether memory is uninitialized or not (other question is if the pattern matching should care about the kind attribute or not, but it wouldn’t have to). Or do you think adding a new attribute would trigger other concerns?

rengolin · December 6, 2023, 4:29pm

Similar concerns, really. When I match the alloc op and I find there’s an attribute I don’t know about, I have to stop. And if I need to know about the attribute, it’s the same as knowing about a different op.

Instead of matching two different ops, I need to match the same op in two different “modes”. Not a huge difference.

mehdi_amini · December 7, 2023, 5:05am

Actually: alloc + copy + free (the original pointer after copied to the new one)

Topic		Replies	Views
What is the strategy for tensor->memref conversion? (bufferization) MLIR	25	2549	November 9, 2020
Chains of unrealized casts MLIR	7	780	July 29, 2022
Is there a way to randomize memref or assign values to memref one by one? MLIR	2	235	February 25, 2023
[RFC] Compile-time memref.alloc Scheduling/Merging optimization MLIR	18	731	May 17, 2024
Need help on code generation semantics for memref MLIR	3	340	September 30, 2020

[RFC] Adding memref.calloc

Related Topics