[RFC] New dialect to expose handy utilities

Menooker · May 17, 2024, 3:24am

My initial intension was to add a printf op which lowers to LLVM dialect for CPU and it now ends up adding a new dialect. There were some discussions on the PR for this op.

Overview

This RFC for a new dialect is to enable an easy-to-use way of calling some of the “handy and small functions” in MLIR, which are implemented by the C standard runtime and other runtime functions. This implies that the ops in this dialect are lowered to function calls on CPU.
The scope of the dialect should be limited to the operations that are useful for common developers of MLIR, and are

already implemented in existing libc runtime. e.g. printf, abort
easy to implement in C/C++/other languages. e.g. timer, expect_close, print
simple control flow wrappers on functions above, like assert, scoped_timer, bench_loop

The ops are useful for developers, to debug, benchmark and test their MLIR program. There exists some direct call to runtime functions in CRunnerUtils in current FileCheck testing. The function calls may be replaced by printf ops, for example.

The naming of this new dialect is not finalized. Maybe util, c_runner_util or dev?

Note: The ideas of the checking operations and timers operations are originated from TPP project’s check, perf

Proposed ops

This is not a finalized list of the operations. Again, I am currently be most interested in printf op. This section is to facilitate discussions on what can be added to this dialect.

printf

C-style printf. Similar Op definitions of gpu.printf, except that it is for cpu.

runtime.printf "Hello world %f %d %lld\n" %t, %t2, %t3 : f32, i32, i64

abort

Calling abort() of libc

assert

Accepts a predicate of i1, and an optional StringAttr for the failure message.

runtime.assert %0, "should be true!"

Can be lowered to SCF like:

scf.if(%0) {

} else {
    runtime.printf "Error at XXX.mlir, Line123. should be true!"
    runtime.abort
}

The source location in the error message can be extracted by the Location of the assert op.

We can have an option at the lowering pass, to remove the assertions for performance.

Will assert be useful to improve the safety of the generated code? Memref dialect can use assert to make sure the user-passed memref in func args have the claimed rank and dimensions at the function entry. Another useful feature based on assert is the boundary checking on memory accesses on memref. I am not sure if these features are already in MLIR or not.

Generic print

It is possible to provide a generic print Op for generic types. This print op can overload for various types like memref and tensor, to print the result of them. The op can be lowered to a function call in the existing printing primitives

Timer and benchmark ops

The operations exposes the access to timing APIs implemented in C. Based on that, we can introduce %t = bench_loop: i64 {...} op which wraps around a Block, runs it for several times and returns the execution time of the block. A working implementation at downstream

Arithmetic checking ops

expect_true, expect_almost_same, expect_sane for memref and tensor. A working implementation here

Implementation of the runtime in C++

There are already some runtime functions in ExecutionEngine/CRunnerUtils. We can put the new runtime C++ functions there.

stellaraccident · May 17, 2024, 3:55am

Thanks for the RFC.

I know you said naming is not finalized, but that’s what I’m going to comment on

I’d really like to not open up a “std dialect v2” as a core dialect, which is where I think this would lead, at least with a name like “runtime” contributed directly to MLIR proper.

Every MLIR project I’ve worked on has a similar grab bag of utility ops but I don’t think any of them are “at risk” of being complete or general purpose enough to go upstream as some kind of central thing that stands alone, survives the test of time, etc.

For those reasons I’d be -1 on going forward with this as proposed. However, I’d feel differently if the naming or organization of the dialects was such that we could have these little “utility library” dialects available for use without giving them a central/weighty name like “runtime”. (And if we’re headed to that kind of catalog of this, we should really separate that from the core MLIR infra and dialects)

I know the current organization of things is limiting. And I know I’m a predictable squeaky wheel on that. But I think we need to tend to this before heading off in a lot of new directions. I could be convinced to not carp on this point for this case, if the naming was such that it connoted more of “a library in the universe” vs a central noun, implying greater universality than what it is.

Menooker · May 17, 2024, 4:08am

What about names like cpu_utils, dev or debug dialects, to limit the scope to either frequently used utilities, or developer helper ops?

As you said, downstream MLIR projects are re-inventing wheels for those util ops. I think some of them are universal - like printf and assert, and maybe expect and timer too.

stellaraccident · May 17, 2024, 4:14am

Something along those lines sits better. But I’ve probably taken my five minutes and will leave this for some others to comment.

mehdi_amini · May 17, 2024, 4:52am

You seem to be mixing up two things in the proposal: libc (which is standard) and some custom / testing utilities. Aren’t these quite fundamentally different?

Something that would benefit from some more motivation in the RFC may be to explain a bit better how a dialect here is pulling its weight instead of relying on func.call as it is done today for CRunnerUtils?

Menooker · May 17, 2024, 5:25am

My original purposal was to add cpu printf Op to MLIR. And I am finding/creating a dialect for its home. I kind of give up the idea of libc dialect and turn to a cpu_util or dev or whatever dialect. That’s why I did not mention libc in this RFC.

Regarding printf itself, I don’t think currently MLIR can call printf with core dialects, because FunctionType does not support varargs right now. (?)
Another reason of introducing the ops as the wrapper of C-functions is that users need to declare the external functions in the module before using them, which introduces boilerplate code. They also need to correctly handle the function signatures.
Regarding the current CRunnerUtils, it introduces functions like printF32 printF64, etc., and printer functions for formatting. We can consolidate them in a single printf Op. BTW, we can improve printf to handle fixed sized types like si32 f32
This dialect also introduces compositive Ops, like assert and bench_loop. They can be lowered to scf or llvm and calls to runtime functions. These Ops are better implemented in MLIR, instead in runtime as a whole.

mehdi_amini · May 17, 2024, 5:53am

You #1 items in the RFC is “already implemented in existing libc runtime. e.g. printf, abort” ; I took it that you intended to expose libc functions, isn’t it the case then?

Don’t we have an example of calling printf from the Toy tutorial already?

Anyway, having a dialect that would map our own CRunnerUtils seems OK to me, the easiest naming maybe just that: crunner_utils.

Menooker · May 17, 2024, 6:56am

Sorry for the making the confusion. I mean I only wanted to include the “handy utilities” that are defined in libc runtime. Not all functions in standard libc are useful in MLIR, and I am not trying to introduce them all. I have updated the RFC post to remove the dialect name “runtime” and make it clearer the scope of the dialect.

As is suggested in the comments:

I am trying to limit the scope of this dialect for “utilities”.

Yes, indeed. But it also directly lowers to LLVM instead of extern func.func. My point was that, for printf, we cannot just declare an extern function plus a func.call.

rengolin · May 17, 2024, 9:30am

utils?

Now, seriously, last year we planed to upstream perf and check, but not as is, and this is why we didn’t.

In my mind, the check dialect ops we have should really be part of the type dialects (like tensor, memref, vector) and the perf ops should really be part of the scf dialect.

But this leaves printf orphaned… I’d really like to have printf functionality in a var_args kind of way in MLIR, even if all it did was to lower to LLVM dialect.

cf.assert already exists, exactly like that:

rengolin · May 17, 2024, 11:03am

Agree. I think as a first approach, just having a helper to create the sequence (call + declaration + marshalling), using some extended functionality in the crunner_utils would give us the iteration on the design.

After this, deciding on an op that replaces that would be a matter of nomenclature and dialect hierarchy, not implementation or design.

mehdi_amini · May 17, 2024, 1:53pm

Right, but in the context of printf, which takes a well defined string format in terms of C-like types, you can always directly emit a llvm.call.
A “print” operating on higher-level / custom type is possible, but lowering to printf from there does not seem straightforward to me: you would some sort of type interface to handle how to construct the format for example.

clattner · May 17, 2024, 3:57pm

fwiw, I agree with Stella’s general concern upthread - having a set of ops “in a dialect in case they’re useful” is what led to dialect creep in std back in the day. If you want to have a dialect focused on the libc, then you could call it “crt” (c runtime) and that would provide a focus for it, but it shouldn’t be a “utils” dialect that would grow beyond that scope to “other useful stuff”.

Assert in particular is not a C runtime symbol entrypoint, it is a macro.

Menooker · May 20, 2024, 2:18am

The users need to introduce LLVM dialect at an very early stage, instead of using a target-independent Op. Note that MLIR can be lowered to non-LLVM targets. Users also need to insertOrGet the printf function from the module Op, and manually handle the format string as i8 global array in LLVM dialect. These efforts are non-trivial.

Menooker · May 22, 2024, 7:44am

Hi, everyone! I would like to discuss the future direction of this RFC. My initial purpose of this RFC and PR is to add a cpu.printf op. gpu dialect has a printf op, why don’t we have a cpu one?

After re-thinking the RFC, I think maybe we should introduce a dialect named like cpuruntime to hold the runtime functions for cpu only. We can provide some similar ops in gpu, like memcpy thread_id?

Topic		Replies	Views
[RFC] Add a printf op MLIR	17	1745	December 8, 2021
Stuck code reviews (MLIR GPU interfacing) MLIR	2	593	November 17, 2021
[RFC] Splitting the Standard dialect MLIR	13	2969	December 5, 2020
MLIR News, 10th edition (6/26/2020) Newsletter	0	1432	June 15, 2020
[RFC] Dialect type cast op MLIR	14	1970	April 29, 2020