MLIR way of hardware specific constraints


I wonder whether there is a best or “MLIR way” to set hardware constraints.

Let’s say you have a multi-core architecture with n cores. Each core has a memory of size k. MLIR needs to know this constraint (e.g., to limit the number of variables stored in a tile).

What is the best way to handle this? Should we store it in the .cpp source files? Or can we implement it in tablegen?

The details matter here quite a bit, so there isn’t a general answer, imo. However, as some general rules, this sounds like the case you are thinking of falls into optimization territory (vs correctness, where you must never generate code that violates a constraint). The former is typically some kind of option, whereas, depending on the situation, the latter could be “built into” some code generation flow directly.

In IREE, we try to boil such constraints down to fundamental options and then have an attribute on the module which controls it. The user level tool drivers will typically have CLI flags that result in populating the attribute for user controllable things. The advantage to this (vs using flags directly) is that source programs are self contained, reproducers are exact, etc.

IREE also goes to some length to insulate code generation from variation in such volatile settings. While it should always be possible to generate a binary perfectly suited for a specific machine, for general purpose targets, we prefer to generate code which detects such constraints at runtime and does the right thing. The use case and target here matter a lot.

In general we’re missing the concept of “target” (even though we have Data Layout Modeling - MLIR which kind of support what you’re asking for here).

LLVM optimizers are using the TTI (TargetTransformInfo) to abstract away optimization heuristic from the target.

It is a bit harder to figure out how to design this in MLIR considering it isn’t as uniform as LLVM.

First of all, thank you for your answers!

After looking at IREE, I think @stellaraccident has a point in correctness vs. optimization. We are targeting correctness in our understanding.

We want to use MLIR for domain-specific accelerators. We assume that each of our cores only can host a limited number of operations. This modeling, in our understanding, is a correctness issue.

Where do we place this constraint? We currently have them as a struct in our .cpp sources, which seems odd. We thought it would be preferred to put them in the dialect’s tablegen file. What would be the standard method?

I am still new to MLIR, so maybe I am missing something. I appreciate your help!

You can store this information in a structured form in an attribute on the IR mapping unit you have (eg. ModuleOp, FuncOp, etc.). By “limited number of operations” I couldn’t tell if you meant a “limited count” or a “limited set of operation types”. Assuming it’s the latter, the names of the supported ops can go into a string array attribute. But again, all of this depends on a lot of other details surrounding the design of your compiler/code generator.