I have read that Linalg stands on the HHO level, which is higher than the MHA level which Affine stand, but I think I didn’t understand what it actually means when optimizing a DL op like conv2d. Could you please show me an example of that? And I didn’t quite clear what does the Linalg is intended to complement Affine rather than replace it mean.
I found we have linalg.conv_2d_nchw_fchw in Linalg IR. but after convert-linalg-to-loops, the conv op seems gone. So do we have any method to keep this op until lowering to LLVM, then call it in cudnn.so, or something like that?
All of these “rationale” docs are outdated to a different extent. We keep them as justification for past design decisions. We should probably add dates to them so people know what to expect.
These levels are purely a mental exercise at dialect layering. What it actually means is Linalg having a possibility to define and use a “conv2d” operation with the attached semantics. So the compiler can do understand that the operation is a 2d convolution. Affine operates on loops, so one would have to spell out the four loops that make up the 2d convolution. At this level, the compiler need to do some non-trivial analysis to understand that these four loops along with the operations in their body are in fact a 2d convolution.
This means that Linalg can be converted to Affine and further transformations can be performed on that representation if desired.
You literally asked the compiler to convert Linalg operations to loops. You got what you asked for – loops.
Not yet. It has been designed for such cases, but nobody actually needed that particular feature so nobody implemented it. It shouldn’t be hard to write as a pattern.
This is a 3+ year rationale doc for what has become the Affine dialect. It is not used in Linalg.
Besides, in my view, the features/benefits or design principles of Linalg are codegen-friendly, more expressive and higher-level primitives, composable and declarative transformation, smaller components and more usable, suitability for search/ML methods.
wow, so many benefits! Did I miss anything?
Yes, it will be just a RewritePattern that converts the op into either std.call or llvm.call, it is not really a “in linalg”. I suppose this must happen after bufferization though.
This one is a little more elaborate to explain, but basically you can think of it as a answering the question: assuming a set of primitive operands and types; when applying a transformation, should the resulting IR be expressible with the same primitives and types or not ?
This is an important question because it can pull you in different directions in the space of IR design vs analysis and expressiveness vs specialization. One concrete example is a simple linalg.matmul and how to represent a tiled version of that. This is a 3d operation (its iteration space is 3d) and let’s assume we tile each one of the 3 loops; which gives us 6d semantics. Here are 3 possible ways of implementing the tiling:
6 loops over scalar operations: this is what scf.for and affine would naturally do, you can use that if it makes sense for your use case.
one single 6d contraction op: this is appealing in principle but has a number of issues esp. related to the type system (statically known vs ?) when mixed with non-divisible boundary conditions. This is a valid design space but it does pull you into designing more and more stuff in the IR and potentially new types (i.e. n-D lists of n-D arrays).
3 loops over a 3d contraction op: this has some nice properties that I won’t go into details here and is what we settled on.
As a consequence, applying a transformation on Linalg does not try to produce a result that is a set of pure Linalg ops: it produces loops and Linalg. This which leads to better compositionality with the rest of the world and avoids bad reflexes of adding more IR design on top of more IR design which is a dangerous slippery slope.
I think I can say historically there has been a strong bias towards systems that try very hard to be closed under transformations and we make the conscious, opposite, choice (at least for now).
So the unclosed primitives(can we call it like this?) avoid adding more and more primitives in Linalg, as long as they appear in other dialects, and thus have an easier way to interact with other dialects. Then all primitives in all of these dialects can convert to llvm(or bytecode related with PDL dialect?) separately. Did I get that right?
To be honest, I think the current Linalg Dialect is complicated enough.
In a good first approximation, there is really one single op in Linalg: linalg.generic.
Many other ops are just configurations of that generic op to which we give a name: i.e. syntactic sugar.
For instance linalg.matmul is a linalg.generic configured to implement the computation: C(i, j) += A(i, k) * B(k, j); many other ops follow the same pattern, are specified with opdsl and are auto-generated from this spec file: core_named_ops.py - llvm/llvm-project - Sourcegraph.
For your particular case of linalg → library, that syntactic sugar name is load-bearing: that named op gets converted to a function name.
When performing transformations (e.g. tiling), we produce scf.for, tensor.extract_slice, tensor.insert_slice and other ops. One possibility would also be to produce a 6-d linalg.generic, which we don’t. In that sense, the Linalg dialect does not try to be closed under transformations and prefers to compose with the rest. In particular, we try hard to
I have converted the linalg IR with a library call function to an obj file and written a main() function and _mlir_ciface_print_memref_f32, _mlir_ciface_linalg_fill_f32_viewsxsxf32, _mlir_ciface_linalg_matmul_viewsxsxf32_viewsxsxf32_viewsxsxf32 functions to run it so far.
thank you again for your help!
A lot of things are missing to get this polished and to a point where it can scale and we can properly mix codegen + library calls. They are all quite unambiguous but potentially a lot of work depending on the degree of usability and pluggability one is interested in.
This is actually what I want to implement! But I found that the linalg, std and llvm toolchains have almost done everything!
Indeed, I really want to do something about that and post it as a RFC one day. But actually, I didn’t have a clear mind about it. Could you please give me some advice or what do you think about it?