Case Study Docs on Vector Dialect CPU Codegen

aartbik · August 27, 2020, 12:03am

In the past few weeks, @nicolasvasilache and myself have been writing a few MLIR case study docs on CPU codegen for the Vector dialect, following the principles of (1) building technology bottom-up, i.e. first make sure one level works really well before building the next level and (2) keeping low-level code generation as architectural-neutral as possible, for example, by using generic intrinsics (rather than CPU specific intrinsics, or even an intermediate, CPU specific dialect), which enables the LLVM backend to generate good code for e.g. x86-64 and AArch64 flavors alike, with only the need for changing a few simple parameters in the lowering strategies.

So far we have

AVX512 Codegen for the Vector Dialect Ops
Sparse Matrix Time Vector in the Vector Dialect
Transfer Operations in the Vector Dialect
A Simple Retargetable Matmul Strategy

The docs focus on AVX512, although the principles are more widely applicable. Furthermore, the docs are simple case studies, not fully worked out academic papers. Nevertheless, if there is a general interest, we can post the docs here on this forum (after some internal cleanup). Please let us know if that is something we should invest time in.

clattner · August 27, 2020, 9:04pm

I haven’t been following the details of the vector dialect, so I’d love to see this. How extensible is it to architectures with variable length vectors, and libcalls to high performance numeric library calls?

aartbik · August 27, 2020, 9:48pm

Thanks! At the moment, the “vectors” in the vector dialect are statically shaped, but we are thinking on how to extend this to variable-length with an eye on upcoming vector ISAs. Some of the docs indeed compare pure codegen with library calls, either done through alternative codegen paths or just done for comparison purposes.

dcaballe · August 28, 2020, 2:56am

Same here. I’d love to see it! I understand that it would be somehow a summary of the SIG presentation? Would it cover more cases?

clattner · August 28, 2020, 5:11am

Thanks Aart, I’m a huge fan of this work!

sjarus · August 28, 2020, 5:35pm

We’d be very interested in these documents here at Arm too! It would be helpful to implement dialects for AArch64 scalable vector implementations.

Suraj

aartbik · August 28, 2020, 11:09pm

It would be helpful to implement dialects for AArch64 scalable vector implementations

Our hope is to extend the vector dialect for this, rather than introducing a new dialect (the second principle i.e. keeping even “low-level” code in MLIR as architectural-neutral as possible). Do you foresee any major difficulties with that?

Other than that I am extremely happy to read interest! We will start posting PDFs in this thread as the docs get cleaned up.

sjarus · August 28, 2020, 11:34pm

That’s a very interesting perspective. We’d like to have more information on that intent of Vector dialect. Would the intended documentation also describe such a philosophical aim for the Vector dialect and some concrete detail too, or perhaps an RFC describing this more ? This would enable us to better respond, but certainly we would also prefer to work within the ethos of the Vector dialect.

aartbik · August 28, 2020, 11:53pm

Would the intended documentation also describe such a philosophical aim for the Vector dialect and some concrete detail too, or perhaps an RFC describing this more?

No, the docs scheduled for posting here are merely a qualitative and quantitative analysis of all the vector ops in simple-case study form. The variable-length part will probably come in the form of an RFC in the future.

But please keep the vector dialect in mind. A lot of the work we are doing is meant so that MLIR can obtain high performance for all backends without committing “too early” to that backend. For instance, by selecting the right generic intrinsic during lowering to LLVM IR, our hope is that LLVM knows what to do that for every possible backend, and for every possible SIMD flavor. So far we have been very happy how well LLVM fares in that regard.

sjarus · August 29, 2020, 5:42am

The recent SIG presentation on the Vector dialect was quite revealing indeed - a lot of work has gone into it since the last time it was presented in around May and it would be ideal if we can leverage that.

I can’t think of an immediate reason why we shouldn’t do what you advise, however we need to look at the Vector dialect more closely - we’ve been busy with other parts of our MLIR stack and need to catch up with your progress!

aartbik · September 1, 2020, 1:15am

Here is the first document in the series, providing an explorative qualitative and quantitative analysis of the AVX512 code that is generated for Vector dialect operations.

MLIR Case Study_ AVX512 CodeGen for VectorOps.pdf (619.3 KB)

clattner · September 1, 2020, 4:43am

Really nice study Aart!

aartbik · September 1, 2020, 4:59am

And here is the second document in the series, focusing on 1-D vector transfers. Note that this much shorter case-study really just supplements the first case-study, conducting a few experiments that did not fit the first document really well.

MLIR Case Study_ 1-D Vector Transfers.pdf (962.4 KB)

aartbik · September 2, 2020, 1:27am

And the third document in the series. This smaller document really started as a supplement to the first case-study, focusing on the newly introduced gather and scatter operations, but rather than just looking at microbenchmarks, my passion for sparse computations decided to look at something slightly more interesting.

MLIR Case Study_ Sparse Matrix Times Vector in the Vector Dialect.pdf (301.0 KB)

asaadaldien · September 18, 2020, 9:50pm

Here a document for benchmarking and assembly code generated for three different flavors of matmul micro kernels targeting AArch64. Which archives 90% theoretical peak performance AArch64_Codegen_For_Vector_Dialect.pdf (240.5 KB)

Topic		Replies	Views
[RFC] Vector Dialects: Neon and SVE MLIR	15	3345	December 8, 2020
[Abandoned][RFC] AVX512-specific Dialect for implementing and benchmarking XNNPack in MLIR MLIR	22	1725	March 4, 2020
MLIR News, 61st edition (28th Jan 2024) Newsletter llvm-weekly	0	494	January 27, 2024
Codegen Dialect Overview MLIR	11	17199	March 24, 2024
MLIR News, 60th edition (7th Jan 2024) Newsletter llvm-weekly	0	615	January 7, 2024

Case Study Docs on Vector Dialect CPU Codegen

Related topics