Hello everyone,
The Vector Dialect document discusses the vector abstractions that MLIR supports and tradeoffs. One of the layer that is missing in OSS atm is the Hardware Vector Ops (HWV) level.
I am interested in experimenting in core with an AVX512-specific dialect for the specific purpose of implementing portions of XNNPack in MLIR and benchmarking them.
I am proposing to add a new Dialect/Targets/AVX512
Dialect that would directly target useful intrinsics to implement XNNPack. The first function I am interested in implementing is exp-avx-512.
I think it is time for such dialects because atm, we rely too much on LLVM’s peephole optimizer to do a good job from small insertelement
/extractelement
/shufflevector
. We have some intrinsics defined and used in the LLVMDialect but these are all “portable” intrinsics, I am looking for defining layering to attack the right instructions in avx512 directly.
I think iterating at this level of abstraction in core will be a useful scouting work to getting the abstractions and layering right, and pave the way for a future ARM SVE dialect and other non-generic CPU dialects. Of course, when possible generic abstractions should be preferred. We also expect to learn more about when HW-specific vs generic abstractions should be used and how they compose in MLIR.
Edit: It was pointed that I should use the template for new dialects so here goes.
- What is the overall goal of the dialect?
Start filling the void in OSS between target-agnostic and target-specific vector operations.
- What is the first implementation milestone?
MLIR vector<16xf32>
to AVX512 LLVM intrinsics for the exp-avx-512 function.
- How does it fit into the MLIR dialect ecosystem?
It is the first HWV dialect(s) in OSS (see the Vector Dialect doc ).
- Connection: how does it connect to the existing dialects in a compilation pipeline(s)?
VectorOps -> AVX512-MLIR -> AVX512-LLVM -> LLVM
- Consolidation: is there already a dialect with a similar goal or matching abstractions; if so, can it be improved instead of adding a new one?
No
- Reuse: how does it generalize to similar but slightly different use-cases?
There will be different HW-specific dialects we want to target. The union of all the ops in HW-independent and HW-specific dialects will represent the set of valid ops for a particular Target.
- What is the community of users that it is serving?
CPU users who want performance with AVX512 intrinsics.
- Who are the future contributors/maintainers beyond those who propose the dialect?
Anyone interested in AVX512 and making it a successful target for MLIR.
Please let me know if you have questions or concerns.
Thanks all!