Merry Christmas, Happy New Year, Happy Hanukkah, and all the other holidays to all the fine MLIR people
TL;DR
I’d like to announce the first release of eudsl
[1].
Currently, the project is the sum total of three components:
- eudsl-tblgen: Python bindings to
libLLVMTableGen
; - eudsl-nbgen: A source-to-source translator that translates MLIR headers[2] into direct
nanobind
bindings; - eudsl-py: Direct Python bindings to MLIR, generated using
eudsl-nbgen
.
Before I say a little about why the heavy emphasis on Python despite prior aspiring to support all/more languages, here’s a colab that demos eudsl-py
, which I suppose is the most intriguing part (or most confusing ):
Why Python
The stated goals of eudsl
were (and continue to be) enabling all language frontends to target MLIR.
My initial idea for achieving that goal was to extend upstream’s libMLIRTableGen
. But who wants to write C++ to just munge strings when there are so many better string munging languages. Hence eudsl-tblgen
, a fairly complete, direct to C++, binding of libLLVMTableGen
. Note, these first bindings were for libLLVMTableGen
because one needs to be able to build and manage the actual llvm::RecordKeeper
to pass to the various functions in mlir-tblgen
.
In order to bootstrap those bindings (i.e., because I’m lazy), I wrote a little source-to-source translator that just blindly emitted stuff like
nb::class_<Record>(m, "Record")
.def_prop_ro("id", &Record::getID)
.def_prop_ro("name", &Record::getName)
which is of course made possible by nanobind
’s very nice templates. But then I massaged those bindings and the plan wasn’t to build out a source-to-source translator[3].
But having done all of this binding I/you/one immediately runs into a problem with the approach (enabling writing ODS backends in Python): ODS isn’t actually a spec and much of the semantics of ODS is actually buried in the implementation of mlir-tblgen
and libMLIRTableGen
; e.g., when exactly are InferTypeOpInterface
traits emitted . So you end up having to not only bind lots of stuff from
libMLIRTableGen
but also rewrite lots of stuff now in Python against those new bindings. Not fun and probably all the way at the right end of the spectrum between “high impact” and “vanishing/diminishing returns”.
So what to do? The answer in meme form:
So that’s what I did: I wrote a clang::ASTFrontendAction
to crawl the ODS generated headers and emit nanobind
bindings. And shockingly enough (primarily because nanobind
is so nice) it worked/works pretty well. This post is already pretty long so I won’t go into weedy details but probably like 90% of the methods in 90% of the classes have working, generated, bindings for them. The obvious/known absences are templated things like the adaptors
template <typename RangeT>
class AddFOpGenericAdaptor : public detail::AddFOpGenericAdaptorBase
But dialect ops, attributes, types, and enums all work[4]; e.g., if you go to the colab above you will see
shape = SmallVector[np.int64]([10, 10])
f32_ty = Float32Type.get(ctx)
memref_ty = MemRefType.Builder(ArrayRef(shape), f32_ty).memref_type()
td = nvgpu.TensorMapDescriptorType.get(
ctx,
memref_ty,
nvgpu.TensorMapSwizzleKind.SWIZZLE_64B,
nvgpu.TensorMapL2PromoKind.L2PROMO_64B,
nvgpu.TensorMapOOBKind.OOB_NAN,
nvgpu.TensorMapInterleaveKind.INTERLEAVE_16B,
)
# !nvgpu.tensormap.descriptor<
# tensor = memref<10x10xf32>,
# swizzle = swizzle_64b,
# l2promo = l2promo_64b,
# oob = nan,
# interleave = interleave_16b
# >
print(td)
which I’m happy about because a perennial sore spot about our upstream bindings is that one has to bind all of these by hand.
A Few technical notes
- Yes this does require doing the unthinkable: it requires building LLVM with
-frtti
. Definitely not upstreamble; - This uses
libLLVM.so
andlibMLIR.so
and has the nice side-effect that multiple downstream users (of these bindings) could conceivably use the same base set of bindings (assuming same compile flags etc etc etc); - The bindings are generated at build time of the host project so the compile flags are correct/matching the flags required by the LLVM distro (I forward
LLVM_DEFINITIONS
both at parse and build time); nanobind
says it compiles 4x faster and that might be true (I didn’t compare againstpybind
) but it’s still an egregiously long compile by default - single TU for all the bindings will timeout the 6 hour limit on GHA and takes hours even on my M1 mbp. To compensate I had to implement a similar sort of sharding as upstream;- Windows isn’t currently supported but could/might be in the future.
Fringe benefits
In “binding all the things” I discovered a lot of dangling header decls from extraClassDeclaration
and elsewhere:
[mlir][arith] DCE getPredicateByName
[mlir][scf] DCE unimplemented decls in TDs
[mlir][llvmir] implement missing attrs getChecked
[mlir][emitc] DCE unimplemented decls
[mlir][linalg] DCE unimplemented extra decl
[mlir][shape] DCE unimplemented extra decl
So at least that’s good.