Google’s TensorFlow team would like to contribute MLIR to the LLVM Foundation

Hi all,

The TensorFlow team at Google has been leading the charge to build a new set of compiler infrastructure, known as the MLIR project. The initial focus has been on machine learning infrastructure, high performance accelerators, heterogeneous compute, and HPC-style computations. That said, the implementation and design of this infrastructure is state of the art, is not specific to these applications, and is already being adopted (e.g.) by the Flang compiler. If you are interested in learning more about MLIR and the technical design, I’d encourage you to look at the MLIR Keynote and Tutorial at the last LLVM Developer Meeting.

MLIR is already open source on GitHub, and includes a significant amount of code in two repositories. “MLIR Core” is located in github/tensorflow/mlir, including an application independent IR, the code generation infrastructure, common graph transformation infrastructure, declarative operation definition and rewrite infrastructure, polyhedral transformations etc. The primary TensorFlow repository at github/tensorflow/tensorflow contains TensorFlow-specific functionality built using MLIR Core infrastructure.

In discussions with a large number of industry partners, we’ve achieved consensus that it would be best to build a shared ML compiler infrastructure under a common umbrella with well known neutral governance. As such, we’d like to propose that MLIR Core join the non-profit LLVM Foundation as a new subproject! We plan to follow the LLVM Developer Policy, and have been following an LLVM-style development process from the beginning - including all relevant coding and testing styles, and we build on core LLVM infrastructure pervasively.

We think that MLIR is a nice complement to existing LLVM functionality, providing common infrastructure for higher level optimization and transformation problems, and dovetails naturally with LLVM IR optimizations and code generation. Please let us know if you have any thoughts, questions, or concerns!

-Chris

Hi, Chris, et al.,

I support adding MLIR as an LLVM subproject. Here are my thoughts:

  1. MLIR uses LLVM. LLVM is one of the MLIR dialects, MLIR is compiler infrastructure, and it fits as a natural part of our ecosystem.

  2. As a community, we have a lot of different LLVM frontends, many of which have their own IRs on which higher-level transformations are performed. We don’t currently offer much, in terms of infrastructure, to support the development of these pre-LLVM transformations. MLIR provides a base on which many of these kinds of implementations can be constructed, and I believe that will add value to the overall ecosystem.

  3. As a specific example of the above, the current development of the new Flang compiler depends on MLIR. Flang is becoming a subproject of LLVM and MLIR should be part of LLVM.

  4. The MLIR project has developed capabilities, such as for the analysis of multidimensional loops, that can be moved into LLVM and used by both LLVM- and MLIR-level transformations. As we work to improve LLVM’s capabilities in loop optimizations, leveraging continuing work to improve MLIR’s loop capabilities in LLVM as well will benefit many of us.

  5. As a community, we have been moving toward increasing support for heterogeneous computing and accelerators (and given industry trends, I expect this to continue), and MLIR can facilitate that support in many cases (although I expect we’ll see further enhancements in the core LLVM libraries as well).

That all having been said, I think that it’s going to be very important to develop some documentation on how a frontend author looking to use LLVM backend technology, and a developer looking to implement different kinds of functionality, might reasonably choose whether to target or enhance MLIR components, LLVM components, or both. I expect that this kind of advice will evolve over time, but I’m sure we’ll need it sooner rather than later.

Thanks again,

Hal

Overall, I think it will be a good move.

Maintenance wise, I'm expecting the existing community to move into
LLVM (if not all in already), so I don't foresee any additional costs.

Though, Hal's points are spot on...

3. As a specific example of the above, the current development of the new Flang compiler depends on MLIR.

Who knows, one day, Clang can, too! :slight_smile:

5. As a community, we have been moving toward increasing support for heterogeneous computing and accelerators (and given industry trends, I expect this to continue), and MLIR can facilitate that support in many cases (although I expect we'll see further enhancements in the core LLVM libraries as well).

Yes, and yes! MLIR can become a simpler entry point into LLVM, from
other languages, frameworks and optimisation plugins. A more abstract
representation and a more stable IR generation from it, could make
maintenance of external projects much easier than direct connections
of today. This could benefit research as much as enterprise, and by
consequence, the LLVM project.

That all having been said, I think that it's going to be very important to develop some documentation on how a frontend author looking to use LLVM backend technology, and a developer looking to implement different kinds of functionality, might reasonably choose whether to target or enhance MLIR components, LLVM components, or both. I expect that this kind of advice will evolve over time, but I'm sure we'll need it sooner rather than later.

Right, I'm also worried that it's too broad in respect to what it can
do on paper, versus what LLVM can handle on code.

With MLIR as a separate project, that point is interesting, at most.
When it becomes part of the LLVM umbrella, then we need to make sure
that MLIR and LLVM IR interact within known boundaries and expected
behaviour.

I'm not saying MLIR can't be used for anything else after the move,
just saying that, by being inside the repo, and maintained by our
community, LLVM IR would end up as the *primary* target, and there
will be a minimum stability/functionality requirements.

But perhaps more importantly, as Hal states clearly, is the need for
an official specification, similar to the one for LLVM IR, as well as
a formal document with the expected semantics into LLVM IR. Sooner,
indeed.

cheers,
--renato

Hi Renato,

Thank you for your kind words. If you are interested, the documentation for MLIR is located here:
https://github.com/tensorflow/mlir/blob/master/g3doc/

Including a bunch of content, eg a full langref doc:
https://github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md

-Chris

FWIW: +1 from me. Personally, I am very excited about this.
I cannot speak on behalf of Arm, but I haven’t heard about any concerns either.

Thanks Chris, that looks awesome!

This one could perhaps be improved with time:
https://github.com/tensorflow/mlir/blob/master/g3doc/ConversionToLLVMDialect.md

Which I think was Hal's point. If we had a front-end already using it
in tree, we could be a bit more relaxed with the conversion
specification.

I remember when I did the EDG bridge to LLVM, I mostly repeated
whatever Clang was doing, "bug-for-bug". :slight_smile:

A cheeky request, perhaps, for the Flang people: they could help with
that document on what they have learned using MLIR as a front-end into
LLVM IR.

We get some common patterns written down, but also we get to review
their assumptions earlier, and make sure that both Flang and MLIR
co-evolve into something simpler.

cheers,
--renato

Overall, I think it will be a good move.

Maintenance wise, I’m expecting the existing community to move into
LLVM (if not all in already), so I don’t foresee any additional costs.

Though, Hal’s points are spot on…

  1. As a specific example of the above, the current development of the new Flang compiler depends on MLIR.

Who knows, one day, Clang can, too! :slight_smile:

  1. As a community, we have been moving toward increasing support for heterogeneous computing and accelerators (and given industry trends, I expect this to continue), and MLIR can facilitate that support in many cases (although I expect we’ll see further enhancements in the core LLVM libraries as well).

Yes, and yes! MLIR can become a simpler entry point into LLVM, from
other languages, frameworks and optimisation plugins. A more abstract
representation and a more stable IR generation from it, could make
maintenance of external projects much easier than direct connections
of today. This could benefit research as much as enterprise, and by
consequence, the LLVM project.

Thanks for the great summary, this is exactly my view as well!

That all having been said, I think that it’s going to be very important to develop some documentation on how a frontend author looking to use LLVM backend technology, and a developer looking to implement different kinds of functionality, might reasonably choose whether to target or enhance MLIR components, LLVM components, or both. I expect that this kind of advice will evolve over time, but I’m sure we’ll need it sooner rather than later.

Right, I’m also worried that it’s too broad in respect to what it can
do on paper, versus what LLVM can handle on code.

With MLIR as a separate project, that point is interesting, at most.
When it becomes part of the LLVM umbrella, then we need to make sure
that MLIR and LLVM IR interact within known boundaries and expected
behaviour.

I’m not saying MLIR can’t be used for anything else after the move,
just saying that, by being inside the repo, and maintained by our
community, LLVM IR would end up as the primary target, and there
will be a minimum stability/functionality requirements.

I fully agree with everything you wrote! :slight_smile:
I really hope that MLIR can succeed as an enabler for users to plug into the LLVM ecosystem.

As an example of something that MLIR is trying to solve elegantly on top of LLVM is helping with heterogeneous computing.
Today a compiler framework that would try to support a device accelerator (like a GPU) would need to manage outside of / above LLVM how to split the host and device computation. MLIR allows to have both in the same module, and providing some convenient facility for the “codegen” and integration with LLVM.

This is still a work in progress, but if you look at this IR: https://github.com/tensorflow/mlir/blob/master/test/mlir-cuda-runner/gpu-to-cubin.mlir#L6-L11

The lines I highlighted are defining a GPU kernel, wrapped in a “gpu.launch” operation. The mlir-cuda-runner is a command line tool that tests will run passes to separate the kernel GPU code from the host code, and emit the LLVM IR in two separate LLVM modules: one for the GPU kernel (using the NVPTX backend) and another one for the host. Then everything is ran through a JIT (assuming you have CUDA and a compatible GPU installed).

In the example above, LLVM is directly used for both the host and the kernel, but there is also a Vulkan/SPIR-V emitter (instead of NVPTX) in the work. In this case LLVM would be used for providing the JIT environment and for the host module, but not the kernel (at least not unless there is a SPIR-V backend in LLVM).

Fundamentally MLIR is very extensible, and let the user define their own abstraction and compose on top of whatever the community will want to propose in the core.

We proposed a tutorial for the US Dev Meeting in which we planned to show how this layers and compose with LLVM in detail, but there are already so many great tutorial sessions in the schedule that we couldn’t get a slot.
In the meantime we are currently still revamping our online tutorial in the coming weeks (https://github.com/tensorflow/mlir/blob/master/g3doc/Tutorials/Toy/Ch-1.md) to make it more representative.

Hope this helps.

Including a bunch of content, eg a full langref doc:
https://github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md

Thanks Chris, that looks awesome!

This one could perhaps be improved with time:
https://github.com/tensorflow/mlir/blob/master/g3doc/ConversionToLLVMDialect.md

Which I think was Hal's point. If we had a front-end already using it
in tree, we could be a bit more relaxed with the conversion
specification.

Don’t worry, Flang is coming soon :-).

In all seriousness, if you didn’t notice, the Flang team is planning to give a talk at LLVMDev in a month or so about Flang + MLIR. I’d also love to see a round table or other discussion about MLIR integration at the event.

The topic of Clang generating MLIR is more sensitive and I think it is best broached as a separate conversation, one motivated with data. I think that Clang generating MLIR can be a hugely positive thing (witness the explosion of recent proposals for LLVM IR extensions that are easily handled with MLIR) but it seems more conservative and logical to upgrade the existing Clang “CFG" representation to use MLIR first. This brings simple and measurable improvements to the reliability, accuracy, and generality of the data flow analyses and the Clang Static Analyzer, without introducing a new step that could cause compile-time regressions. Iff that goes well, we could consider the use of MLIR in the main compilation flow.

In any case, I hope that "Clang adoption" is not considered to be a blocker for MLIR to be adopted as part of the LLVM project. This hasn’t been a formal or historical requirement for new LLVM subprojects, and I’d like to make sure we don’t put undue adoption pressure on Clang - it is important that we are deliberate about each step and do the right (data driven) thing for the (huge) Clang community.

-Chris

In all seriousness, if you didn’t notice, the Flang team is planning to give a talk at LLVMDev in a month or so about Flang + MLIR. I’d also love to see a round table or other discussion about MLIR integration at the event.

Ah, the title was just "Flang update", I didn't check the abstract.
Looking forward to it.

The topic of Clang generating MLIR is more sensitive and I think it is best broached as a separate conversation, one motivated with data. I think that Clang generating MLIR can be a hugely positive thing (witness the explosion of recent proposals for LLVM IR extensions that are easily handled with MLIR) but it seems more conservative and logical to upgrade the existing Clang “CFG" representation to use MLIR first. This brings simple and measurable improvements to the reliability, accuracy, and generality of the data flow analyses and the Clang Static Analyzer, without introducing a new step that could cause compile-time regressions. Iff that goes well, we could consider the use of MLIR in the main compilation flow.

Totally agreed!

In any case, I hope that "Clang adoption" is not considered to be a blocker for MLIR to be adopted as part of the LLVM project. This hasn’t been a formal or historical requirement for new LLVM subprojects, and I’d like to make sure we don’t put undue adoption pressure on Clang - it is important that we are deliberate about each step and do the right (data driven) thing for the (huge) Clang community.

Absolutely.

It doesn't make sense to put artificial orthogonal constraints, when
we know the implementation would raise more questions than answer and
could take years to get right. I'm hoping by adding MLIR first, we'd
have a pretty solid use case and the eventual move by Clang, if any,
would be smoother and more robust.

I agree with this proposal being the first step. I'm also personally
happy with the current level of docs and progress of Flang.

LGTM, thanks! :smiley:

--renato

Renato Golin via llvm-dev <llvm-dev@lists.llvm.org> writes:

But perhaps more importantly, as Hal states clearly, is the need for
an official specification, similar to the one for LLVM IR, as well as
a formal document with the expected semantics into LLVM IR. Sooner,
indeed.

+1. There are all kinds of scattered documents on the TensorFlow site
talking about MLIR, the affine dialect, etc. but nothing of the quality
and approachability of LLVM's language reference. I find it difficult
to pull all the pieces together.

Of course by its nature, MLIR doesn't lend itself to concrete semantic
descriptions, though I would expect the affine dialect (and others) to
have documentation on par with the LLVM IR. For MLIR itself, I would
want documentation somewhat less dense than the current BNF-style
specification.

Does the current proposal only cover adding the base MLIR to the LLVM
project, or also the affine dialect and possibly others? The affine
dialect could certainly be quite useful for many projects.

                         -David

There are two talks about Flang, the one about MLIR is: http://llvm.org/devmtg/2019-10/talk-abstracts.html#tech19

Renato Golin via llvm-dev <llvm-dev@lists.llvm.org> writes:

But perhaps more importantly, as Hal states clearly, is the need for
an official specification, similar to the one for LLVM IR, as well as
a formal document with the expected semantics into LLVM IR. Sooner,
indeed.

+1. There are all kinds of scattered documents on the TensorFlow site
talking about MLIR, the affine dialect, etc. but nothing of the quality
and approachability of LLVM’s language reference. I find it difficult
to pull all the pieces together.

One of the main reason we haven’t invested in a proper website and documentation was in anticipation of a possible integration in LLVM, so we didn’t prioritize what I saw as throw-away work.
We’re looking forward to have a space on llvm.org for MLIR and build great online docs there!

Of course by its nature, MLIR doesn’t lend itself to concrete semantic
descriptions, though I would expect the affine dialect (and others) to
have documentation on par with the LLVM IR.

Just last week I had to scout through the affine dialect “LangRef” for something, and I also felt that it is due for a refresh! It seemed a bit more than just BNF though, do you have example of what you would like to see expanded there?

And to be clear: the ambition should be that the dialects included in-tree (in MLIR/LLVM) get some level of documentation on-par with LLVM LangRef.

For MLIR itself, I would
want documentation somewhat less dense than the current BNF-style
specification.

Does the current proposal only cover adding the base MLIR to the LLVM
project, or also the affine dialect and possibly others? The affine
dialect could certainly be quite useful for many projects.

The current proposal includes all the content of https://github.com/tensorflow/mlir/ as-is.
It does not include the TensorFlow specific dialects and other pieces here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/

Best,

Mehdi AMINI <joker.eph@gmail.com> writes:

Of course by its nature, MLIR doesn't lend itself to concrete semantic

descriptions, though I would expect the affine dialect (and others) to
have documentation on par with the LLVM IR.

Just last week I had to scout through the affine dialect "LangRef
<https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/Affine.md>"
for something, and I also felt that it is due for a refresh! It seemed a
bit more than just BNF though, do you have example of what you would like
to see expanded there?

I was referring to the base MLIR documentation with the BNF comment:

https://github.com/tensorflow/mlir/blob/master/g3doc/LangRef.md

Obviously there's more to it than that but I found this document pretty
dense.

The current proposal includes all the content of
https://github.com/tensorflow/mlir/ as-is.
It does *not* include the TensorFlow specific dialects and other pieces
here:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/

Looks great, thanks for making it more clear!

                    -David

Oh I see, indeed this is a bit difficult to grasp: since MLIR is designed for extensibility, a lot of the core “LangRef” things are very structural by nature at this level. If I draw a parallel with LLVM IR, it is like if you would split LangRef into:

The MLIR LangRef corresponds to the former part only, because this is what is common to all dialects. On the other hand, each dialect will need to provide its own LangRef equivalent (for example I linked to the Affine dialect doc before).

Does it make sense?

For LLVM, I think the document is:
https://github.com/tensorflow/mlir/blob/master/g3doc/ConversionToLLVMDialect.md

It has some examples and some tips, but it needs more love. In the
end, we'd need three things: reasonable documents, at least one
implementation in-tree and good testing coverage.

As with any new technology that we introduce to LLVM, these things can
build up with time. Unlike them, however, MLIR is an existing project
with its own responsibilities. There will be a period of instability
for both projects as they are merged.

So, as long as we understand the costs and are willing to pay them,
each of the three things can come at "reasonable" time, after the
merge.

I'm assuming that the general approach is to, in case of conflict,
value LLVM's stability more than MLIR's during the transition. I'm
also assuming the transition will not take longer than one release
period (1/2 year).

I'm uncertain how the other projects that use MLIR will interact with
the LLVM community, but I'm also assuming that's a given, by wanting
to merge the two communities.

IE, I hope we don't get to the point where other users of MLIR want to
take a radically different direction as LLVM does, which would start a
conflict.

I don't think anyone here wants that, but it's good to be aware that
it could happen, and prepare for it.

cheers,
--renato

The MLIR LangRef corresponds to the former part only, because this is what is common to all dialects. On the other hand, each dialect will need to provide its own LangRef equivalent (for example I linked to the Affine dialect doc before).

For LLVM, I think the document is:
https://github.com/tensorflow/mlir/blob/master/g3doc/ConversionToLLVMDialect.md

For the LLVM dialect, the document is
https://github.com/tensorflow/mlir/blob/master/g3doc/Dialects/LLVM.md

It is a good question for such “interface” dialects whether we should describe the semantics of the operations (essentially copy it from the source), or just refer to the authoritative document (LLVM’s LangRef in this case). So far, we decided to say that operations in the LLVM dialect have the same semantics as LLVM IR instructions, but we had to describe their syntax since it differs. On the other hand, the operations that model IR concepts absent from MLIR IR (first-class constant values, globals) are defined with more detail. Suggestions on how to structure that document without much duplications are very welcome. Also note that the dialect currently covers ~60% of LLVM instructions and ~1% of intrinsics.

The document you referenced above is about the conversion between the Standard and the LLVM dialects. Similarly to dialect documents, the conversion document only describes the details of a specific A to B conversion. In particular, type conversion and CFG requirements. Admittedly, it does not describe how individual arithmetic operations are converted when it is a direct one-to-one mapping after type conversion. The conversion infrastructure itself is described in https://github.com/tensorflow/mlir/blob/master/g3doc/DialectConversion.md.

Yep, you’re right that MLIR is still early and we can build these things up over time.

One point of clarification though: MLIR was and has always been built with the idea that it would go to LLVM. This is why it has always followed the coding style, development practices, etc. The ‘instability’ that I expect is more about the GitHub infra changing (monorepo etc) than the code itself.

To put it another way, MLIR was built the way Clang was (both Clang and MLIR were a started as private projects that was eventually contributed to LLVM, with full revision control history). In contrast, MLIR isn’t being built the way LLDB was, which was a project that built up over time and then was later decided to move to LLVM.

-Chris

Mehdi Amini <aminim@google.com> writes:

The MLIR LangRef corresponds to the former part only, because this is what
is common to all dialects. On the other hand, each dialect will need to
provide its own LangRef equivalent (for example I linked to the Affine
dialect doc before).

Does it make sense?

Yeah, it makes perfect sense. I think maybe reading the document I got
caught up in the BNF grammar -- it's a bit distracting. I've not seen a
similar BNF specification for LLVM IR. It may exist somewhere, but BNF
isn't part of any LLVM document I've read. :slight_smile: Maybe the grammar bits
could be factored out into a formal specification and LangRef could be a
little more informal.

Just suggestions, obviously.

                         -David

That’s a good one, we should look into outlining the grammar to make this more friendly to read.

Another things that I just remember now about documentation is that we don’t expect dialects to write a “LangRef” that describe each operation. Instead we use a table-driven approach for defining operation and we generate both the C++ classes and the documentation from there (this helps keeping documentation up-to-date as well!).

From a local build directory of MLIR, you can try:

{BUILD}/bin/mlir-tblgen -gen-op-doc {SRC}/Dialect/StandardOps/Ops.td -I {SRC}/include/ > std.md

(try --gen-op-defs and --gen-op-decls for the C++ code)

I pushed these here for your convenience: https://github.com/joker-eph/mlir-docs/

See for example the definition for alloc: https://github.com/tensorflow/mlir/blob/master/include/mlir/Dialect/StandardOps/Ops.td#L124
From there here is:

Of course we need to improve the content in general, but I expect the incentive to do so to grow assuming we can get a space like http://llvm.org/mlir ; at which point we could organize the MLIR overall online doc structure to include these generated file continuously.

Best,

Mehdi AMINI <joker.eph@gmail.com> writes:

Another things that I just remember now about documentation is that we
don't expect dialects to write a "LangRef" that describe each
operation. Instead we use a table-driven approach for defining
operation
<https://github.com/tensorflow/mlir/blob/master/g3doc/OpDefinitions.md#table-driven-operation-definition-specification-ods>
and we generate both the C++ classes and the documentation from there
(this helps keeping documentation up-to-date as well!).

From a local build directory of MLIR, you can try:

  {BUILD}/bin/mlir-tblgen -gen-op-doc {SRC}/Dialect/StandardOps/Ops.td -I
{SRC}/include/ > std.md

(try --gen-op-defs and --gen-op-decls for the C++ code)

I pushed these here for your convenience:
https://github.com/joker-eph/mlir-docs/

Very nice!

Of course we need to improve the content in general, but I expect the
incentive to do so to grow assuming we can get a space like
http://llvm.org/mlir ; at which point we could organize the MLIR overall
online doc structure to include these generated file continuously.

That would be wonderful. Thanks for engaging on this!

                   -David