[RFC] Landing MDL in LLVM CodeGen

In November 2022, we published an RFC for a new Machine Description Language (MDL) for LLVM. The purpose of the MDL language is to support a much broader class of accelerator architectures in the CodeGen and MC libraries. We’ve done an initial pull request that includes just the baseline documentation for the project. The work in it’s entirety can be found at GitHub - MPACT-ORG/llvm-project at all, together with extensive documentation (in llvm/docs/Mdl).

Since it’s a large contribution, before we continue with PRs I’d like to encourage a bit more conversation with the community about the work, and try to actively address peoples’ questions and concerns.

A few notes about the status of the work:

  • This work grew out of a need to model much more complex architectures in LLVM, going back at least 15 years. More recently, we wanted to support Google’s TPU ML accelerators, and found it really challenging to do so in the existing infrastructure. The MDL directly addresses the issues we’ve had, and made it “easy” to support that class of architecture, as well as all the existing upstream architectures.
  • Support for MDL is integrated into the MC and CodeGen libraries along side support for Schedules and Itineraries, in the same general style that both Schedules and Itineraries are supported. It’s not meant to replace either, although it carries more detailed information about the microarchitecture than either of those, and could enable more sophisticated scheduling algorithms.
  • LLVM MDL support is selectable on an opt-in basis via an explicit CMake configuration flag. When enabled, it can be enabled/disabled by a command line option.
  • MDL directly supports all upstream targets that have Itineraries and/or Schedules. We have a tool that scrapes information from TableGen and produces an “equivalent” MDL description, which we compile and include in the MC libraries (much like the TableGen-generated files). We don’t expect this to be a typical use case, but was done to prove out the integration into all the CodeGen and MC components.
  • The footprint of MDL in LLVM is really quite modest: around 1300 lines of code added to CodeGen and MC, and around 600 lines of code added to support all the pertinent targets in the Target libraries. The MDL support code (separate from existing code) is around 2500 LOC. The majority of the code is in the external tooling for the language.
  • It’s very well tested: when enabled, we pass all but 190 of the 93007 tests. Of those, most “failures” are either very minor incidental (and valid) scheduling differences, or tests that specifically test the format of debug information.
  • For the runtime tests we’ve done to date (on various X86 platforms), there are no discernible performance deltas (typically +/-0.2%, varying run-to-run), in fact we almost always generate exactly the same code. This is what we expected: the intent of this effort was not to improve performance for existing targets, but to be able to better support more targets. That said, it is able to do some things (like bundle-packing) slightly better than the existing infrastructure.

Please take a look and lets discuss any questions/concerns you may have.

-Reid

3 Likes

I remember you bootstrapped this with ANTLR (or something like that?), were you able to not depend on an external tool? I assume we wouldn’t want such dependency in order to build LLVM?

Yeah, we use Antlr4 to build the MDL compiler, but not in LLVM itself. I believe its a standard package in both Ubuntu and Debian at this point (and available everywhere), so I’m not sure its really any different than Python, Java, CMake, Sphinx, etc?

At any rate, the MDL stuff is all explicitly opt-in, so there’s no requirement to build MDL if you’re not using it.

I know that perhaps this is a bit of “rule stretching”, but - to be honest - Antlr4 is a seriously awesome toolset for generating solid, fast, recursive descent parsers - ie its not just yacc/bison on steroids, but a very different “animal” altogether. FWIW…

But you need to build the MDL compiler in order to build LLVM when using MDL right?

That’s a fair point, but what about the other distributions, and Windows, FreeBSD, etc.? Does it run on AIX as well? (we have AIX bots…). I don’t know anything about ANTLR support really, maybe it’s all fine :slight_smile:
(Sphinx isn’t needed to build LLVM I believe?)

But that’s intended to become a component of LLVM backend right? Or there is no plan to ever use it for any in-tree backend?

The current proposal parses the .td files generates .mdl files and then reparses them. It requires ifdefs in all backends. Ideally I’d like that no backend needs to be changed (especially those that don’t benefit), this is merely a flag flip on the generator (or flipping the generator binary in cmake) and under the same interfaces. This enables decoupling the whole new language angle (with new Antlr and Java dependency) and the benefit here can be evaluated independently. Also results in less changes, opt-in per backend that’s a build time flip.

E.g., why isn’t the starting point here the existing TD files without an intermediate jump to a new language? If the in memory flow shows benefits (in the tables generated or new opportunities it opens or simpler debugging), then it is independently useful. And you can have different out of tree bindings/language that use the same APIs and the language can be socialized over time.

This is part of core code so presenting the benefits and evaluating in isolation is good as this could affect all backend devs. And I’m honestly still unsure what the impact here is for which upstream targets. The only numbers mentioned above is writing less code for a backend - but all the backends are written already and the current approach parses the .td file so no need to write any MDL, sounds like generated code performance is the same, unclear about the impact on binary size or compilation speed.

I think this is a core question: if this is not expected to be used or have benefit for in tree targets, then why in tree? And why now? Especially as it may end up on buildbots limiting changes folks can make on the default path.

1 Like

Yeah, good question. I’ll ask. Its distributed as a .jar file, so I believe it runs everywhere that Java runs - certainly any linux dist, Mac, and Windows.

Yes, sorry I was being pedantic: the code generated by the MDL compiler becomes part of an LLVM MC library, if MDL is enabled.

Someone will have to decide whether using Antlr for a tool is tolerable. I’d argue that a 526 line grammar is much better than a 10k+ lines of C++ (thats what Antlr generates for that grammar). :slight_smile:

Can it be handled in same way as typically Lex / Yacc files are handled? So, ANTLR would become a developer-side dependency and would be required if someone would like to change the grammar. Otherwise the generated C++ files will be committed to the repo and built by default.

I’ve considered that. It could, although I don’t know that that would be an ideal solution.

Where did you land on the following from one of our past conversations:

If a subtarget wants to use MDL, all subtargets that belong to the same target need to use MDL

I can see this being an issue between maintainers of different subtargets within the same target.

I’d rather say there’s no plan to force any target to use it. Just like Schedules and Itineraries, MDL can be used on a subtarget-by-subtarget basis, or not at all.

Again, the purpose of this work is to make it easy to model things that can’t be fully or easily modeled today, and enable more targets/subtargets to easily exist in-tree. Schedules and itineraries have non-overlapping capabilities, so we unfortunately need them both. MDL’s capabilities are a superset of both, and also simpler to write and maintain than either (well, at least I think so…).

So while I don’t honestly expect everyone to immediately bail on schedules and itineraries, new targets and subtargets could be written with this, if they find it useful. If nobody does, we can of course delete it.

So its worthwhile to ask: where would it likely be used:

  • certainly any VLIW target (Hexagon, which currently uses 18000 machine-generated lines of tablegen, because there’s no easy way to generate that by hand), and AMD/R600 could greatly benefit from using MDL.
  • RISCV+accelerators. There are literally dozens of these under development, and the accelerators are notoriously hard to model.
  • ML accelerators (things like Google TPU), anything with sufficiently complex tensor units.
  • Anyone who’s tired of trying to express things in schedules or itineraries. :slight_smile:
1 Like

Hi Michael, let me clarify that comment. When we scrape tablegen files to extract architectural information, we scrape all the subtargets. In that case, its kind of “all or nothing”, since the scraped models are largely for proving out the concept. However, the infrastructure works on a subtarget-by-subtarget basis. So you can write an MDL (or edit a generated one) to support just the subtargets you want to support.

1 Like

Thank you for escalating this discussion Reid, I’m exciting to see innovations in the LLVM codegen space. In addition to forcing a dependency on Java, I’m curious how you think about a few other “bigger picture” impacts on the LLVM project as a whole. I have been out of working on LLVM codegen for a very long time, so I’d love to know what folks like @topperc and @rotateright and other gurus think:

  1. Beyond being a major change, it is also an architectural change. Do we see this as:

    a) the new "general solution for all targets” that we want to move targets over to over time? If so, what do the owners of those targets think?
    b) Do we see this as an experimental project? If so, why should this be in tree?
    c) Do we see this as an alternate approach for specific important targets? If so, which ones, and how do we manage the fragmentation in the community and codebase, and confusion about what to do for newcomers?

  2. What is the plan with documentation? We have extensive documentation for the existing system, how do we handle “two things”?

  3. This work has be developed for a long time out-of-tree by a team of experts, is this being used in production for any of those out of tree clients? What are the learnings from that so far?

  4. You want to move this in-tree to make this more available, what in-tree targets are planning to move to rely on this for production?

-Chris

Hey Chris, all good questions, and as you know I could write a dissertation about each of them. :slight_smile: They’re all important, so perhaps it would make sense to address a few at a time, and perhaps spawn separate threads on each. Like you, I’d really like to hear what the long-time back-end contributors have to say about the work, so I hope they’ll engage!

… and Antlr. Of course I was aware of this from the very beginning of the effort, and know its a hot button for some people. Certainly we want to be careful about introducing external dependencies, and thoughtfully weight the costs and benefits of each new dependence. So, depending on Turbo Pascal would probably be a really bad idea. I’m certainly not the final arbitrator on this, but here’s my thoughts regarding that:

  • I’ve written a lot of parsers over the past 40+ years, and they’ve ceased being fun for me. I just want a parser that is extremely easy to write and modify as the language grows and evolves.
  • Antlr is fast, free, available everywhere (per the author), extremely-well documented (published books), and very powerful. Its nicely integrated with all the popular IDEs, and has exceptional tools for debugging your grammar. Its widely used in industry and academia. There are lots of projects using Antlr in the context of LLVM (and gcc).
  • The MDL grammar is 526 lines of text (commented), and Antlr generates an LL(*) parser in around 10000 lines of C++, with solid error reporting and recovery.

Other than being “new” to LLVM, I guess I’d like to understand why the community wouldn’t want to leverage a compiler-like tool like this. Its awesome, even if you like writing recursive descent parsers by hand - its better if only because its better at keeping track of details, every time the language changes!

People have asked about why I didn’t just modify TableGen, and its a perfectly reasonable question. I (and others) have considered that for quite some time. TableGen currently supports two microarchitecture descriptions: Itineraries and Schedules. They both have strengths and weaknesses, but neither are, alone, sufficient to describe all current targets - much less more complex targets. And FWIW, I couldn’t figure out a reasonable way to extend either of them to effectively support a broader set of accelerator architectures in anything resembling an elegant manner. Not for lack of trying.

So we’re stuck with both approaches, which still can’t support all the architectures I’d like to support. As you know, I developed an MDL language around 30 years ago, which is still in production use for processors and accelerators at TI. This new MDL inherits a lot from that language, but extends it significantly - its both easier to use, and much more powerful, and can support a very broad class of architectures, and quite easily supports all in-tree targets.

Great question. LLVM does have massive amounts of documentation, much of it very high quality, including details about TableGen. However: I’ve poured over every word of the documentation about Schedules and Itineraries (we already have two things!), and its actually quite sparse in this area. Some of the best bits are actually in the code, or in the various target descriptions. So honestly I think the MDL is in pretty good shape on this point - the language is much more intuitive than the equivalent TableGen descriptions, and we have lots of user-level documentation already written (and complete examples for every in-tree target!).

So bottom line: if its indeed a goal for LLVM to support a broader class of architectures including the zoo of insane ML accelerators that are proliferating, we need to do something different.

Clearly, including a new language and tools is an important decision. I talked to a lot of people over the past 4 years at LLVM conferences, etc, and in general I’ve felt like an alternative to TableGen for this problem would be a welcome thing. But I’d love to hear from more people!

Chris, I’ll start threads on the other topics over the next few days.

1 Like

I don’t have any particular stake one way or the other wrt MDL, but this caught my eye:

Coming to Clang after decades of working with parser-generator based front ends, it took me a while to understand why Clang didn’t leverage one. I came to realize that basically, a recursive descent parser is “just code” and thus more accessible to developers without a background in parsing technology (e.g., a compiler course). For an open-source project, it made sense.

Re Antlr in particular, it’s one thing for the author to claim it’s available “everywhere” and it’s another to define whether “everywhere” includes all the hosts where people want to build LLVM. I don’t know that we even have a well-defined list, but I’m moderately sure it includes more than Linux/Mac/Windows. You’d need to document the requirements and dependencies, and then demonstrate the benefits to the point where people will agree that yes, the additional X dependencies are worth it.

2 Likes

I really think that the solution is easy: turn ANTRL a developer-dependency. So it would be necessary to change / extend MDL syntax, but not for building LLVM itself.

1 Like

I am not qualified to have more than a high level opinion on the merits of this kind of change, but as someone who has responsibility for production backends we’d like to upstream someday and which stumble in this area and struggle to get the needed performance within the bounds of LLVM’s code generation tooling, I am +1 on making sure that we have a progressive enough stance so that LLVM remains relevant. I would hope that the people who are the subject matter experts in this area are evaluating the merits of the approach with this lens.

With that said, this kind of thing is probably destined to be a long term alternative for motivated implementations vs a big switch. In any software project of this size, getting to a peaceful evolution state takes effort and change from both sides: the established project needs to make space for new ideas and the contribution should attempt to expose as few needless deviations from the norm as it can so it can focus all of its energy on its potentially valuable new ideas.

Because I care about this area not being locked in amber and getting some oxygen to evolve, let me be a bit more direct and say some things that others are dancing around: it will likely be a cold day in hell before the core LLVM project takes a build dep on Java for something this potentially central, and the bias towards parsers-are-just-code-we-write runs deep. Even if you manage to “win” the debate on this thread in principle, you’ll fight the headwinds three or four more times and eventually all of your energy will be spent on that vs doing what you came to do. It won’t be any one of us that throws up the hard roadblocks straightaway, but it will be the system itself resisting that kind of change.

I want to see this area move forward. I don’t want to see it stuck in the ditch of debating inconsequential things. Any one of us could write MDL in a form the project would easily accept, and I’d recommend just conforming to the norm on that vs getting stuck on a side debate with entropy on all fronts.

4 Likes

Our team (Texas Instruments) maintains a downstream ARM compiler, and additionally has a high level of experience with high-performance embedded VLIW architectures (C6000/C7000 DSPs).

My opinion towards the external dependency is one of indifference. Our team is capable and willing to install most development dependencies as needed, so I can’t speak to ANTLR discussion.

We’ve been in contact with Reid since the first RFC. I feel like he can’t understate how much of a game changer this support would be for accelerators/VLIW architectures, both up and downstream. I think anyone who’s struggled with attempting to model slot-specific pipeline behaviors in Itineraries and settling for “I suppose I’ll just duplicate them all again” will see benefit.

Just speaking from my “caring about LLVM over the long term” I’m pretty opposed to this line of argument, for the following reasons:

  1. The question isn’t what is fun for a contributor, it is what is the right long term thing for the project. Once contributed, many more people across the project end up having to maintain a thing than the people who originally contributed it.
  2. I agree that Antlr is mature, but bringing it in is a “one way door” dependency on java and the ecosystem, not just antlr. We don’t currently have a java dependency, and many people have objected to that in the past.
  3. This is nonsense. The c++ generated by Antlr isn’t what a human would write. it isn’t the correct comparison.

To make a google analogy, Google doesn’t allow random SWEs to decide to build production infra in weirdtech even if it is delightful (e.g. Google has had prior run-ins with ocaml and haskell) just because they think it is fun or cool. Such teams have to use approved technology for exactly the same “team impact and maintenance” reasons that LLVM faces.

That said, I don’t think that Antlr is the big question. The big question is: who is going to use this and for what? I’m specifically curious about what production users have seen as benefits of this work - are production workloads at google being sped up by this? If so, that would be great evidence to motivate target maintainers to invest in this.

If it is “just code” that gets added to the repo and has no adopters, then I think we should wait to merge it in until there is a reason to.

-Chris

1 Like

I agree thats the important bit. There are three questions here: who, “for what”, and benefits. I think its generally understood that it won’t be used in production anywhere until its landed.

Regarding “for what”: its also understood that itineraries and schedules are not sufficient to fully describe many accelerators, leading to excessive C++ hacking. MDL provides a solution to that in the existing code base with quite minimal backend code impact. It seems to me that part of the problem with the big question is that if someone hasn’t tried to get LLVM to support a “challenging” architecture, I don’t believe that they can appreciate how truly awful the problem is, so may fail to see the value of solving that problem. FWIW. Maybe some other people can speak to that.

Regarding benefits, an example: in the case of Google TPUs, a ~2000 line description (that encompassed all 8+ members of the family) eliminated around 14K lines of TPU-specific C++ code, ~3000 lines of TableGen - for only 2 of the targets - and produced better schedules.

Regarding who: I have talked to people at Texas Instruments, AMD, and Qualcomm (and Google) about their interest in the project (see DragonDisciple’s previous post on this thread, I’ll let others speak up on their own). FWIW, TI has used a version of this language for over 30 years in all of their (non-LLVM) production compilers, so I know the language works well, and that they will likely adopt this - once landed - for their compilers that currently can’t use LLVM. And of course I require it in the compilers I’m writing - targets that we won’t upstream, but need to use upstream LLVM - to one of your points.

I can encourage other potential adopters to explicitly ask for it - but I’d like to get to closure on some of the issues raised so far. If we’re really hung up on some of these issues, and we want to see innovation in this area, then perhaps we can use this thread to discuss different specific approaches.

I’ve seen enough underperforming VLIW backends in my time – including one now that I would like to be production but can’t get to be good enough in the confines of LLVM in comparison with its non LLVM incumbent. I’m a big fan of the community moving something forward here. I don’t know if MDL is it, but I know what we have isn’t (in the way a non expert can have seen enough examples with real metrics to have deduced a pattern).

I’d like evolution for these kinds of targets, and I would like upstream to embrace a progressive stance. Minimally, this would include factoring things so that something like this could exist out of tree without weird/divergent patchiness. That would give confidence for existing implementers to put some weight on it and find out… With some confidence that success will likely yield a path forward.

I’d actually prefer an incremental try before you buy approach for all of the reasons Chris points out. But getting some public out of tree proofs for a production compiler requires the downstream to have some confidence that what they are trying as an early adopter is aligned enough that if things work out, there is a path to in tree.

I’d encourage the contributors to treat this as potentially production, contributable code that is aligned with an upstream trajectory. And I’d love to see practical tweaks to llvm that may be needed to let it exist in such a state be accepted.

And I do care about the eng standards conformance. We need things we can ship and would like to not be at odds on the deps front from day one.

Finally, to nit pick, LOC comparisons of generated parser code is a specious argument… Human authored parsers never have the heft of their machine generated analogs. I feel like we’re in a perfect storm of badness in this discussion:

  • There is demand but almost by definition, it is hard to judge in this forum because LLVM underperforms for these cases and people avoid it.
  • There are probably multiple ways to move this forward but we’re arguing about parsers and deps. Contributions should be done to the standards of the project. Why are we arguing? It isn’t much code.
  • Various of us have a lot of personal experience with some of the people involved (myself included – once upon a time, Reid worked for me and he did for Chris as well) and I don’t feel like we’re having this discussion in a way that we would if this was coming in cold off the street, so to speak.

Sorry for almost ranting. I agree almost completely with Chris’s technical judgment on this. And I feel like we need some progress in this area. I want to like the proposal on those grounds, but I think it needs to be approached in the right way, and is not.