[Instruction Selection] GlobalISel vs SelectionDAG in 2025

Given the current state of affairs, I am wondering which instruction selection framework will be the best for an AI accelerator.

All the latest discussions in the discourse have been biased towards GlobalISel. But I would appreciate at an unbiased, end-to-end analysis about pros and cons of both so that I can weigh in my requirements and make a well informed decision. It would also be helpful for future developers to refer to this discussion. Thanks.

1 Like

I am not sure you will find an end-to-end analysis as such, but the original proposal here:

and numerous follow-up discussions/presentations at subsequent LLVM Developer Meetings cover many of the motivations and pros/cons.

What is “best” for your AI accelerator is difficult to ascertain without some knowledge of the machine/ISA (assuming it has an instruction set per-se). But generally speaking, either approach will be workable for even complicated and unusual ISAs.

FWIW, my personal viewpoint– in a nutshell– is to go with GlobalISel. I’ve written a dozen or so LLVM back-ends over the last 22 years– and until 2019, all of those were based on SelectionDAG.

However, in 2019 I switched to GISel for a very complicated back-end that is an “AI accelerator”, among other things (100% GISel, no SD fallbacks). The primary reason for the choice originally was due to the need to meet stringent compile-time constraints– i.e., JIT compilation of graphics “shaders” and now specialized AI kernels. As this new LLVM back-end replaces a legacy/proprietary one which was already tuned for extremely fast compilation time, the speed and adaptability of GISel (i.e., adding/removing modular sub-phases) was absolutely necessary to be competitive. In retrospect, this was absolutely the right decision to make.

Having now gained experience with GISel, I personally would not want to go back to SDISel for all of the other reasons brought up in the original proposal and as summarized in the current documentation (Global Instruction Selection — LLVM 22.0.0git documentation). On the whole, I find it cleaner and simpler (in the relative sense) than SD and it is convenient to work on a single machine IR through all phases of the back-end.

Either way, much of the TableGen description used to describe the ISA and instruction selection is shared by both frameworks– in the sense that GISel is able to import many existing SD definitions and patterns. So if one already has familiarity with one or more SD back-ends, there is “only” an incremental learning curve to move to GI.

There is still much work put into SDISel and most upstream back-ends are 100% SD based, so I anticipate both frameworks will be used for years to come.

There’s one random developer’s take, for what it’s worth.

5 Likes

Thanks for the note Jason. Not just in terms of compilation time, But lets say also in terms of codegen quality and expressiveness of InstCombine passes (vs DAGCombine in SDAG). Would you still rank GISel higher over SDAG?

Thanks for the note Jason. Not just in terms of compilation time, But lets say also in terms of codegen quality and expressiveness of InstCombine passes (vs DAGCombine in SDAG). Would you still rank GISel higher over SDAG?

It is hard to make an apples-to-apples comparison in this particular back-end since it is GI-only. That said, the quality of code is very good [*].

GISel’s pre- and post-legalizer Combiner passes are easy to write and increasingly expressive. It is true that even 5 years ago there were very few “stock” combiners. However, this collection has been growing (often guided by useful ones mined from SD) and there are now many useful out-of-the-box combiners available. There are target-specific gaps we need to fill, obviously. But it is very easy to cook-up new combiners and add them to the pipeline. The placement of combiners is a bit more flexible was well– and each is optional (modularity, true pass pipeline, etc). I personally find writing transformations on MI simpler than SD (and only having to write them on one IR).

The following point is not a comparison with SD, but we did find that RegBankSelect was less useful than it could be. We use it as a preliminary bank selector, but later we do some rewriting based on uniformity, scalar vs vector instructions, etc. Our problems are quite similar to what AMDGPU faces.

[*] For example, we have an existing legacy/proprietary back-end that has been tuned to the max for many years– our GISel generated code is competitive with that already high bar.

1 Like

Since the above argument is mostly in favor of GobalISel, I would like to understand the pain points that people have faced in the adoption of GISel, and why it has not been a default backend for CPU/GPU targets in the upstream tree.

From the above points that Jason mentioned, I think it would be a good idea to understand the kind of problems that people working on AMDGPU or any other backend have faced/currently facing. I want to know if the scope of the issues with GlobalISel is high enough that it mandates significant investment in time as well as effort in building out a backend (in terms of changes to the generic llvm infrastructure).

@arsenm @jayfoad @Shrep16 I would really appreciate if you could share any insights on the same.

Since the above argument is mostly in favor of GobalISel, I would like to understand the pain points that people have faced in the adoption of GISel, and why it has not been a default backend for CPU/GPU targets in the upstream tree.

Note that most upstream targets predate GlobalISel– so that is an obvious reason why they are SD-based. One can also assume that for some targets the maxim “if it ain’t broke, don’t fix it” applies. That said, a few upstream targets are being retrofitted incrementally (another nice feature of GlobalISel), with AArch64 and AMDGPU being farthest along. I believe one of the newer backends, SPIRV, is GlobalISel from the start.

1 Like

I’m a GISel maintainer, and I can’t speak to the experience of writing a new backend targeting it, I can offer a few counterpoints to the arguments in favor.

  • GlobalISel is still not a full fledged replacement for SelectionDAG for some cases. In terms of the raw body of combines, it still hasn’t had the same amount of investment as DAGCombine.
  • Floating point types don’t exist, we still determine FP semantics based on the instructions generated not the type of the values. This means that for cases where you have 2 FP types with the same bit-width, you can’t disambiguate them. As a result we don’t yet support BF16 operations.
  • The generic legalizer is fairly mature but you may still run into cases where there’s something not yet implemented. Nothing is fundamentally difficult here, they’re by nature rare since the main GISel targets don’t hit them often (AArch64 & AMDGPU).
  • For AArch64, our approach of placing constants and globals into functions using the “localizer” isn’t the right long term solution, and on AArch64 we see code size regressions of ~0.5% geomean due to this issue. For an AI accelerator this probably isn’t a problem, or you may even see benefits vs SDAG with this approach.
3 Likes

I should also add that there multiple known targets not in LLVM tree that exclusively use GlobalISel. Especially in the realm of GPUs/accelerators it’s seen success, with the major caveat of the bf16 issue.

1 Like

Regarding this point specifically:

Other than some minor early growing pains (~2019 timeframe), using GlobalISel for a new target has been roughly the same effort as I’ve experienced for new SDISel targets– and this for a very complicated ISA. Very little generic LLVM infrastructure was touched in either case. I doubt this will be your largest investment.

In fact, depending on your architecture, it is likely that most of your effort will be spent on everything other than instruction selection (of either sort). For example, my/our current target is a recent NVIDIA GPU. It requires at least two significant functionalities that are absent from the target-independent code. The first is occupancy management– dealing with the difficult tension between register/resource usage versus thread occupancy (number of simultaneous threads/warps/waves). Another is synchronization management– or determining program points at which to insert primitives dealing with points of divergence and reconvergence. These are quite a bit gnarlier than instruction selection and there are no existing, parameterizable, target-independent components as such (though there are related building blocks, such as UniformityAnalysis).

1 Like

Working on a very CISC GobalISel target for fun.

It would be “helpful” to have a GlobalISel only target in-tree with the most minimal other content eg. nothing in TargetLowering that isn’t needed. ex. I probably have to register the register classes, and implement some support routines but expect not to need lowerFormalArguments, lowerReturn, etc. Which I have to implement, and which I don’t would be helpful.

Given that i don’t really have a functioning SelectionDAG implementation ,I’d love to dispense entirely with DAG, converted DAG patterns, etc. All this may be possible but isn’t clearly documented.

regards,
Adam

Working on a very CISC GobalISel target for fun.

It would be “helpful” to have a GlobalISel only target in-tree with the most minimal other content eg. nothing in TargetLowering that isn’t needed.

The SPIRV target is about as close as you’ll get to that today as far as upstream goes, though it is a bit different than typical backends that target an actual machine ISA.

ex. I probably have to register the register classes, and implement some support routines but expect not to need lowerFormalArguments, lowerReturn, etc. Which I have to implement, and which I don’t would be helpful.

Given that i don’t really have a functioning SelectionDAG implementation ,I’d love to dispense entirely with DAG, converted DAG patterns, etc. All this may be possible but isn’t clearly documented.

Yes, it is possible, and this is what we do in the GISel backend I discussed earlier in this thread[**]. Other than some very minimal TargetLoweringBasesupport, you can completely remove any of the other DAG infrastructure[*]. Something along these lines:

XXTargetLowering::XXXTargetLowering(const TargetMachine &TM,
                                    const XXXSubtarget &STI)
    : TargetLowering(TM), Subtarget(STI) {

  // Set up some register classes.
  addRegisterClass(MVT::i32, &XXX::GPRRegClass);
  ...

  // Compute derived properties from the register classes.
  computeRegisterProperties(Subtarget.getRegisterInfo());

  // Note: GlobalISel back-ends will not have any op actions here!

  setJumpIsExpensive(...); // E.g., any configuration settings needed.
}

Depending on your input IR and any finalization, you might need, e.g., getTgtMemIntrinsic and/or finalizeLowering.

Beyond that, you “just” implement the primary GlobalISel APIs. You will at least need to stub out CallLowering, even if you don’t intend to support calls initially.

[*] Although you can completely write your XXXInstructionSelector in C++, it is still very convenient to use “selection DAG” patterns in TableGen. GlobalISel has a compatibility layer/importer so that one can either use existing TD files (with some minor editing) or write anew for GI using familiar syntax. Instead of operating on SDNodes, it operates on MI under the hood.

[**] FWIW, I do hope to start upstreaming that target early next year. There is still some planning, prep work, and corporate hoops to jump through.

More on the GlobaISel without “extras”. These questions may be naive :slight_smile:

Is it possible to use GIComplexOperandMatcher without the GIComplexPatternEquiv? Further, can I can associate an MIOperandInfo with my GISel Complex Operand Matcher or must i go through the GIComplexPatternEquiv route?

Is there a decision tree for when to use in this GISel only scenario – instruction patterns (dag), GIComplexOperandMatcher(C++), GICombineRule (MIR and C++) and just outright lowering in instruction selection to a target instruction?

My CISC target has 15+ addressing modes for each operand (including the destination) which makes finding the right way to pattern match important :slight_smile:

thanks,
Adam