TableGen based backend development for custom architecture

Apart from the backend development tutorial for CPU0 architecture, is there a proper tutorial for how to develop/codify an architecture in the TableGen language?

The official documentation for backend development is not really useful for someone who is developing from scratch; it is vague at best.

Is there a tutorial that is better than:

copy existing backend and modify as per need

There couldn’t be a worse suggestion, IMO.

You can start with empty MyTargetMachine.cpp and a simple CMakeLists.txt. Add your target name to LLVM_EXPERIMENTAL_TARGETS_TO_BUILD CMake option.

The build system will suggest what you need to implement by showing you linker errors. It will suggest you to define a couple of “entry points”. In these entry points you will need to “register” your target. For that, you will need a few classes to be implemented. You will probably also need to add your architecture to llvm::Triple (don’t forget to add tests).

You can start with empty classes as well. Pure virtual functions can be stubbed with llvm_unreachable.
Once you successfully link llc, you can try it on an empty *.ll file. It will either crash or give you an error message. This information will give you a hint what needs to be implemented next. Most of these hints will require gdb/lldb sessions.

It will be a long learning curve, but in the end you will understand it better.

If that scares you, an alternative way is to find git commits that added a new backend. RISC-V would be a good example.

What I was hoping for was, if there could be a tutorial/book in the following format:

  1. Design a oversimplified compute architecture (say with only load, store, add, shift, operators)
  2. Have a simple simulator for the same
  3. Now develop the TableGen codification for the architecture from scratch
  4. Lower LLVM IR to this architecture as a demonstration

Such a tutorial by an expert would immensely flatten the learning curve for us newbies!

The problem with trying to learn backend development from RISCV, M68K, etc. is that it involves two learning curves - first learn the architecture (which is not exactly simple), then learn how to codify it in TableGen (for which the documentations aren’t exactly good).

Ah, sorry, I missed that you’re only interested in TableGen definitions. I’ve never seen a detailed tutorial on that topic.

Here is a minimalistic example:

// Define the only register. Its assembly name is r0, and in TableGen files it will be referred as R0.
def R0 : Register<"r0"> {
  let Namespace = "MyTarget";
}

// Define the only register class. It contains the only register.
def R32 : RegisterClass<[i32], 32, (add R0)>;

// Define the only instruction. It has one input and one output operand.
def LD : Instruction {
  let Namespace = "MyTarget";
  let OutOperandList = (outs R32:$Rd); // Output operands.
  let InOperandList = (ins R32:$Rs); // Input operands.
  let AsmString = "$Rd = [$Rs]"; // The assembly mnemonic for the instruction.
  let mayLoad = true; // This instruction may load from memory.
  let hasSideEffects = false; // ... and has no other side effects.
}

// Define a record that encapsulates information about instruction set.
def MyTargetInstrInfo : InstrInfo;

// Define the target.
def MyTarget : Target {
  let InstructionSet = MyTargetInstrInfo; // Let it know about your ISA.
}

Check llvm/include/Target/Target.td for documentation on each of the classes.

I’m afraid a topic is too large for a discourse thread.

3 Likes

I’d recommend the following–especially the patch series by Alex.
As usual, LLVM is an evolving project, so worth checking with the existing backends in the upstream repository for the current best practices.

3 Likes

I have, as a beginner, two concerns:

  1. Trying to learn backend development using an architecture like RISC-V is a 2-learning-curve process!
  2. Tutorials from 2014 or before may use syntax of TableGen that has since changed significantly - it might be difficult to follow.

That is why I was hoping for a recent tutorial with an oversimplified architecture.

This would be a new-target example, kind of like the Hello World pass is an example for IR passes. @mshockwave as one of the more recent new-target developers, and a book author, what do you think?

I feel like the problem is picking the right architecture. Despite the fact that LLVM has gone a long way since its inception as a project with limited number of supported targets, there are still plenty of implicit or inherent assumptions on the kind of architectures LLVM was originally designed to target, which is register-oriented, RISC-alike architecture. In other words, it’ll be easy to create a backend for those architectures. In that sense, X86 and M68k are probably not good places to start.

Therefore, I don’t recommend beginners to pick an ISA solely based on its simplicity. For instance, stack-based ISAs (e.g. Zylin CPU) are simple, easy to understand and maybe it’s easy to write a code generator for it from scratch – but it’s more tricky to write (or learn how to write) a LLVM backend for it compared to, say OpenRISC.

And even if you get your hands on designing a completely new architecture with the aforementioned properties in mind, you’ll probably find it looks like one of the existing RISC architectures at the end, except you need to write the emulator and other tools yourself.
Granted, there are usually hundreds of instructions in existing architectures but you don’t have to implement all of those: start simple! make assumptions on the input program so that the project is manageable while leveraging the existing ecosystem.

Since we’re on the topic of learning backend development, I feel like one of the biggest challenges is that too many backend components are tangled together. Compared to middle-end which (mostly) organizes in a pipeline constituting of Passes, most of the important building blocks in backend, like Subtarget, FrameLowering, and RegisterInfo to name a few, are cross-referenced here and there, despite the fact that there is still a Pass pipeline underlying. And IMAO, many of these things are connected using hooks and callbacks, which are more difficult to find.

What’s even more difficult for beginners is creating a backend in stages. It takes some steps before you can even lower the first IR instruction (which is usually return) and those steps are not documented anywhere, let alone adding other more advanced features incrementally.
I’m not an expert in GlobalISel but maybe its modular design can alleviate some of these issues. For the very least, GlobalISel makes testing in codegen much easier which gives instant feedback to the developers.

@mshockwave, I am eagerly awaiting your book Engineering LLVM Backend. I hope it would be the tutorial I need. Is it possible that you release a draft version?