Bypassing tablegen for asm/disassm

I have a naive question: I’m adding a new LLVM backend for an ISA that is rather experimental and still changing frequently.

With tablegen, setting up and debugging the assembly and disassembly is somewhat labourious. I understand why tablegen is used. But I think I can do this much easier without, because I am already generating the actual assembly syntax and translation between assembly and binary using a short Python script. That’s because instructions come in classes like so:

  CMD GPR GPR UInt(29)

Here GPR is for General Purpose Register, UInt(n) for an n-bit unsigned integer etc). It is trivial to emit parsers/pretty-printers for CMD, GPR, UInt(n), etc for any target programming language desired, just from this abstract specification of instruction classes.

This is what I would like to do: bypass tablegen for handling assembly and disassembly, and instead auto-generate C/C++ functions for this purpose and link them into my LLVM backend. That would make it much easier and automatic to adjust my LLVM backend, if and when the assembly syntax changes.

Is this a good idea? Do I overlook some difficulty? Are there other backends that do this, that I can look at?

Completely replicating all the functionality of TableGen is way too much work; I wouldn’t suggest trying that. Using an alternate implementation specifically for assembly/disassembly isn’t too hard, though.

Every in-tree target uses TableGen to generate its tables. Maybe take a look at what the x86 backend does, though; it uses a separate codepath from every other target.

Alternatively, you might want to consider using a script to generate TableGen .td files.

1 Like

i am not suggesting to replicate all of TableGen at all. I want to bypass TableGen for 2 specific functionalites.

I already have a script that generates a small assembler / disassembler for another purpose, it would be easy to retarget it to generate C/C++ that I can link into some other program. In principle I could also auto-generate the TableGen .td files, but that seems to be more difficult than just linking in working assembler / disassembler.

The backend APIs for going between an MCInst and binary instructions are MCCodeEmitter and MCDisassembler. In particular, each target implements MCCodeEmitter::encodeInstruction and MCDisassembler::getInstruction . Your script would need to generate code to implement those interfaces.

1 Like

Thanks. This has been very useful.