Comments needed for refactoring `DecoderEmitter` TableGen backend

Rot127 · October 6, 2022, 12:23am

The last month I tried to let the DecoderEmitter/Disassembler TableGen backend emit C instead of C++ code (to use it later in Capstone for disassembly).
I tried to overload the methods which emit C++ code but it ended in a really ugly construct.
Due to that I would like to refactor and extend the DecoderEmitter/Disassembler backend.

Since it is my first time with the LLVM code I would like to ask you for any hints and advises you might have. Also, if you have requests I would like to hear them as well!

Please note that I don’t plan to touch the x86 emitter. It is very distinct from the backend used for all other archs and I haven’t looked at it yet.
But if you think it would be really helpful or not that difficult to implement, please let me know.

Here is a little more detail about what I identified as a problem of the current design:

Code is written into the output stream from 2/3 classes. Not from some single module.
The three classes are entangled quite heavily (although it is not strictly necessary).
Very long methods (up to 500 lines).
A little patchy documentation (the classes are documented fine IMO, but I would have wished for a high level overview).
const qualifications are sometimes inconsistently used (e.g. see the usage of FilterChooser::Filters).

The idea is to:

Move the printing logic into its own module (interface with C++ and (C for Capstone) implementations). Other output languages can be added more easily this way.
Affix each class to a single functionality:
- DecoderEmitter: Control code generation.
- InstructionGroup: Separate target instruction into different namespace subsets.
- FilterChooser: Selects the best Filter
- Filter: Represents a filter between instruction subsets.
- Printer<Language>: Outputs the decoding state machine in a given language.
Split up long methods into smaller chunks for better readability.
Add an ARCHITECTURE.md

Rot127 · October 8, 2022, 5:01am

Current working branch (based on LLVM release/15.x): GitHub - Rot127/llvm-capstone at tblgen_decoder_emitter_refactor

XVilka · January 12, 2023, 3:42am

@Rot127 I think, it’s better to edit the category to “LLVM Project” instead. Then you will get more relevant replies.

jayfoad · January 12, 2023, 9:58am

Personally I would vote against this if it’s going to add complexity or maintenance burden to llvm-tblgen. Emitting languages other than C++ doesn’t seem useful for the LLVM project itself.

The cleanups you mention that don’t add complexity are welcome of course!

Rot127 · January 12, 2023, 3:23pm

Thanks for your feedback! Meanwhile the idea is more concrete and I opened another thread about it and have an implementation.
I don’t think that it adds more complexity. In fact I found it more clear after the syntax printing was moved to their own methods.
This could be of cause subjective. So if you wish you can take a look at the implementation linked in this new thread:

Topic		Replies	Views
[TableGen] Add abstraction layer between code generation and syntax printing LLVM Project llvm	3	746	September 6, 2023
Porting LLVM backend is no fun yet LLVM Dev List Archives	11	185	April 14, 2009
TableGen backend API refactoring. LLVM Dev List Archives	1	80	May 7, 2012
Discussing feasibility: Generating Tablegen files for easier LLVM backend development? Common CodeGen Infrastructure	11	738	June 26, 2023
Bypassing tablegen for asm/disassm Code Generation	4	351	May 27, 2022

Comments needed for refactoring `DecoderEmitter` TableGen backend

Related topics