Comments needed for refactoring `DecoderEmitter` TableGen backend

The last month I tried to let the DecoderEmitter/Disassembler TableGen backend emit C instead of C++ code (to use it later in Capstone for disassembly).
I tried to overload the methods which emit C++ code but it ended in a really ugly construct.
Due to that I would like to refactor and extend the DecoderEmitter/Disassembler backend.

Since it is my first time with the LLVM code I would like to ask you for any hints and advises you might have. Also, if you have requests I would like to hear them as well!

Please note that I don’t plan to touch the x86 emitter. It is very distinct from the backend used for all other archs and I haven’t looked at it yet.
But if you think it would be really helpful or not that difficult to implement, please let me know.


Here is a little more detail about what I identified as a problem of the current design:

  • Code is written into the output stream from 2/3 classes. Not from some single module.
  • The three classes are entangled quite heavily (although it is not strictly necessary).
  • Very long methods (up to 500 lines).
  • A little patchy documentation (the classes are documented fine IMO, but I would have wished for a high level overview).
  • const qualifications are sometimes inconsistently used (e.g. see the usage of FilterChooser::Filters).

The idea is to:

  • Move the printing logic into its own module (interface with C++ and (C for Capstone) implementations). Other output languages can be added more easily this way.
  • Affix each class to a single functionality:
    • DecoderEmitter: Control code generation.
    • InstructionGroup: Separate target instruction into different namespace subsets.
    • FilterChooser: Selects the best Filter
    • Filter: Represents a filter between instruction subsets.
    • Printer<Language>: Outputs the decoding state machine in a given language.
  • Split up long methods into smaller chunks for better readability.
  • Add an ARCHITECTURE.md
1 Like

Current working branch (based on LLVM release/15.x): GitHub - Rot127/llvm-capstone at tblgen_decoder_emitter_refactor

@Rot127 I think, it’s better to edit the category to “LLVM Project” instead. Then you will get more relevant replies.

Personally I would vote against this if it’s going to add complexity or maintenance burden to llvm-tblgen. Emitting languages other than C++ doesn’t seem useful for the LLVM project itself.

The cleanups you mention that don’t add complexity are welcome of course!

Thanks for your feedback! Meanwhile the idea is more concrete and I opened another thread about it and have an implementation.
I don’t think that it adds more complexity. In fact I found it more clear after the syntax printing was moved to their own methods.
This could be of cause subjective. So if you wish you can take a look at the implementation linked in this new thread: