The last month I tried to let the DecoderEmitter
/Disassembler TableGen backend emit C instead of C++ code (to use it later in Capstone for disassembly).
I tried to overload the methods which emit C++ code but it ended in a really ugly construct.
Due to that I would like to refactor and extend the DecoderEmitter
/Disassembler backend.
Since it is my first time with the LLVM code I would like to ask you for any hints and advises you might have. Also, if you have requests I would like to hear them as well!
Please note that I don’t plan to touch the x86 emitter. It is very distinct from the backend used for all other archs and I haven’t looked at it yet.
But if you think it would be really helpful or not that difficult to implement, please let me know.
Here is a little more detail about what I identified as a problem of the current design:
- Code is written into the output stream from 2/3 classes. Not from some single module.
- The three classes are entangled quite heavily (although it is not strictly necessary).
- Very long methods (up to 500 lines).
- A little patchy documentation (the classes are documented fine IMO, but I would have wished for a high level overview).
-
const
qualifications are sometimes inconsistently used (e.g. see the usage ofFilterChooser::Filters
).
The idea is to:
- Move the printing logic into its own module (interface with C++ and (C for Capstone) implementations). Other output languages can be added more easily this way.
- Affix each class to a single functionality:
-
DecoderEmitter
: Control code generation. -
InstructionGroup
: Separate target instruction into different namespace subsets. -
FilterChooser
: Selects the bestFilter
-
Filter
: Represents a filter between instruction subsets. -
Printer<Language>
: Outputs the decoding state machine in a given language.
-
- Split up long methods into smaller chunks for better readability.
- Add an
ARCHITECTURE.md