[TableGen] Add abstraction layer between code generation and syntax printing

I would like to ask about feedback for a refactor task of the TableGen backends.

The Capstone project (a lightweight disassembler) was in the need of customized output from TableGen backends. The emitted code had to be in C and some of it had to be altered.
To achieve this I separated the syntax output from the code generation logic in some backends.

Now I would like to ask you about feedback and, if possible, get it upstreamed. Of cause I am happy to make more changes to it if requested.

You can find the patch in this review:

Some more details about the problem this solves

The code emitted by TableGen backends is a useful resource for non LLVM tools.
(Capstone makes heavy use of this code to disassemble opcodes without the need to implement a complete disassembler on its own).

The problem is that the backends can only emit C++ code and the user has no control which parts are emitted and which are not. Altering the output is also not possible.

For example, there is no option for the user to emit only arrays and not the functions and enums.
Emitting the code in another language than C++ is also not possible. Although it could be useful if projects need the code in Java, C or other syntax.

Writing a new backend for these use cases is not necessarily an option, if the code needed is the one emitted by a specific backend. Writing such a “new” backend would simply duplicate the original one. And updating this duplicated backend with each LLVM release is maintenance heavy.

Adding an abstraction layer between the code generation and the syntax output would be helpful here.
For example, if the backend emits a specific enum X it generates the information and calls a Printer::emitEnumX(<eunm_data>) method. This method prints the syntax to the output stream.

If the syntax output must be altered, it is only necessary to override this emitEnumX method of the Printer class (for an implementation see the review above).

1 Like

I appreciate what you’re trying to do, but I think the current approach should be rejected.

Perhaps the most fundamental issue is that as long as the only other printers are out-of-tree, maintaining a very fine-grained interface to hook into isn’t going to be maintainable.

Please take a look at CodeGenTarget: it is already a “data model” abstraction of the information that TableGen knows about. If you need to write an out-of-tree printer, then please just write one that relies on that model. It is a far cleaner and more sensible abstraction. (And if that model needs to be extended for your purposes, I think that’s a more reasonable path to take.)

1 Like

I addressed @nhaehnle points in the review comments because they contain more details.
So for everyone who is interested into this, please refer to the review