Automatically generate tablegen

Tablegen is good and powerful.

But I had to write a lot of Talegen code when implementing the back end of a particular piece of hardware.
I found the Tablegen writing on the back end of hexagon very formatted. And “lib/Target/Hexagon/” file number of lines reached a staggering 30000 rows.

I wonder if there’s something I don’t know about Talegen’s automated generation tool. Otherwise, the thought of writing so much about Talegen would be terrifying.

I often feel that the Tablegen class structure I designed for a particular piece of hardware is very cumbersome. In addition to the above problems, I also want to know how to design a good tablegen class structure. In order to reduce the number of tablegen rows without an automated tablegen generation tool.

This file was added in [Hexagon] Replace instruction definitions with auto-generated ones · llvm/llvm-project@a72fad9 · GitHub and it is in fact, auto generated. Or at least was for that first segment.

So you are quite right no one would want to hand write the bulk of that.

When it comes to autogenerated code, it has to at least be hand editable and understandable. I don’t know Hexagon at all, but I assume with a little effort I could do a bug fix there if I had to. If you generate impenetrable code, it will be rejected just as if you hand wrote it all.

There is also potentially a licensing issue, which I am absolutely not qualified to judge. If I take for instance, Arm’s instruction XML and generate tablegen from it, is that sufficiently different to license to llvm. And if someone wants to change it, are they able to, or do they need access to some private source of information.

…which is all to say it still helps to have a well thought out structure for your tablegen. Whether you generate it or not. Some human (certainly you at some point) will have to change a small part of it.

And perhaps even a well thought out structure will have some 100s of records in it. If that’s what needs to happen and it’s clearly laid out and searchable, great.

If you look at the AArch64 backend for instance, that has grown in somewhat randomly sized chunks over time as we add new extensions. So the structure is hardly elegant at this point but humans are still able to work with it, that’s the key.

Thank you very much for your reply, which makes me understand that automatic generation also has disadvantages.
Thanks for reminding me, I’ll take a look at the AArch64 back end later.

The Hexagon instruction definitions don’t come from PRM or any document, they come from data provided from the architecture team. Initially we did write the .td files by hand, but it was a lot of work, plus it was hard to accommodate architecture changes (like changes in instruction latencies between different arch versions). Eventually we have written a translator that automatically generates all the .td data from the original files.

If there is a bug somewhere in the autogenerated .td, or if someone adds a flag that needs to be used, we modify the generator and recreate the files. This adds extra steps to the process, but it’s relatively rare. On the other hand it saves us a ton of work—we now support over 10 different architecture variants, and handling it by hand would be nearly impossible.

Well, thank you very much for your kind help.

May I ask, is the translator you use open source? If not open source, please ignore me. If open source, what are the requirements for the original files? What information should I look up from where? Are there any keywords that can be searched?

Seems this is a very good practise for complex archtecture as instructions growing more and more complex, and have different processor versions.