Now that I've learned my way around TableGen just a bit, I'd like to solicit
suggestions for improving and enhancing it.
Perhaps there are some lexical changes that could improve readability of .td
files (e.g., I'm planning to enhance the lexer to allow an apostrophe as a
digit group separator in integers, a la C++).
Perhaps there are some syntactic enhancements that would make .td files
easier to read and write.
Perhaps there are common portions of .td files that can be factored out to
reduce future duplications, as with Automaton.td and SearchableTable.td.
Perhaps there are common portions of TableGen backends that can be factored
out to reduce future efforts, resulting in some general-purpose library
Perhaps there are new features in TableGen that, coupled with enhanced or
new C++ files, would open up possibilities for using TableGen in new areas
of the target-independent code generator.
I don't know how much people have thought about this, but I'm interested in
any ideas you may have.
Instead of syntactic enhancements, I think it would be great to invest in the internal infrastructure in the implementation of TableGen.
People frequently complain about the quality of error messages in TableGen. One big reason for this is that we don’t track source locations very well in the “tablegen AST”. I think that fixing that would be a really nice step towards upgrading the individual diagnostics.
It's interesting that you bring this up. I've seen a lot of TableGen error messages over the past couple of weeks and I don't recall being confused about any error or its location. I will give this a closer look.
I agree that syntactic enhancements aren't particularly exciting by themselves. I was wondering whether some new features coupled with new backends would pave the way for additional uses for TableGen in the area of code generation (or any other areas, for that matter).
My sense (which is mostly historic, I haven’t worked on the code generators for a long time sadly) is that tblgen is reasonable with syntactic and other errors. However, it doesn’t maintain the location info in the AST, so if a tblgen backend wants to report something that is wrong, it points back up to the top level records more often than not.
For example, consider if someone writes an invalid pattern like:
(ADDrr32 EAX, AL)
The AL def is for an 8 bit register, but the instruction requires a 32-bit register. The error message should point to the “AL” token on that line when it complains about it.
This is very dated memory, it is possible someone already fixed this up.
Ah, yes, it appears that the locations are saved only with records, not with any components of them. For better error messages, locations probably need to be saved with Inits. This should be interesting.
I'm not sure if you're still looking for additional TableGen tasks, but something that has been an irritant for years is the performance of llvm-tblgen on larger targets (X86 and AMDGPU in particular).
https://bugs.llvm.org/show_bug.cgi?id=28222 and https://bugs.llvm.org/show_bug.cgi?id=44628 are perfect examples - if you find yourself having to debug a later stage of a -gen-dag-isel run, you can find yourself waiting for 10mins+ .....
There's some pretty nasty O(N^2) loops in MatcherTableEmitter::EmitMatcherList for instance