Extending TableGen's 'foreach' to work with 'multiclass' and 'defm'

I have been reading the “RFC/bikeshedding: Separation of instruction and pattern definitions in LLVM backends” topic with considerable interest. This is an approach I have been considering for taming our own large instruction set, and it looks like it structures our descriptions better than the conventional approach we have used so far.

However, I have another form of TableGen taming that I would like to do.

In addition to the separation of instruction from the patterns that use it, I have also got a large number of “instruction groups” that differ in their schedules and operands, but are in all other respects structurally similar.

For example, I have a large number of load instructions that are almost identical but which are in 5 specific groups:

· Loads where the memory operand is in a register [LDA]

· Loads where the memory operand is in a register and is auto-incremented by an implicit value [LDB]

· Loads where the memory operand is in a register and is auto-incremented by a value in another register [LDC]

· Loads where the memory operand has a base pointer in a register and an immediate offset [LDD]

· Loads where the memory operand has a base pointer in a register and an offset in a register [LDE]

If I don’t have the multiple processor versions, I can use ‘class/def’ by specifying the appropriate ‘class’ for each type of instruction in the group, and then use a common set of ‘def’ declarations for them with ‘foreach’, for example:

// Describe the meta-classes for the LDA group

class T_LDA_Type1 : …

class T_LDA_Type2 : …

// Describe the meta-classes for the LDB group

class T_LDB_Type1 : …

class T_LDB_Type2 : …

// Describe the meta-classes for the LDC group

class T_LDC_Type1 : …

class T_LDC_Type2 : …

// Describe the meta-classes for the LDD group

class T_LDD_Type1 : …

class T_LDD_Type2 : …

// Describe the meta-classes for the LDE group

class T_LDE_Type1 : …

class T_LDE_Type2 : …

// Share a single set of definitions, but parameterised by meta-class

foreach loadOp = [“LDA”, “LDB”, “LDC”, “LDD”, “LDE” ] in {

def Prefix_#loadOp#suffix1 : T#loadOp#_Type1<…>;

def Prefix_#loadOp#suffix2 : T#loadOp#_Type2<…>;

}

All of the ‘def’s pass the same values to the ‘class’s, though the ‘class’s may ignore some as appropriate. For example, I pass the auto-increment size to each, though only the auto-increment patterns care.

This neatly allows me to symmetrically manage all the instructions in each of the groups using a single statement of the patterns, and maintain only one fifth of the number of definitions. In my actual source, there are around 50 different types of instruction within each group, so reducing the repetition is quite significant.

But there is a downside.

For each of the above I also have variations that are a result of different processor and ISA versions, and because of this I have to use ‘multiclass/defm’ to define the descriptions along with ‘Require’ predicates. The same approach does not work with ‘multiclass/defm’ though, because TableGen does not support ‘foreach’ with ‘multiclass/defm’.

I have experimented with adapting TableGen to do this, but I am just not knowledgeable enough about how TableGen works and my attempts have not been successful. Perhaps some of the people debating the separation of instruction and patterns topic might have some insight into how TableGen might be adapted to support ‘foreach’ with ‘multiclass/defm’ definitions and could advise me how I should do this; or maybe the maintainers of TableGen might consider this something that they would be willing to add to TableGen in the future?

Thanks,

MartinO

Hi Martin, I think this is an interesting topic. I've also run up
against the limitations of foreach, though for my particular case the
variable-sized register class work provides a better solution.

I will note that at least one backend (Hexagon) has moved towards
using TableGen as a fairly 'dumb' data definition language, relying on
a separate tool for generating instruction definitions. I'd be curious
to know if others are using this approach. It'd also imaging that
using m4/jinja or similar as a .td preprocessor would be a potential
option for an out-of-tree backend, in cases where TableGen macro
support and programmability is too weak.

I suppose one question is: would allowing foreach to be used with
multiclass/defm be sufficient to allow TableGen to be a productive and
maintainable way of defining complex architectures, or would there be
a number of other deficiencies that might push you towards larger
TableGen extensions or using a separate tool or preprocessor?

Best,

Alex

But there is a downside.

For each of the above I also have variations that are a result of different
processor and ISA versions, and because of this I have to use
‘multiclass/defm’ to define the descriptions along with ‘Require’
predicates. The same approach does not work with ‘multiclass/defm’ though,
because TableGen does not support ‘foreach’ with ‘multiclass/defm’.

I have experimented with adapting TableGen to do this, but I am just not
knowledgeable enough about how TableGen works and my attempts have not been
successful. Perhaps some of the people debating the separation of
instruction and patterns topic might have some insight into how TableGen
might be adapted to support ‘foreach’ with ‘multiclass/defm’ definitions and
could advise me how I should do this; or maybe the maintainers of TableGen
might consider this something that they would be willing to add to TableGen
in the future?

Hi Martin, I think this is an interesting topic. I've also run up
against the limitations of foreach, though for my particular case the
variable-sized register class work provides a better solution.

I will note that at least one backend (Hexagon) has moved towards
using TableGen as a fairly 'dumb' data definition language, relying on
a separate tool for generating instruction definitions. I'd be curious
to know if others are using this approach. It'd also imaging that
using m4/jinja or similar as a .td preprocessor would be a potential
option for an out-of-tree backend, in cases where TableGen macro
support and programmability is too weak.

I suppose one question is: would allowing foreach to be used with
multiclass/defm be sufficient to allow TableGen to be a productive and
maintainable way of defining complex architectures, or would there be
a number of other deficiencies that might push you towards larger
TableGen extensions or using a separate tool or preprocessor?

The fact that you can't use foreach with multiclasses is a bug, and we should fix it, if possible, regardless of whether it is the last remaining roadblock to handling complex architectures. For situations well beyond TableGen's current language capabilities, we have a decision to make. We can continue extending TableGen until it can meet those needs. Alternatively, we can enable the use of some more-powerful input language. For example, we could allow TableGen to embed Python, and then use Python in order to generate record definitions.

  -Hal

For a project that’s not LLVM, I recently had the opportunity to replace both TableGen and *.td files with Python scripts. I found that TableGen’s features were easily matched by Python’s for loops and the ability to define functions. I am pretty happy with the approach so far. AMA

This is a lot easier to do in a green field project than in an old project like LLVM, of course.

Example “.td” file: https://github.com/stoklund/cretonne/blob/master/lib/cretonne/meta/isa/riscv/encodings.py

Thanks,
/jakob

Interesting, thanks. If we want to go down that route, I can certainly imagine a feasible incremental-transitioning strategy. We could allow TableGen to use an embedded Python interpreter to generate records based on Python data structures, and then, combine records from the existing .td files with those generated by the Python code. We’d use the existing TableGen plugins (which we may need to continue to use regardless, compared to writing Python, for performance reasons), and so we could incrementally transition existing definitions from .td files to Python as appropriate. -Hal

Would we then eliminate TableGen completely in the long term?

-Krzysztof

For what it’s worth: I also had very good experiences with python “specifications” to generate code [1].

Of course it’s hard to justify switching all the infrastructure just because of one missing tablegen feature…

  • Matthias

[1] Example spec + template: https://github.com/libfirm/libfirm/blob/master/scripts/ir_spec.py
https://github.com/libfirm/libfirm/blob/master/scripts/templates/gen_irio.c

If we want to go down that route, I can certainly imagine a feasible incremental-transitioning strategy. We could allow TableGen to use an embedded Python interpreter to generate records based on Python data structures, and then, combine records from the existing .td files with those generated by the Python code. We'd use the existing TableGen plugins (which we may need to continue to use regardless, compared to writing Python, for performance reasons), and so we could incrementally transition existing definitions from .td files to Python as appropriate.

Would we then eliminate TableGen completely in the long term?

That could also be two separate questions: Would we replace the .td input language with Python completely in the long term? Would we rewrite the the backends (i.e., TableGen plugins) in Python? I don't yet have an opinion on either. I can see advantages to providing Python as input language. What do you think?

  -Hal

For situations well beyond TableGen's current language capabilities, we have a decision to make. We can continue extending TableGen until it can meet those needs. Alternatively, we can enable the use of some more-powerful input language. For example, we could allow TableGen to embed Python, and then use Python in order to generate record definitions.

For a project that’s not LLVM, I recently had the opportunity to replace both TableGen and *.td files with Python scripts. I found that TableGen’s features were easily matched by Python’s for loops and the ability to define functions. I am pretty happy with the approach so far. AMA

Interesting, thanks.

This is a lot easier to do in a green field project than in an old project like LLVM, of course.

If we want to go down that route, I can certainly imagine a feasible incremental-transitioning strategy. We could allow TableGen to use an embedded Python interpreter to generate records based on Python data structures, and then, combine records from the existing .td files with those generated by the Python code. We'd use the existing TableGen plugins (which we may need to continue to use regardless, compared to writing Python, for performance reasons), and so we could incrementally transition existing definitions from .td files to Python as appropriate.

I think we should seriously consider doing this. python would give targets
a lot more flexibility in how they define their instruction sets and I think
it would be much easier to add new features (e.g. patterns for instructions with
condition codes). Especially, for non-standard targets like AMDGPU which has
had to resort to some pretty creative uses of TableGen to keep the td files
reasonable (see lib/Target/AMDGPU/VOP3Instructions.td).

Another advantage is that I can imagine hardware vendors using python to
generate their own hardware documentation or test suites, so they might be
able to re-use those some tools to generate ISA definitions for LLVM.

-Tom

I agree. As someone who has improved TableGen on occasion over the years, however, I suspect that we’ll continue to run into these things. Plus, given the number of out-of-tree users, at least, auto-generating their .td inputs, I think we’re actually well past one missing feature (although obviously there could be unrelated motivations as well). At some point, improving TableGen, at least as an input language, may have become a sub-optimal use of our time (even if it does look nicer than Python in some cases). -Hal

If we want to undertake incorporating Python into the TableGen pipeline, then completely replacing TableGen with Python sounds like a logical long-term goal. Everything that TableGen can do can be done in Python as well, plus Python offers nearly unlimited flexibility.

-Krzysztof

Replacing TableGen with general purpose language X runs into the issue of bikeshedding what X should be. I’d be very much opposed to Python because:

- It’s a large external dependency for the build (there’s no chance of FreeBSD shipping Python in the base system, for example, so we’d have to import the Python-generated files on each import, which would be annoying)

- The language has had one backwards-incompatible break that it’s taken over a decade to recover from, I have little confidence that it will remain compatible going forward

- It seems to encourage terrible code (I have yet to be presented with a piece of allegedly working Python software that I have not had to fix at least one bug in - git-imerge was almost an exception, but sadly not quite).

- It intentionally doesn’t support tail recursion optimisation and imposes arbitrary stack depth limits, which forces some convoluted coding styles.

More generally, I’m not sure about the underlying goal. We already have one solid general-purpose language in LLVM: C++. A lot of the things that we currently use more complex TableGen programming practices for now would be good uses for the proposed metaclass / reflection APIs in C++21, which seems a more palatable end goal than a scripting language.

There are also external tools that both produce and consume the TableGen sources. Having a language that is *not* a general-purpose programming language is a feature for these tools, not an obstacle.

David

This isn't really specific to replacing TableGen with Python, as much as it is a concern against using Python altogether. The original idea here was to add it as an extra tool to aid with the processing of .td files.

I'm guessing that the annoyance would come from the fact that bypassing TableGen, and using the pre-existing .inc files is not well supported by the build process. What if this was made easier?

-Krzysztof

I see three options it's worth differentiating between due to the
different trade-offs in terms of benefit, disruption, build-time
dependencies etc etc:
1) Keep tablegen as-is, but encourage/allow backends to use a more
powerful language to generate very straight-forward .td. This might
take a template-based approach (e.g. Jinja2), or perhaps just
pretty-printing the preferred data structure in to .td.
2) Embed Python in tablegen (i.e. python will be linked in to the
tablegen binary).
3) Drop tablegen altogether, develop an alternative tool for
generating the necessary .inc files

A number of LLVM users already do something like option 1). e.g. the
Hexagon backend or a variety of companies who offer tools for
generating custom processor cores + compiler support based on
specifying new instructions in a custom DSL.

Best,

Alex

If we want to go down that route, I can certainly imagine a feasible incremental-transitioning strategy. We could allow TableGen to use an embedded Python interpreter to generate records based on Python data structures, and then, combine records from the existing .td files with those generated by the Python code. We'd use the existing TableGen plugins (which we may need to continue to use regardless, compared to writing Python, for performance reasons), and so we could incrementally transition existing definitions from .td files to Python as appropriate.

Would we then eliminate TableGen completely in the long term?

That could also be two separate questions: Would we replace the .td input language with Python completely in the long term? Would we rewrite the the backends (i.e., TableGen plugins) in Python? I don't yet have an opinion on either. I can see advantages to providing Python as input language. What do you think?

Replacing TableGen with general purpose language X runs into the issue of bikeshedding what X should be. I’d be very much opposed to Python because:

  - It’s a large external dependency for the build (there’s no chance of FreeBSD shipping Python in the base system, for example, so we’d have to import the Python-generated files on each import, which would be annoying)

Given that our regression-testing infrastructure is built on Python, I already consider it to be a build dependency (and, as I recall, there are parts of the compiler-rt/libc++ builds that depend on it as well). This, and a wide user base, motivate my suggestion.

  - The language has had one backwards-incompatible break that it’s taken over a decade to recover from, I have little confidence that it will remain compatible going forward

  - It seems to encourage terrible code (I have yet to be presented with a piece of allegedly working Python software that I have not had to fix at least one bug in - git-imerge was almost an exception, but sadly not quite).

  - It intentionally doesn’t support tail recursion optimisation and imposes arbitrary stack depth limits, which forces some convoluted coding styles.

More generally, I’m not sure about the underlying goal. We already have one solid general-purpose language in LLVM: C++. A lot of the things that we currently use more complex TableGen programming practices for now would be good uses for the proposed metaclass / reflection APIs in C++21, which seems a more palatable end goal than a scripting language.

Maybe, but it might be a long time before we can use C++20.

There are also external tools that both produce and consume the TableGen sources. Having a language that is *not* a general-purpose programming language is a feature for these tools, not an obstacle.

I think that this goes both ways. For in-tree targets at least, we want the source that is intended to be maintained by a human in tree. So, the fact that an external tool can produce TableGen doesn't really solve the problem (unless we can also have that tool in tree, and moreover, I don't want to encourage each backend to develop their own such tools independently).

  -Hal

The compilers I maintain are for our own custom hardware processor designs, so it is normal for the hardware and tools to be developed in conjunction with each other; and a question I am asked about twice a year is:

  "can we not just generate the tool-chain from the machine description?"

Seems simple enough, no? The object being to have a single definitive statement of the machine, and from that derive the RTL, silicon, documentation, assembler, compiler, debugger, simulator, etc. It’s a neat objective, but not well realised.

There have been attempts in the past, but they always seem to fizzle out after a while, often because they begin as University PhD research topics, and after the original dissertation is completed, they just seem to die.

'ArchC' was an example, and 'AACGen', but I haven't seen any progress on this since 2013, and even then it was using LLVM v2.7 (or possible older). I just had a quick look at the website for ArchC, and it seems that it is still stuck in 2013 :frowning:

Hal's comment "unless we can also have that tool in tree" I think is important, because if an alternative to TableGen is devised, then having its source within the LLVM tree ensures that it will be maintained and will develop over time.

  MartinO

The Synopsis toolchain with their Lisa HDL can generate a TableGen back end for LLVM from the instruction descriptions.

David

Yes, this is very true. I am only recently in a company that can afford the cost of these tools :slight_smile: when I was in "start-up land" it was not accessible to me.

Do you know if the Synopsis tools is regularly tracking LLVM versions?

Thanks,

  MartinO

Hi Martin et al,

  "can we not just generate the tool-chain from the machine description?"

At Synopsys, we do have and provide such tools (and they also use LLVM).

For more information, you can check: <https://www.synopsys.com/dw/ipdir.php?ds=asip-designer&gt;
or you can contact me directly.

Greetings,

Jeroen Dobbelaere