[RFC] TableGen-erating SDNode descriptions

This RFC proposes autogenerating SDNode descriptions from *.td files. This includes:

  1. SDNode enumeration (MyTargetISD::NodeType in MyTargetISelLowering.h)
  2. SDNode names (MyTargetLowering::getTargetNodeName() in MyTargetISelLowering.cpp)
  3. SDNode verification functionality (MyTargetISelLowering::verifyTargetSDNode(), before this patch it is only implemented for AArch64 for few nodes)
  4. Generic and target-specific SDNode properties.

TL;DR The (stacked) patch is here: [RFC] TableGen-erate SDNode descriptions by s-barannikov · Pull Request #119709 · llvm/llvm-project · GitHub

Motivation

The benefits of autogenerating the descriptions are the usual:

  • Avoiding code duplication (points 1 and 2 above).
  • Generating code that is hard to write by hand (3) or difficult to maintain (4).

Also, the added node verification functionality proved to be useful in catching bugs (that may or may not be critical). I have already fixed a number of discovered issues (see my recent patches), but most of the them are left for target maintainers to resolve.

C++ changes

The patch introduces SDNodeInfo class, which is supposed to encapsulate all information about target-specific SDNodes. It is initialized with autogenerated arrays and provides a few accessors for querying basic node properties: the node name, the number of results / operands, whether the node has a memory operand or has strict floating-point semantics. It also features verifyTargetNode method that validates the given node against its tblgen description.

The class is modeled after MCInstrInfo / TargetGenInstrInfo, so the idea should be familiar.

Instances of the class are currently managed by target implementations of SelectionDAGGenTargetInfo. I considered adding a new factory method to TargetSubtargetInfo instead, but that was more changes. Another alternative could be putting everything in TargetLowering, but this class is already overloaded too much. SelectionDAGGenTargetInfo also serves as a proxy and extension point for targets that haven’t fully migrated to autogenerated descriptions.

TableGen changes

SDNode has got Flags and TSFlags fields similar to those found in Instruction. This allows targets to give properties to nodes (e.g. whether the node has strict floating-point semantics or has a “passthru” operand). Before this patch one had to write switches over node enum or carefully organize the enum members so that e.g., memory opcodes have values not less than FIRST_TARGET_MEMORY_OPCODE.

The new TableGen backend is simplistic. It collects all derived definitions of SDNode, filters them by target ISD namespace, and emits enum/arrays for the selected nodes.

Limitations and future work

  1. Obviously, TableGen cannot generate information for nodes not found in *.td files. Half of the targets have nodes (usually, a few) that are selected to target instructions by custom C++ code and thus don’t require tblgen descriptions as there are no tblgen Patterns. Targets are now encouraged to add tblgen descriptions for such nodes. This would avoid the need for overriding SelectionDAGGenTargetInfo methods and would allow the nodes to be verified.
  2. Some targets have more than one SDNode description for the same enum name. These SDNodes are sometimes incompatible (e.g. have different operand type constraints). Such nodes still get a generated enum member, but no information is generated for verifyTargetNode.
  3. Even though operand type constraints information is generated, it is currently not used by verifyTargetNode. Experiments have shown that type constraints are far more often violated than other invariants (the number of operands/results, chain/glue operands), so I excluded this functionality from the final draft. It wasn’t very clean anyway.
  4. The generated node names have the same “style” for all backends, which is “MyISD::OP_NAME”. Some targets may wish to use a different style, e.g. “my::op_name”. If there is a demand for this functionality, it can easily be implemented by adding a command line option to the tblgen backend.
  5. It would be great to generate descriptions for the generic SDNodes as well, but they are even more affected by the above issues, so this is left as a future work.

Summary of the draft PR

  • The first commit implements SelectionDAGTargetInfo for target that sill use the base implementation.
  • The second commit virtualizes isTargetStrictFPOpcode / isTargetMemoryOpcode so that target no longer need to care about the order of ISD enum members. (This change should be good by itself.)
  • The first commit adds a new TableGen backend and SDNodeInfo class. It also makes SelectionDAG use methods of this class before falling back to the old methods.
  • The rest of the commits are one per backend. They are intended to show the general idea and I have no plans for getting them ready for commit. Specifically, I didn’t care to move comments from the node enumeration to *.td files. It is a tedious work and I will leave it for target maintainers (who could as well reject the whole idea of autogenerating node descriptions).

The patch is huge, but you can pick your favorite target and look if the changes satisfy your expectations. I’d suggest to look at RISCV backend as well as I’ve put a little more effort into it.

2 Likes

+1 from me.

Do out of tree targets need to make any changes if this land or is it all opt-in?

This is completely opt-in.
There is some maintenance cost that could be avoided if all targets switched to the new approach, but it is rather small. For instance, we will have to have both TargetLowering::getTargetNodeName (shouldn’t be implemented by opted-in targets) and SelectionDAGTargetInfo::getTargetNodeName.

To be clear, the second commit that virtualizes isTargetMemoryOpcode/isTargetStrictFPOpcode is not opt-in, but it won’t go unnoticed by downstream targets because there will be a compilation error (which is easy to fix by implementing these methods, examples can be found in the commit).

I’ll write a migration guide later, hopefully after receiving some more feedback on the general approach.

A big +1, we really shouldn’t maintain things like MyTargetLowering::getTargetNodeName() manually.

The infrastructure part has been merged, here is the promised

Migration Guide

Please see the Draft PR for concrete examples.
Replace XX with your target’s name below.

  1. Add the following line to CMakeLists.txt:
    tablegen(LLVM XXGenSDNodeInfo.inc -gen-sd-node-info)
    
  2. Implement XXSelectionDAGInfo by deriving it from SelectionDAGGenTargetInfo (note Gen).
  3. Implement XXSubtarget::getSelectionDAGInfo().
  4. Add the following lines to XXISelLowering.h, before XXISD::NodeType enum, outside llvm namespace:
    #define GET_SDNODE_ENUM
    #include "XXGenSDNodeInfo.inc"
    
  5. Remove duplicate enum members suggested by an LSP server / compiler. In the ideal case, all enum members will be removed (along with the enum).
  6. Navigate to XXTargetLowering::getTargetNodeName() and remove switch cases referring to the removed enum members. Using switch (static_cast<XXISD::NodeType>(Opcode)) will help identify the cases that should be removed. If all enum members were removed, just delete the metod.
  7. Move the include added in step 4 and the XXISD::NodeType enum (if it still exists) to XXSelectionDAGInfo.h.
  8. Include XXSelectionDAGInfo.h from XXISelLowering.cpp, XXISelDAGToDAG.cpp, and/or from other files that need the node enumeration.
  9. Add the following lines to XXSelectionDAGInfo.cpp, outside llvm namespace:
    #define GET_SDNODE_DESC
    #include "XXGenSDNodeInfo.inc"
    
  10. Implement XXSelectionDAGInfo constructor like this:
    XXSelectionDAGInfo::XXSelectionDAGInfo()
        : SelectionDAGGenTargetInfo(XXGenSDNodeInfo) {}
    
  11. If XXTargetLowering::getTargetNodeName() wasn’t deleted, move its contents to overridden XXSelectionDAGInfo::getTargetNodeName(). In the new method, delegate to SelectionDAGGenTargetInfo::getTargetNodeName() on the default code path.

For simple targets, this is all that should be necessary.

Troubleshooting & Tips

  • If you relied on relative order of enum members (e.g. for doing range checks), you can rewrite the code to use switch / series of equality comparisons, or (better) use SDNode::TSFlags field to add properties to nodes that can be queried from C++ code. See RISCVSelectionDAGInfo::hasPassthruOp() in the draft for an example.

  • It is possible that the backend will start throwing errors from SelectionDAGGenTargetInfo::verifyTargetNode(), which verifies a newly created SDNode against its definition in a *.td file. In most cases the bugfix would be simple, but as a temporary measure you can override the method to suppress checks for specific nodes.

  • If your target had implemented isTargetStrictFPOpcode(), it can now be replaced with the generic version. Just make sure the nodes have SDNode::IsStrictFP bit set. Similarly, the generic version of isTargetMemoryOpcode() picks up SDNPMemOperand property.

  • Consider describing all SDNodes in *.td files. This will enable the verification functionality for them and will allow removing overrides of TargetSelectionDAGInfo methods.

2 Likes

What is the expectations for targets given your draft PR here: [RFC] TableGen-erate SDNode descriptions by s-barannikov · Pull Request #119709 · llvm/llvm-project · GitHub

Do you want targets doing per-target PRs to opt in, or do you intend to have your draft PR land?

I started that PR mostly as a proof of concept, and to gather as much context as possible. It helped me detect a few edge cases that needed to be handled at tablegen level.

I don’t plan to land it because it is too much job for one person, considering the amount of refactoring (moving enum member comments, possibly restructuring td files). Targets that wish to opt-in may cherry-pick the corresponding commit and make the necessary modifications, no attribution required.

Cool, I figured as much, but wanted to check. I’m working on the RISC-V PR, and I will mark my changes as co-authored-by you because you’ve already done most of the legwork.

1 Like