[RFC][TableGen] New llvm-tblgen backend to print records/classes hierarchy as a dot graph

Hi,
I’m locally working on an experimental tablgen backend to print the records/classes inheritance chain as a dot graph. I decided to create this to cover up for the lack of documentation regarding tablgen files. Personally, this will help me understand clearly how a record is getting initialized and what member values it derives from which classes in the inheritance chain.

I have attached a sample dot graph generated by this backend.
Invoking commands:

llvm-tblgen -print-dot-records AMDGPU.td -filter-def=SI_IF -I …/…/…/include/ -o temp.dot
llvm-tblgen -print-dot-records AMDGPU.td -filter-class=CFPseudoInstSI -I …/…/…/include/ -o temp.dot

The above 2 commands will generate a dot gaph with the mentioned record-def/class as the root node.

As of now, each node contains the node-name (class/record name), template args, and an edge to the node it inherits from. I would also like the node to contain member variables defined in that class.

The backend still has a lot of things yet to figure out such as how to handle multiclasses and anonymous records and I would like to know what are your thoughts on this type of utility in llvm-tablegen? Is it useful enough to be added to the upstream compiler?

The code can be found here.

5 Likes

Absolutely, yes please :slight_smile:

2 reasons:

  1. I was helping someone with ARM’s tablegen this past week and this would have been a great help. This happens fairly frequently.
  2. I want something like this for the Jupyter kernel for the purposes of teaching the things you mentioned, classes, multiclasses etc.

I’ve tried using the JSON for that and the results can be ok but directly getting a graph would be much better.

Just a suggestion: instead of adding more C++ code to TableGen itself, could you implement this as a simple Python script that consumes the output of llvm-tblgen -dump-json?

That sounds less maintainable than something properly glued in to the TableGen APIs

@singh-yashwant Could you make the diff into a full Phabricator review, or post a patch file/github branch of it? I can’t seem to find a way to download a patch from the current link.

Can you try arc patch --diff 543273? I’ll open a review if this doesn’t works.

That works, thanks!

For whatever reason I didn’t think that I could arc patch it if I couldn’t manually download it.

1 Like

I tried using -dump-json output but it’s lacking in the details I’m looking at, for eg for each record listed it doesn’t list its immediate superclasses(instead lists all of them, which doesn’t help my case as I want to plot a graph). Same for classes and their derived records (doesn’t list immediate ones). I also cannot track which variables are coming for what class in the inheritance chain.

I also intend to make more use of tblgen APIs than dump-json for eg adding the filename in which record/class is defined to node info.

Edit: Template args are also not part of json output.

Thanks for trying. I still think this is the kind of job that -dump-json was designed for. If it’s not good enough maybe we should improve it or remove it. Any thoughts @statham-arm?

My use of the JSON backend has been writing Tablegen backends in something other than C++ (along the lines of https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/jupyter/sql_query_backend.ipynb).

So I wouldn’t say that info being missing makes it not fit for purpose. Most backends are not trying to dump the entire graph, no pun intended.

Certainly would be a bonus to have that information in the JSON though.

@jayfoad, thanks for drawing my attention to this!

I don’t have a strong opinion on whether it’s more maintainable to put this mode into Tablegen proper, or to keep it at arm’s length as a consumer of the JSON. I can see pros and cons either way, assuming this is intended to end up in-tree one way or the other.

(If it’s not, then the JSON approach has all the advantages, because it saves some poor person from having to maintain a downstream patch on the Tablegen C++ code forever.)

But I’d have no objection to putting a few more pieces of information into the JSON output, if they’re useful to somebody. Source locations in particular seem as if they’d be useful to other users too, because one of the uses of -dump-json is that it lets you write your own completely new Tablegen backends without having to compile them into the Tablegen binary itself (e.g. if you’re trying to do something that can never be upstreamed, or a “build one to throw away” level of rapid prototype). And Tablegen backends certainly have a legit need to report semantic errors in the input, and reporting them with a source location is more useful to the end user.

(The other use of -dump-json is for people doing auxiliary analysis on data that is also being consumed by one of the existing built-in backends, such as extracting a target’s list of instructions, or the list of clang options, and one or two specific facts about each one. For that use, error reporting isn’t so critical, because the existing backend that consumes the same data has surely checked its semantic consistency already. That’s why source locations aren’t already in the JSON output.)

One of the early drafts of -dump-json actually generated a lot more information than the final version does: it had basically anything you could find in -print-records, including all the partially specified parametric expressions in the class definitions. My reasoning was that that way I was sure that anything you were previously doing by fragile text-matching on the output of -print-records would be possible to do more reliably by consuming the JSON.

But code review suggested cutting down the data to something much less ambitious, partly because the full version would have been huge. So if we’re going to add more things to the JSON, we should keep it to only the things someone actually has a use for.

(Also, I’m not sure even the original draft of -dump-json would have included the information about immediate class ancestry, because I don’t think -print-records shows it either.)

Thanks for looking into it @statham-arm and apologies for the late reply!

I would prefer this to end up in the tree as it might also be usable for others.

From the feedback received on this thread, I think -dump-json is the preferred way. Although that would mean adding a few more details to -dump-json. For now, I believe I would need to add:

  • Template args
  • Direct superclasses
  • File-name of the class/record definition

Do you think that’s something that can go inside -dump-json? I can open a proper review then?

Yes, please do!

I suppose that this would constitute a new version of the JSON format, and therefore you’d also want to increment the value of the !tablegen_json_version entry in the output dictionary.