TableGen trace facility

A question for those of you who have developed complex TableGen files: Do you think a trace facility would be useful during debugging? The idea is to add a new statement to TableGen along these lines:

  trace tag : value1, value2, ... ;

When encountered, the TableGen parser would display the tag along with the values of the specified value1, value2, etc. The tag is an identifier that makes it easier to distinguish multiple traces.

I think such a feature would be very helpful if the trace can be used to print
arbitrary parts of patterns. I often find that when a pattern is rejected I
need to strip down the *.td file, fire up gdb, set up breakpoints or step to
where the pattern is rejected, and inspect the current tree pattern. Having
the ability to insert trace points directly into the pattern would save me
this effort.

Cheers,
Gabriel

I'm not so sure that that's particularly useful or how it would even
work. The issue is that at the point in time where the frontend parses
those trace tags, most substitutions haven't taken place yet, so
you'll usually get fairly trivial answers that by themselves aren't
particularly helpful.

Some form of inspection of how values are substituted would indeed be
helpful, I just don't think the "trace" is quite it. In a perfect
world, we'd have some sort of "record database explorer" that doesn't
just let you look at all the final records (TableGen already allows
you to do that), but also allows you to interactively explore their
"history", i.e. how did the records come to be.

Cheers,
Nicolai

Hi Gabriel,

I think such a feature would be very helpful if the trace can be used to print
arbitrary parts of patterns. I often find that when a pattern is rejected I
need to strip down the *.td file, fire up gdb, set up breakpoints or step to
where the pattern is rejected, and inspect the current tree pattern. Having
the ability to insert trace points directly into the pattern would save me
this effort.

It's important to distinguish between the TableGen frontend, which is
the thing that parses .td files, expands defm's, evaluates
substitutions, etc., and the backends, which take the resulting record
database as input and generate some output.

What you're describing sounds more like problems with a backend (the
SelectionDAG ones, I assume?), and I don't see how what Paul sketched
very briefly would help you there. You're probably aware that running
TableGen with the default backend will dump the entire record database
as text, so that you can inspect the result of substitutions and check
whether the resulting pattern records really are what you expect? Once
you've done that, if you run into further problems with the backend
rejecting a pattern, you'd really need some sort of backend-specific
help.

Cheers,
Nicolai

Yes, I understand the problem. To be more useful, TableGen would have to carry the traces along with the classes and records and (re)display the values while the substitutions are being made.

I'm writing a new Programmer's Guide for TableGen and have been digging into the parse-time versus substitution-time issue. I haven't found a document that makes it clear. Can you give a quick summary of the phases?

Yes, I understand the problem. To be more useful, TableGen would have to carry the traces along with the classes and records and (re)display the values while the substitutions are being made.

I'm writing a new Programmer's Guide for TableGen and have been digging into the parse-time versus substitution-time issue. I haven't found a document that makes it clear. Can you give a quick summary of the phases?

There aren't really any phases per se, in the sense that the TableGen
frontend doesn't have passes. Maybe that's a mistake, but that's how
it evolved.

So instead, every time a record is "instantiated" -- whether that's by
defining a new class that derives from one or more pre-existing
classes, or by a def in a multi-class, or by a free-standing def, or a
defm (inside or outside a multiclass) -- any free variables that are
"resolved" as template parameters get their substitution applied. The
relevant let-statements take effect just after that.

However, and this is key, reference _between_ fields are only resolved
and substituted once the final defined record at the global scope is
produced. This is what makes:

class A<int x> {
  int a = x;
  int b = a + 1;
}

def B : A<5> { let a = 10; }

... result in b == 11 instead of b == 6.

There's probably subtlety here that I'm forgetting, but that's the
very short version of it :wink:

Cheers,
Nicolai

Are all the records collected as they are parsed, with template parameter substitution and lets, and *then*, after all records are collected, a "pass" is made to calculate the inter-field expressions?

Once I understand this, I will add a section to the new guide to explain it. I presume it is the case that this behavior should be publicized.

It also appears to be the case that a record is created and bound to its name before fields are inherited from its superclasses, which is why you can write:

class A <dag d> {
  dag the_dag = d;
}

def rec1 : A<(ops rec1)>

Do I understand that correctly?

I wasn't sure how to respond to John Byrd's post, since it wasn't addressed to me. So I've responded to Nicolai's.

I'm reasonably far along in the process of writing a new Programmer's Guide for TableGen. I will continue working on it and submit it for review. I expect to do some rewriting as a result.

John: Would you like me to respect a copyright on your documents linked below? I wouldn't take any text verbatim, but some of the ways of describing TableGen give me ideas.

Are all the records collected as they are parsed, with template parameter substitution and lets, and *then*, after all records are collected, a "pass" is made to calculate the inter-field expressions?

No, the inter-field expressions are resolved immediately as the final
"def" is processed. See TGParser::addDefOne.

Once I understand this, I will add a section to the new guide to explain it. I presume it is the case that this behavior should be publicized.

It also appears to be the case that a record is created and bound to its name before fields are inherited from its superclasses, which is why you can write:

class A <dag d> {
  dag the_dag = d;
}

def rec1 : A<(ops rec1)>

Do I understand that correctly?

Almost :slight_smile:

In reality, this is allowed as a special-case by
TGParser::ParseIDValue, whose purpose is looking up identifiers. See
the comment about self-references towards the bottom of the method.

Cheers,
Nicolai

Hi Paul,
If all you care about is debugging then for now we can just emit a few more debug messages which would help to “trace” the flow. To distinguish traces you can prefix it with some known string. I don’t think you really need a ‘trace’ tag in the language spec for this.

Debugging Tablegen has always been a nightmare and I don’t think we can ever reach a stage where we can start a debugger and debug statements in .td files step-by-step. This is far from reality unless we fundamentally change the language.

Your reply suggests that there is a way to see debug messages from TableGen. Is that what you meant? If so, can you explain how that works? (Sorry, I should know the answer to this question, but I'm quite the newbie.)

Well, I was hinting at LLVM_DEBUG messages. You can pretty much dump all “actions” Tablegen would take to process a .td file, which should suffice, IMO.

I'm sorry, I still don't understand. I presume you are talking about the LLVM_DEBUG() macro that is enabled with the -debug option. But there are no uses of LLVM_DEBUG() in the TableGen base files. Wouldn't the base "compiler" have to display the information we're talking about? Or is it sufficient to use LLVM_DEBUG() in the target-specific files?

Gabriel Hjort Åkerlund via llvm-dev <llvm-dev@lists.llvm.org> writes:

I think such a feature would be very helpful if the trace can be used to print
arbitrary parts of patterns. I often find that when a pattern is rejected I
need to strip down the *.td file, fire up gdb, set up breakpoints or step to
where the pattern is rejected, and inspect the current tree pattern. Having
the ability to insert trace points directly into the pattern would save me
this effort.

The way I usually debug these is open up the generated selection .inc
for the target (beware, it's HUGE!) and do a compiler run with
-debug-only=isel,x86-isel (for example). The debug dump shows the steps
the isel interpreter takes along with the "addresses" of branch points
and targets where the interpreter makes decisions. These "addresses"
are comments in the .inc file and one can step through it by following
the printouts in the debug dump. That way one can see where things go
wrong. I would say 95% of the time I can find the problem this way.

One thing that would help tremendously is reducing the size of the .inc,
either by splitting it into multiple files or by some other method.

               -David

Hi Paul,

I'm sorry, I still don't understand. I presume you are talking about the LLVM_DEBUG() macro that is enabled with the -debug option. But there are no uses of LLVM_DEBUG() in the TableGen base files. Wouldn't the base "compiler" have to display the information we're talking about? Or is it sufficient to use LLVM_DEBUG() in the target-specific files?

Trying to interpret what Madhur is saying, maybe adding some uses of
LLVM_DEBUG() to TableGen would be a good idea.

Cheers,
Nicolai

Hi Paul,
There are a bunch of LLVM_DEBUG messages in CodeGenDAGPatterns.cpp. There are many in Tablegen backends too. In other places you may have to include headers to use LLVM_DEBUG.

Note, I am talking about Tablegen core which generates .inc files in some backends (like DAGISel, GlobalISel etc.) which gets included by the compiler. Hence, these messages will be dumped during .inc generation. As far as I can imagine, such messages during .inc generation should be able to unleash the flow. BTW, if you do this exercise, please commit! There are a very few debug messages Tablegen dumps as of now. Hence, any enhancement on this line would be a useful addition.

I hope that helps.

I wasn’t sure how to respond to John Byrd’s post, since it wasn’t addressed to me. So I’ve responded to Nicolai’s.

It was intended for you and the others on the thread.

I’m reasonably far along in the process of writing a new Programmer’s Guide for TableGen. I will continue working on it and submit it for review. I expect to do some rewriting as a result.

I’d like to collaborate, if you point me at your tree. Like you, I also have both a strong need for the documentation, and some strong opinions on how it might be improved.

John: Would you like me to respect a copyright on your documents linked below? I wouldn’t take any text verbatim, but some of the ways of describing TableGen give me ideas.

Not a bit – I had intended to submit it as a patch, but got sidetracked. Use it however you can –

jwb

Thanks for the open invitation to use your text.

I am writing the Programmer's Guide as a .rst file, as you'd expect. It's just sitting in a working directory alongside my copy of the repository. I'm not sure what you mean by pointing to my tree, but I presume you're talking about reviewing it prior to submitting it for review with Phabricator? (Forgive my newbieness.)

Thanks for the open invitation to use your text.

I am writing the Programmer's Guide as a .rst file, as you'd expect. It's just sitting in a working directory alongside my copy of the repository. I'm not sure what you mean by pointing to my tree, but I presume you're talking about reviewing it prior to submitting it for review with Phabricator? (Forgive my newbieness.)

John,

I'm pretty much down to details and smoothing the text. Here is a pdf. Can you mark it with comments? Or you can send an email with a list. I will incorporate your comments and send you a second proof, so you can be sure I understood you.

Is a week long enough?

~~ Paul

TableGen-Programmers-Guide.pdf (451 KB)