Isel DAG documentation?

I'm having a great deal of trouble figuring out how to write instruction
patterns which actually match the DAG produced by the compiler. I can't
seem to find any documentation on both what the various nodes represent
or on what the syntax accepted by TableGen is. The backends I have
access to all seem to do this in different (and obscure) ways. And when
things go wrong the compiler seems to dump the DAG tree in a entirely
different syntax to that used by TableGen.

Can anyone point me at any decent resources as to what all this stuff
actually does? And how to figure out what's going wrong when it
inevitably does go wrong?

So far all I've found is include/llvm/Target/TargetSelectionDAG.td,
which at least lists them but doesn't say what they do, and the C++
implementations in SelectionDAGNodes.h, which is entirely undocumented...

Generally speaking, the Graphviz options (such as -view-isel-dags) to visualize DAGs are your friends.
ISDOpcodes.h contains what documentation there is on the semantics of each opcode.

—Owen

And TargetOpcodes.h for a few of the post-ISel ones (mostly they're in
MachineInstr form, but you'll see them with -view-sched-dags, and
occasionally before).

Tim.

You can use -mllvm -debug-only=isel to see the DAG during the lowering process, and it will look mostly like what you see in the error messages.
You can also add statements that print the SD nodes to the code that builds the DAG, or the DAG combiner.

I'd take a working backend, and watch what happens when I modify the .td files. Pick one that seems the easiest to understand, or simply ask about the obscurity when something looks strange.

Your experiences with this are probably not uncommon...

-Krzysztof

[...]

And TargetOpcodes.h for a few of the post-ISel ones (mostly they're in
MachineInstr form, but you'll see them with -view-sched-dags, and
occasionally before).

Thanks very much; but I'm afraid it's not helping much... if only I
could find some loose ends to unravel, I'm sure I can make this work (I
have compiler experience elsewhere, just not with LLVM), but no matter
how much I pick I can't make anything come loose!

Some specific questions:

(1) Given the DAG: i32 = GlobalAddress<i32* @i>

...I would expect to be able to select for it with an instruction
pattern of this form:

def MOVar : S32<
  (outs GR32:$rD), (ins i32imm:$addr),
  "mov $rD, $addr",
  [(set GR32:$rD, globaladdr:$addr)]

;

(I'm aware the i32imm in the ins field is wrong, but I don't think I've
got far enough for that to be a problem yet.) What actually happens when
I try it is the compiler stack dumps, producing debug tracing that looks
like this:

x7f914882e210: i32 = <<Unknown Machine Node #65509>> 0x7f914882e210
[ORD=2] [ID=1]
  0x7f914882e210: i32 = <<Unknown Machine Node #65509>> 0x7f914882e210
[ORD=2] [ID=1]
    0x7f914882e210: i32 = <<Unknown Machine Node #65509>> 0x7f914882e210
[ORD=2] [ID=1]
    ...etc...

It seems to have somehow managed to create a cycle in the DAG, which is
of course wrong. But how? Has it replaced the GlobalAddress node with
the MOVar node and then updated $addr to refer to the MOVar node? I'm
not sure what I should be doing here --- I can't select for the
enclosing CopyToReg node, and GlobalAddress is a leaf...

(2) Even when I do manage to generate machine instructions, they're all
discarded because they're dead. This is leading to very fast but
slightly flawed code (all I get is a return instruction).

My best guess is that this is because I'm incorrectly wiring up the
function outputs, resulting in the compiler discarding the instructions
because the result isn't used. Except I'm using precisely the same
function lowering as one of the other targets. How do I find out what's
happening here?

Hi David,

  [(set GR32:$rD, globaladdr:$addr)]

It seems to have somehow managed to create a cycle in the DAG, which is
of course wrong. But how?

When I write a similar pattern into the ARM .td files and look at
(from the build directory) lib/Target/ARM/ARMGenDAGISel.inc, I see:

/*56478*/ /*SwitchOpcode*/ 13, TARGET_VAL(ISD::GlobalAddress),// ->56494
/*56481*/ OPC_RecordNode, // #0 = $src
/*56482*/ OPC_CheckType, MVT::i32,
/*56484*/ OPC_CheckPatternPredicate, 4, // (!Subtarget->isThumb())
/*56486*/ OPC_MorphNodeTo, TARGET_VAL(ARM::MOVi32imm), 0,
                  1/*#VTs*/, MVT::i32, 1/*#Ops*/, 0,
              // Src: (globaladdr:i32):$src - Complexity = 3
              // Dst: (MOVi32imm:i32 (globaladdr:i32):$src)

The first lines are fairly unimportant, but the last two show what the
DAG thinks it's doing and make it clear why the recursion happens.
It's because the globaladdr node doesn't actually get changed during
matching, so LLVM is just going to encounter it again at the next step
and perform the same substitution.

I'd want that Dst globaladdr to be a tglobaladdr (target globaladdr)
since LLVM knows it shouldn't try to select target nodes. Existing
targets mostly do that during ISelLowering for various reasons, but
for your case a separate pattern can probably do the job:

def : Pat<(globaladdr:$addr), (MOVar tglobaladdr:$addr)>;

(2) Even when I do manage to generate machine instructions, they're all
discarded because they're dead. This is leading to very fast but
slightly flawed code (all I get is a return instruction).

My best guess is that this is because I'm incorrectly wiring up the
function outputs, resulting in the compiler discarding the instructions
because the result isn't used.

That sounds most likely. If you run "llc -print-after-all" you should
be able to see the MachineInstr dataflow before it all gets removed.
For example a simple "ret i32* @glob" function (on ARM again) gives
me:

BB#0: derived from LLVM BB %0
        %vreg0<def> = MOVi32imm <ga:@addr>; GPR:%vreg0
        %R0<def> = COPY %vreg0; GPR:%vreg0
        BX_RET pred:14, pred:%noreg, %R0<imp-use>

The important point here is that the BX_RET has some kind of <imp-use>
of the registers that will be used to return (%R0 in this case). It
gets added first in XYZTargetLowering::LowerReturn, and then selected
in the .td file using a variadic node.

Which other target's LowerReturn are you using? That's probably the
first thing we should check is doing its job. The pictures from of
"llc -view-isel-dags" and "llc -view-sched-dags" on a simple function
returning something would be very useful there. If you can upload them
somewhere (or attach them) I'll take a look and see if anything looks
wrong.

Cheers.

Tim.

[...]

I'd want that Dst globaladdr to be a tglobaladdr (target globaladdr)
since LLVM knows it shouldn't try to select target nodes. Existing
targets mostly do that during ISelLowering for various reasons, but
for your case a separate pattern can probably do the job:

def : Pat<(globaladdr:$addr), (MOVar tglobaladdr:$addr)>;

So if I've understood what this does correctly: instead of just
describing a machine instruction to LLVM and letting it figure out how
to transform the DAG tree into machine instructions, we're giving it an
explicit rule about how to transform globaladdr nodes? That seems
sensible... but I'm afraid it doesn't work, dying in precisely the same
way as above.

My rules look like this:

def MOVar : S32<
  (outs GR32:$rD), (ins i32imm:$addr), // i32imm is wrong here
  "mov $rD, $addr",
  [(set GR32:$rD, tglobaladdr:$addr)]

;

def : Pat<(globaladdr:$addr), (MOVar tglobaladdr:$addr)>;

In the crash dump, the DAG is dumped. The beginning looks like this:

Offending node:
0x7fd14880c010: i32 = <<Unknown Machine Node #65508>> 0x7fd14880c010
[ORD=2] [ID=1]
  0x7fd14880c010: i32 = <<Unknown Machine Node #65508>> 0x7fd14880c010
[ORD=2] [ID=1]
  ...etc...

(I'm pretty sure that machine node #65508 is the MOVar, but how do I
find out for sure?)

My gut feeling is that we have a node n, and the pattern fragment above
is transforming this to (MOVar n) *at the same address*, so n points at
the new transformed node. Thus, forming a cycle. The other targets don't
suffer from this because they represent address as complex patterns and
use custom-written selectors to pull the information out of the
globaladdr node and represent it in a different way. I'd really rather
not do this, as I'd rather like to try and suss out pattern based
codegen before trying to customise it; is it actually possible to do
this using pure patterns?

[...]

The important point here is that the BX_RET has some kind of <imp-use>
of the registers that will be used to return (%R0 in this case). It
gets added first in XYZTargetLowering::LowerReturn, and then selected
in the .td file using a variadic node.

Yep --- it wasn't marked as variadic, and so my RET instruction had no
inputs. Fixing this makes it generate code quite happily. Ta.

That seems
sensible... but I'm afraid it doesn't work, dying in precisely the same
way as above.

So I think I've reached the bottom of this.

It looks like the generated table loses type information. At one point I
had a rule that looked like this:

def MOVr : S16<
    (outs GR32:$rD), (ins GR32:$rS),
    "mov $rD, $rS",
    [(set GR32:$rD, GR32:$rS)]

;

This ended up last on the opcode list, and was always invoked if no
other instruction matched --- because the selection pattern is basically
'x = y' from the point of view of the instruction selector, and
therefore fits everything.

I believe that the same thing's applying to the globaladdr issue. I had
rules that look like this:

def MOVar : S32<
  (outs GR32:$rD), (ins i32imm:$addr), // i32imm is wrong here
  "mov $rD, $addr",
  [(set GR32:$rD, tglobaladdr:$addr)]

;

def : Pat<(globaladdr:$addr), (MOVar tglobaladdr:$addr)>;

...but that (set) ends up as a rule which compiles to 'x = y' and
everything goes horribly wrong.

What I eventually did was to copy what the ARM target does: add code to
TargetLowering::LowerOperation() to explicitly convert the globaladdr
nodes into wrapped machine-specific nodes. My MOVar now looks like:

def MOVar : S32<
   (outs GR32:$rD), (ins i32imm:$addr), // i32imm is wrong here
   "mov $rD, $addr",
   [(set GR32:$rD, (WrappedGlobal tglobaladdr:$addr))]

;

The WrappedGlobal has no semantic meaning, simply being a
machine-specific ISD node with a single parameter. I'm not quite sure
*why* this works; it is, after all, the failing rule above should do
precisely the same thing, but in a less cryptic way.

I suspect there's an llvm bug here, but I'm not quite sure what it is.
It would be nice if tablegen was a bit more rigorous about detecting
invalid patterns (is there an option I'm missing?).