Making Sense of ISel DAG Output

I'm debugging some X86 patterns and I want to understand the debug dumps from
isel better.

Here's some example output:

  0x391bc40: i64,ch = load 0x3922c50, 0x391b8d0, 0x38dc530 <0x39053e0:0> <sext
i32> alignment=4 srcLineNum= 10
    0x3922c50: <multiple use>
          0x391bc40: <multiple use>
          0x3856ab0: <multiple use>
        0x3914520: i64 = shl 0x391bc40, 0x3856ab0 srcLineNum= 10
        0x38569b0: <multiple use>
      0x391bdf0: i64 = add 0x3914520, 0x38569b0 srcLineNum= 10
      0x39127c0: <multiple use>
    0x3913dd0: i64 = add 0x391bdf0, 0x39127c0 srcLineNum= 10
    0x38dc530: <multiple use>
  0x391bf40: f64,ch = load 0x3922c50, 0x3913dd0, 0x38dc530 <0x3850d30:0>
alignment=8 srcLineNum= 10

I think I've figured out that lines with greater indent feed lines with lesser
indent. So for example, the final load is fed by three operands: 0x3922c50,
0x3913dd0 and 0x38dc530. And <multiple use> seems to mean a node that feeds
many nodes.

Is this understanding correct?

Is there any way to tell how these dags map onto the final machine instruction
sequence? I'm having a hard time making sense of where some machine
instructions are coming from -- they appear to be superfluous. And I have a
pattern that specifies two distinct memory operands but the final code
sequence produced uses only one of them.

I'll try ot write a small example and send it in a bit.

                                                     -Dave

Ok, here's what I'm trying to do:

let AddedComplexity = 40 in {
  def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:
$src1))),
                                   (v2f64 (scalar_to_vector (loadf64 addr:
$src2))),
                                   SHUFP_shuffle_mask:$sm)),
      (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)),
                       (v2f64 (MOVSD2PDrm addr:$src2)),
                       SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
} // AddedComplexity

It turns out you can't actually write a pattern like this with tblgen as-is.
There's a bug where it outputs multiple definitions of some local variables.
I've patched that here and hope to send it upstream once I get approval.

But let's say you _could_ write such a pattern (because I can). The input DAG
looks like this:

            0x391a220: <multiple use>
          0x391c970: v2f64 = scalar_to_vector 0x391a220 srcLineNum= 10
            0x391ac10: <multiple use>
          0x391c8b0: v2f64 = scalar_to_vector 0x391ac10 srcLineNum= 10
          0x3927b10: <multiple use>
        0x3923100: v2f64 = vector_shuffle 0x391c970, 0x391c8b0, 0x3927b10<0,2>
srcLineNum= 10

The code that gets produced looks like this:

  %reg1071<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8)
[r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10
  %reg1072<def> = MOVSD2PDrm %reg1026, 8, %reg1065, 4294967288, Mem:LD(8,8)
[r66428 + 0]LD(8,8) [r78427 + 0] ; srcLine 10
  %reg1073<def> = SHUFPDrri %reg1071, %reg1072, 0 ; srcLine 10

Note that %reg1026 and %reg1065 are used in both address expressions even
though I specified different names ($src1 and $src2) in the pattern. Huh?
How do I find out who is screwing up? It could be an incorrect pattern, it
could be an incorrectly patched tblgen or it could be somewhere in
SelectionDAG itself.

                                                    -Dave

I think so, though I don't actually look at the SelectionDAG
dump() output very often.

I highly recommend the viewGraph() output. -view-isel-dags and
-view-sched-dags show the graph before and after selection,
respectively. See the CodeGen docs where I recently added some
text describing all these options.

Also, you can call viewGraph() from within a debugger, to view
the graph at arbitrary point in the middle of the selection
process. You can can even put a breakpoint on the Select
function and view the graph as each individual instruction is
selected.

It can get hairy with really large graphs, but if you're trying
to understand instruction selection, it's often possible to reduce
the testcases to a readable scale while still including the
interesting parts. SelectionDAG's setGraphColor method can also
help when graphs get large.

And FWIW, there are some significant improvements in the
viewGraph() output in TOT :-).

Dan

I highly recommend the viewGraph() output. -view-isel-dags and
-view-sched-dags show the graph before and after selection,
respectively. See the CodeGen docs where I recently added some
text describing all these options.

Yeah, I've been using those but they're real hard to understand with big
graphs.

Also, you can call viewGraph() from within a debugger, to view
the graph at arbitrary point in the middle of the selection
process. You can can even put a breakpoint on the Select
function and view the graph as each individual instruction is
selected.

That might be more helpful. Thanks for the tip!

Which Select function are you referring to?

It can get hairy with really large graphs, but if you're trying
to understand instruction selection, it's often possible to reduce
the testcases to a readable scale while still including the
interesting parts. SelectionDAG's setGraphColor method can also
help when graphs get large.

Unfortunately, the testcase is about as simple as it can get: a loop with a
gather, a multiply and a store. Maybe I can hand-whittle some IR.

And FWIW, there are some significant improvements in the
viewGraph() output in TOT :-).

Yeah, I saw that. Unfortunately we won't get it until early next year,
probably. :frowning:

                                               -Dave

Actrually, it's worse than this. I wanted to check to make sure something
else wasn't causing the problem but it appears to come from isel. The full
output for the DAG looks like this:

  %reg1059<def> = MOVSX64rm32 %reg1033, 1, %reg0, 4, Mem:LD(4,4) [tmp163 +
0] ; srcLine 10
  %reg1060<def> = MOVSDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8)
[r45154 + 0] ; srcLine 10
  %reg1061<def> = MOVSX64rm32 %reg1033, 1, %reg0, 0, Mem:LD(4,4) [iv.161162 +
0] ; srcLine 10
  %reg1062<def> = MOVSDrm %reg1026, 8, %reg1061, 4294967288, Mem:LD(8,8)
[r30158 + 0] ; srcLine 10
  %reg1063<def> = MOVSD2PDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8)
[r30158 + 0]LD(8,8) [r45154 + 0] ; srcLine 10
  %reg1064<def> = MOVSD2PDrm %reg1026, 8, %reg1059, 4294967288, Mem:LD(8,8)
[r30158 + 0]LD(8,8) [r45154 + 0] ; srcLine 10
  %reg1065<def> = SHUFPDrri %reg1063, %reg1064, 0 ; srcLine 10

Where the <bleep> are these extra dead MOVSDrms coming from? Note that the
extra MOVSDrms at least seem to use the correct addresses.

                                                   -Dave

Hi all,

The current repository (revision 56968.) does not compile on my Linux box (with GCC 3.4.6):

X86TargetAsmInfo.cpp:41: error: duplicate explicit instantiation of `bool llvm::X86TargetAsmInfo<BaseTAI>::ExpandInlineAsm(llvm::CallInst*) const [with BaseTAI = llvm::TargetAsmInfo]'
X86TargetAsmInfo.cpp:43: error: duplicate explicit instantiation of `bool llvm::X86TargetAsmInfo<BaseTAI>::LowerToBSwap(llvm::CallInst*) const [with BaseTAI = llvm::TargetAsmInfo]'

I'm not seeing this issue on my Macbook with GCC 4.0.1.

And it seems that there is a file missing at llvm/tools/llvmc2/plugins. The Makefile in llvm/tools/llvmc2/plugins/Base says:

LLVMC_PLUGIN = Base

include ../Makefile.common

In which ../Makefile.common is missing.

I'd be very appreciated if somebody could fix these issues.

Thank you very much!

Best,

Haohui

Haohui Mai <haohui.mai@gmail.com> writes:

Hi all,

The current repository (revision 56968.) does not compile on my Linux
box (with GCC 3.4.6):

X86TargetAsmInfo.cpp:41: error: duplicate explicit instantiation of
`bool
llvm::X86TargetAsmInfo<BaseTAI>::ExpandInlineAsm(llvm::CallInst*)
const [with BaseTAI = llvm::TargetAsmInfo]'
X86TargetAsmInfo.cpp:43: error: duplicate explicit instantiation of
`bool llvm::X86TargetAsmInfo<BaseTAI>::LowerToBSwap(llvm::CallInst*)
const [with BaseTAI = llvm::TargetAsmInfo]'

In

http://llvm.org/bugs/show_bug.cgi?id=2846

a user reports the same problem while using the CMake build system, and
then says that it compiles okay with configure && make. Now you say that
it fails with configure && make.

OTOH, as a comment on that bug report says, gcc 3.4.x is known to be
broken on AMD64:

http://llvm.org/docs/GettingStarted.html#brokengcc

I'm not seeing this issue on my Macbook with GCC 4.0.1.

And it seems that there is a file missing at llvm/tools/llvmc2/
plugins. The Makefile in llvm/tools/llvmc2/plugins/Base says:

LLVMC_PLUGIN = Base

include ../Makefile.common

In which ../Makefile.common is missing.

There was a commit related to this very recently. Update and try again.

Haohui Mai <haohui.mai@gmail.com> writes:

Hi all,

The current repository (revision 56968.) does not compile on my Linux
box (with GCC 3.4.6):

X86TargetAsmInfo.cpp:41: error: duplicate explicit instantiation of
`bool
llvm::X86TargetAsmInfo<BaseTAI>::ExpandInlineAsm(llvm::CallInst*)
const [with BaseTAI = llvm::TargetAsmInfo]'
X86TargetAsmInfo.cpp:43: error: duplicate explicit instantiation of
`bool llvm::X86TargetAsmInfo<BaseTAI>::LowerToBSwap(llvm::CallInst*)
const [with BaseTAI = llvm::TargetAsmInfo]'

In

http://llvm.org/bugs/show_bug.cgi?id=2846

a user reports the same problem while using the CMake build system, and
then says that it compiles okay with configure && make. Now you say that
it fails with configure && make.

OTOH, as a comment on that bug report says, gcc 3.4.x is known to be
broken on AMD64:

http://llvm.org/docs/GettingStarted.html#brokengcc

I'm not seeing this issue on my Macbook with GCC 4.0.1.

And it seems that there is a file missing at llvm/tools/llvmc2/
plugins. The Makefile in llvm/tools/llvmc2/plugins/Base says:

LLVMC_PLUGIN = Base

include ../Makefile.common

In which ../Makefile.common is missing.

There was a commit related to this very recently. Update and try again.

Also, you can call viewGraph() from within a debugger, to view
the graph at arbitrary point in the middle of the selection
process. You can can even put a breakpoint on the Select
function and view the graph as each individual instruction is
selected.

That might be more helpful. Thanks for the tip!

Which Select function are you referring to?

For example, X86DAGToDAGISel::Select, for the x86 target.

It can get hairy with really large graphs, but if you're trying
to understand instruction selection, it's often possible to reduce
the testcases to a readable scale while still including the
interesting parts. SelectionDAG's setGraphColor method can also
help when graphs get large.

Unfortunately, the testcase is about as simple as it can get: a loop with a
gather, a multiply and a store. Maybe I can hand-whittle some IR.

Another trick is to place an abort() call somewhere in codegen
such that it will be called whenever the construct of interest
is processed, and then run bugpoint. If it works, the result
is a reduced testcase that's still interesting :-).

A feature that would be really useful for large graphs that LLVM
doesn't yet have is the ability to display subsets of the graph.
For example, take a node of interest and show just it and its
operands and its users. Or maybe two or three or N levels of
operands.

Dan

Looking at your dump() output above, it looks like the pre-selection
loads have multiple uses, so even though you've managed to match a
larger pattern that incorporates them, they still need to exist to
satisfy some other users.

Dan

Yes, I looked at that too. It looks like these other uses end up being
chains to TokenFactor nodes that don't go anywhere. They're really, truly,
dead. Who is supposed to clean those up?

                                           -Dave

Another trick is to place an abort() call somewhere in codegen
such that it will be called whenever the construct of interest
is processed, and then run bugpoint. If it works, the result
is a reduced testcase that's still interesting :-).

That's a neat trick. I'll see if I can do that.

A feature that would be really useful for large graphs that LLVM
doesn't yet have is the ability to display subsets of the graph.
For example, take a node of interest and show just it and its
operands and its users. Or maybe two or three or N levels of
operands.

Yes, this would be helpful. I also looked at creating a setSubgraphColor
method that would color the node and its children. But there doesn't seem to
be a convenient way to traverse SelectionDAGs. It seems to be hard-coded by
tblgen.

                                                         -Dave

Take a look at SDNodeIterator and the GraphTraits for SDNode*
in SelectionDAGNodes.h.

Dan

DAGCombine will zap dead nodes, but nodes can also become dead
during selection. Some parts of selection know how to clean up
nodes that become dead during selection, but my guess is that
it's missing some cases.

Dan

Dear all,

I’m trying to run the example of recursive type construction examples in LLVM Programmer’s Manual, here is the code:

// *Create the initial outer struct*
[PATypeHolder](http://llvm.org/docs/ProgrammersManual.html#PATypeHolder) StructTy = OpaqueType::get();
std::vector<const Type*> Elts;
Elts.push_back(PointerType::get(StructTy));
Elts.push_back(Type::Int32Ty);
StructType *NewSTy = StructType::get(Elts);

// _At this point, NewSTy = "{ opaque*, i32 }". Tell VMCore that_
// *the struct and the opaque type are actually the same.*
cast<OpaqueType>(StructTy.get())->[refineAbstractTypeTo](http://llvm.org/docs/ProgrammersManual.html#refineAbstractTypeTo)(NewSTy);

// *NewSTy is potentially invalidated, but StructTy (a [PATypeHolder](http://llvm.org/docs/ProgrammersManual.html#PATypeHolder)) is*
// *kept up-to-date*
NewSTy = cast<StructType>(StructTy.get());

// *Add a name for the type to the module symbol table (optional)*
MyModule->addTypeName("mylist", NewSTy);

However, it does not work. The compiler says that the API of PointerType::get(PATypeHolder &) does not exist.

It seems that the API has been removed from LLVM trunk.

I’d be very appreciated if you could tell me how to construct Recursive Type using the LLVM trunk API.

Thank you very much.

Best,

Haohui

// NewSTy is potentially invalidated, but StructTy (a PATypeHolder) is
// kept up-to-date
NewSTy = cast<StructType>(StructTy.get());

However, it does not work. The compiler says that the API of PointerType::get

Use PointerType::getUnqual.

(PATypeHolder &) does not exist.

Make sure to resolve the type handle by invoking its 'get' method.

— Gordon

Ok, I figured out why I'm not getting the correct addresses for the
MOVSD2PDrms. It's another tblgen bug I have to patch. So I'll fix that and
see what happens.

                                                        -Dave

Ok, as far as I can tell, here's what's happening.

I have the following pattern:

let AddedComplexity = 40 in {
  def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:
$src1))),
                                   (v2f64 (scalar_to_vector (loadf64 addr:
$src2))),
                                   SHUFP_shuffle_mask:$sm)),
            (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)),
                       (v2f64 (MOVSD2PDrm addr:$src2)),
                       SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
} // AddedComplexity

After much hacking of tblgen, I finally convinced it to generate some
somewhat-seemingly-reasonably-correct matching and generation code.

What's happening is that that generation code constructs two MOVSD2PD
instructions. These are all brand-new SDNodes. The very last step of the
generation code calls

  SDNode *RetVal = CurDAG->SelectNodeTo(N.Val, Opc2, VT2, Tmp1, Tmp3, Tmp5);

where N is the vector_shuffle node. and the Tmp variables are the two
MOVSD2PD instructions and the shuffle mask. SelectNodeTo does an in-place
replacement of the machine-independent SDNode (vector_shuffle)with a
machine-dependent one (the SHUFPD).

When we pop back out to SelectRoot we run into this code:

      if (ResNode != Node) {
        if (ResNode) {
          ReplaceUses(Node, ResNode);
        }
        if (Node->use_empty()) { // Don't delete EntryToken, etc.
          ISelQueueUpdater ISQU(ISelQueue);
          CurDAG->RemoveDeadNode(Node, &ISQU);
          UpdateQueue(ISQU);
        }
      }

Since we did an in-placement replacement the first test fails and we don't
call ReplaceUses. I think this is correct EXCEPT that now we have orphaned
machine-independent loadf64 and scalar_to_vector nodes that got disconnected
from the vector_shuffle by SelectNodeTo. We go ahead and select those
(correctly) to MOVSD instructions which are then dead. But no one cleans them
up.

The root of the problem seems to be the orphand operands of the
vector_shuffle. Given that I had to hack tblgen quite a bit to get it to even
accept and generate mostly-correct code for the pattern, I'm going to need a
little help from the tblgen experts to get this final cleanup accomplished.
How should that happen? Does SelectNodeTo need to take care of it?

I plan to contribute these enhancements to tblgen back upstrream as they are
quite useful for doing cool isel stuff. But I gotta get this case to work
first. :slight_smile:

                                                       -Dave

Looking at your dump() output above, it looks like the pre-selection
loads have multiple uses, so even though you've managed to match a
larger pattern that incorporates them, they still need to exist to
satisfy some other users.

Yes, I looked at that too. It looks like these other uses end up being
chains to TokenFactor nodes that don't go anywhere. They're really,
truly,
dead. Who is supposed to clean those up?

DAGCombine will zap dead nodes, but nodes can also become dead
during selection. Some parts of selection know how to clean up
nodes that become dead during selection, but my guess is that
it's missing some cases.

Ok, as far as I can tell, here's what's happening.

I have the following pattern:

let AddedComplexity = 40 in {
def : Pat<(v2f64 (vector_shuffle (v2f64 (scalar_to_vector (loadf64 addr:
$src1))),
                                  (v2f64 (scalar_to_vector (loadf64 addr:
$src2))),
                                  SHUFP_shuffle_mask:$sm)),
           (SHUFPDrri (v2f64 (MOVSD2PDrm addr:$src1)),
                      (v2f64 (MOVSD2PDrm addr:$src2)),
                      SHUFP_shuffle_mask:$sm)>, Requires<[HasSSE2]>;
} // AddedComplexity

After much hacking of tblgen, I finally convinced it to generate some
somewhat-seemingly-reasonably-correct matching and generation code.

What's happening is that that generation code constructs two MOVSD2PD
instructions. These are all brand-new SDNodes. The very last step of the
generation code calls

SDNode *RetVal = CurDAG->SelectNodeTo(N.Val, Opc2, VT2, Tmp1, Tmp3, Tmp5);

where N is the vector_shuffle node. and the Tmp variables are the two
MOVSD2PD instructions and the shuffle mask. SelectNodeTo does an in-place
replacement of the machine-independent SDNode (vector_shuffle)with a
machine-dependent one (the SHUFPD).

When we pop back out to SelectRoot we run into this code:

     if (ResNode != Node) {
       if (ResNode) {
         ReplaceUses(Node, ResNode);
       }
       if (Node->use_empty()) { // Don't delete EntryToken, etc.
         ISelQueueUpdater ISQU(ISelQueue);
         CurDAG->RemoveDeadNode(Node, &ISQU);
         UpdateQueue(ISQU);
       }
     }

Since we did an in-placement replacement the first test fails and we don't
call ReplaceUses. I think this is correct EXCEPT that now we have orphaned
machine-independent loadf64 and scalar_to_vector nodes that got disconnected
from the vector_shuffle by SelectNodeTo. We go ahead and select those
(correctly) to MOVSD instructions which are then dead. But no one cleans them
up.

The root of the problem seems to be the orphand operands of the
vector_shuffle. Given that I had to hack tblgen quite a bit to get it to even
accept and generate mostly-correct code for the pattern, I'm going to need a
little help from the tblgen experts to get this final cleanup accomplished.
How should that happen? Does SelectNodeTo need to take care of it?

It should. SelectNodeTo is a wrapper around MorphNodeTo, and MorphNodeTo
has code to check for and remove nodes that become dead, specifically to
address this case. If that's not working, it's a bug.

What version of LLVM are you using here? This is code that has changed
substantially over the last few months. And we've fixed a number of
similar-sounding bugs along the way.

I plan to contribute these enhancements to tblgen back upstrream as they are
quite useful for doing cool isel stuff. But I gotta get this case to work
first. :slight_smile:

:slight_smile:

Dan

2.3 here. I'll check the logs to look for stuff I should pull in. Any
pointers?

                                           -Dave