Load serialisation during selection DAG building

I've got a question about how SelectionDAGBuilder treats loads.

The LLVM Language Reference Manual explicitly states that the order of volatile operations may be changed relative to non-volatile operations. However, when the SelectionDAGBuilder in LLVM 3.1 encounters a volatile load, it flushes all pending loads and then chains the volatile load onto them meaning that the volatile load must be scheduled after those loads. While this behaviour isn't wrong, it seems to reduce the scope for efficient instruction selection.

Is there any reason not to modify the behaviour of SelectionDAGBuilder::visitLoad() so that volatile loads don't cause SelectionDAGBuilder::getRoot() to be called. Instead, they can be chained together with the head of the chain being stored in PendingLoads. Then when something else calls SelectionDAGBuilder::getRoot(), the chain of volatile loads is TokenFactored together with the non-volatile loads. I've tried this out and it seems to do what I want but as I'm fairly inexperienced with LLVM, I'm not sure whether there's something else preventing this strategy from working.

The reason I noticed this is because I have been developing a back-end for a target in which some instructions are implemented as pseudos which will be expanded into a pair of instructions. Each of the two operands of the pseudo can be a load but as the expansion accesses the right operand twice, I don't want to match a volatile load for this operand. For example, in:

  %0 = load i16* @y, align 2, !tbaa !0
  %1 = load volatile i16* @x, align 2, !tbaa !0
  %add = add i16 %0, %1

the volatile load is sequenced after the non-volatile load which means that the non-volatile load can't match the left operand of the add because this would create a scheduling cycle. This means both loads are selected as load instructions, resulting in use of an extra register and an extra instruction. If I change the behaviour of SelectionDAGBuilder::visitLoad() as described then I get just two instructions.

Steve,

I had created a patch last year that does something similar to what you
describe for regular loads and stores using aliasing information. I
think that the last message in the thread was:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120402/140299.html

This approach has worked for me, but it is not the preferred solution
going forward. The preferred solution is to keep the "critical
chain" as is (partly because there are places in some of the backend
lowering code that assume it exists in its current form), and just
update the scheduler to be more intelligent about how it forms the
dependency graph. This has since been developed for the new MI
scheduling framework by Sergei Larin, and was committed in r156842
(last message in the discussion thread was
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120507/142659.html)

I would recommend trying something like Sergei's solution first, and
fall back to trying to play with the critical chain only if that can't
or won't work.

-Hal

Thanks Hal,

I'll look into it Sergei's solution.

Steve

I looked into those patches but I don’t think they will help in my situation because my problems occur during instruction selection rather than scheduling.

A simple and concrete example is a pattern like:

[(set GR:$dst (add GR:$src (nvload addr:$mem)))]

where nvload matches a load provided that isVolatile() is false.

If the selection DAG looks like:

LD1 LD2
^ ^

\ /
add
^

\ /
ST

and the chain like:

LD1 LD2
^ ^

\ /
TokenFactor
^

ST

then the add instruction is selected. However, if operand 1 of the add is a volatile load, then the chain looks like:

LD1
^

LD2Volatile

ST

In this case the add instruction cannot be selected. The operator is commutative, so instruction selection matches the non-volatile load against the operand 1 but then fails because to select this instruction would mean that the volatile load into a register (to match operand 0) would occur before the non-volatile load that’s folded into the instruction.

I can get this instruction to be selected by changing the way loads are built into the selection DAG, i.e. making the chain:

LD1 LD2Volatile
^ ^

\ /
TokenFactor
^

ST

Given what the LRM says about volatile accesses, this seems valid but I take Hal’s point about there being code that might not be expecting such a chain.

It’s a matter of changing a few lines in SelectionDAGBuilder.cpp to achieve what I want so I think I’ll go ahead and see what happens.

If Hal or anyone else has any further thoughts on the potential problems with doing so then I’d be glad to know.

Thanks

Steve Montgomery

Further to my earlier question, I’m perhaps a bit confused about memory serialisation. The following example, compiled using clang for the MSP430:

target datalayout = “e-p:16:16:16-i8:8:8-i16:16:16-i32:16:32-n8:16”
target triple = “msp430-??-??”

@y = common global i16 0, align 2
@x = common global i16 0, align 2

define void @f() nounwind {
entry:
%0 = load i16* @y, align 2, !tbaa !0
%1 = load volatile i16* @x, align 2, !tbaa !0
%add = add i16 %1, %0
store i16 %add, i16* @y, align 2, !tbaa !0
ret void
}

has a chain store->load volatile->load. I thought this meant that the load volatile had to occur after the load but the MSP430 backend selects the ADD16mm instruction for which I suspect the order of operand access isn’t specified. So, does the chain mean “no earlier than” rather than “later than”?

No, a chain is supposed to mean "later than". It sounds like MSP430 is bending
the rules here.

Dan

No, a chain is supposed to mean "later than". It sounds like MSP430 is bending
the rules here.

The instruction selector for ADD16mm is autogenerated, so, this is not
MSP430 bug alone :slight_smile:
This is just the single target in the tree which has mem-mem instructions.

OK, I guess the problem is in HandleMergeInputChains() or maybe SelectCodeCommon() so should I file a bug report against SelectionDAGISel.cpp?