alias information on machine instructions

hi,

i`m working on a machine instruction scheduler for an VLIW architecture.
loads are somewhat expensive on this architecture, thus i would like to
reorder unrelated loads/stores to better hide load latencies.

to do this, i would need alias information on machine instructions,
i.e., which machine instructions may access the same memory region.

as far as i know, this is not available at the moment? are there any
plans to get aliasing information on machine instructions?

florian

There are a couple of ways to do this. Is your scheduler a prepass scheduler (before regalloc) or a post-pass scheduler (after regalloc)? If you want to extract maximal parallelism, I assume you want a prepass scheduler. In that case, you should look into the SelectionDAG based schedulers, which do have alias information on them.

-Chris

Chris Lattner wrote:

There are a couple of ways to do this. Is your scheduler a prepass
scheduler (before regalloc) or a post-pass scheduler (after regalloc)?

it is a post-pass scheduler, which operates on MachineInstrs. we need to
run it after register allocation to hide latencies of spill code,
prolog, and epilog.

If you want to extract maximal parallelism, I assume you want a prepass
scheduler. In that case, you should look into the SelectionDAG based
schedulers, which do have alias information on them.

i had a look at the SelectionDAG based schedulers. it seems that
aliasing loads/stores are chained together by the DAGCombiner. after
scheduling, when the MachineInstrs are created, the alias information
cannot be used anymore in the current framework. is this correct?

i don't think that it is a good idea to do alias analysis on
MachineInstrs, thus i would like to preserve the alias information, and
annotate the MachineInstrs explicitly. however, this information might
be invalidated by subsequent passes.

currently, our backend does not emit any additional loads/stores execpt
for spill code, prolog and epilog. thus this approach would work (at
least for now).

do you have any ideas for a better solution?

florian

Chris Lattner wrote:

There are a couple of ways to do this. Is your scheduler a prepass
scheduler (before regalloc) or a post-pass scheduler (after regalloc)?

it is a post-pass scheduler, which operates on MachineInstrs. we need to
run it after register allocation to hide latencies of spill code,
prolog, and epilog.

ok.

If you want to extract maximal parallelism, I assume you want a prepass
scheduler. In that case, you should look into the SelectionDAG based
schedulers, which do have alias information on them.

i had a look at the SelectionDAG based schedulers. it seems that
aliasing loads/stores are chained together by the DAGCombiner. after
scheduling, when the MachineInstrs are created, the alias information
cannot be used anymore in the current framework. is this correct?

Right. The original Value*'s are preserved in the DAG, but dropped when MachineInstrs are created. We could add a machineoperand to capture this Value* if desired.

i don't think that it is a good idea to do alias analysis on
MachineInstrs, thus i would like to preserve the alias information, and
annotate the MachineInstrs explicitly. however, this information might
be invalidated by subsequent passes.

Right, it would have to be added as a Value* operand to MachineInstr. This would not be invalidated by later passes, they would just ignore the field. Targets would have to be updated though, as certain pieces of code "know" about the expected operands.

-Chris

Another benefit of keeping the original Value*'s (and offsets) around is it
would provide more information that could be used for verbose asm output
and MachineInstr-level dumps, which would be quite nice.

Dan

Dan Gohman wrote:

hi,

Florian Brandner wrote:

Dan Gohman wrote:

Right. The original Value*'s are preserved in the DAG, but dropped when
MachineInstrs are created. We could add a machineoperand to capture this
Value* if desired.

Another benefit of keeping the original Value*'s (and offsets) around is it
would provide more information that could be used for verbose asm output
and MachineInstr-level dumps, which would be quite nice.

i'll do this and post a patch, unless someone else is already working on
this.

i have extended the DAG instruction selector and scheduler to preserve
Value*'s. the values are attached to machine instructions as a separate
operand list (at least for now). for now this looks very good.

but i found that the alias analysis is not as good as expected. the
problem seems to be the codegenprepare pass. GEP instructions are
lowered to integer calculations (e.g. ptrtoint, add, inttoptr). this
causes the (basic) alias analysis to answer MayAlias for most queries.

this is also a problem for the DAG combiner, when the
-combiner-global-alias-analysis switch is given to llc.

any ideas how to handle this?
florian

hi,

Florian Brandner wrote:
> Dan Gohman wrote:
>>> Right. The original Value*'s are preserved in the DAG, but dropped when
>>> MachineInstrs are created. We could add a machineoperand to capture this
>>> Value* if desired.
>> Another benefit of keeping the original Value*'s (and offsets) around is it
>> would provide more information that could be used for verbose asm output
>> and MachineInstr-level dumps, which would be quite nice.
>>
>
> i'll do this and post a patch, unless someone else is already working on
> this.
>

i have extended the DAG instruction selector and scheduler to preserve
Value*'s. the values are attached to machine instructions as a separate
operand list (at least for now). for now this looks very good.

Cool!

but i found that the alias analysis is not as good as expected. the
problem seems to be the codegenprepare pass. GEP instructions are
lowered to integer calculations (e.g. ptrtoint, add, inttoptr). this
causes the (basic) alias analysis to answer MayAlias for most queries.

this is also a problem for the DAG combiner, when the
-combiner-global-alias-analysis switch is given to llc.

any ideas how to handle this?

I don't have any that are both easy and clean :-/.

It probably wouldn't be difficult to teach the basic alias analysis how
to dig past adds and casts to find base expressions. Or, make the
CodeGenPrepare pass use getelementptrs directly instead of expanding
addressing into integer operations. Or maybe instcombine could be run
after CodeGenPrepare so it could fold the integer operations into
getelementptrs?

I think the long-term solution is to teach the SelectionDAG framework
how to look beyond basic-block boundaries so that CodeGenPrepare doesn't
have to do this de-hoisting of addressing.

It also might make sense to write a SCEV-based AliasAnalysis some day.

Dan

hi,

Florian Brandner wrote:

Dan Gohman wrote:

Right. The original Value*'s are preserved in the DAG, but dropped when
MachineInstrs are created. We could add a machineoperand to capture this
Value* if desired.

Another benefit of keeping the original Value*'s (and offsets) around is it
would provide more information that could be used for verbose asm output
and MachineInstr-level dumps, which would be quite nice.

i'll do this and post a patch, unless someone else is already working on
this.

i have extended the DAG instruction selector and scheduler to preserve
Value*'s. the values are attached to machine instructions as a separate
operand list (at least for now). for now this looks very good.

Cool!

but i found that the alias analysis is not as good as expected. the
problem seems to be the codegenprepare pass. GEP instructions are
lowered to integer calculations (e.g. ptrtoint, add, inttoptr). this
causes the (basic) alias analysis to answer MayAlias for most queries.

this is also a problem for the DAG combiner, when the
-combiner-global-alias-analysis switch is given to llc.

any ideas how to handle this?

I don't have any that are both easy and clean :-/.

It probably wouldn't be difficult to teach the basic alias analysis how
to dig past adds and casts to find base expressions. Or, make the

I think this is very much doable and would provide much immediate benefit. It's not a trivial task. We have to make sure DAG combiner and legalizer do not muck up alias information (among other challenges).

CodeGenPrepare pass use getelementptrs directly instead of expanding
addressing into integer operations. Or maybe instcombine could be run
after CodeGenPrepare so it could fold the integer operations into
getelementptrs?

I think the long-term solution is to teach the SelectionDAG framework
how to look beyond basic-block boundaries so that CodeGenPrepare doesn't
have to do this de-hoisting of addressing.

Whole-function isel is being considered. No immediate plan yet though.

Evan

i have extended the DAG instruction selector and scheduler to preserve
Value*'s. the values are attached to machine instructions as a separate
operand list (at least for now). for now this looks very good.

but i found that the alias analysis is not as good as expected. the
problem seems to be the codegenprepare pass. GEP instructions are
lowered to integer calculations (e.g. ptrtoint, add, inttoptr). this
causes the (basic) alias analysis to answer MayAlias for most queries.

cool!

this is also a problem for the DAG combiner, when the
-combiner-global-alias-analysis switch is given to llc.

any ideas how to handle this?

Probably the best way is to enhance the LSR pass to insert GEP instructions instead of pointer arithmetic. For example, turn this:

for (int i = 0...; ++i)
   A[i] = 0;

into the equivalent of this:

for (int i = 0, *p = &A[0]; ...; ++i, ++p)
   *p = 0;

Inside the loop, this turns into a:

  next = getelementptr int* prev, 1

instead of a "cast, add 4, cast" sequence.

-Chris

It probably wouldn't be difficult to teach the basic alias analysis how
to dig past adds and casts to find base expressions. Or, make the
CodeGenPrepare pass use getelementptrs directly instead of expanding
addressing into integer operations. Or maybe instcombine could be run
after CodeGenPrepare so it could fold the integer operations into
getelementptrs?

Running instcombine is probably a bad idea, because it makes all sorts of transformations (e.g. constant folding casts) that can undo things carefully done by codegenprepare and lsr.

I think the long-term solution is to teach the SelectionDAG framework
how to look beyond basic-block boundaries so that CodeGenPrepare doesn't
have to do this de-hoisting of addressing.

yes yes yes, absolutely.

It also might make sense to write a SCEV-based AliasAnalysis some day.

That would be interesting as well, we could even hook dependence analysis up to it so that we could prove A[i] and A[j] don't alias etc.

-Chris

hi,

i know it took a while, but here is a patch that adds a list of source
values to machine instructions.

i modified the DAGISelEmiter to automatically catch regular
loads/stores. custom instructions and loads/stores rewritten by the
lowering pass are not automatically captured.

during the instruction selection a source value operand is added to the
DAG for patterns matching a load/store.

after the scheduling these additional operands are added to the machine
instruction automatically.

one can add special source values for spills, using the machine
instruction builder.

i'm not to happy about the representation of the source values in the
selection DAG. i would like to use a new SelectionDAG node, which
then distinguishes between reads and writes, and adds additional
information on the memory access. but so far the current scheme works fine.

i've testet all this for our backend only, which is not public. i do not
know how much has to be done to integrate this with the other, e.g., the
x86, targets. does any of the other targets rewrite loads/stores during
lowering?

any comments how to integrate this into llvm, and how to proceed to get
the alias analysis working after the codegen prepare pass?

florian

Florian Brandner wrote:

minstr_svops.patch (15.2 KB)

hi,

i know it took a while, but here is a patch that adds a list of source
values to machine instructions.

Cool!

i've testet all this for our backend only, which is not public. i do not
know how much has to be done to integrate this with the other, e.g., the
x86, targets. does any of the other targets rewrite loads/stores during
lowering?

I tried out your patch on x86 and it didn't appear to need any special changes.
With this code:

define i32 @foo(i32* %p) {
  %q = getelementptr i32* %p, i32 17
  %r = load i32* %q
  %s = add i32 %r, 10
  ret i32 %s
}

The llc -march=x86 -print-machineinstrs output looks like this:

: 0x8a99fc8, LLVM BB @0x8a94998, ID#0:
        %reg1024 = MOV32ri 10 SV:0
        %reg1025 = MOV32rm <fi#-1>, 1, %NOREG, 0 SV:1[??]
        %reg1026 = ADD32rm %reg1024, %reg1025, 1, %NOREG, 68 SV:1[q]
        %EAX = MOV32rr %reg1026 SV:0
        RET SV:0

(For those following along, the SV:1[??] and SV:1[q] are the new parts here).

For the [??], it looks like the IsFrameIndex isn't getting set for the first
instruction there.

A few quick comments on specific parts of the patch that I noticed so far:

+ TargetSrcValue,

I'm curious why you added a new node kind, TargetSrcValue, instead of just
using the existing SRCVALUE.

+ else if (MRO.SrcValue && !MRO.SrcValue->getName().empty())
+ OS << "[" << MRO.SrcValue->getName() << "]";

This code should also print the SVOffset value.

+ SDOperand getVecLoad(unsigned Count, MVT::ValueType VT, SDOperand Chain,
+ SDOperand Ptr, SDOperand SV);

This is code that was deleted from the LLVM trunk recently; it looks like it
shouldn't be included in this patch.

Dan

Dan Gohman wrote:

I tried out your patch on x86 and it didn't appear to need any special changes.

it might be needed to look at the addressing modes of a load/store to
get the right offset. but i think it should work, if the lowering does
not rewrite loads/stores into custom DAG nodes.

For the [??], it looks like the IsFrameIndex isn't getting set for the first
instruction there.

yes, this needs to be added for each target. for our target i've
modified the loadRegFromStackSlot and storeRegToStackSlot methods to add
information on the frame index:
  BuildMI(MB, MBI, TII.get(STORE_REG_IMM)).addReg(framePointer)
      .addFrameIndex(FrameIndex).addReg(SrcReg).addSVOp(FrameIndex);

I'm curious why you added a new node kind, TargetSrcValue, instead of just
using the existing SRCVALUE.

this is needed to ensure that the lowering pass does not rewrite them. i
don't know if this is actually done, but anyway i wanted to be on the
safe side.

+ else if (MRO.SrcValue && !MRO.SrcValue->getName().empty())
+ OS << "[" << MRO.SrcValue->getName() << "]";

This code should also print the SVOffset value.

this was meant for debugging purpose only, but yes it would be helpful
to print the offset.

this leads to a question on how to query the alias analysis. when i have
a look at the alias set tracker, it seems that only the value and the
size of the type is relevant for the query. but when i look at the
DAGCombiner an "overlap" is calculated from the type size, the offset
and an other offset. how is this supposed to work?

the arguments of the "alias" function are named "V1Size" and "V2Size",
so it would make sense to pass the size only?

+ SDOperand getVecLoad(unsigned Count, MVT::ValueType VT, SDOperand Chain,
+ SDOperand Ptr, SDOperand SV);

This is code that was deleted from the LLVM trunk recently; it looks like it
shouldn't be included in this patch.

you are right, sorry.

florian

Dan Gohman wrote:

I tried out your patch on x86 and it didn't appear to need any special changes.

it might be needed to look at the addressing modes of a load/store to
get the right offset. but i think it should work, if the lowering does
not rewrite loads/stores into custom DAG nodes.

FWIW, I just tried a complex address example on x86 and it looks like it
does work.

define i32 @foo(i32* %base, i32 %i) {
  %a = getelementptr i32* %base, i32 %i
  %b = getelementptr i32* %a, i32 19
  %c = load i32* %b
  %d = add i32 %c, 1
  ret i32 %d
}

In -print-machineinstrs, this comes out as:

     %reg1027 = ADD32rm %reg1024, %reg1026, 4, %reg1025, 76 SV:1[b]

Initially I was thinking it might get a+76 instead of b, however I don't
think that's necessary, or even desirable for alias analysis.

For the [??], it looks like the IsFrameIndex isn't getting set for the first
instruction there.

yes, this needs to be added for each target. for our target i've
modified the loadRegFromStackSlot and storeRegToStackSlot methods to add
information on the frame index:
  BuildMI(MB, MBI, TII.get(STORE_REG_IMM)).addReg(framePointer)
      .addFrameIndex(FrameIndex).addReg(SrcReg).addSVOp(FrameIndex);

I added this to the x86 target (in addFrameReference in
X86InstrBuilder.h), though it still misses some cases.

For example, loads of parameters from the stack are loads with NULL
SrcValues, and they have FrameIndex operands. When
SelectionDAG::getTargetSrcValue sees them, it currently just passes on the
NULL SrcValue; it doesn't notice the FrameIndex operand.

I think getTargetSrcValue needs to check if the address of a load or
store is a FrameIndexSDNode operand.

I'm curious why you added a new node kind, TargetSrcValue, instead of just
using the existing SRCVALUE.

this is needed to ensure that the lowering pass does not rewrite them. i
don't know if this is actually done, but anyway i wanted to be on the
safe side.

Ok. I think it's safe. At a quick glance, I don't see any code that lowers
it, and SRCVALUE should never contribute directly to an instruction, so it
shouldn't need to be lowered.

+ else if (MRO.SrcValue && !MRO.SrcValue->getName().empty())
+ OS << "[" << MRO.SrcValue->getName() << "]";

This code should also print the SVOffset value.

this was meant for debugging purpose only, but yes it would be helpful
to print the offset.

Thanks. Debugging purposes are one of the reasons I'm interested in this :-).

And actually, I have several more specific requests for the code in
MachineInstr.cpp, now that I've looked at it a little more.

The SVOperand printing should be under 'if (getNumSVOperands() > 0)' so that
we don't see SV: printed for instructions that don't have any memory
operands. Don't print the value of getNumSVOperands() itself; it's easy to
see how many operands there are because they're all printed. And it'll
almost always just be 1. And lastly, please change the " SV:" to ", SV:" so
that it's consistent with the rest of the (comma-separated) operands.

this leads to a question on how to query the alias analysis. when i have
a look at the alias set tracker, it seems that only the value and the
size of the type is relevant for the query. but when i look at the
DAGCombiner an "overlap" is calculated from the type size, the offset
and an other offset. how is this supposed to work?

It looks like that second offset being added into the size arguments in
DAGCominer's AliasAnalysis query is bogus, actually...

the arguments of the "alias" function are named "V1Size" and "V2Size",
so it would make sense to pass the size only?

It looks like you need to first extend the SVOperand struct to include the
size of the memory reference, because I don't think that can be easily
deduced otherwise.

Then your MachineInstr-level alias test can do something similar to what
DAGCombiner::isAlias does, minus the FindBaseOffset stuff.
- If the bases are the same, the query can be answered by simply examining
   the offsets and sizes.
- Otherwise, query the AliasAnalysis, using each reference's offset+size
   for its size argument. This is over-conservative, but the previous check
   should protect it in most cases.

Dan