Custom GEP lowering

It looks like I need to be able to intercept GEP lowering (in SelectionDAGLowering::visitGetElementPtr) and insert something else other than the shifts and adds. The basic problem is that CellSPU loads and stores on 16-byte boundaries. Consequently, the SPU backend has to do the load or store differently than most normal architectures that have byte-addressable operations.

Unfortunately, detecting whether an add is really an add or whether it was generated by a GEP lowering is ambiguous. Hence, the need to custom lower GEP.

From reading the code, hijacking SelectionDAGLowering::visitGetElementPtr() appears to be the only way to pull this off. Is there a better way?

If not, how receptive would the community be to:
a) Creating a GEP DAG node
b) Sending the new GEP node through legalize/custom/promote switch

I'm sure that this will spark a raging debate, with a few comments about how to refactor the whole legalize/custom/promote switch, etc. But all I really want to do it customize GEP processing for Cell.

-scooter

In TOT, load and store instructions have an alignment attribute which is
useful for addressing similar needs on other architectures. For example,
this attribute is used on x86, which also has a bunch of instructions
which require 16-byte alignment. x86 uses it quite late, after legalize,
and I don't know if that's appropriate for the CellSPU target, but
wherever you're doing the lowering, could you use the load and store
alignment attribute?

The alignment attribute can be set by LLVM IR producers (front-ends),
however instcombine also automatically sets alignments on load an store
instructions by looking through GEPs and casts and examining underlying
storage. There's room for improvement, but it gets common cases.

Dan

I'm aware of this attribute, but it doesn't help. The underlying problem is that CellSPU does not know how to natively perform byte-level addressing. For example, here's an indexed stack instruction to load register $3:

  ldq $3, 4($sp)

In reality, the "4($sp)" doesn't mean what you think it means in the PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four zero bits to the right of the offset. To get at the 4th byte requires loading from 0($sp) and some vector shuffling. (Dan: Think about older Cray hardware... you'll immediately understand!)

I could try custom lowering loads and stores as an interim step and detect if one of the operands is really a frameindex (or global variable or external variable or ... <insert exhaustive list of edge cases here>) added to some offset. Ultimately, custom lowering GEPs is probably the better idea (if not a lot more work).

If I go ahead and shuffle around some code (no pun intended), would it worth my while to prototype some refactoring of the legalize/promote/custom mess, since I'll have to touch it anyway for custom GEP lowering?

-scooter

Isn’t this just an ISel issue? You have to ISel unaligned load/store’s to more complex code is all. It seems very simple to current targets that support indexed and non-indexed addressing modes. In this case it’s simply that you have to implement the un-indexed modes in terms of a more complex expression based on an indexed load.

I tackled a similar issue in Ageia’s back end in just this way.

Will do a little debugging and investigating and get back to you on this… although I suspect the answer is still going to be custom lowering GEPs. I’d like to be wrong!

-scooter

It looks like I need to be able to intercept GEP lowering (in
SelectionDAGLowering::visitGetElementPtr) and insert something else
other than the shifts and adds. The basic problem is that CellSPU
loads and stores on 16-byte boundaries. Consequently, the SPU backend
has to do the load or store differently than most normal
architectures that have byte-addressable operations.

In TOT, load and store instructions have an alignment attribute
which is
useful for addressing similar needs on other architectures. For
example,
this attribute is used on x86, which also has a bunch of instructions
which require 16-byte alignment. x86 uses it quite late, after
legalize,
and I don't know if that's appropriate for the CellSPU target, but
wherever you're doing the lowering, could you use the load and store
alignment attribute?

I'm aware of this attribute, but it doesn't help. The underlying
problem is that CellSPU does not know how to natively perform byte-
level addressing. For example, here's an indexed stack instruction to
load register $3:

  ldq $3, 4($sp)

In reality, the "4($sp)" doesn't mean what you think it means in the
PPC and x86 worlds: that's 4 x 16 -- load quadword (ldq) appends four
zero bits to the right of the offset. To get at the 4th byte requires
loading from 0($sp) and some vector shuffling. (Dan: Think about
older Cray hardware... you'll immediately understand!)

I agree with Christopher that this is just an unaligned load issue. Consider a risc chip with only a 32-bit load that requires the pointer to be aligned. If you want to do an unaligned load, you'd have to do something like this:

  t1 = load p & ~3
  t2 = load (p+4) &~3
  t3 = merge t1, t2, p & 3

in the altivec world this is a very very common thing to code up. The nice thing about doing this is that the dag combiner can then hack away loads if it discovers that p&3 is zero.

I could try custom lowering loads and stores as an interim step and
detect if one of the operands is really a frameindex (or global
variable or external variable or ... <insert exhaustive list of edge
cases here>) added to some offset. Ultimately, custom lowering GEPs
is probably the better idea (if not a lot more work).

You're really asking about alignment. You can take alignment into consideration when you do this.

The bigger problem that you'll hit is that LSR lowers a lot of getelementptr instructions to explicit ptrtoint + add + inttoptr, so you won't get the GEP expressions in lots of cases.

Better yet, you won't have to do major surgery on the code generator :slight_smile:

-Chris