Extending Vector GEP - proposal

Hi,

According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements.

%A = getelementptr <4 x i8*> %ptrs, <4 x i64> %offsets

I propose to lessen this requirement. Let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value.

%A = getelementptr i8* %ptr, <4 x i64> %offsets
or
%A = getelementptr <4 x i8*> %ptrs, i64 %offset

In this case we don’t have to add a “broadcast” before GEP. It is actually will be developer’s decision what form to choose.
I plan to use vector GEP in gather/scatter and the “broadcasting” of the scalar value impedes to narrow this operation to the “common base, multiple indices” form in the future.

What do you think?
Thanks.

  • Elena

I don’t have a strong opinion on this. The current GEP syntax is more restrictive and the single base pointer case can be emulated using a broadcast + vector-gep, that can easily be patten matched at codegen time. The problem with the current syntax is that the ‘broadcast’ instruction can be hoisted outside of loops and this can be a problem with our “one block at a time” codegen implementation. This problem can be solved by sinking the broadcast instruction at codegen-prepare time.

Is there a strong motivation to prefer one representation over the other?

This problem can be solved by sinking the broadcast instruction at codegen-prepare time.

I considered this option. We currently don’t have target specific optimizations in codegen-prepare time. (Or I’m wrong?)

And it will be very X86-directed optimization. Even gather-scatter intrinsics are considered as common for all targets.

And the second reason, why I’d prefer to generate a splat-GEP, is compile-time saving.

I should generate 2 (or more, for each splat element) redundant instructions (broadcast is insert+shuffle), hoist them outside the loop on some stage. Then look for them on CodeGenPreare pass, sink them back and rebuild the CFG.

> This problem can be solved by sinking the broadcast instruction at codegen-prepare time.
I considered this option. We currently don’t have target specific optimizations in codegen-prepare time. (Or I’m wrong?)
And it will be very X86-directed optimization. Even gather-scatter intrinsics are considered as common for all targets.

And the second reason, why I’d prefer to generate a splat-GEP, is compile-time saving.
I should generate 2 (or more, for each splat element) redundant instructions (broadcast is insert+shuffle), hoist them outside the loop on some stage. Then look for them on CodeGenPreare pass, sink them back and rebuild the CFG.

Okay. I think that it’s reasonable to add support for GEP with a single base pointer and a vector of indices.

From: "Nadav Rotem" <nrotem@apple.com>
To: "Elena Demikhovsky" <elena.demikhovsky@intel.com>
Cc: llvmdev@cs.uiuc.edu, "Duncan P. N. Exon Smith"
<dexonsmith@apple.com>, dag@cray.com, "Philip Reames
(listmail@philipreames.com)" <listmail@philipreames.com>, "Hal
Finkel (hfinkel@anl.gov)" <hfinkel@anl.gov>, "Chandler Carruth
(chandlerc@gmail.com)" <chandlerc@gmail.com>
Sent: Tuesday, March 3, 2015 11:38:47 AM
Subject: Re: Extending Vector GEP - proposal

> > This problem can be solved by sinking the broadcast instruction
> > at
> > codegen-prepare time.

> I considered this option. We currently don’t have target specific
> optimizations in codegen-prepare time. (Or I’m wrong?)

> And it will be very X86-directed optimization. Even gather-scatter
> intrinsics are considered as common for all targets.

> And the second reason, why I’d prefer to generate a splat-GEP, is
> compile-time saving.

> I should generate 2 (or more, for each splat element) redundant
> instructions (broadcast is insert+shuffle), hoist them outside the
> loop on some stage. Then look for them on CodeGenPreare pass, sink
> them back and rebuild the CFG.

Okay. I think that it’s reasonable to add support for GEP with a
single base pointer and a vector of indices.

I agree; the splat case, especially when you're indexing into a structure, seems as though it will be very common.

-Hal

"Demikhovsky, Elena" <elena.demikhovsky@intel.com> writes:

I should generate 2 (or more, for each splat element) redundant
instructions (broadcast is insert+shuffle), hoist them outside the
loop on some stage. Then look for them on CodeGenPreare pass, sink
them back and rebuild the CFG.

I agree with Elena. These are common operations and ought to be
directly representable in the IR. Hoisting and sinking have been
constant pain points for us for exactly the reason described. Getting
the sinking right isn't trivial. It's not especially hard but it's
extra work that supporting the operations actually desired in the IR
would eliminate.

                             -David

Nadav Rotem <nrotem@apple.com> writes:

Okay. I think that it’s reasonable to add support for GEP with a
single base pointer and a vector of indices.

We should also support a vector of pointers and a scalar index, I think.

                                       -David

Yes, of course. Any parameter of the vector GEP may be scalar.
GEP that returns vector of pointers should have one or more vector operands. All vector operands should be with the same vector width.
Scalar operand means the splat vector.

- Elena