Code review for gather and scatter intrinsics


I have a code ready, waiting for review here:
I presented this work on LLVM Euro and people are interested in this feature.
Can anybody review this code, please?


  • Elena

Hi Elena,

Sorry for the delay, I'm still catching up with my emails after a long
holiday. :slight_smile:

The only concern to this feature I remember was Chandler's comment
that we should try to encode everything into loads and shuffles.

Correct me if I'm wrong, but on the strided vectorizer thread we have
reached a consensus that indexed intrinsics would be the least
problematic (compared to strided access intrinsics or plain
load+shuffle) because they're restricted to load/store of patterns
that could later be lowered to shuffles, if the hardware doesn't
support it, but they'd also keep the pattern intact even after other
optimization passes have gone through the same code.


As I said at EuroLLVM, the fear is that we'd get the pattern destroyed
and lose the ability of using masked / strided access at all. However,
I'm haven't looked at great depth at this problem to know what are the
cases and why they could break. Maybe Elena or James could help with


Hi Renato,

I fully agree with you, but indexed load and store is the next step.
I'm asking to review gather and scatter code.


- Elena

I see. I probably got it wrong, then.

I was under the impression that we wouldn't do stride.load/store or
gather/scatter directly, but implement both of them via
index.load/store and differentiate gather/scatter with the argument as
a vector of pointers, not a pointer of vectors.

I also thought that we'd use the mask to complement the indexed load,
which would in turn help the vectorizer to create prologues/epilogues
and the back end to generate the best instruction.


We had a long discussion in the dev list and I suggested to implement
masked.index.load(%base, %<vector of indices>, %scale, %mask), but people said that it is Intel-wise form.

So we decided to go for gather/scatter -> masked.gather(<vector of pointers>, %mask) - common case of random memory access. For A[B[i]].
The first patch for Gather/Scatter is already submitted.

As the next step, that good for Intel and for ARM we see this form:

masked.index.load(%ptr, <vector of const-indices>, %mask), but unlike the first proposal, the index is not a random variable, but a vector of compile time constants.
ARM does not need masks, but it can be "all-ones".

I suppose, that Hao, from ARM will go forward with index.load, otherwise I'll take it when gather/scatter will be completed.

- Elena

That's my understanding, yes. I'll let Hao review the patch, then, to
make sure the semantics are correct on both sides.

The only comment I had in your patch is to change LangRef adding the
new intrinsics, so it doesn't get lost.