[RFC] Insert casts to increase vectorization of loads and stores

Hi,

I have a LoadStoreVectorizer PR up for review: #134436. Here’s a brief summary of the PR:

The current LSV functions by first classifying load and store instructions based on the base pointer, address space and scalar bitwidth of the value operand, respectively. Each classified equivalence class is later converted into chains and ultimately vectorized. Motivated by issue #97715, this patch attempts to merge classes by inserting bit or pointer casts, assuming that the key of the classes only differs in scalar bitwidth.

Assume classes A and B are keyed with the same base pointer, address space and both contain load instructions.

class A = { %x = load i32, ptr %p.0 }
class B = { %y = load <2 x i16>, ptr %p.1 }

This patch introduces a bitcast <2 x i16> %y to i32 which enables the merging of the two classes into one and therefore resulting in vectorization of the %x and %y.

This problem becomes even more interesting because the classes disregard the bit representation of the value operands of loads/stores, ie. int vs fp. For instance,

%x = load i32, ptr %p.0
%y = load <2 x half>, ptr %p.1
=> 
%x = load i32, ptr %p.0
%y.cast = load i32, ptr %p.1
%y = bitcast i32 %y.cast to <2 x half>

I would like to spark a discussion on what types should be bitcasted for the purpose of increasing load store vectorization. Any feedback will be appreciated!

Thanks,
Anshil Gandhi

Doesn’t that cause issues if the source value has a poison lane? <2 x i16> tracks poison on a per-lane basis, whereas i32 is either fully poison or not.

In your example at the end, the original program might end up with a y that is partially poison, but in the transformed program a single poison lane would lead to a fully-poison result.

(Cue my usual comment that many parts of LLVM have a strong need for either a byte type or integer types with per-byte provenance and poison.)

I am not sure if checks exist for such cases but they could certainly be added to prevent vectorization of (partially) poison values.

I created a draft PR to introduce casts in LoadStoreVectorizer. A feedback I received in the PR is to introduce casts immediately before vectorization, on chains rather than equivalence classes. The reason, as I understand, is to avoid insertion of casts to unvectorized load/store instructions. However, my concerns with merging chains instead of classes are the following:

  • Instructions are ordered in chains, merging chains would require re-sorting or using a heap-based data structure to preserve contiguity
  • Offsets of the load/store instructions would require recomputing.
  • Chains may need splitting after merge because of MayAlias

Hence, I think merging classes makes more sense. In case dead bitcasts are introduced, they could be removed in the end of the pass.

Any comments regarding this approach will be appreciated, thanks.

Anshil

I don’t think these implementation details are important, and you should avoid the tentative bitcasts. Having the casts or not does not change the problem space; I don’t really understand why deferring this would complicate the problem

1 Like