Canonicalizing vector masking.

Hi, I received an internal test case from a game team (it wasn’t about this in particular), and I was wondering if there was maybe an opportunity to canonicalize a particular code pattern:

%inputi = bitcast <4 x float> %input to <4 x i32>

%row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0>
%row0 = bitcast <4 x i32> %row0i to <4 x float>

%row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0>
%row1 = bitcast <4 x i32> %row1i to <4 x float>

%row2i = and <4 x i32> %inputi, <i32 0, i32 0, i32 -1, i32 0>
%row2 = bitcast <4 x i32> %row2i to <4 x float>

%row3i = and <4 x i32> %inputi, <i32 0, i32 0, i32 0, i32 -1>
%row3 = bitcast <4 x i32> %row3i to <4 x float>

This arises from code which expands a vector of scale factors into the diagonal of a 4x4 diagonal matrix. This code pattern is coming from intrinsics which are explicitly doing the masking like this.

My question is: should we canonicalize this to:

%row0 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 4>
%row1 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 1, i32 4, i32 4>
%row2 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 4, i32 2, i32 4>
%row3 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 4, i32 4, i32 3>

which seems to better express the intent, or a sequence of insertelement and extract element (which is what we get for the attached code), or leave it as is? (or any better ideas?)

Forgive my naivete if there’s something obvious I’m missing since I haven’t done much w.r.t. vectors in LLVM.

– Sean Silva

diagonalToScalingMatrix.cpp (314 Bytes)

shufflevector does look more canonical. In the past I think we avoided
creating shufflevector for fear of producing bad code in CodeGen, but
I think Chandler just fixed that :slight_smile:

Cheers,
Rafael

I think that the pattern below should be canonicalized into a vector ’select’ instruction with a constant mask. I think that we already have code for canonicalizing select-like shuffles into selects.

Hi, I received an internal test case from a game team (it wasn’t about this
in particular), and I was wondering if there was maybe an opportunity to
canonicalize a particular code pattern:

%inputi = bitcast <4 x float> %input to <4 x i32>

%row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0>
%row0 = bitcast <4 x i32> %row0i to <4 x float>

%row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0>
%row1 = bitcast <4 x i32> %row1i to <4 x float>

%row2i = and <4 x i32> %inputi, <i32 0, i32 0, i32 -1, i32 0>
%row2 = bitcast <4 x i32> %row2i to <4 x float>

%row3i = and <4 x i32> %inputi, <i32 0, i32 0, i32 0, i32 -1>
%row3 = bitcast <4 x i32> %row3i to <4 x float>

This arises from code which expands a vector of scale factors into the
diagonal of a 4x4 diagonal matrix. This code pattern is coming from
intrinsics which are explicitly doing the masking like this.

My question is: should we canonicalize this to:

%row0 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 0, i32 4, i32 4, i32 4>
%row1 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 1, i32 4, i32 4>
%row2 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 4, i32 2, i32 4>
%row3 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 4, i32 4, i32 3>

I think that there is a bug in the shuffle pattern. It should be <i32 4, i32 5, i32 6, i32 3>.

which seems to better express the intent, or a sequence of insertelement and
extract element (which is what we get for the attached code), or leave it as
is? (or any better ideas?)

Forgive my naivete if there’s something obvious I’m missing since I haven’t
done much w.r.t. vectors in LLVM.

shufflevector does look more canonical. In the past I think we avoided
creating shufflevector for fear of producing bad code in CodeGen, but
I think Chandler just fixed that :slight_smile:

Excellent!

It should be canonicalized to select in the *IR* but absolutely do not
canonicalize to VSELECT in the SelectionDAG. The VSELECT node is
significantly harder to analyze in the backend than the VECTOR_SHUFFLE node.

I think that the pattern below should be canonicalized into a vector
’select’ instruction with a constant mask. I think that we already have
code for canonicalizing select-like shuffles into selects.

Hi, I received an internal test case from a game team (it wasn't about this
in particular), and I was wondering if there was maybe an opportunity to
canonicalize a particular code pattern:

%inputi = bitcast <4 x float> %input to <4 x i32>

%row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0>
%row0 = bitcast <4 x i32> %row0i to <4 x float>

%row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0>
%row1 = bitcast <4 x i32> %row1i to <4 x float>

%row2i = and <4 x i32> %inputi, <i32 0, i32 0, i32 -1, i32 0>
%row2 = bitcast <4 x i32> %row2i to <4 x float>

%row3i = and <4 x i32> %inputi, <i32 0, i32 0, i32 0, i32 -1>
%row3 = bitcast <4 x i32> %row3i to <4 x float>

This arises from code which expands a vector of scale factors into the
diagonal of a 4x4 diagonal matrix. This code pattern is coming from
intrinsics which are explicitly doing the masking like this.

My question is: should we canonicalize this to:

%row0 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 0, i32 4, i32 4, i32 4>
%row1 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 1, i32 4, i32 4>
%row2 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 4, i32 2, i32 4>
%row3 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4
x i32> <i32 4, i32 4, i32 4, i32 3>

I think that there is a bug in the shuffle pattern. It should be <i32 4,
i32 5, i32 6, i32 3>.

Aren't 4, 5, and 6 all just elements of the zeroinitializer? I just used
4,4,4 which should be the same semantically.

FWIW, it is trying to take

input = {x,y,z,w}

and output

row0 = {x,0,0,0}
row1 = {0,y,0,0}
row2 = {0,0,z,0}
row3 = {0,0,0,w}

-- Sean Silva