Hi, I received an internal test case from a game team (it wasn’t about this in particular), and I was wondering if there was maybe an opportunity to canonicalize a particular code pattern:

%inputi = bitcast <4 x float> %input to <4 x i32>

%row0i = and <4 x i32> %inputi, <i32 -1, i32 0, i32 0, i32 0>

%row0 = bitcast <4 x i32> %row0i to <4 x float>

%row1i = and <4 x i32> %inputi, <i32 0, i32 -1, i32 0, i32 0>

%row1 = bitcast <4 x i32> %row1i to <4 x float>

%row2i = and <4 x i32> %inputi, <i32 0, i32 0, i32 -1, i32 0>

%row2 = bitcast <4 x i32> %row2i to <4 x float>

%row3i = and <4 x i32> %inputi, <i32 0, i32 0, i32 0, i32 -1>

%row3 = bitcast <4 x i32> %row3i to <4 x float>

This arises from code which expands a vector of scale factors into the diagonal of a 4x4 diagonal matrix. This code pattern is coming from intrinsics which are explicitly doing the masking like this.

My question is: should we canonicalize this to:

%row0 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 0, i32 4, i32 4, i32 4>

%row1 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 1, i32 4, i32 4>

%row2 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 4, i32 2, i32 4>

%row3 = shufflevector <4 x float> %input, <4 x float> zeroinitializer, <4 x i32> <i32 4, i32 4, i32 4, i32 3>

which seems to better express the intent, or a sequence of insertelement and extract element (which is what we get for the attached code), or leave it as is? (or any better ideas?)

Forgive my naivete if there’s something obvious I’m missing since I haven’t done much w.r.t. vectors in LLVM.

– Sean Silva

diagonalToScalingMatrix.cpp (314 Bytes)