Hi All,
I have read the links from Joel. It seems one of its main focus is vectorization of loop with vector predicate register. I am not sure we need the scalable vector type for it. Let’s see a simple example from the white paper.
1 void example01(int *restrict a, const int *b, const int *c, long N)
2 {
3 long i;
4 for (i = 0; i < N; ++i)
5 a[i] = b[i] + c[i];
6 }
We could imagine roughly the vectorized loop with mask on IR level as below.
header:
%n.broadcast.splatinsert = insertelement <8 x i32> undef, i32 %n, i32 0
%n.vec = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> undef, <8 x i32> zeroinitializer
br label %loop.body
loop.body:
%index = phi i32 [ 0, %header ], [ %index.next, %loop.body ]
%mask.vec = phi <8 x i1> [ <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, %header ], [ %mask.vec.next, %loop.body ]
%a.addr = getelementptr inbounds i32, i32* %a, i32 %index
%b.addr = getelementptr inbounds i32, i32* %b, i32 %index
%c.addr = getelementptr inbounds i32, i32* %c, i32 %index
%b.val = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* %b.addr, i32 4, <8 x i1> %mask.vec, <8 x i32> undef)
%c.val = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* %c.addr, i32 4, <8 x i1> %mask.vec, <8 x i32> undef)
%a.val = add <8 x i32> %b.val, %c.val
call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> %a.val, <8 x i32>* %a.addr, i32 4, <8 x i1> %mask.vec)
%index.broadcast.splatinsert = insertelement <8 x i32> undef, i32 %index, i32 0
%index.vec = shufflevector <8 x i32> %index.broadcast.splatinsert, <8 x i32> undef, <8 x i32> zeroinitializer
%index.next.vec = add <8 x i32> index.vec, <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%lane.cond.vec = icmp lt <8 x i32> %index.next.vec, %n.vec
%mask.vec.next = and <8 x ii> %lane.cond.vec, %mask.vec
%index.next = add i32 index, 8
%cond = icmp eq i64 %index.next, %n
br i1 %cond, label %loop.exit, label %loop.body
loop.exit:
Above vectorized loop does not need tail loop. I guess we could map the %mask.vec to predicate register as native register class on ISelLowering level. The conditional branch could also be mapped to ‘whilexx’ and 'b.xxx on MIR level. In order to get vector type, we could calculate cost model for target as llvm’s vectorizers. If SVE focuses on loop vectorization mainly, I am not sure why the scalarable vector type is needed… From my personal opinion, the VLA programming model could add ambiquity and complexity to compiler because it is not concrete type at compile time… I am not expert for SVE and VLA. I could miss something important. If I missed something, please let me know.
Thanks,
JinGu Kang