Authors: Hongbin @zhanghb97, Yingchi @inclyc
Context
VP intrinsic discussion in the RISC-V Vector Dialect RFC
VP intrinsic discussion in the vector masking representation RFC
In recent discussions about the vector abstractions, VP intrinsic is the critical point in the lowering path. However, there is no integration test for VP intrinsic in MLIR. We mainly focus on the RVV side, so we test all the MLIR VP Ops with both fixed and scalable vector types on the RVV backend.
Integration Test
You can find the test cases in our buddy-mlir repo and run the test cases in our web application buddy-caas (Buddy Compiler As A Service). We provide a table to show the test cases and results. In short, the LLC will crash when it legalizes the VPFRemOp
, VPIntToPtrOp
, VPPtrToIntOp
, VPReduceFMulOp
, and VPReduceMulOp
.
PromoteIntegerOperand Op #2: t188: f32 = vp_frem t185, t186, Constant:i1<-1>, Constant:i64<8>
Do not know how to promote this operator's operand!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1629!
- VPPtrToIntOp and VPIntToPtrOp - MLIR Test Case - Web (run the compile box to reproduce the error)
PromoteIntegerOperand Op #0: t231: i64 = vp_inttoptr t229, Constant:i1<-1>, Constant:i64<8>
Do not know how to promote this operator's operand!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1629!
- VPReduceFMulOp - MLIR Test Case - Web (run the compile box to reproduce the error)
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:309:
unsigned int llvm::EVT::getVectorNumElements() const: Assertion `isVector() && "Invalid vector type!"' failed.
- VPReduceMulOp - MLIR Test Case - Web (run the compile box to reproduce the error)
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:309:
unsigned int llvm::EVT::getVectorNumElements() const: Assertion `isVector() && "Invalid vector type!"' failed.
Discussion and Fix
@inclyc Yingchi looked deep into the issues and came to the following conclusions.
- VPReduceFMulOp & VPReduceMulOp
The LLVM IR example:
declare i32 @llvm.vp.reduce.mul.v4i32(i32, <4 x i32>, <4 x i1>, i32)
define signext i32 @vpreduce_mul_v4i32(i32 signext %s, <4 x i32> %v, <4 x i1> %m, i32 zeroext %evl) {
%r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %s, <4 x i32> %v, <4 x i1> %m, i32 %evl)
ret i32 %r
}
There is no mul
reduce instruction in RVV. For this reason, it is mandatory to unroll VP_REDUCE_{F,}MUL
ops in LLVM. Unrolling process of these two VP intrinsics may be shared among backends. We may need to think about which part of LLVM to implement the unrolling of this instruction: ExpandVectorPredicationPass
? or TLI.expandVecReduce
?
The SelectionDAGBuilder
will convert VP_REDUCE_MUL
to VP_REDUCE_AND
if the vector element is i1s. In this scenario specifically, this VP intrinsic will be compiled into RVV vredand.vs
.
declare i1 @llvm.vp.reduce.mul.v4i1(i1, <4 x i1>, <4 x i1>, i32)
define signext i1 @vpreduce_mul_v4i1(i1 signext %s, <4 x i1> %v, <4 x i1> %m, i32 zeroext %evl) {
%r = call i1 @llvm.vp.reduce.mul.v4i1(i1 %s, <4 x i1> %v, <4 x i1> %m, i32 %evl)
ret i1 %r
}
- VPFRemOp
The LLVM IR example:
; ModuleID = 'LLVMDialectModule'
define <8 x float> @vpfrem_v8f32(<8 x float> %v1, <8 x float> %v2, <8 x i1> %m, i32 %evl) {
%ret = call <8 x float> @llvm.vp.frem.v8f32(<8 x float> %v1, <8 x float> %v2, <8 x i1> %m, i32 %evl)
ret <8 x float> %ret
}
declare <8 x float> @llvm.vp.frem.v8f32(<8 x float>, <8 x float>, <8 x i1>, i32)
The LLVM community has discussed this problem in D104327. The frem
node is unsupported due to a lack of available instructions.
For fixed-length vectors we could scalarize but that option is not (currently) available for scalable-vector types. The support is intentionally left out so it equivalent for both vector types.
- VPPtrToIntOp & VPIntToPtrOp
VPPtrToIntOp
and VPIntToPtrOp
were introduced in D122291. Scalar inttoptr
instruction is lowering to zext
/ trunc
in SelectionDAGBuilder
. Introducing similar logic in SelectionDAGBuilder
is a possible solution. Redundant instructions are generated in the DAG Builder, e.g. zext (trunc)
and should be reduced in InstCombine
. Currently we do not have similar logic for VP intrinsics. That is to say, vp.zext
& vp.trunc
may not be reduced/eliminated by such logic.
@inclyc has submitted a candidate patch here!
The LLVM IR example in the patch:
declare <4 x ptr> @llvm.vp.inttoptr.v4p0.v4i32(<4 x i32>, <4 x i1>, i32)
define <4 x ptr> @inttoptr_v4p0_v4i32(<4 x i32> %va, <4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: inttoptr_v4p0_v4i32:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; CHECK-NEXT: vzext.vf2 v10, v8, v0.t
; CHECK-NEXT: vmv.v.v v8, v10
; CHECK-NEXT: ret
%v = call <4 x ptr> @llvm.vp.inttoptr.v4p0.v4i32(<4 x i32> %va, <4 x i1> %m, i32 %evl)
ret <4 x ptr> %v
}