Authors: Hongbin @zhanghb97, Yingchi @inclyc
Context
VP intrinsic discussion in the RISC-V Vector Dialect RFC
VP intrinsic discussion in the vector masking representation RFC
In recent discussions about the vector abstractions, VP intrinsic is the critical point in the lowering path. However, there is no integration test for VP intrinsic in MLIR. We mainly focus on the RVV side, so we test all the MLIR VP Ops with both fixed and scalable vector types on the RVV backend.
Integration Test
You can find the test cases in our buddy-mlir repo and run the test cases in our web application buddy-caas (Buddy Compiler As A Service). We provide a table to show the test cases and results. In short, the LLC will crash when it legalizes the VPFRemOp, VPIntToPtrOp, VPPtrToIntOp, VPReduceFMulOp, and VPReduceMulOp.
PromoteIntegerOperand Op #2: t188: f32 = vp_frem t185, t186, Constant:i1<-1>, Constant:i64<8>
Do not know how to promote this operator's operand!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1629!
- VPPtrToIntOp and VPIntToPtrOp - MLIR Test Case - Web (run the compile box to reproduce the error)
PromoteIntegerOperand Op #0: t231: i64 = vp_inttoptr t229, Constant:i1<-1>, Constant:i64<8>
Do not know how to promote this operator's operand!
UNREACHABLE executed at llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1629!
- VPReduceFMulOp - MLIR Test Case - Web (run the compile box to reproduce the error)
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:309:
unsigned int llvm::EVT::getVectorNumElements() const: Assertion `isVector() && "Invalid vector type!"' failed.
- VPReduceMulOp - MLIR Test Case - Web (run the compile box to reproduce the error)
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:309:
unsigned int llvm::EVT::getVectorNumElements() const: Assertion `isVector() && "Invalid vector type!"' failed.
Discussion and Fix
@inclyc Yingchi looked deep into the issues and came to the following conclusions.
- VPReduceFMulOp & VPReduceMulOp
The LLVM IR example:
declare i32 @llvm.vp.reduce.mul.v4i32(i32, <4 x i32>, <4 x i1>, i32)
define signext i32 @vpreduce_mul_v4i32(i32 signext %s, <4 x i32> %v, <4 x i1> %m, i32 zeroext %evl) {
%r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %s, <4 x i32> %v, <4 x i1> %m, i32 %evl)
ret i32 %r
}
There is no mul reduce instruction in RVV. For this reason, it is mandatory to unroll VP_REDUCE_{F,}MUL ops in LLVM. Unrolling process of these two VP intrinsics may be shared among backends. We may need to think about which part of LLVM to implement the unrolling of this instruction: ExpandVectorPredicationPass ? or TLI.expandVecReduce ?
The SelectionDAGBuilder will convert VP_REDUCE_MUL to VP_REDUCE_AND if the vector element is i1s. In this scenario specifically, this VP intrinsic will be compiled into RVV vredand.vs .
declare i1 @llvm.vp.reduce.mul.v4i1(i1, <4 x i1>, <4 x i1>, i32)
define signext i1 @vpreduce_mul_v4i1(i1 signext %s, <4 x i1> %v, <4 x i1> %m, i32 zeroext %evl) {
%r = call i1 @llvm.vp.reduce.mul.v4i1(i1 %s, <4 x i1> %v, <4 x i1> %m, i32 %evl)
ret i1 %r
}
- VPFRemOp
The LLVM IR example:
; ModuleID = 'LLVMDialectModule'
define <8 x float> @vpfrem_v8f32(<8 x float> %v1, <8 x float> %v2, <8 x i1> %m, i32 %evl) {
%ret = call <8 x float> @llvm.vp.frem.v8f32(<8 x float> %v1, <8 x float> %v2, <8 x i1> %m, i32 %evl)
ret <8 x float> %ret
}
declare <8 x float> @llvm.vp.frem.v8f32(<8 x float>, <8 x float>, <8 x i1>, i32)
The LLVM community has discussed this problem in D104327. The frem node is unsupported due to a lack of available instructions.
For fixed-length vectors we could scalarize but that option is not (currently) available for scalable-vector types. The support is intentionally left out so it equivalent for both vector types.
- VPPtrToIntOp & VPIntToPtrOp
VPPtrToIntOp and VPIntToPtrOp were introduced in D122291. Scalar inttoptr instruction is lowering to zext / trunc in SelectionDAGBuilder . Introducing similar logic in SelectionDAGBuilder is a possible solution. Redundant instructions are generated in the DAG Builder, e.g. zext (trunc) and should be reduced in InstCombine . Currently we do not have similar logic for VP intrinsics. That is to say, vp.zext & vp.trunc may not be reduced/eliminated by such logic.
@inclyc has submitted a candidate patch here!
The LLVM IR example in the patch:
declare <4 x ptr> @llvm.vp.inttoptr.v4p0.v4i32(<4 x i32>, <4 x i1>, i32)
define <4 x ptr> @inttoptr_v4p0_v4i32(<4 x i32> %va, <4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: inttoptr_v4p0_v4i32:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli zero, a0, e64, m2, ta, ma
; CHECK-NEXT: vzext.vf2 v10, v8, v0.t
; CHECK-NEXT: vmv.v.v v8, v10
; CHECK-NEXT: ret
%v = call <4 x ptr> @llvm.vp.inttoptr.v4p0.v4i32(<4 x i32> %va, <4 x i1> %m, i32 %evl)
ret <4 x ptr> %v
}