SiFive VCIX (Xsfvcp) is a RISC-V extension that allows to easily add own vector instructions and/or interact with own co-processor throughout special instructions
Motivation
The extension has been supported by Clang and LLVM IR for a while so that C/C++ users can efficiently utilize a co-processor which support Xsfvcp extension , but not MLIR users.
The purpose of the RFC is to add VCIX Dialect to MLIR so that users can use VCIX-compatible co-processor.
Proposal
The PR implements VCIX dialect for entire set of VCIX instructions
Mnemonic funct6 vm rs2 rs1 funct3 rd Destination Sources
sf.vc.x 0000-- 1 ----- xs1 100 ----- none scalar xs1
sf.vc.i 0000-- 1 ----- simm 011 ----- none simm[4:0]
sf.vc.v.x 0000-- 0 ----- xs1 100 vd vector vd scalar xs1
sf.vc.v.i 0000-- 0 ----- simm 011 vd vector vd simm[4:0]
sf.vc.v.vv 0010-- 1 vs2 vs1 000 ----- none vector vs1, vector vs2
sf.vc.v.xv 0010-- 1 vs2 xs1 100 ----- none scalar xs1, vector vs2
sf.vc.v.iv 0010-- 1 vs2 simm 011 ----- none simm[4:0], vector vs2
sf.vc.v.fv 0010-- 1 vs2 fs1 101 ----- none scalar fs1, vector vs2
sf.vc.v.vv 0010-- 0 vs2 vs1 000 vd vector vd vector vs1, vector vs2
sf.vc.v.xv 0010-- 0 vs2 xs1 100 vd vector vd scalar xs1, vector vs2
sf.vc.v.iv 0010-- 0 vs2 simm 011 vd vector vd simm[4:0], vector vs2
sf.vc.v.fv 0010-- 0 vs2 fs1 101 vd vector vd scalar fs1, vector vs2
sf.vc.vvv 1010-- 1 vs2 vs1 000 vd none vector vs1, vector vs2, vector vd
sf.vc.xvv 1010-- 1 vs2 xs1 100 vd none scalar xs1, vector vs2, vector vd
sf.vc.ivv 1010-- 1 vs2 simm 011 vd none simm[4:0], vector vs2, vector vd
sf.vc.fvv 10101- 1 vs2 fs1 101 vd none scalar fs1, vector vs2, vector vd
sf.vc.v.vvv 1010-- 0 vs2 vs1 000 vd vector vd vector vs1, vector vs2, vector vd
sf.vc.v.xvv 1010-- 0 vs2 xs1 100 vd vector vd scalar xs1, vector vs2, vector vd
sf.vc.v.ivv 1010-- 0 vs2 simm 011 vd vector vd simm[4:0], vector vs2, vector vd
sf.vc.v.fvv 10101- 0 vs2 fs1 101 vd vector vd scalar fs1, vector vs2, vector vd
The VCIX dialect consists of unary
, binary
, ternary
, wide.ternary
operations and their read-only (i.e. that don’t have destination vector register) variants. For example sv.vc.v.vv
instruction will be represented as
%0 = vcix.binary %const, %op2, %rvl { opcode = 3 : i2 } : (i5, vector<[4] x f32>, ui32) -> vector<[4] x f32>
The operations of the VCIX dialect accept fixed or scalable vectors when RVV encoding is possible.
The PR also implements conversion of VCIX dialect to LLVM IR. Since the conversion requires correct bit-width for VL parameter, which is determined by target, RV64 is assumed by default. If user wants to convert for RV32 target, the function attribute vcix.target_features=”+32bit” must be set.
Use in MLIR ecosystem
Since the dialect does only operate either on scalable or on fixed vector type, thus the dialect cannot be used by high-level dialects that operate on Tensor or MemRef, such as TOSA, StableHLO, ONNX etc.
Example
The following simple example is used to demonstrate the possible conversion of math.exp with a fixed and scalable vtypes to VCIX operation
func.func @exp(%arg0: vector<32xf32>) -> vector<32xf32> {
%0 = math.exp %arg0 : vector<32xf32>
return %0 : vector<32xf32>
}
func.func @exp_scalable(%arg0: vector<[16]xf32>, %arg1: ui32) -> vector<[16]xf32> {
%0 = math.exp %arg0 : vector<32xf32>
return %0 : vector<32xf32>
}
After conversion to VCIX dialect
func.func @exp(%arg0: vector<32xf32>) -> vector<32xf32> {
%const = arith.constant 1 : i5
%0 = vcix.binary %const, %arg0 {opcode = 1 : i2, rs2 = 0 : i5} : (i5, vector<32xf32>) -> vector<32xf32>
return %0 : vector<32xf32>
}
func.func @exp_scalable(%arg0: vector<[16]xf32>, %arg1: ui32) -> vector<[16]xf32> {
%const = arith.constant 1 : i5
%0 = vcix.binary %const, %arg0, %arg1 {opcode = 1 : i2, rs2 = 0 : i5} : (i5, vector<[16]xf32>, ui32) -> vector<[16]xf32>
return %0 : vector<[16]xf32>
}
Compiling this down to machine code with Zvl256b
extension enabled produces:
…
li a0, 32
li a1, 1
vsetvli zero, a0, e32, m4, ta, ma
sf.vc.v.xv 1, v8, v8, a1
…
…
slli a0, a0, 32
srli a0, a0, 32
li a1, 1
vsetvli zero, a0, e32, m8, ta, ma
sf.vc.v.xv 1, v8, v8, a1
NOTE: Ideally, the operations should use RVV Dialect instead of scalable vectors to minimize verification and conversion logic.