Hi everyone, I’m an undergraduate student submitting an RFC to the community for the first time.I’m happy to be here to talk to you all.Feel free to comment below if you think there’s a problem, thanks.
Overview
Previously my research was based on RISC-V AI chips.For a matrix multiplication or convolution operation, bias
may be incorporated in the chip.For example, ucb-bar/gemmini: Berkeley’s Spatial Array Generator (github.com).When doing matrix multiplication, you can pass in a nullptr, which means no bias, but when using MLIR for support, you need to pass in a bias even if you don’t have one, because MLIR’s memref dialect doesn’t support nullptr,this causes unnecessary performance loss.The specific reasons are some chips rely on addresses for data transfer.The chip is implemented using the ROCC interface.The instruction set of the chip is an extension of the RISC-V R type.The rs1 and rs2 fields may be an address.
memref.null
Call it that for now.This op is able to get a memref, but the address of this memref is nullptr, and when the address is taken for it, it will get nullptr.Here are the specific tests.
// test.mlir
func.func @main() {
%1 = memref.null : memref<4x4xf32>
%2 = memref.extract_aligned_pointer_as_index %1 : memref<4x4xf32> -> index
vector.print %2 : index
return
}
.// .mlir-opt test.mlir -convert-func-to-llvm -finalize-memref-to-llvm -reconcile-unrealized-casts
module {
llvm.func @main() {
%0 = llvm.mlir.constant(4 : index) : i64
%1 = llvm.mlir.constant(4 : index) : i64
%2 = llvm.mlir.constant(1 : index) : i64
%3 = llvm.mlir.constant(16 : index) : i64
%4 = llvm.mlir.zero : !llvm.ptr
%5 = llvm.getelementptr %4[16] : (!llvm.ptr) -> !llvm.ptr, f32
%6 = llvm.ptrtoint %5 : !llvm.ptr to i64
%7 = llvm.mlir.zero : !llvm.ptr
%8 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%9 = llvm.insertvalue %7, %8[0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%10 = llvm.insertvalue %7, %9[1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%11 = llvm.mlir.constant(0 : index) : i64
%12 = llvm.insertvalue %11, %10[2] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%13 = llvm.insertvalue %0, %12[3, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%14 = llvm.insertvalue %1, %13[3, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%15 = llvm.insertvalue %1, %14[4, 0] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%16 = llvm.insertvalue %2, %15[4, 1] : !llvm.struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>
%17 = llvm.ptrtoint %7 : !llvm.ptr to i64
llvm.return
}
}
advantages
- Enriches the semantics of memref.memref provides semantics in terms of memory, but it is not complete.I think implementing this in memref is modest, not really radical.
- Extended
memref.extract_aligned_pointer_as_index
functionality. - Many companies are now using MLIR to build their AI compilers, with a wide variety of chip implementations, and MLIR is useful as a compiler infrastructure, I think it’s useful to provide nullptr functionality at the memref level, and Gemmini shouldn’t be a special case.I shouldn’t be the only one to benefit.