[RFC] Add RISC-V Vector Extension (RVV) Dialect

zhanghb97 · August 23, 2021, 10:42am

Hi,

I am writing to propose a RISC-V Vector extension (RVV) dialect. The RISC-V vector extension v1.0 candidate has been released. Currently, LLVM supports the stable release v0.10. RVV is rapidly emerging, I think applications and optimizations will benefit from its features, but RVV is absent in MLIR architectural-specific vector dialects now. In MLIR, there are two types of vector-related dialects:

Virtual Vector level/General vector dialect: Vector Dialect
Hardware Vector level/Architectural-specific dialects vector dialect: amx Dialect, x86-vector Dialect, arm-neon Dialect, and arm-sve Dialect.

This RFC proposes the initial RVV Dialect. Fortunately, the SVE dialect has explored scalable vector types and operations, allowing me to refer and simplify my implementation on the RVV side.

Motivation and Goal

RVV is the vector instruction set with scalable vector types, and it is designed for vector architecture. Unlike SIMD, RVV can group multiple registers to have a scalable vector length. It can also hide the length of the physical vector register and allow us to set the vector length we are operating on. These features can help us to avoid many disadvantages of SIMD:

SIMD needs to add new instructions to support longer vector registers. RVV instructions are not bound to vector registers length.
SIMD requires more effort than RVV to deal with the tails because of its fixed vector length.
SIMD has more power consumption for fetching and decoding than RVV because SIMD needs more instructions to deal with the long vector.

RVV thus can do better than SIMD in some tasks, such as machine learning, multimedia, etc. I propose the RVV dialect to expose the vector processing features to MLIR, which allows the applications and compilers to have more optimization options and methods.

RVV Dialect First Patch

I have completed the RFC patch,which includes:

RVV Dialect Definition
- RVV Scalable Vector Type
- RVV Operations
- RVV Intrinsic Operations
Translation from RVV Dialect to LLVM Dialect

1. RVV Dialect

(1) RVV Scalable Vector Type

Before introducing the scalable type, let’s see some basic concepts for RVV.

VLEN: the number of bits in a single vector register.

ELEN: the maximum size of a vector element that any operation can produce or consume in bits.

SEW: dynamically selected element width.

LMUL: the vector length multiplier represents the number of vector registers combined to form a vector register group.

The mapping relationship between the RVV and LLVM types can be seen here. The following is the mapping table.

	MF8 LMUL=1/8	MF4 LMUL=1/4	MF2 LMUL=1/2	M1 LMUL=1	M2 LMUL=2	M4 LMUL=4	M8 LMUL=8
i64 SEW=64	N/A	N/A	N/A	nxv1i64	nxv2i64	nxv4i64	nxv8i64
i32 SEW=32	N/A	N/A	nxv1i32	nxv2i32	nxv4i32	nxv8i32	nxv16i32
i16 SEW=16	N/A	nxv1i16	nxv2i16	nxv4i16	nxv8i16	nxv16i16	nxv32i16
i8 SEW=8	nxv1i8	nxv2i8	nxv4i8	nxv8i8	nxv16i8	nxv32i8	nxv64i8
double SEW=64	N/A	N/A	N/A	nxv1f64	nxv2f64	nxv4f64	nxv8f64
float SEW=32	N/A	N/A	nxv1f32	nxv2f32	nxv4f32	nxv8f32	nxv16f32
half SEW=16	N/A	nxv1f16	nxv2f16	nxv4f16	nxv8f16	nxv16f16	nxv32f16

Therefore, we can infer the number of register groups and the data type from the LLVM scalable vector type. Similarly, we also need the scalable vector type in MLIR. The SVE dialect currently has the scalable vector type, but it is the dialect-specific version. I thus define an RVV scalable vector type with the same method as the SVE side. The standard and scalable types share the same syntax but have different semantics.

For example, if we want four vector register to be a group to deal with i32 element type, we can use the following type.

!rvv.vector<8xi32>

Corresponding Type in LLVM Dialect:

!llvm.vec<? x 8 x i32>

Corresponding Type in LLVM IR:

<vscale x 8 x i32>

(2) Operations in RVV Dialect

The operations in RVV dialect can be divided into two categories:

RVV Operations: interoperate with higher-level abstractions.
RVV Intrinsic Operations: interoperate with LLVM IR and intrinsic.

In the RFC patch, I define the basic arithmetic and memory accessing operations for the integer types. Those arithmetic operations can work with mask and support vector-scalar form, which means we can operate a vector with a scalar under a mask. The following table shows all the operations in my initial version.

RVV Operations	RVV Intrinsic Operations
rvv.load	rvv.intr.vle
rvv.store	rvv.intr.vse
rvv.add	rvv.intr.vadd
rvv.sub	rvv.intr.vsub
rvv.mul	rvv.intr.vmul
rvv.div	rvv.intr.vdiv
rvv.masked.add	rvv.intr.vadd_mask
rvv.masked.sub	rvv.intr.vsub_mask
rvv.masked.mul	rvv.intr.vmul_mask
rvv.masked.div	rvv.intr.vdiv_mask

2. Lowering Path

There are two steps to lower the RVV operations to LLVM IR:

RVV operations to RVV intrinsic operations: as for the basic arithmetic operations, the conversion is a one-to-one lowering; as for memory access operations, the conversion should add some additional operations to convert memref to the pointer type.
RVV intrinsic operation to LLVM IR: RVV intrinsic operations sit at the same abstraction level with LLVM dialect operations. In the definition, the RVV intrinsic operations are one-to-one binding with LLVM IR intrinsic, so the translation is a naturally one-to-one mapping.

Here I show the lowering path of load operation and add operation.

RVV Load Operation

%0 = rvv.load %m[%c0], %vl : memref<?xi64>, !rvv.vector<4xi64>, i64

RVV Load Intrinsic Operation

%1 = llvm.extractvalue %arg1[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
%2 = llvm.getelementptr %1[%0] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
%3 = llvm.bitcast %2 : !llvm.ptr<i64> to !llvm.ptr<vec<? x 4 x i64>>
%4 = "rvv.intr.vle"(%3, %arg2) : (!llvm.ptr<vec<? x 4 x i64>>, i64) -> !llvm.vec<? x 4 x i64>

LLVM IR Load Intrinsic

%4 = extractvalue { i64*, i64*, i64, [1 x i64], [1 x i64] } %1, 1
%5 = getelementptr i64, i64* %4, i64 0
%6 = bitcast i64* %5 to <vscale x 4 x i64>*
%7 = call <vscale x 4 x i64> @llvm.riscv.vle.nxv4i64.i64(<vscale x 4 x i64>* %6, i64 %2)

RVV Addition Operation

%0 = rvv.add %a, %b, %vl : !rvv.vector<4xi64>, !rvv.vector<4xi64>, i64

RVV Addition Intrinsic Operation

%0 = "rvv.intr.vadd"(%arg0, %arg1, %arg3) : (!llvm.vec<? x 4 x i64>, !llvm.vec<? x 4 x i64>, i64) -> !llvm.vec<? x 4 x i64>

LLVM IR Addition Intrinsic

%5 = call <vscale x 4 x i64> @llvm.riscv.vadd.nxv4i64.nxv4i64.i64(<vscale x 4 x i64> %0, <vscale x 4 x i64> %1, i64 %3)

The specific tools and commands used on the lowering path can be seen in the next section. How RVV dialect interoperates with higher-level dialects needs to be explored in the future, especially considering scalable vector types.

An Example

To demonstrate an executable version, I prepared an example (including mask, mixed-precision, and vector-scalar form). I define an MLIR function to perform an RVV addition operation and call the function in a CPP program to execute it.

func @vadd(%in1: memref<?xi64>, %in2: i32, %out: memref<?xi64>, %maskedoff: memref<?xi64>, %mask: memref<?xi1>) {
  %c0 = constant 0 : index
  %vl = constant 6 : i64
  %input1 = rvv.load %in1[%c0], %vl : memref<?xi64>, !rvv.vector<4xi64>, i64
  %off = rvv.load %maskedoff[%c0], %vl : memref<?xi64>, !rvv.vector<4xi64>, i64
  %msk = rvv.load %mask[%c0], %vl : memref<?xi1>, !rvv.vector<4xi1>, i64
  %output = rvv.masked.add %off, %input1, %in2, %msk, %vl: !rvv.vector<4xi64>, i32, !rvv.vector<4xi1>, i64
  rvv.store %output, %out[%c0], %vl : !rvv.vector<4xi64>, memref<?xi64>, i64
  return
}

The CPP program can be found here. Now we start the journey.

Lowering to LLVM Dialect with MLIR Tools

$ <mlir-opt> <mlir file> -convert-vector-to-llvm="enable-rvv" -convert-scf-to-std -convert-memref-to-llvm -convert-std-to-llvm='emit-c-wrappers=1' | <mlir-translate> -mlir-to-llvmir -o <llvm file>

Translate to LLVM IR and Generate Object File with LLVM Tools

$ <llc> -mtriple riscv64 -target-abi lp64d -mattr=+m,+d,+experimental-v <llvm file> --filetype=obj -o <object file>

Compile and Link with RISC-V GNU Compiler Toolchain

$ <riscv64-unknown-linux-gnu-g++>  -mabi=lp64d <C++ file> <object file> -o <executable file>

Run and Simulate with QEMU

Note that the QEMU should build from source code ( rvv-intrinsic branch of RISC-V GNU compiler toolchain ).

$ <qemu-riscv64> -L <sysroot path> -cpu rv64,x-v=true <executable file>

Then you can get the result. According to the mask, the first and last one is the result of adding a scalar to the vector, and the middle four are from the masked off register.

[ 7 99 99 99 99 17 ]

Future Work

I only prepare the basic arithmetic and memory accessing operations in this RFC to express the main idea. In the future, the main exploration direction is how to make the RVV dialect benefit the higher-level dialects and workloads. There will be a project to explore how to improve convolution with RVV dialect. We will add more RVV operations for our optimization algorithm, and I hope my group can have discoveries and improve the RVV dialect.

I am looking forward to receiving comments and suggestions.

Thanks!
Hongbin

Joejiong · August 24, 2021, 9:04am

question? should be 8 instead 4? which is different from the following example. “!rvv.vector<8xi32>”

zhanghb97 · August 24, 2021, 11:14am

You can find the mapping relationship in the table above.

The nxv8i32 means LMUL=4, SEW=32.

ftynse · August 24, 2021, 11:49am

Thanks for the proposal! I have questions on the direction and overall design, haven’t looked into details yet.

From the description, it appears that the main intention is to experiment with the new instruction set. Who are the intended users of this dialect upstream? What are the maintenance expectations? (see also the guidelines on contributing a new dialect in Developer Guide - MLIR) What I am really getting at is what happens to this dialect after your mentoring project is complete.
Would it make more sense to define a common “scalable vector” type as either builtin or vector dialect type instead of creating yet another one? It’s kind of hard to grasp the semantics of the RVV vector type without reading the spec, so there may be a good justification for having a separate type because of different semantics.

Joejiong · August 24, 2021, 12:47pm

I see it’s the grouped register
thanks

aartbik · August 24, 2021, 4:27pm

Thanks for the RFC! I have questions along the same lines as @ftynse.

One of the design objectives of the vector dialect in MLIR is to provide an architectural-neutral way of passing SIMD code to an architectural-specific backend in LLVM. We only add architectural-specific vector dialects if we feel the instructions are too idiomatic to fit this design (like AMX), but even there the long goal is to find ways to move the design back into this architectural-neutral approach. We always have to be careful that we don’t turn MLIR into a glorified assembler.

Glancing over your initial revision, it looks like you introduce typical “vanilla” vector instructions but with the distinguishing factor of introducing scalable vectors, like we have in SVE. So, just curious, would it make sense to try to design such scalability back into the vector dialect, rather than introducing new dialects for each of these?

Second, I see you added some check tests, e.g. for round-tripping (thank you for that). But, dialects like AMX also have integration tests (running on the emulator) to make sure the MLIR ops map to instructions with the right semantics. Do you have a similar testing plan for RVV?

zhanghb97 · August 25, 2021, 5:49am

Sorry for missing some components. I will add these in the answers below.

Developers and users of MLIR core

For the MLIR core dialects, the RVV dialect serves higher-level abstractions. When the target platform is RISC-V with vector extension, the lowering and optimization pass developer can use the RVV dialect to get more architecture-specific features, such as scalable vector length, mixed-precision, vector-scalar form, etc.

Custom compiler developer for RISC-V extensions

RISC-V is as scalable as MLIR, and it allows users to design their extensions. As a high-performance extension, RVV can be used as the cornerstone for many extensions. For example, people may want to create a RISC-V machine learning extension based on RVV. These users can use MLIR as the third-party infrastructure and develop their own compilation stack based on the RVV dialect.

For the support of MLIR in mainstream SIMD/Vector architecture, the RVV dialect is an important piece of the puzzle. For the RISC-V community, I think MLIR is the best compiler infrastructure to support RISC-V’s customizable and extensible features. Ideally, the MLIR and RISC-V communities can work together to maintain and contribute to the RVV dialect. Currently, my lab, ISCAS (The Institute of Software, Chinese Academy of Sciences)/PLCT, plans to work on this. We have experience in RISC-V compilers, runtimes, and emulators. We have a long-term plan to contribute to the RVV dialect, and we can also attract some developers and make some contributions through the RVI-LFX project.

Yes, I thought about the general scalable vector type, and I’d like to discuss with the SVE side and ask for suggestions on where to define the scalable vector type is better.

Technically, we don’t need to separate types. I think we can use the same scalable vector type but design a different type conversion. For example, if we have a general scalable vector type, and we want four vector registers to be a group to deal with i32 element type, we can use following type for RVV operations.

vector<?x4xi32>

When it lowers to RVV intrinsic operations, we perform a type conversion according to the mapping relationship.

!llvm.vec<? x 8 x i32>

In this way, we provide a better semantic for RVV dialect users by hiding the type mapping.

zhanghb97 · August 25, 2021, 5:51am

Thanks for informing this. It is a good vision to support scalable vectors in vector dialect, but different architectures have different designs. For example, RVV can use vector-scalar form and set vector length in the intrinsic. As far as I know, SVE does not currently support these features (Please correct me if my understanding is wrong). So it is challenging to cove all vector architecture with the same abstraction, but we can discuss and explore how to do this.

Yes, integration tests are necessary to make sure it can work well. I will try to add this to the current patch.

zhanghb97 · August 27, 2021, 1:42pm

When I tried to add integration tests, I had some problems.

lli can only interpret for the host architecture, I thus need to execute the lli with QEMU to run the testing code, just like the integration tests for AMX dialect. My host machine is X86, so I built the MLIR to X86 for the first time, and then I cross-compiled the MLIR to RISC-V with the following commands:

$ cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS="mlir" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD=RISCV \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DCMAKE_C_COMPILER=path/to/riscv64-unknown-linux-gnu-gcc \
   -DCMAKE_CXX_COMPILER=path/to/riscv64-unknown-linux-gnu-g++ \
   -DMLIR_TABLEGEN=path/to/mlir-tblgen \
   -DLLVM_TABLEGEN=path/to/llvm-tblgen \
   -DMLIR_LINALG_ODS_GEN=path/to/mlir-linalg-ods-gen \
   -DMLIR_LINALG_ODS_YAML_GEN=path/to/mlir-linalg-ods-yaml-gen
$ ninja

Then I used lli to run an LLVM IR file in QEMU, but got the error:

lli: Unable to find target for this triple (no targets are registered)

I printed the version information and found there was only the default target, no registered targets.

$ <qemu-riscv64> -L <sysroot> lli --version
LLVM (http://llvm.org/):  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: (unknown)

I can’t figure out how to solve the problem, could anyone give me some hints? And is the way of integration test (translate test code to LLVM IR, run the LLVM IR with cross-compiled lli in QEMU) correct?

aartbik · August 27, 2021, 8:07pm

I am not the best person to give advise here, since my expertise is only with Intel targets, but just comparing the AMX setup I have been using with your attempt, not having a host CPU in your case seems a bit suspect (but I am using Intel emulator, not QEMU).

./lli --version
LLVM (http://llvm.org/):
  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake-avx512

sde -spr -- ./lli --version
LLVM (http://llvm.org/):
  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: sapphirerapids

zhanghb97 · August 28, 2021, 12:42am

Thanks for the information!

Yes, it’s weird. It seems that the CPU is not detected in QEMU. If I run the lli on the X86 side, it can show that the host CPU is cascadelake.

LLVM (http://llvm.org/):
  LLVM version 14.0.0git
  Optimized build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: cascadelake

Apart from that, only the default target can be detected. With setting the -DLLVM_TARGETS_TO_BUILD="X86;RISCV", I would expect there are registered targets:

Registered Targets:
    riscv32 - 32-bit RISC-V
    riscv64 - 64-bit RISC-V

But these targets never appeared, now I’m not sure if these registered targets are set by the LLVM_TARGETS_TO_BUILD.

Joejiong · August 28, 2021, 9:20am

try to build newlib vesion of the riscv-tool-chain
change “riscv64-unknown-linux-gnu-xxxxx” → “riscv64-unknown-elf-gnu-xxxxx”
checkout this one with newlib:

cmake -G Ninja -DCMAKE_BUILD_TYPE="Debug" \
  -DBUILD_SHARED_LIBS=True -DLLVM_USE_SPLIT_DWARF=True \
  -DLLVM_OPTIMIZED_TABLEGEN=True \
  -DLLVM_BUILD_TESTS=True \
  -DDEFAULT_SYSROOT="/path/to/riscv-gcc-install-path/riscv32-unknown-elf" \
  -DGCC_INSTALL_PREFIX="/path/to/riscv-gcc-install-path" \
  -DLLVM_DEFAULT_TARGET_TRIPLE="riscv32-unknown-elf" \
  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="RISCV" ../
cmake --build .

or with this flag

 -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="RISCV"

zhanghb97 · August 28, 2021, 11:48am

Thanks! I will have a try. But before this, I have a question about this toolchain. As far as I know, the newlib version is used for the embedded systems, so it seems unreasonable to use this one. If you make it work, could you figure out why the newlib version toolchain can work, but the linux version cannot.

Joejiong · August 29, 2021, 6:57am

Sorry,
The low-riscv’s solution doesn’t work either, but llvm/tool/lli is related to llvm/jit. It seems that the jitlink for risc-v is still under development (https://reviews.llvm.org/rG0ad562b48bfd7a487eb38ceaa309cdd9a1ad87e7). And after checking the building system currently llvm does not support jit_target for risc-v. llvm/CMakeList.txt. llvm/tool/lli/CMakeLists.txt

And here’s the commit (https://github.com/llvm/llvm-project/commit/465f93645f596081310db5e1a7f658ab81f2473c)

zhanghb97 · September 14, 2021, 8:53am

Update

Naming (RVV → RISCVV)
Scalable Vector Type (Common Scalable Vector Type + Separate RISC-V Scalable Vector Type)

The first RISCVV dialect patch is ready for review now.

Naming

@River707 gives a good suggestion (in the previous review message) for the naming. “RVV” seems too general as the dialect name. RISC-V is an extensible ISA, so the idea is to combine the prefix of the architecture name (RISCV/riscv) and the abbreviation of the extension name (V/v). Now I use “RISCVV” and “riscvv” to name files, functions, namespaces, dialect, etc.

Scalable Vector Type

The previous introduction shows a mapping relationship between the LLVM IR vector type and the RVV vector features (LMUL and SEW). In the initial version, I used the same strategy with the LLVM IR vector type. Although it is easy to implement, it provides bad semantics to higher-level dialects. Like @ftynse said that a separate vector type is better.

After further thinking, I implement a RISCVV-specific vector type like this.

!riscvv.vector<!riscvv.m4,i32>

!riscvv.m4 means LMUL = 4, i32 means SEW = 32. The RISCVV vector type depends on LMUL and SEW settings, so the main idea is to expose these two settings on the vector type. After this, it needs a type mapping process to make sure that the semantics are lowered correctly. Specifically, there are the following changes:

Lift Scalable Vector Type to Vector Dialect

The previous version redefined the scalable vector type, which is the same as the SVE side. @ftynse and @aartbik give the suggestion (in the previous review message) that lifts the definition to a proper place.

Now I lift part of the definition (the scalable vector type tablegen class) to vector dialect. In this case, different dialects-specific scalable vector types can be derived from the same source. And these scalable vector types can have different definitions (parameters, parser, printer, etc.) to provide different semantics.

Define RISCVV LMUL Type, RISCVV Mask Type, and RISCVV Vector Type

The SEW setting can be inferred directly from the element type. For example, as the element type, i64 means SEW = 64, i32 means SEW = 32, and so on.

Element Type	SEW Setting
i64	64
i32	32
i16	16
i8	8

Unlike SEW, the LMUL type cannot be directly expressed by the built-in type because RISCVV supports fractions in the LMUL setting. Although the fractional LMUL values are not in the “must support” list (“Implementations must support LMUL integer values of 1, 2, 4, 8”), it is an important feature for performance in mixed-width values cases. According to the RISCVV specification, “fractional LMUL is used to increase the number of usable architectural registers when operating on mixed-width values, by not requiring that larger-width vectors occupy multiple vector registers.” I thus implement an LMUL type to provide better support for the fractional setting.

LMUL Type	LMUL Setting
!riscvv.mf8	1/8
!riscvv.mf4	1/4
!riscvv.mf2	1/2
!riscvv.m1	1
!riscvv.m2	2
!riscvv.m4	4
!riscvv.m8	8

According to LMUL and SEW types, the mask type can be determined. The ratio (SEW/LMUL) is the size for the mask type. I also define an RISCVV mask type to provide better semantics.

Mask Type	SEW/LMUL
!riscvv.mask1	1
!riscvv.mask2	2
!riscvv.mask4	4
!riscvv.mask8	8
!riscvv.mask16	16
!riscvv.mask32	32
!riscvv.mask64	64

For example, when the LMUL type is !rvv.m4 and SEW type is i32, the mask type will be !rvv.mask8.

As for the RISCVV scalable vector type, it takes the LMUL/Mask type and SEW type as parameters, and it also uses the “vector” keyword to be consistent with the SVE side. I use the same example in my first post; if we want four vector registers to be a group to deal with the i32 element type.

Previous version:

!rvv.vector<8xi32> (mask type: !rvv.vector<8xi1>)

Current version.

!riscvv.vector<!riscvv.m4, i32> (mask type: !riscvv.vector<!riscvv.mask8, i1>)

Obviously, the current version provides better semantics for users, people now can put the spec down when they use the RISCVV dialect.

Implement Type Mapping Process

The price of better semantics is the type syntax gap between the RISCVV dialect and the LLVM dialect. I thus add a type mapping process according to the mapping table. In this case, the users of the RISCVV dialect do not need to consider the type mapping. The lowering pass will handle the type mapping, and people only need to determine what data type and how many registers are used to form a register group.

For the above example, when we use the -convert-vector-to-llvm="enable-riscvv" option, the type mapping process will be triggered.

RISCVV Scalable Vector Type:

!riscvv.vector<!riscvv.m4, i32> (mask type: !riscvv.vector<!riscvv.mask8, i1>)

LLVM Scalable Vector Type:

!llvm.vec<? x 8 x i32> (mask type: !llvm.vec<? x 8 x i1>)

Example

After changing the naming and adding the new types, the example should be changed accordingly.
The latest code of the example can be found here. The compilation path is the same with my first post, don’t forget to add option -reconcile-unrealized-casts for current mlir-opt.

Integration Test

As for the integration test, I tried to run the cross-compiled lli and mlir-cpu-runner with QEMU, but both of them reported that unable to find the target machine. As shown in the example, the AOT method can work well with QEMU, which proves the correctness. But I am not sure if JIT now supports running in RISC-V QEMU. So I think we can just include the unit tests in the first patch and leave the integration tests for future exploration.

zhanghb97 · September 23, 2021, 1:34am

FYI - The RISC-V vector extension v1.0 has been frozen for public review.

javiersetoain · September 23, 2021, 3:44pm

Hi Hongbin,

I’m the author of the ArmSVE dialect and, obviously, I’m also interested in scalable vectorization in MLIR and I’ll be happy to discuss the topic and solutions I see you’ve based this vector dialect on my own and I thought I should give you a couple of warnings about the decisions I made and their implications, since you have implicitly accepted them for RVV.

First of all, the obvious one is that the dialect is quite disconnected from the rest of the infrastructure. It works as a back-end to generate scalable vector code, but none of the existing transformations will work with it. Adapting existing passes & ops to work with fixed-length and scalable-length, even when possible, is not trivial. But, as is, you can’t even do that without making those passes dependent on a close-to-hardware back-end dialect (be it RVV or SVE).

I went this way because it was the fastest, easiest, least intrusive way to get started with scalable vectors, but I think we should start thinking about how to promote scalable vectors to a built in type. There are a bunch of arithmetic and comparison ops that are there as a workaround, simply because the ones in StandardOps won’t accept scalable vectors as operands (again, without making them dependent on a back-end dialect), but all of those are unnecessary and should to go if scalable vectors become a built in type.

This means that there’s a lot of work left to do on the dialect from a maintenance point of view, work that requires a long-term commitment. Correct me if I am wrong but I believe you’re doing this work as part of an internship, are there any stakeholders on your side who can commit to “inherit” the responsibility once you’ve finished? It might be worth reaching out to people in industry and public research institutions with long-term interest in RISC-V Vector, it looks like the extension is ready to leave the “Draft” state, there should be a few.

That aside, I’ll be happy to discuss and collaborate with you on the topic

_sean_silva · September 24, 2021, 3:54am

Looping in @clattner @topperc

zhanghb97 · September 24, 2021, 12:10pm

Hi Javier,

Thanks for your reply! I am very willing to discuss this topic with you

Now, I am a PhD student in the PLCT Lab, ISCAS (The Institute of Software, Chinese Academy of Sciences). Supporting RVV in MLIR is part of my work, and I am interested in exploring the compilation technology for vector architecture. I have about four years to graduate, and I can contribute to this direction during this time. My laboratory also has plans for continuous contributions. Our lab also has various RVV development experiences, including LLVM RVV backend and OpenCV RVV support. Apart from that, we also have project on the LFX platform to attract more contributors to explore how to make good use of RVV.

lazyparser · September 24, 2021, 1:41pm

Hi javiersetoain ,

That’s true. Implementing and maintaining RVV dialect is a long-term project, and one contributor cannot get all the things done. This project is supported by the PLCT Lab. Hongbin Zhang is a PhD candidate who is leading the MLIR related projects in the PLCT Lab.

My name is Wei Wu, and I’m the director and co-founder of the PLCT Lab. PLCT has an engineering team with 30+ staff and 50+ students, focusing on compilers, simulators and language virtual machines (VM), and devotes significant effort on the fundamental open source projects especially for RISC-V ecosystem, including GCC, LLVM, OpenCV, V8 and MLIR. The PLCT Lab is also one of the first Development Partners of the RISC-V International, contributing on the implementations of Bitmanip, Krypto Scalar, Zfinx, Zce, and many other unratified specs. We also had maintaining a RVV implementation in LLVM (0.8, 0.9, ~0.10) until early 2021, and merged our efforts with the team from SiFive and EPI/BSC. We contributed the RVV support for OpenCV, which is believed one of the first RVV applications in big open source projects.

The PLCT Lab has several successful stories for continuously contributing and maintaining open source projects. Take OpenCV as an example: our another graduate Yin Zhang had contributed the initial RVV support for OpenCV as a GSoC project in 2020. He becomes an active contributor after his GSoC project. Further more, we now have a new contributor Liutong Han working on extending the RVV support for OpenCV since 2021. Each projects in PLCT has at least one senior staff supervising it. Mingjie Xing is our senior staff who is supervising RISC-V support projects for MLIR, LLVM, and OpenCV.

Feel free to contact me if you have any further concerns on the long term support.

Topic		Replies	Views
[RFC] Add built-in support for scalable vector types MLIR	14	2036	December 4, 2021
[RFC] Dynamic Vector Semantics for the MLIR Vector Dialect MLIR	24	2243	March 7, 2024
[RFC] Vector Dialects: Neon and SVE MLIR	15	3578	December 8, 2020
[Abandoned][RFC] AVX512-specific Dialect for implementing and benchmarking XNNPack in MLIR MLIR	22	1947	March 4, 2020
Vector dialect and Google Highway? MLIR mlir	9	501	November 28, 2024