[RFC] Add RISC-V Vector Extension (RVV) Dialect

Hi Hongbin,

I’m the author of the ArmSVE dialect and, obviously, I’m also interested in scalable vectorization in MLIR and I’ll be happy to discuss the topic and solutions :slight_smile: I see you’ve based this vector dialect on my own and I thought I should give you a couple of warnings about the decisions I made and their implications, since you have implicitly accepted them for RVV.

First of all, the obvious one is that the dialect is quite disconnected from the rest of the infrastructure. It works as a back-end to generate scalable vector code, but none of the existing transformations will work with it. Adapting existing passes & ops to work with fixed-length and scalable-length, even when possible, is not trivial. But, as is, you can’t even do that without making those passes dependent on a close-to-hardware back-end dialect (be it RVV or SVE).

I went this way because it was the fastest, easiest, least intrusive way to get started with scalable vectors, but I think we should start thinking about how to promote scalable vectors to a built in type. There are a bunch of arithmetic and comparison ops that are there as a workaround, simply because the ones in StandardOps won’t accept scalable vectors as operands (again, without making them dependent on a back-end dialect), but all of those are unnecessary and should to go if scalable vectors become a built in type.

This means that there’s a lot of work left to do on the dialect from a maintenance point of view, work that requires a long-term commitment. Correct me if I am wrong but I believe you’re doing this work as part of an internship, are there any stakeholders on your side who can commit to “inherit” the responsibility once you’ve finished? It might be worth reaching out to people in industry and public research institutions with long-term interest in RISC-V Vector, it looks like the extension is ready to leave the “Draft” state, there should be a few.

That aside, I’ll be happy to discuss and collaborate with you on the topic :smiley:

2 Likes

Looping in @clattner @topperc

Hi Javier,

Thanks for your reply! I am very willing to discuss this topic with you :grin:

Now, I am a PhD student in the PLCT Lab, ISCAS (The Institute of Software, Chinese Academy of Sciences). Supporting RVV in MLIR is part of my work, and I am interested in exploring the compilation technology for vector architecture. I have about four years to graduate, and I can contribute to this direction during this time. My laboratory also has plans for continuous contributions. Our lab also has various RVV development experiences, including LLVM RVV backend and OpenCV RVV support. Apart from that, we also have project on the LFX platform to attract more contributors to explore how to make good use of RVV.

Hi javiersetoain ,

That’s true. Implementing and maintaining RVV dialect is a long-term project, and one contributor cannot get all the things done. This project is supported by the PLCT Lab. Hongbin Zhang is a PhD candidate who is leading the MLIR related projects in the PLCT Lab.

My name is Wei Wu, and I’m the director and co-founder of the PLCT Lab. PLCT has an engineering team with 30+ staff and 50+ students, focusing on compilers, simulators and language virtual machines (VM), and devotes significant effort on the fundamental open source projects especially for RISC-V ecosystem, including GCC, LLVM, OpenCV, V8 and MLIR. The PLCT Lab is also one of the first Development Partners of the RISC-V International, contributing on the implementations of Bitmanip, Krypto Scalar, Zfinx, Zce, and many other unratified specs. We also had maintaining a RVV implementation in LLVM (0.8, 0.9, ~0.10) until early 2021, and merged our efforts with the team from SiFive and EPI/BSC. We contributed the RVV support for OpenCV, which is believed one of the first RVV applications in big open source projects.

The PLCT Lab has several successful stories for continuously contributing and maintaining open source projects. Take OpenCV as an example: our another graduate Yin Zhang had contributed the initial RVV support for OpenCV as a GSoC project in 2020. He becomes an active contributor after his GSoC project. Further more, we now have a new contributor Liutong Han working on extending the RVV support for OpenCV since 2021. Each projects in PLCT has at least one senior staff supervising it. Mingjie Xing is our senior staff who is supervising RISC-V support projects for MLIR, LLVM, and OpenCV.

Feel free to contact me if you have any further concerns on the long term support. :slight_smile:

3 Likes

Brilliant! :smiley: I believe the dialect is in good hands :slight_smile: Once this has landed I’ll reach out to Hongbin, there’s a lot of work to do around scalable vectors outside of backend dialects, we should coordinate :slight_smile:

Thanks for taking the time to answer!

2 Likes

FYI, the type mapping from LMUL and SEW to llvm vscale types all falls apart if VLEN==32 instead of >= 64. We haven’t figured out to address this yet. The implementation defines vscale as VLENB/8, but if VLEN==32 then VLENB==4 and VLENB/8==0. Changing the mapping to support VLEN=32 leaves us no way to encode LMUL=1/8 for SEW=8.

Is the plan to support every RISCV vector operation or just the basic arithmetic, loads, stores, conversions? There is an ongoing effort to add intrinsics versions of basic IR instructions that take a mask and vector length argument. https://llvm.org/docs/LangRef.html#vector-predication-intrinsics It might make sense to target those instead of RISCV vector intrinsics. In theory those are supposed to work on multiple targets.

Thanks for the RFC!
I’m trying to support multi-backend with MLIR, and this could be really helpful!

IMO, supporting all the RVV operations is the ideal state, but we should add frequently used operations first, and then gradually support others. The reason why I only implement the basic arithmetic, loads, stores for the initial patch is that I want to keep the RFC simple to show the basic idea (flexible to modify and change direction) and these operations can build an executable example.

Thanks for informing this! I think this work can help us to create a unified vector abstraction layer in MLIR. I will learn more about the details of this work.

1 Like

Is the plan to support every RISCV vector operation or just the basic arithmetic, loads, stores, conversions?

There is no need to have hw-specific basic arithmetic operations. Standard arithmetic ops on scalable vectors already map neatly to whatever scalable hardware you want to target through LLVM IR. We should only need specific hw instructions for those operations that don’t map cleanly into LLVM IR ones (e.g.: matrix multiply or dot products). If we find ourselves having a 1-1 map between an MLIR dialect and a whole ISA, we’re very likely doing something wrong. 99% of the work will be adapting passes to work with scalable vectors and building new passes to deal with scalable vectorization. These dialects should be just an outlet for specialized instructions. The only reason we need these right now is because MLIR builtin vector types are fixed length only.

There is an ongoing effort to add intrinsics versions of basic IR instructions that take a mask and vector length argument. LLVM Language Reference Manual — LLVM 13 documentation It might make sense to target those instead of RISCV vector intrinsics. In theory those are supposed to work on multiple targets.

Indeed, those are the natural target for all masked vector operations. The reason why “masked” instructions in the Arm SVE dialect map to SVE intrinsics (which ended up replicated in RISC-V Vector) is because something was failing in the instruction selection, I was advised it’s a work in progress, and I decided to work around that. Eventually, similarly to basic arithmetic instructions, masked operations in the Vector dialect should map to masked vector operations in LLVM IR. Whether those are fixed-length vectors or scalable vector, RISC-V or SVE, can be determined by type of the vector operands in the Vector dialect and the target hw in LLVM respectively.

1 Like

Please read and provide feedback on [RFC] Add built-in support for scalable vector types. If that patch or something to that effect gets accepted, it would significantly simplify this change, as well as the approval process for it.

Thank you!
Javier

1 Like

The built-in support will be very helpful for the RVV side. I have replied to your RFC and expressed my thoughts. In general, I think it is challenging to design a unified scalable vector type, and I am very willing to discuss and contribute to this direction :grin:

1 Like

Indeed, I’m counting on that :smiley: Thanks, Hongbin!

I am writing to show the current state of the dialect. This work relies on two ongoing parts.

  • Built-in Scalable Vector Type

We have discussed this part in @javiersetoain’s RFC. After the patch lands, I will replace the current RVV specific type with the built-in scalable vector type.

  • Integration Test

The integration test needs lli or mlir-cpu-runner can work for the RISC-V backend. However, the RuntimeDyld doesn’t support the RISC-V side now. My teammate suggests that we should use JITLink, and we are working to support this. After the JIT supports the RISC-V backend, the problem of integration testing can be solved.

7 Likes

Update

  • Sync to the vector type with scalable dimensions.
  • Set the vta as an attribute.
  • Add setvl operation.
  • Some RISC-V + JIT progress (needed by integration test)

Here is the current RISCVV dialect patch .

Sync to the vector type with scalable dimensions.

According to the previous discussion, I sync the type to the built-in vector type with scalable dimensions.

Set the vta as an attribute.

The llvm intrinsic add vta argument to let users control the tail policy, see the patch for more details. Here I quote the sentences from the patch to show the meaning of the tail agnostic and tail undisturbed:

Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation.

Since the vta parameter is a tail policy option, it is more appropriate to be an attribute in MLIR. And the lowering pass is responsible for converting the attribute into an intrinsic argument.

Add setvl operation.

vsetvli is an useful instruction for RISC-V vector extension to set vector length according to the AVL, SEW, LMUL configurations. RVV uses this to achieve a direct and portable strip-mining approach, which is purposed to handle a large number of elements. The return value of this instruction is the number of elements for a single iteration. In this case, the vsetvli can help with strip-mining for loop iterations, which is different from the SIMD style (using masks for the tail processing).

After adding this operation, we can use strip-mining style loop iterations in MLIR for RVV target. I prepare an example to show this.

https://gist.github.com/zhanghb97/db87cd22d330ba6424b31c70b135b0ca#file-test-rvv-stripmining-mlir

Some RISC-V + JIT progress (needed by integration test)

My teammate has sent some patches hoping to support JIT for the RISC-V side.
Here I quote some sentences of his summary to show the point of the challenge:

In RISCV, temporary symbols will be used to generate dwarf, eh_frame sections…, and will be placed in object code’s symbol table. However, LLVM does not use names on these temporary symbols.

For more details, please see his patches:

https://reviews.llvm.org/D116475

https://reviews.llvm.org/D116794

Update

Here is the current RISCVV dialect patch .

Integration Test

  • Build the integration test environment

Currently, there is no RVV hardware available, so the emulator is required for the integration tests. I provide an environment setup document to show how to build the toolchain and perform the integration tests.

  • Test cases

I add three cases for the integration tests.

  1. test-riscvv-arithmetic
  2. test-riscvv-memory
  3. test-riscvv-stripmining

Patterns for the mask/tail policy strategies

There are two strategies to control the mask/tail policy in the RISC-V LLVM IR intrinsics:

  1. Use the “policy” argument at the end of the argument list.
  2. Use the “passthrough” argument at the beginning of the argument list.

I add two patterns (“ConvertPolicyOperandOpToLLVMPattern” and “ConvertPassthruOperandOpToLLVMPattern”) to deal with these two strategies.

Discussion

  • Unified integration test configurations for the emulator

The emulator configurations for the integration test are target specific. So I use similar configurations for the RVV side now, but it seems a little cumbersome. Should we design unified configurations for the integration test with emulators?

1 Like

Just wanted to chime in and say thank you for the good work. I can help setup access to actual hardware if you want to test on real hardware with RVV support.

1 Like

Hi @powderluv, thanks a lot for your help! I am very excited to hear that the RVV hardware is available, and I do hope to test the RFC patch on it! With the hardware support, the entire lowering process and integration testing part can be further tested. Maybe we can discuss the details of how to access the hardware through private messages.

Hi @zhanghb97 ,
I build RISCVV Dialect patch with the following instructions.

  1. Clone the patch files
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
arc patch D108536
  1. Build local MLIR
cd llvm-project
mkdir build-local-mlir
cd build-local-mlir
cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS=mlir \
   -DLLVM_TARGETS_TO_BUILD="host;RISCV" \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON
ninja check-mlir
  1. Export mlir-opt path
export PATH=/llvm-project/build-local-mlir/bin:$PATH

But while lowing the test example vadd.mlir with the following instructions, some error occurs.

  • vadd.mlir
func @vadd(%in1: memref<?xi64>, %in2: i32, %out: memref<?xi64>, %maskedoff: memref<?xi64>, %mask: memref<?xi1>) {
  %c0 = arith.constant 0 : index
  %vta = arith.constant 1 : i64
  %vl = arith.constant 6 : i64
  %input1 = riscvv.load %in1[%c0], %vl : memref<?xi64>, !riscvv.vector<!riscvv.m4,i64>, i64
  %off = riscvv.load %maskedoff[%c0], %vl : memref<?xi64>, !riscvv.vector<!riscvv.m4,i64>, i64
  %msk = riscvv.load %mask[%c0], %vl : memref<?xi1>, !riscvv.vector<!riscvv.mask16,i1>, i64
  %output = riscvv.masked.add %off, %input1, %in2, %msk, %vl, %vta : !riscvv.vector<!riscvv.m4,i64>, i32, !riscvv.vector<!riscvv.mask16,i1>, i64
  riscvv.store %output, %out[%c0], %vl : !riscvv.vector<!riscvv.m4,i64>, memref<?xi64>, i64
  return
}
  • Lowering instructions
mlir-opt vadd.mlir -convert-vector-to-llvm="enable-riscvv" -convert-scf-to-std -convert-memref-to-llvm -convert-std-to-llvm='emit-c-wrappers=1' | mlir-translate -mlir-to-llvmir -o vadd.ll
mlir-opt: Unknown command line argument '-convert-scf-to-std'.  Try: 'mlir-opt --help'
mlir-opt: Did you mean '--convert-scf-to-cf'?
mlir-opt: Unknown command line argument '-convert-std-to-llvm=emit-c-wrappers=1'.  Try: 'mlir-opt --help'
mlir-opt: Did you mean '--convert-cf-to-llvm=emit-c-wrappers=1'?

So I change the instructions to

mlir-opt vadd.mlir -convert-vector-to-llvm="enable-riscvv" -convert-scf-to-cf -convert-memref-to-llvm -convert-cf-to-llvm='emit-c-wrappers=1' | mlir-translate -mlir-to-llvmir -o vadd.ll

or

mlir-opt vadd.mlir -convert-vector-to-llvm="enable-riscvv"

And the error is

vadd.mlir:5:65: error: dialect 'riscvv' provides no type parsing hook
  %input1 = riscvv.load %in1[%c0], %vl : memref<?xi64>, !riscvv.vector<!riscvv.m4,i64>, i64

Please give me some help.

For mlir-opt, I also try the version with the flag -reconcile-unrealized-casts, but the same error occurs.

mlir-opt vadd.mlir -convert-vector-to-llvm="enable-riscvv" -convert-scf-to-cf -convert-memref-to-llvm -convert-cf-to-llvm='emit-c-wrappers=1' -reconcile-unrealized-casts | mlir-translate -mlir-to-llvmir -o vadd.ll
vadd.mlir:5:65: error: dialect 'riscvv' provides no type parsing hook
  %input1 = riscvv.load %in1[%c0], %vl : memref<?xi64>, !riscvv.vector<!riscvv.m4,i64>, i64
mlir-opt vadd.mlir -convert-vector-to-llvm="enable-riscvv" -convert-scf-to-std -convert-memref-to-llvm -convert-std-to-llvm='emit-c-wrappers=1' -reconcile-unrealized-casts | mlir-translate -mlir-to-llvmir -o vadd.ll
mlir-opt: Unknown command line argument '-convert-scf-to-std'.  Try: 'mlir-opt --help'
mlir-opt: Did you mean '--convert-scf-to-cf'?
mlir-opt: Unknown command line argument '-convert-std-to-llvm=emit-c-wrappers=1'.  Try: 'mlir-opt --help'
mlir-opt: Did you mean '--convert-cf-to-llvm=emit-c-wrappers=1'?

Hi @njru8cjo , thanks for your finding!

I notice that you are using the old version example, which still includes the target-specific type (!riscvv.vector<!riscvv.m4,i64>). Now the RFC patch uses the unified scalable vector type.

As for the latest example, you can see the integration test in the patch, and you can also find the lowering passes pipeline at the head of the test file. Furthermore, if you want to cross-compile and run the example, please follow this doc to build the RISC-V environment.

1 Like