How to implement vectorloadStore pass for a new GPU backend?

Tengfei09 · June 28, 2023, 2:51am

My goal is to optimize the performance of softmax kernel.

When comparing the NVPTX backend and my own GPU backend to generate LLVM IR, I found that NVPTX vectorized load and store instructions.

Part of NVPTX LLVM IR

  %31 = load <4 x half>, ptr addrspace(1) %scevgep91, align 8, !invariant.load !5
  %32 = fmul <4 x half> %31, <half 0xH211F, half 0xH211F, half 0xH211F, half 0xH211F>
  %33 = fcmp oge <4 x half> %19, %32
  %34 = or <4 x i1> %30, %33
  %35 = select <4 x i1> %34, <4 x half> %19, <4 x half> %32
  %scevgep85 = getelementptr i8, ptr addrspace(1) %scevgep89, i64 %27
  %scevgep86 = getelementptr i8, ptr addrspace(1) %scevgep85, i64 -524288

Further analysis found that the NVPTX backend has a separate LoadStoreVectorizer pass. However, when I directly added this pass, I got incorrect results. My question is, what other passes or dependences need to be enabled to implement load and store vectorization?

Artem-B · June 28, 2023, 7:16pm

NVPTX is a somewhat odd back-end, so there’s a chance that not everything there is applicable to other back-ends. AMDGPU may be a somewhat better reference.

That said, the LoadStoreVectorizer is a generic IR pass (I assume we’re talking about https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp) and it should’ve produced correct results. So, when you’re saying “added this pass, I got incorrect results”, it’s not clear what IR you gave to the pass, what you got back and what you expected to see.

Is that the IR you want to lower to vector instructions for your target? It looks pretty well vectorized already, so there’s not much for IR-level LoadStoreVectorizer pass to do here.

If your problem that load <4 x half> gets lowered to for individual loads, then you need to make sure your <target>ISelLowering.cpp sets the op as Legal or Custom and the rest of your back-end handles the rest of the lowering for the vector load.

Topic		Replies	Views
How to implement VectorloadStore Pass for a new GPU Target? IR & Optimizations gpu , llvm	0	152	July 3, 2023
Questions about modify getelementptr IR & Optimizations	1	117	July 6, 2023
How to Generate Non-Coherent Loads on a GPU Backend? MLIR gpu	5	313	September 11, 2023
Adding masked vector load and store intrinsics LLVM Dev List Archives	64	200	October 28, 2014
[NVPTX] We need an LLVM CUDA math library, after all LLVM Dev List Archives	13	79	July 13, 2013

How to implement vectorloadStore pass for a new GPU backend?

Related Topics