Convert NVIDIA GPU LLVM IR(NVVM) alloca instruction to AMDGPU's

Zeyuli · August 15, 2022, 5:49pm

I use clang++ and hipcc to generate LLVM IR for NVIDIA GPU and AMD GPU. They generate different alloca statements.
Here is an example:

__device__ void int_a_kernel() {
    int a = 1;
}

NVIDIA GPUs LLVM IR:

define dso_local void @_Z12int_a_kernelv() #0 {
  %1 = alloca i32, align 4
  store i32 1, i32* %1, align 4
  ret void
}

But AMDGPU’s has more memory info

define dso_local void @_Z12int_a_kernelv() #0 {
  %1 = alloca i32, align 4, addrspace(5)
  %2 = addrspacecast i32 addrspace(5)* %1 to i32*
  store i32 1, i32* %2, align 4
  ret void
}

I wonder is there a way to convert LLVM ir for NVIDIA GPU to ir for AMDGPU or a way to add the addrspace and addrspacecast instructions?

jvesely · August 15, 2022, 6:00pm

Clang has an undocumented __attribute__((address_space(N))) you can try using that.
See [0] for the semantics of different address space numbers.

[0] clang: include/clang/Basic/AddressSpaces.h Source File

Zeyuli · August 15, 2022, 6:07pm

Thank you very much for your reply and the useful information you provided. But I would prefer to have a way to handle LLVM IR directly.

Artem-B · August 15, 2022, 6:24pm

I believe NVPTX does eventually add device-specific ASCs – see llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Zeyuli · August 16, 2022, 2:32am

Thanks! I think that will help me a lot

arsenm · September 8, 2022, 4:08pm

It’s not more information, it’s a fundamental part of the value.

The NVPTX alloca handling is basically an old hack to avoid changing the IR to keep alloca producing a generic pointer where the value is really allocated in ADDRESS_SPACE_LOCAL. NVPTXLowerAlloca introduces a pair of casts to hack in some of the optimization benefit of the specific address space. Really replacing the address space of the value requires transitively rewriting all users

Zeyuli · September 9, 2022, 5:59am

Thanks! This conflicts somewhat with my intuition for optimizing CUDA programs. For instance, when I use a shared-memory to accelerate repeated memory access, the pointer which points to shared memory in llvm ir will be converted to a generic pointer. However, it works.

Zeyuli · September 9, 2022, 6:00am

I guess nvvm uses annotations to contain information about address space.

Topic		Replies	Views
Memref.alloca in AMD GPU kernels seem to lower to llvm.alloca with an incorrect address space MLIR gpu	24	894	January 4, 2023
[RFC] Cleaning up the NVIDIA (and potentially AMD) GPU backend Code Generation gpu , nvptx , amdgpu	5	537	June 29, 2023
Is it possible to run part of code to NVIDIA gpu and part to AMD gpu? Community gpu , llvm	19	921	July 14, 2023
AMDGPU mimics JIT? LLVM Dev List Archives	1	89	February 25, 2020
RFC: llvm.gpu builtins for target agnostic code representation IR & Optimizations nvptx , amdgpu , clang , libc , spirv	0	57	March 13, 2025

Convert NVIDIA GPU LLVM IR(NVVM) alloca instruction to AMDGPU's

Related topics