How to add new memory symbol like cuda "__constant__"

Hi,I‘m beginner to clang, I want to add some memory and support its compilation,just like cuda
“__constant__” “__global__”.
I have researched some test cases like clang/test/CodeGenCUDA/address-spaces.cu:

// RUN: %clang_cc1 -emit-llvm %s -o - -fcuda-is-device -triple nvptx-unknown-unknown | FileCheck %s
// RUN: %clang_cc1 -emit-llvm %s -o - -fcuda-is-device -triple amdgcn | FileCheck %s

// Verifies Clang emits correct address spaces and addrspacecast instructions
// for CUDA code.

#include "Inputs/cuda.h"

// CHECK: @i ={{.*}} addrspace(1) externally_initialized global
__device__ int i;

// CHECK: @j ={{.*}} addrspace(4) externally_initialized global
__constant__ int j;

// CHECK: @k ={{.*}} addrspace(3) global
__shared__ int k;

struct MyStruct {
int data1;
int data2;
};

// CHECK: @_ZZ5func0vE1a = internal addrspace(3) global %struct.MyStruct undef
// CHECK: @_ZZ5func1vE1a = internal addrspace(3) global float undef
// CHECK: @_ZZ5func2vE1a = internal addrspace(3) global [256 x float] undef
// CHECK: @_ZZ5func3vE1a = internal addrspace(3) global float undef
// CHECK: @_ZZ5func4vE1a = internal addrspace(3) global float undef
// CHECK: @b ={{.*}} addrspace(3) global float undef

__device__ void foo() {
// CHECK: load i32, ptr addrspacecast (ptr addrspace(1) @i to ptr)
i++;

// CHECK: load i32, ptr addrspacecast (ptr addrspace(4) @j to ptr)
j++;

// CHECK: load i32, ptr addrspacecast (ptr addrspace(3) @k to ptr)
k++;

__shared__ int lk;
// CHECK: load i32, ptr addrspacecast (ptr addrspace(3) @_ZZ3foovE2lk to ptr)
lk++;
}

some definition in Inputs/cuda.h:

/* Minimal declarations for CUDA support. Testing purposes only. */

#include <stddef.h>

#if __HIP__ || __CUDA__

#define __constant__ __attribute__((constant))

#define __device__ __attribute__((device))

#define __global__ __attribute__((global))

#define __host__ __attribute__((host))

#define __shared__ __attribute__((shared))

#if __HIP__

#define __managed__ __attribute__((managed))

#endif

#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))

#else

#define __constant__

#define __device__

#define __global__

#define __host__

#define __shared__

#define __managed__

#define __launch_bounds__(...)

#endif

output LLVM IR:

  1 ; ModuleID = 'sample.cu'
  2 source_filename = "sample.cu"
  3 target datalayout = "e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"
  4 target triple = "nvptx-unknown-unknown"
  5
  6 @i = addrspace(1) externally_initialized global i32 0, align 4
  7 @j = addrspace(4) externally_initialized global i32 0, align 4
  8 @k = addrspace(3) global i32 undef, align 4
  9 @_ZZ3foovE2lk = internal addrspace(3) global i32 undef, align 4
 10 @llvm.compiler.used = appending global [2 x ptr] [ptr addrspacecast (ptr addrspace(1) @i to ptr), ptr addrspacecast (ptr addrspace(4) @j to ptr)], section "llvm.metadata"

I tried imitating Cuda and made some attempts,

  6 #define SIZE 10
  7
  8 #define __wBuff__ __attribute__((wBuff))
  9
 10 int aaa[SIZE]={0};
 11 int bbb[SIZE]={1};
 12 __wBuff__ int ddd[SIZE];
 13

clang have a warning and output LLVM IR has no ‘addrspace’ ,

test.c:12:1: warning: unknown attribute 'wBuff' ignored [-Wunknown-attributes]
__wBuff__ int ddd[SIZE];
^~~~~~~~~
test.c:8:34: note: expanded from macro '__wBuff__'
#define __wBuff__ __attribute__((wBuff))
                                 ^~~~~
1 warning generated.

So how should I define ’wBuff‘ to generate a normal LLVM IR ?Are there any documents or cases that can be referenced?
I would greatly appreciate any help from you.

Address spaces are qualifiers in LLVM-IR that instruct the backend to generate different kinds of instructions for different kinds of memory. See User Guide for NVPTX Back-end — LLVM 19.0.0git documentation for the numerical values that the NVPTX backend supports. The CUDA attribute constant for example is just a special attribute that puts the global in address space 4, which is listed as constant by the backend.

In this example I’m just using the OpenCL attributes in leui of the CUDA ones, but they do effectively the same thing Compiler Explorer. See how the different address spaces on the globals causes the backend to emit different assembly to support it.

#define __wBuff__ __attribute__((wBuff))

This isn’t a recognized attribute, if you really want to you can do __attribute__((address_space(N))) where N is some number, but that’s going to be very likely broken because the numerical meaning depends entirely on the backend.

Does that answer your question? I can provide more examples if you’re still confused.

@jhuber6
first , thank you very much for your help.
I made modifications based on your suggestions,

6 #define SIZE 10
  7
  8 #define __wBuff__ __attribute__((addrspace(1)))       
  9
 10 int aaa[SIZE]={0};
 11 int bbb[SIZE]={1};
 12 __wBuff__ int ddd[SIZE];
 13

and get follow llvm IR:

  1 ; ModuleID = 'test.c'
  2 source_filename = "test.c"
  3 target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
  4 target triple = "riscv64-unknown-linux-gnu"
  5
  6 @aaa = dso_local global [10 x i32] zeroinitializer, align 4
  7 @bbb = dso_local global <{ i32, [9 x i32] }> <{ i32 1, [9 x i32] zeroinitializer }>, align 4
  8 @ddd = dso_local addrspace(1) global [10 x i32] zeroinitializer, align 4

please allow me to ask another question:

definition like #define __wBuff__ __attribute__((addrspace(1))) is ok enough to generate LLVM IR without core develop in clang?
or i should do some things like cuda in
clang/include/clang/Basic/Attr.td:

// CUDA attributes are spelled __attribute__((attr)) or __declspec(__attr__),
// and they do not receive a [[]] spelling.
def CUDAConstant : InheritableAttr {
  let Spellings = [GNU<"constant">, Declspec<"__constant__">];
  let Subjects = SubjectList<[Var]>;
  let LangOpts = [CUDA];
  let Documentation = [Undocumented];
}

clang/lib/Sema/SemaDeclAttr.cpp:

  case ParsedAttr::AT_CUDAConstant:
    handleConstantAttr(S, D, AL);
    break;
------------------------------------------------------------
static void handleConstantAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
  const auto *VD = cast<VarDecl>(D);
  if (VD->hasLocalStorage()) {
    S.Diag(AL.getLoc(), diag::err_cuda_nonstatic_constdev);
    return;
  }
  // constexpr variable may already get an implicit constant attr, which should
  // be replaced by the explicit constant attr.
  if (auto *A = D->getAttr<CUDAConstantAttr>()) {
    if (!A->isImplicit())
      return;
    D->dropAttr<CUDAConstantAttr>();
  }
  D->addAttr(::new (S.Context) CUDAConstantAttr(S.Context, AL));
}

If you want to communicate to the backend that the variable is in the global memory space then yes, it is sufficient. The CUDA attribute provides a more direct implementation and has more semantic checks associated with it, but fundamentally it ends up looking the same to the backend.

I do not have a full understanding of what you are trying to do, so I can’t make any suggestions beyond that. Here’s a place in tree that uses these attributes directly llvm-project/libc/src/__support/GPU/nvptx/utils.h at main · llvm/llvm-project · GitHub. But you are probably going to have a much better time sticking with stuff like CUDA. My suggestion is to check the output PTX and see if it matches what you except it to be.

@jhuber6

If you want to communicate to the backend that the variable is in the global memory space then yes, it is sufficient.

That’s great !

I do not have a full understanding of what you are trying to do, so I can’t make any suggestions beyond that.

Simply , I want to add some static memory to RISCV and hope that the compiler can support its compilation, with corresponding address segments in the generated binary.

RISC-V? I’m not an expert but I don’t think RISC-V has a concept of address spaces like GPU targets do. Everything there would just go into the default address space.

@jhuber6
My description is not accurate. Riscv is only a part of our DSA architecture, like Google’s tpu v5. Riscv can perform scalar and vector calculations, and in addition to the default address space, additional memory blocks need to be added for tensor calculations