[RFC] Add per-global code model attribute

Hello everyone,

I would like to propose to adding a code model attribute for global variable, which can override the target’s default code model.

Motivation

There is an edge case where only certain global variables require larger code models, while the rest can default to smaller ones, which can effectively reduce overhead.

For linux kernel module percpu case of LoongArch:

static __attribute__((section(".data..percpu" ""))) int pcpu;

int fun(void)
{
    return pcpu;
}

After static linking, a small code model code is generated to access the pcpu variable, which can fully access the pcpu variable in the same link unit. The corner issue with kernel modules is that after dynamic linking, the actual location of the pcpu variables will be so far away from the code that the default code model (small) doesn’t work. This only happens for per-cpu variables, changing the code model for variables instead of all will help reduce overhead.

Proposal

In a previous discussion created by @xen0n, @aeubanks proposed adding a code model attribute to global variables. Backends will need to be updated to respect this.

@pcpu = internal global i32 0, section ".data..percpu", code_model "large"

How it works?

For the LoongArch case, the backend generated a large code model for the pcpu_large variable.

static __attribute__((section(".data..percpu" ""))) int pcpu_small;

int fun_small(void)
{
    return pcpu_small;
}

static __attribute__((section(".data..percpu" ""))) __attribute__((model("large")) int pcpu_large;

int fun_large(void)
{
    return pcpu_large;
}
@pcpu_small= internal global i32 0, align 4

define dso_local signext i32 @fun_small() #0 {
; CHECK-LABEL: fun_small:
; CHECK:       # %bb.0:
; CHECK-NEXT:    pcalau12i $a0, %pc_hi20(pcpu_small)
; CHECK-NEXT:    addi.d $a0, $a0, %pc_lo12(pcpu_small)
; CHECK-NEXT:    ld.w $a0, $a0, 0
; CHECK-NEXT:    ret
  %1 = load i32, ptr @pcpu_small, align 4
  ret i32 %1
}

@pcpu_large = internal global i32 0, code_model "large", align 4

define dso_local signext i32 @fun_large() #0 {
; CHECK-LABEL: fun_large:
; CHECK:       # %bb.0:
; CHECK-NEXT:    pcalau12i $a0, %pc_hi20(pcpu_large)
; CHECK-NEXT:    addi.d $a1, $zero, %pc_lo12(pcpu_large)
; CHECK-NEXT:    lu32i.d $a1, %pc64_lo20(pcpu_large)
; CHECK-NEXT:    lu52i.d $a1, $a1, %pc64_hi12(pcpu_large)
; CHECK-NEXT:    add.d $a0, $a1, $a0
; CHECK-NEXT:    ld.w $a0, $a0, 0
; CHECK-NEXT:    ret
  %1 = load i32, ptr @pcpu_large, align 4
  ret i32 %1
}

Pull requests

Part 1/3: [llvm][IR] Add per-global code model attribute
Part 2/3: [clang] Add per-global code model attribute
Part 3/3: [llvm][LoongArch] Get the code model from the global object

2 Likes

We (Google) also want something like this for x86-64’s medium code model, which has a small/large (near/far) data split based on the size of the global. We want to be able to specify that a global variable is unconditionally small or large regardless of its size. The annoying thing about mapping this to your proposed code_model is that this is specific to medium code model (the large code model can also split small/large data for some reason). Since among all x86-64 code models all that matters for a specific global variable/function is whether or not it’s considered “large”, we could overload this code_model to mean something slightly different for x86-64, where code_model = "small" means the global is unconditionally small and code_model = "large" means the global is unconditionally large.

Summarizing the x86-64 code models:
small: small text and small data
medium: small text and large data, unless the data is smaller than some threshold
large: large text and large data (data can also be split into small/large?)

This RFC should state that this goes on a GlobalObject.

3 Likes

I think it’s important to make a distinction between a “code model” and where a global is laid out (and by extension what code sequence is used to access it). It sounds like for LoongArch there’s a one to one mapping between the two, but as I said before that’s not true for x86-64. x86-64’s medium code model can consider a global to be either “large” or “small” and lay it out differently.

It’s possible that some architecture has a bunch of different ways to layout and access globals for performance reasons and using the existing code models is insufficient for that. In that case we can extend the values that code_model accepts to also include architecture-specific values (e.g. “x86-64-humongous” if we needed it). But reusing the existing code models is likely convenient for most use cases, as your PR 72079 shows.

But specifically for LoongArch’s “percpu” case, if “percpu” is known to be special then perhaps the backend should automatically infer “large” if it sees that the section name contains “percpu” and we wouldn’t need this whole mechanism for your case?

Sorry for the late reply.

I believe extending the code model’s values is a fantastic approach to expanding functionality and meeting requirements similar to those of x86.

For the percpu case in LoongArch, relying on section names for inference might not be a reliable approach due to potential instability in the internal implementation of the Linux kernel. This concern has been mentioned in prior discussions, including those predating GCC.

if percpu is an internal implementation detail of just the kernel and not loongarch in general, then this makes sense