About the optimization of the generated LLVM IR function that returns struct type

I wrote a function that returns the following structure and compiled it to LLVM IR using clang++ with -O3.

#include <cstdint>

typedef struct {
    uint64_t a;
    uint32_t b;
} BBB;

[[gnu::used]] BBB fun2(uint64_t a, uint32_t b) {
    return BBB{a, b};
}

As a result, the following LLVM IR was generated, and this code performs no memory accesses (at least in the LLVM IR) and calculates using only virtual registers.

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define dso_local [2 x i64] @_Z4fun2mj(i64 noundef %0, i32 noundef %1) #4 {
  %3 = insertvalue [2 x i64] poison, i64 %0, 0
  %4 = zext i32 %1 to i64
  %5 = insertvalue [2 x i64] %3, i64 %4, 1
  ret [2 x i64] %5
}

However, when I compiled a function that returns a structure with three members in the same way:

typedef struct {
  uint64_t a;
  uint64_t b;
  uint64_t c;
} BIGS;

[[gnu::used]] BIGS func3(uint64_t a, uint64_t b, uint64_t c) {
  return BIGS{a, b, c};
}

The following code was generated. This code adds an unnecessary pointer as an argument and performs memory accesses. After some experiments, I found that this happens when the number of members is three or more. Why does this result occur? I thought that performing memory accesses would generally make it slower. Could anyone give me some advice?

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable
define dso_local void @_Z5func3mmm(ptr noalias nocapture writeonly sret(%struct.BIGS) align 8 %0, i64 noundef %1, i64 noundef %2, i64 noundef %3) #5 {
  store i64 %1, ptr %0, align 8, !tbaa !16
  %5 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 1
  store i64 %2, ptr %5, align 8, !tbaa !18
  %6 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 2
  store i64 %3, ptr %6, align 8, !tbaa !19
  ret void
}

It’s actually related to the ABI of the architecture you want to target, which I assumed it’s x86_64. Here is the description from Wikipedia (my apologies for the lack of a better source) regarding returning a struct in x86_64 SysV ABI: “ …Struct and union return types with sizes of two pointers or fewer are returned in RAX and RDX (or XMM0 and XMM1). When an oversized struct return is needed, another pointer to a caller-provided space is prepended as the first argument…”

1 Like

In your second example, the pointer has the attribute sret. You can search for sret in this document for more details:
https://llvm.org/docs/LangRef.html

1 Like

@ mshockwave Thank you for your response. This code was verified on ARM64, and it might be due to the ABI specifications similar to x86-64. Additionally, I think it is possible to forcibly define a function using the LLVM IR API that returns a relatively large structure while using only virtual registers. However, in such a function, is it unlikely that LLVM will perform register allocation so that the return value is computed entirely using physical registers?

This is a well known limitation of LLVM. Clang and only Clang knowns the ABIs for Arm32, AArch64, AArch64-Darwin, 32bit X86, 64bit X86, 64bit X86 Windows, …
You can fiddle with LLVM-IR, but you should leave the ABI decisions to Clang.

1 Like