About the optimization of the generated LLVM IR function that returns struct type

Masashi · July 23, 2024, 7:45pm

I wrote a function that returns the following structure and compiled it to LLVM IR using clang++ with -O3.

#include <cstdint>

typedef struct {
    uint64_t a;
    uint32_t b;
} BBB;

[[gnu::used]] BBB fun2(uint64_t a, uint32_t b) {
    return BBB{a, b};
}

As a result, the following LLVM IR was generated, and this code performs no memory accesses (at least in the LLVM IR) and calculates using only virtual registers.

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define dso_local [2 x i64] @_Z4fun2mj(i64 noundef %0, i32 noundef %1) #4 {
  %3 = insertvalue [2 x i64] poison, i64 %0, 0
  %4 = zext i32 %1 to i64
  %5 = insertvalue [2 x i64] %3, i64 %4, 1
  ret [2 x i64] %5
}

However, when I compiled a function that returns a structure with three members in the same way:

typedef struct {
  uint64_t a;
  uint64_t b;
  uint64_t c;
} BIGS;

[[gnu::used]] BIGS func3(uint64_t a, uint64_t b, uint64_t c) {
  return BIGS{a, b, c};
}

The following code was generated. This code adds an unnecessary pointer as an argument and performs memory accesses. After some experiments, I found that this happens when the number of members is three or more. Why does this result occur? I thought that performing memory accesses would generally make it slower. Could anyone give me some advice?

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable
define dso_local void @_Z5func3mmm(ptr noalias nocapture writeonly sret(%struct.BIGS) align 8 %0, i64 noundef %1, i64 noundef %2, i64 noundef %3) #5 {
  store i64 %1, ptr %0, align 8, !tbaa !16
  %5 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 1
  store i64 %2, ptr %5, align 8, !tbaa !18
  %6 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 2
  store i64 %3, ptr %6, align 8, !tbaa !19
  ret void
}

mshockwave · July 23, 2024, 8:15pm

It’s actually related to the ABI of the architecture you want to target, which I assumed it’s x86_64. Here is the description from Wikipedia (my apologies for the lack of a better source) regarding returning a struct in x86_64 SysV ABI: “ …Struct and union return types with sizes of two pointers or fewer are returned in RAX and RDX (or XMM0 and XMM1). When an oversized struct return is needed, another pointer to a caller-provided space is prepended as the first argument…”

tschuett · July 23, 2024, 8:19pm

In your second example, the pointer has the attribute sret. You can search for sret in this document for more details:
https://llvm.org/docs/LangRef.html

Masashi · July 24, 2024, 12:15am

@ mshockwave Thank you for your response. This code was verified on ARM64, and it might be due to the ABI specifications similar to x86-64. Additionally, I think it is possible to forcibly define a function using the LLVM IR API that returns a relatively large structure while using only virtual registers. However, in such a function, is it unlikely that LLVM will perform register allocation so that the return value is computed entirely using physical registers?

tschuett · July 24, 2024, 5:53am

This is a well known limitation of LLVM. Clang and only Clang knowns the ABIs for Arm32, AArch64, AArch64-Darwin, 32bit X86, 64bit X86, 64bit X86 Windows, …
You can fiddle with LLVM-IR, but you should leave the ABI decisions to Clang.

Topic		Replies	Views
Question about returning a large struct LLVM Dev List Archives	1	100	January 20, 2015
Returning a structure LLVM Dev List Archives	4	128	January 27, 2010
Optimization for initialization after declaration of struct IR & Optimizations llvm	2	269	October 5, 2022
Questions about C Calling conventions IR & Optimizations llvm	10	777	November 2, 2023
First-class structs LLVM Dev List Archives	4	83	December 17, 2008

About the optimization of the generated LLVM IR function that returns struct type

Related topics