I wrote a function that returns the following structure and compiled it to LLVM IR using clang++ with -O3.
#include <cstdint>
typedef struct {
uint64_t a;
uint32_t b;
} BBB;
[[gnu::used]] BBB fun2(uint64_t a, uint32_t b) {
return BBB{a, b};
}
As a result, the following LLVM IR was generated, and this code performs no memory accesses (at least in the LLVM IR) and calculates using only virtual registers.
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none) uwtable
define dso_local [2 x i64] @_Z4fun2mj(i64 noundef %0, i32 noundef %1) #4 {
%3 = insertvalue [2 x i64] poison, i64 %0, 0
%4 = zext i32 %1 to i64
%5 = insertvalue [2 x i64] %3, i64 %4, 1
ret [2 x i64] %5
}
However, when I compiled a function that returns a structure with three members in the same way:
typedef struct {
uint64_t a;
uint64_t b;
uint64_t c;
} BIGS;
[[gnu::used]] BIGS func3(uint64_t a, uint64_t b, uint64_t c) {
return BIGS{a, b, c};
}
The following code was generated. This code adds an unnecessary pointer as an argument and performs memory accesses. After some experiments, I found that this happens when the number of members is three or more. Why does this result occur? I thought that performing memory accesses would generally make it slower. Could anyone give me some advice?
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable
define dso_local void @_Z5func3mmm(ptr noalias nocapture writeonly sret(%struct.BIGS) align 8 %0, i64 noundef %1, i64 noundef %2, i64 noundef %3) #5 {
store i64 %1, ptr %0, align 8, !tbaa !16
%5 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 1
store i64 %2, ptr %5, align 8, !tbaa !18
%6 = getelementptr inbounds %struct.BIGS, ptr %0, i64 0, i32 2
store i64 %3, ptr %6, align 8, !tbaa !19
ret void
}