Optimizing pass-by-value structs for le64 target

Eric_Hein · July 2, 2019, 2:55pm

Consider the following small example:

struct wrapper {
long value;
};
long read_wrapper(wrapper w) { return w.value; }
long read_primitive(long x) { return x; }

When compiling for x86 at -O1, both functions reduce nicely to a single IR instruction. Looks like -sroa is performing this transformation, but even at -O0 it has deduced that the argument is really just an i64.

Before -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64) #0 {
%2 = alloca %struct.wrapper, align 8
%3 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
store i64 %0, i64* %3, align 8
%4 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
%5 = load i64, i64* %4, align 8
ret i64 %5
}

After -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64 returned) local_unnamed_addr #0 {
ret i64 %0
}

But when I add -target le64, the read_wrapper function accepts a %struct.wrapper* byval, a pointer to the caller’s stack. No level of optimization is able to make this function look as simple as read_primitive.

define dso_local i64 @_Z12read_wrapper7wrapper(%struct.wrapper* byval nocapture readonly align 8) local_unnamed_addr #0 {
%2 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %0, i64 0, i32 0
%3 = load i64, i64* %2, align 8, !tbaa !2
ret i64 %3
}

We’re writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?

TNorthover · July 4, 2019, 8:50am

Hi Eric,

We're writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?

This particular detail is down to ABI handling in
lib/CodeGen/TargetInfo.cpp. There each target has code to decide how a
given C or C++ type gets mapped to LLVM IR at function call
boundaries. The main purpose is of course to follow an externally
specified ABI, but as you've discovered there's certain leeway for
performance gains if multiple options are equally valid.

Cheers.

Tim.

Topic		Replies	Views
Trying to optimize small snippet IR & Optimizations	5	175	December 7, 2023
RFC: SROA for method argument LLVM Dev List Archives	4	99	May 15, 2017
Optimization issues (Alias Analysis?) LLVM Dev List Archives	3	63	July 6, 2016
If there are some passes in LLVM do the opposite of the SROA(Scalar Replacement of Aggregates) pass LLVM Dev List Archives	3	102	March 6, 2019
Poor optimization of memory access to (no-byval indirect) aggregate value arguments IR & Optimizations riscv , clang	0	193	January 31, 2023

Optimizing pass-by-value structs for le64 target

Related Topics