Optimizing pass-by-value structs for le64 target

Consider the following small example:

struct wrapper {
long value;
};
long read_wrapper(wrapper w) { return w.value; }
long read_primitive(long x) { return x; }

When compiling for x86 at -O1, both functions reduce nicely to a single IR instruction. Looks like -sroa is performing this transformation, but even at -O0 it has deduced that the argument is really just an i64.

Before -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64) #0 {
%2 = alloca %struct.wrapper, align 8
%3 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
store i64 %0, i64* %3, align 8
%4 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
%5 = load i64, i64* %4, align 8
ret i64 %5
}

After -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64 returned) local_unnamed_addr #0 {
ret i64 %0
}

But when I add -target le64, the read_wrapper function accepts a %struct.wrapper* byval, a pointer to the caller’s stack. No level of optimization is able to make this function look as simple as read_primitive.

define dso_local i64 @_Z12read_wrapper7wrapper(%struct.wrapper* byval nocapture readonly align 8) local_unnamed_addr #0 {
%2 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %0, i64 0, i32 0
%3 = load i64, i64* %2, align 8, !tbaa !2
ret i64 %3
}

We’re writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?

Hi Eric,

We're writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?

This particular detail is down to ABI handling in
lib/CodeGen/TargetInfo.cpp. There each target has code to decide how a
given C or C++ type gets mapped to LLVM IR at function call
boundaries. The main purpose is of course to follow an externally
specified ABI, but as you've discovered there's certain leeway for
performance gains if multiple options are equally valid.

Cheers.

Tim.