Consider the following small example:
struct wrapper {
long value;
};
long read_wrapper(wrapper w) { return w.value; }
long read_primitive(long x) { return x; }
When compiling for x86 at -O1, both functions reduce nicely to a single IR instruction. Looks like -sroa is performing this transformation, but even at -O0 it has deduced that the argument is really just an i64.
Before -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64) #0 {
%2 = alloca %struct.wrapper, align 8
%3 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
store i64 %0, i64* %3, align 8
%4 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %2, i32 0, i32 0
%5 = load i64, i64* %4, align 8
ret i64 %5
}
After -sroa:
define dso_local i64 @_Z12read_wrapper7wrapper(i64 returned) local_unnamed_addr #0 {
ret i64 %0
}
But when I add -target le64, the read_wrapper function accepts a %struct.wrapper* byval, a pointer to the caller’s stack. No level of optimization is able to make this function look as simple as read_primitive.
define dso_local i64 @_Z12read_wrapper7wrapper(%struct.wrapper* byval nocapture readonly align 8) local_unnamed_addr #0 {
%2 = getelementptr inbounds %struct.wrapper, %struct.wrapper* %0, i64 0, i32 0
%3 = load i64, i64* %2, align 8, !tbaa !2
ret i64 %3
}
We’re writing our own LLVM backend for a new architecture, we started with the generic little-endian 64-bit target (le64) and made customizations from there. What needs to be done to re-enable this optimization for our target?