Hi Sean,
I have looked at the code size issues and identified the root cause of them.
The biggest code size increase (from 35512 bytes to 44184 bytes, +24%) is for MultiSource/Benchmarks/MiBench/automotive-susan (http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/MiBench/automotive-susan/susan.c?view=markup).
Compiler options: clang -O3 -DNDEBUG -mcpu=cortex-a57 -fomit-frame-pointer -c MultiSource/Benchmarks/MiBench/automotive-susan/susan.c
The problem is that the partial inliner “duplicates” huge functions. I mean “duplicates” because a difference between the original function and the one created by the partial inliner is very small.
For example:
define dso_local i32 @susan_edges_small…{
entry:
%0 = bitcast i32* %r to i8*
%mul = mul nsw i32 %y_size, %x_size
%conv = sext i32 %mul to i64
%mul1 = shl nsw i64 %conv, 2
tail call void @llvm.memset.p0i8.i64(i8* align 4 %0, i8 0, i64 %mul1, i1 false)
%cmp645 = icmp sgt i32 %y_size, 2
br i1 %cmp645, label %for.cond3.preheader.lr.ph, label %for.end398
…
<<A lot of code: ~500 lines of IR>>
…
for.end398: ; preds = %for.inc396, %entry, %for.cond84.preheader
ret i32 undef
}
The partial inliner creates @susan_edges_small.50_for.cond3.preheader.lr.ph where those 500 lines of IR are put. This results in two huge functions.
There are four such big functions in susan.c: susan_edges_small, susan_edges, susan_thin and susan_principle.
IMHO the issue can be solved when functions are put into own sections (this mode is off by default) and then removed by a linker. However this will raise additional requirements how to build and to link an application which cannot be met in all cases.
Thanks,
Evgeny Astigeevich