[RFC] Enable Partial Inliner by default

Hello,

I’ve posted a preliminary patch on Phabricator

https://reviews.llvm.org/D40477

that enables the partial inliner by default and disables the partial inlining pass during ThinLTO prepare/prelink and leaves it to the actual ThinLTO pass to do the work. The regular LTO prepare/prelink pass has been left alone since ThinLTO has a customized one.

I’ll gather some SPEC numbers this week to make sure disabling PI during ThinLTO prepare/prelink has a positive/neutral impact.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifGraham Yiu—11/14/2017 04:40:28 PM—Hi Evgeny, I agree that we probably need to tweak when the partial inliner should run when using LTO

image002.gif

image002.gif

Hi,

Hello All,

This conversations seems to have fizzled out and I would like to try to revive it. My intention is to pick up where Graham left off with enabling partial-inlining by default.

Hi Sean,

Thank you for reminding me.

It looks like it get lost among tons of emails and other tasks.

I’ll check if the code size issues still exist.

Thanks,

Evgeny Astigeevich

Hi Sean,

I have looked at the code size issues and identified the root cause of them.

The biggest code size increase (from 35512 bytes to 44184 bytes, +24%) is for MultiSource/Benchmarks/MiBench/automotive-susan (http://www.llvm.org/viewvc/llvm-project/test-suite/trunk/MultiSource/Benchmarks/MiBench/automotive-susan/susan.c?view=markup).

Compiler options: clang -O3 -DNDEBUG -mcpu=cortex-a57 -fomit-frame-pointer -c MultiSource/Benchmarks/MiBench/automotive-susan/susan.c

The problem is that the partial inliner “duplicates” huge functions. I mean “duplicates” because a difference between the original function and the one created by the partial inliner is very small.

For example:

define dso_local i32 @susan_edges_small…{

entry:

%0 = bitcast i32* %r to i8*

%mul = mul nsw i32 %y_size, %x_size

%conv = sext i32 %mul to i64

%mul1 = shl nsw i64 %conv, 2

tail call void @llvm.memset.p0i8.i64(i8* align 4 %0, i8 0, i64 %mul1, i1 false)

%cmp645 = icmp sgt i32 %y_size, 2

br i1 %cmp645, label %for.cond3.preheader.lr.ph, label %for.end398

<<A lot of code: ~500 lines of IR>>

for.end398: ; preds = %for.inc396, %entry, %for.cond84.preheader

ret i32 undef

}

The partial inliner creates @susan_edges_small.50_for.cond3.preheader.lr.ph where those 500 lines of IR are put. This results in two huge functions.

There are four such big functions in susan.c: susan_edges_small, susan_edges, susan_thin and susan_principle.

IMHO the issue can be solved when functions are put into own sections (this mode is off by default) and then removed by a linker. However this will raise additional requirements how to build and to link an application which cannot be met in all cases.

Thanks,

Evgeny Astigeevich