[RFC] Enable Partial Inliner by default

Forgot to add that all experiments were done with ‘-O3 -m64 -fexperimental-new-pass-manager’.

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifGraham Yiu—11/02/2017 05:26:58 PM—Hello, I’d like to propose turning on the partial inliner (-enable-partial-inlining) by default.

Hi Graham,

Is your RFC to enable it with the current pass manager? If so, do you have benchmark data for it?

Am I correct the new pass manager turns the partial inliner by default?

Thanks,

Evgeny Astigeevich

image001.gif

Hi Evgeny,

Ah, yes, I guess I wasn’t clear in my original email. I am proposing to enable it by default on both the new and current pass managers. However, I didn’t collect any data for the current pass manager, since I’m assuming (hoping) the new pass manager will be the new default at some point in the future.

I don’t think the partial inliner is enabled by default on the new pass manager (unless something changed recently). I do know it requires a slightly different option to enable (-enable-npm-partial-inlining).

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifEvgeny Astigeevich —11/02/2017 06:19:18 PM—Hi Graham, Is your RFC to enable it with the current pass manager? If so, do you have benchmark data

image001.gif

Hi Graham,

I think this is a good idea. It is also useful for libquantum, where
together with some other changes, it enables Polly to perform libfusion.

The ARM people also played with the partial inliner and might have
feedback.

Best,
Tobias

Hi

Hi Graham,

I think this is a good idea. It is also useful for libquantum, where
together with some other changes, it enables Polly to perform libfusion.

The ARM people also played with the partial inliner and might have
feedback.

We have been using the partial inliner on a range of large benchmarks internally for a while now. AFAIK the only problem we found was fixed upstream in https://reviews.llvm.org/rL317084.

Compile time is not our primary concern, so I cannot really comment on the impact there.

Cheers,
Florian

Hi,

We'd like to check impact on armv7m and armv6m targets, especially code size. We have not tried the partial inliner on them.

Could a decision to turn it on by default wait for results?

Thanks,
Evgeny Astigeevich
The Arm Compiler Optimization team

-----Original Message-----

Forgot to add that all experiments were done with '-O3 -m64 -fexperimental-new-pass-manager'.

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

<graycol.gif>Graham Yiu---11/02/2017 05:26:58 PM---Hello, I'd like to propose turning on the partial inliner (-enable-partial-inlining) by default.

From: Graham Yiu/Toronto/IBM
To: llvm-dev@lists.llvm.org
Cc: junbuml@codeaurora.org, xinliangli@gmail.com
Date: 11/02/2017 05:26 PM
Subject: [RFC] Enable Partial Inliner by default

Hello,

I'd like to propose turning on the partial inliner (-enable-partial-inlining) by default.

We've seen small gains on SPEC2006/2017 runtimes as well as lnt compile-times with a 2nd stage bootstrap of LLVM. We also saw positive gains on our internal workloads.

-------------------------------------
Brief description of Partial Inlining
-------------------------------------
A pass in opt that runs after the normal inlining pass. Looks for branches to a return block in the entry and immediate successor blocks of a function. If found, it outlines the rest of the function using the CodeExtractor.

Since you mention outlining of code: Does this negatively affect the debug info quality?

-- adrian

Hi Adrian,

As far as I know, the code extractor takes care of fixing up the debug information, if necessary. However, I haven’t verified this myself and to be honest my knowledge of how debug information is represented in LLVM is limited.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

Inactive hide details for Adrian Prantl ---11/03/2017 12:21:16 PM---> On Nov 2, 2017, at 3:05 PM, Graham Yiu via llvm-dev <llvmAdrian Prantl —11/03/2017 12:21:16 PM—> On Nov 2, 2017, at 3:05 PM, Graham Yiu via llvm-dev llvm-dev@lists.llvm.org wrote: >

Hi Evgeny,

Yes, please do. It was our hope that folks would verify the impact of the partial inliner on the platforms they’re currently working on.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifEvgeny Astigeevich —11/03/2017 12:18:05 PM—Hi, We’d like to check impact on armv7m and armv6m targets, especially code size. We have not tried

>
> Forgot to add that all experiments were done with '-O3 -m64
-fexperimental-new-pass-manager'.
>
> Graham Yiu
> LLVM Compiler Development
> IBM Toronto Software Lab
> Office: (905) 413-4077 C2-707/8200/Markham
> Email: gyiu@ca.ibm.com
>
> <graycol.gif>Graham Yiu---11/02/2017 05:26:58 PM---Hello, I'd like to
propose turning on the partial inliner (-enable-partial-inlining) by
default.
>
> From: Graham Yiu/Toronto/IBM
> To: llvm-dev@lists.llvm.org
> Cc: junbuml@codeaurora.org, xinliangli@gmail.com
> Date: 11/02/2017 05:26 PM
> Subject: [RFC] Enable Partial Inliner by default
>
>
>
> Hello,
>
> I'd like to propose turning on the partial inliner
(-enable-partial-inlining) by default.
>
> We've seen small gains on SPEC2006/2017 runtimes as well as lnt
compile-times with a 2nd stage bootstrap of LLVM. We also saw positive
gains on our internal workloads.
>
> -------------------------------------
> Brief description of Partial Inlining
> -------------------------------------
> A pass in opt that runs after the normal inlining pass. Looks for
branches to a return block in the entry and immediate successor blocks of a
function. If found, it outlines the rest of the function using the
CodeExtractor.

Since you mention outlining of code: Does this negatively affect the debug
info quality?

-- adrian

It's not merging anything together so line information is always preserved.
For dbg.declare/dbg.addr intrinsics it depends on if the allocas are
shrinkwrapped into the outlined function, otherwise the addr is replaced
with "metadata !{}". I'm not entirely sure on how dbg.value looks off the
top of my head. I haven't actually debugged partial-inlined code so I can't
say anything about loss of context from the outlining but those are some
observations from the code itself.

-- River Riddle

Hi Evgeny,

When you think the experiments on armv7m and armv6m targets will be complete? We’re looking to turn this on sooner rather than later, if there aren’t objections from folks running on other platforms.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifGraham Yiu—11/03/2017 12:40:10 PM—Hi Evgeny, Yes, please do. It was our hope that folks would verify the impact of the partial inline

Hi Graham,

I need two-three days to complete runs and compare results. As the runs are on bare-metal boards benchmarking takes more time than on hardware with OS.

Thanks,

Evgeny

image001.gif

Hi Graham,

I’ve almost finished my runs. However I’ve got couple compiler crashes:

!dbg attachment points at wrong subprogram for function

LLVM ERROR: Broken module found, compilation aborted!

This will take some time to investigate.

Thanks,

Evgeny Astigeevich

image001.gif

Thanks, Evgeny.

Let me know if there’s something in the partial inlining code that is causing the issue(s) you’re seeing.

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifEvgeny Astigeevich —11/08/2017 05:13:09 PM—Hi Graham, I’ve almost finished my runs. However I’ve got couple compiler crashes:

image001.gif

Hi Evgeny,

I just realized that if these are compile-time errors I can help investigate on my end. Do you have something I can use to reproduce?

Cheers,

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifGraham Yiu—11/08/2017 06:00:05 PM—Thanks, Evgeny. Let me know if there’s something in the partial inlining code that is causing the is

image001.gif

Hi Graham,

Thank you for offering help. I am trying to create a reproducer. The problem is that the crashes happen whilst LTO is used. One thing I am sure about IR is broken at compile time.

Thanks,

Evgeny

image001.gif

image002.gif

Hi Graham,

I’ve got results of benchmarking. Armv7m and armv6m are not affected. No changes in scores nor code sizes.

I did some additional benchmarks runs for AArch64 and AArch32.

LNT test suite, AArch32, Cortex-A57, -O3 -mcpu=cortex-a57 -mthumb -fomit-frame-pointer

image002.gif

image002.gif

Hi Graham,

I created a bug report with a reproducer for the failures I’ve got: https://bugs.llvm.org/show_bug.cgi?id=35288

I have also found that LTO reverts everything the partial inliner has done. Maybe the partial inliner should not be used at the first LTO phase (compilation).

I hope I’ll have a chance to look at the code size regressions this week.

Thanks,

Evgeny Astigeevich

image002.gif

image002.gif

Hi Evgeny,

I agree that we probably need to tweak when the partial inliner should run when using LTO/thinLTO. The easiest thing to do is likely to just disable partial inlining in the pre-LTO pass during compilation, so we don’t outline things that the LTO inliner will eventually inline again.

As for the code size increases you’re seeing, it’s not too surprising, though it would’ve been nice to see some performance speed-ups. Do you know if we just happen to increase the size of functions that are infrequently executed?

Graham Yiu
LLVM Compiler Development
IBM Toronto Software Lab
Office: (905) 413-4077 C2-707/8200/Markham
Email: gyiu@ca.ibm.com

graycol.gifEvgeny Astigeevich —11/13/2017 09:47:45 AM—Hi Graham, I created a bug report with a reproducer for the failures I’ve got: https://urldefense.pr

image002.gif

image002.gif

Hi Evgeny,

I agree that we probably need to tweak when the partial inliner should
run
when using LTO/thinLTO. The easiest thing to do is likely to just
disable
partial inlining in the pre-LTO pass during compilation, so we don't
outline things that the LTO inliner will eventually inline again.

As for the code size increases you're seeing, it's not too surprising,
though it would've been nice to see some performance speed-ups. Do you
know if we just happen to increase the size of functions that are
infrequently executed?

Yes, disabling the partial inliner in the pre-LTO phase is likely the
right choice. Would be great to get a patch that implements this in.

Best,
Tobias