llvm is getting slower, January edition

Michael_Zolotukhin · January 18, 2017, 2:02am

Hi,

Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

Configuration:
The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single source file, but still takes a noticeable time to compile, which makes it very convenient for this kind of experiments. The file was compiled with Os for arm64 on x86 host.

Results:
The attached PDF has a compile time graph, on which I marked points where compile time changed with a list of corresponding commits. A textual version of the list is available below, but I think it might be much harder to comprehend the data without the graph. A number in the end shows compile time change after the given commit:

1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs ignored. +1%
2. r241886: [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue. +1%
3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl. +2%
4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%
5. r247269: [ADT] Rewrite the StringRef::find implementation to be simpler... +1%
   r247240: [LPM] Use a map from analysis ID to immutable passes in the legacy pass manager... +3%
   r247264: Enable GlobalsAA by default. +1%
6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'. +2%
8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after `GroupByComplexity`; NFCI. +4%
9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
11. No data
12. r259252: AttributeSetImpl: Summarize existing function attributes in a bitset. -1%
    r259256: Add LoopSimplifyCFG pass. -2%
13. r262250: Enable LoopLoadElimination by default. +3%
14. r262839: Revert "Enable LoopLoadElimination by default". -3%
15. r263393: Remove PreserveNames template parameter from IRBuilder. -3%
16. r263595: Turn LoopLoadElimination on again. +3%
17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%
18. r268509: Do not disable completely loop unroll when optimizing for size. -34%
19. r269124: Loop unroller: set thresholds for optsize and minsize functions to zero. +50%
20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. -28%
22. r270881: Don't allocate in APInt::slt. NFC. -2%
    r270959: Don't allocate unnecessarily in APInt::operator[+-]. NFC. -1%
    r271020: Don't generate unnecessary signed ConstantRange during multiply. NFC. -3%
23. r271615: [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. +22%
24. r276942: Don't invoke getName() from Function::isIntrinsic(). -1%
    r277087: Revert "Don't invoke getName() from Function::isIntrinsic().", rL276942. +1%
25. r279585: [LoopUnroll] By default disable unrolling when optimizing for size.
26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
27. r289755: Make processing @llvm.assume more efficient by using operand bundles. +6%
28. r290086: Revert @llvm.assume with operator bundles (r289755-r289757). -6%

CompileTime.pdf (514 KB)

Sanjoy_Das1 · January 18, 2017, 5:41am

Hi Mikhail,

Thank you for doing this!

On a quick scan of just the SCEV commits, this already highlights some things:

- https://reviews.llvm.org/rL249802 has a 4% regression, when it was "NFCI".
- The 23% regression on https://reviews.llvm.org/rL251049 is also
very suspicious.

I'll take a closer look at both.

Is it possible to run some of this as a Jenkin's job? Running just
what you ran, and getting a graph that we can view at some URL will be
great.

-- Sanjoy

Michael_Zolotukhin · January 18, 2017, 7:40am

Hi,

Hi Mikhail,

Thank you for doing this!

On a quick scan of just the SCEV commits, this already highlights some things:

https://reviews.llvm.org/rL249802 has a 4% regression, when it was "NFCI”.

Yep, this surprised for me as well. Please let me know if you can reproduce it too.

The 23% regression on https://reviews.llvm.org/rL251049 is also
very suspicious.

Hmm, this should be r251048 (or at least go with it), which we’ve discussed some time ago.

I’ll take a closer look at both.

Thank you!

Is it possible to run some of this as a Jenkin’s job? Running just
what you ran, and getting a graph that we can view at some URL will be
great.

We have it on a green dragon now: [http://lab.llvm.org:8080/green/view/Compile%20Time/](http://lab.llvm.org:8080/green/view/Compile Time/)
See e.g. http://104.154.54.203/db_default/v4/nts/machine/1336 for Os ARM64 compile time data. Did you mean just that or something else? LNT should also automagically detect regressions and allow to track them (The links to the regressions list should be on the green dragon page. I CCed ChrisM who can provide more details about this). The main problem is that it was set up relatively recently, and thus it doesn’t have much history. For the graph that I sent I used our internal data and in some cases needed to manually apply/revert commits as we didn’t have enough data points.

Michael

mehdi_amini · January 18, 2017, 7:44am

This is a good opportunity to remind everyone about compile time tracking:

CTMark announcement: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107087.html
Jenkins job: [http://lab.llvm.org:8080/green/view/Compile%20Time/](http://lab.llvm.org:8080/green/view/Compile Time/)
LNT: http://104.154.54.203/db_default/v4/nts/recent_activity
tramp3d-v4 Os: http://104.154.54.203/db_default/v4/nts/graph?plot.0=1336.1604487.2&highlight_run=25082

mehdi_amini · January 18, 2017, 7:49am

Hi Mikhail,

Hi,

Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

Configuration:
The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single source file, but still takes a noticeable time to compile, which makes it very convenient for this kind of experiments. The file was compiled with Os for arm64 on x86 host.

Results:
The attached PDF has a compile time graph, on which I marked points where compile time changed with a list of corresponding commits. A textual version of the list is available below, but I think it might be much harder to comprehend the data without the graph. A number in the end shows compile time change after the given commit:

1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs ignored. +1%
2. r241886: [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue. +1%
3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl. +2%
4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%
5. r247269: [ADT] Rewrite the StringRef::find implementation to be simpler... +1%
  r247240: [LPM] Use a map from analysis ID to immutable passes in the legacy pass manager... +3%
  r247264: Enable GlobalsAA by default. +1%
6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'. +2%
8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after `GroupByComplexity`; NFCI. +4%
9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
11. No data
12. r259252: AttributeSetImpl: Summarize existing function attributes in a bitset. -1%
   r259256: Add LoopSimplifyCFG pass. -2%
13. r262250: Enable LoopLoadElimination by default. +3%
14. r262839: Revert "Enable LoopLoadElimination by default". -3%
15. r263393: Remove PreserveNames template parameter from IRBuilder. -3%
16. r263595: Turn LoopLoadElimination on again. +3%
17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%
18. r268509: Do not disable completely loop unroll when optimizing for size. -34%
19. r269124: Loop unroller: set thresholds for optsize and minsize functions to zero. +50%
20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. -28%
22. r270881: Don't allocate in APInt::slt. NFC. -2%
   r270959: Don't allocate unnecessarily in APInt::operator[+-]. NFC. -1%
   r271020: Don't generate unnecessary signed ConstantRange during multiply. NFC. -3%
23. r271615: [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. +22%
24. r276942: Don't invoke getName() from Function::isIntrinsic(). -1%
   r277087: Revert "Don't invoke getName() from Function::isIntrinsic().", rL276942. +1%
25. r279585: [LoopUnroll] By default disable unrolling when optimizing for size.
26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
27. r289755: Make processing @llvm.assume more efficient by using operand bundles. +6%
28. r290086: Revert @llvm.assume with operator bundles (r289755-r289757). -6%
<CompileTime.pdf>

This is an amazing set of data!

Disclaimer:
The data is specific for this particular test, so I could have skipped some commits affecting compile time on other workloads/configurations.
The data I have is not perfect, so I could have skipped some commits, even if they impacted compile-time on this test case.
Same commits might have a different impact on a different test/configuration, up to the opposite to the one listed.
I didn't mean to label any commits as 'good' or 'bad' by posting these numbers. It's expected that some commits increase compile time, we just need to be aware of it and avoid unnecessary slowdowns.

Conclusions:
Changes in optimization thresholds/cost-models usually have the biggest impact on compile time. However, usually they are well-assessed and trade-offs are discussed and agreed on.

My impression is that most of the time, they are well-assessed, discussed, and agreed on, bases solely on the “performance” expectation, without necessarily looking at the compile time impact.
For example, a change in a threshold in the loop unrolled may trigger a pattern that makes SCEV blowup later. Looking at this only from the "performance of the generated code” point of view is a mistake in my opinion, and hopefully a closer tracking like you’ve been doing will help preventing these situations. So thanks a lot for this!

Introducing a pass doesn't necessarily mean a compile time slowdown. Sometimes the total compile time might decrease because we're saving some work for later passes.
There are many commits, which individually have a low compile time impact, but together sum up to a noticeable slowdown.
Conscious efforts on reducing compile time definitely help - thanks everyone who's been working on this!

Thanks for reading, any comments or suggestions on how to make LLVM faster are welcome! I hope we'll see this graph going down this year

Looking forward for this!
Do you plan to generate a report like that frequently (weekly? Whenever you notice a regression?)

Thanks,

Michael_Zolotukhin · January 18, 2017, 10:51pm

Hi Mehdi,

Hi Mikhail,

Hi,

Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what’s going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

Configuration:
The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single source file, but still takes a noticeable time to compile, which makes it very convenient for this kind of experiments. The file was compiled with Os for arm64 on x86 host.

Results:
The attached PDF has a compile time graph, on which I marked points where compile time changed with a list of corresponding commits. A textual version of the list is available below, but I think it might be much harder to comprehend the data without the graph. A number in the end shows compile time change after the given commit:

r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs ignored. +1%

r241886: [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue. +1%

r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl. +2%

r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%

r247269: [ADT] Rewrite the StringRef::find implementation to be simpler… +1%
r247240: [LPM] Use a map from analysis ID to immutable passes in the legacy pass manager… +3%
r247264: Enable GlobalsAA by default. +1%

r247674: [GlobalsAA] Disable globals-aa by default. -1%

r248638: [SCEV] Reapply ‘Teach isLoopBackedgeGuardedByCond to exploit trip counts’. +2%

r249802: [SCEV] Call StrengthenNoWrapFlags after GroupByComplexity; NFCI. +4%

r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%

r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%

No data

r259252: AttributeSetImpl: Summarize existing function attributes in a bitset. -1%
r259256: Add LoopSimplifyCFG pass. -2%

r262250: Enable LoopLoadElimination by default. +3%

r262839: Revert “Enable LoopLoadElimination by default”. -3%

r263393: Remove PreserveNames template parameter from IRBuilder. -3%

r263595: Turn LoopLoadElimination on again. +3%

r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%

r268509: Do not disable completely loop unroll when optimizing for size. -34%

r269124: Loop unroller: set thresholds for optsize and minsize functions to zero. +50%

r269392: [LoopDist] Only run LAA for loops with the pragma. -4%

r270630: Re-enable “[LoopUnroll] Enable advanced unrolling analysis by default” one more time. -28%

r270881: Don’t allocate in APInt::slt. NFC. -2%
r270959: Don’t allocate unnecessarily in APInt::operator[±]. NFC. -1%
r271020: Don’t generate unnecessary signed ConstantRange during multiply. NFC. -3%

r271615: [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. +22%

r276942: Don’t invoke getName() from Function::isIntrinsic(). -1%
r277087: Revert “Don’t invoke getName() from Function::isIntrinsic().”, rL276942. +1%

r279585: [LoopUnroll] By default disable unrolling when optimizing for size.

r286814: [InlineCost] Remove skew when calculating call costs. +3%

r289755: Make processing @llvm.assume more efficient by using operand bundles. +6%

r290086: Revert @llvm.assume with operator bundles (r289755-r289757). -6%
<CompileTime.pdf>

This is an amazing set of data!

Thanks for the interest in this!

Disclaimer:
The data is specific for this particular test, so I could have skipped some commits affecting compile time on other workloads/configurations.
The data I have is not perfect, so I could have skipped some commits, even if they impacted compile-time on this test case.
Same commits might have a different impact on a different test/configuration, up to the opposite to the one listed.
I didn’t mean to label any commits as ‘good’ or ‘bad’ by posting these numbers. It’s expected that some commits increase compile time, we just need to be aware of it and avoid unnecessary slowdowns.

Conclusions:
Changes in optimization thresholds/cost-models usually have the biggest impact on compile time. However, usually they are well-assessed and trade-offs are discussed and agreed on.

My impression is that most of the time, they are well-assessed, discussed, and agreed on, bases solely on the “performance” expectation, without necessarily looking at the compile time impact.

Runtime performance has definitely been getting more attention, but I think people who changed heuristics usually looked at compile time too. In fact, I think changes that are not expected to affect performance much are more likely to go in without thorough compile time testing. Hopefully, improved regular tracking will help to detect such undesired side-effects in future.

For example, a change in a threshold in the loop unrolled may trigger a pattern that makes SCEV blowup later. Looking at this only from the "performance of the generated code” point of view is a mistake in my opinion, and hopefully a closer tracking like you’ve been doing will help preventing these situations.

That’s true, but I think it’s an accepted requirement for such sort of changes to provide compile-time testing results as well.

So thanks a lot for this!

Introducing a pass doesn’t necessarily mean a compile time slowdown. Sometimes the total compile time might decrease because we’re saving some work for later passes.
There are many commits, which individually have a low compile time impact, but together sum up to a noticeable slowdown.
Conscious efforts on reducing compile time definitely help - thanks everyone who’s been working on this!

Thanks for reading, any comments or suggestions on how to make LLVM faster are welcome! I hope we’ll see this graph going down this year

Looking forward for this!
Do you plan to generate a report like that frequently (weekly? Whenever you notice a regression?)

I didn’t plan to send such a report regularly, but if I find something interesting, I’ll definitely share it. Also, it will make sense to compare releases, so that’s what I’ll probably do as well.

Michael

Davide_Italiano · January 18, 2017, 10:55pm

This is great, thanks for the January update
Do you mind to share how you collected the numbers (script etc.. and
how you plotted the graph so I can try repeating at home with my
testcases?)

Thanks,

jroelofs · January 18, 2017, 11:21pm

Hi,

Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

Configuration:
The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single source file, but still takes a noticeable time to compile, which makes it very convenient for this kind of experiments. The file was compiled with Os for arm64 on x86 host.

Results:
The attached PDF has a compile time graph, on which I marked points where compile time changed with a list of corresponding commits. A textual version of the list is available below, but I think it might be much harder to comprehend the data without the graph. A number in the end shows compile time change after the given commit:

1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs ignored. +1%
2. r241886: [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue. +1%
3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl. +2%
4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%
5. r247269: [ADT] Rewrite the StringRef::find implementation to be simpler... +1%
   r247240: [LPM] Use a map from analysis ID to immutable passes in the legacy pass manager... +3%
   r247264: Enable GlobalsAA by default. +1%
6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'. +2%
8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after `GroupByComplexity`; NFCI. +4%
9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
11. No data
12. r259252: AttributeSetImpl: Summarize existing function attributes in a bitset. -1%
    r259256: Add LoopSimplifyCFG pass. -2%
13. r262250: Enable LoopLoadElimination by default. +3%
14. r262839: Revert "Enable LoopLoadElimination by default". -3%
15. r263393: Remove PreserveNames template parameter from IRBuilder. -3%
16. r263595: Turn LoopLoadElimination on again. +3%
17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%
18. r268509: Do not disable completely loop unroll when optimizing for size. -34%
19. r269124: Loop unroller: set thresholds for optsize and minsize functions to zero. +50%
20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. -28%
22. r270881: Don't allocate in APInt::slt. NFC. -2%
    r270959: Don't allocate unnecessarily in APInt::operator[+-]. NFC. -1%
    r271020: Don't generate unnecessary signed ConstantRange during multiply. NFC. -3%
23. r271615: [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. +22%
24. r276942: Don't invoke getName() from Function::isIntrinsic(). -1%
    r277087: Revert "Don't invoke getName() from Function::isIntrinsic().", rL276942. +1%
25. r279585: [LoopUnroll] By default disable unrolling when optimizing for size.
26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
27. r289755: Make processing @llvm.assume more efficient by using operand bundles. +6%
28. r290086: Revert @llvm.assume with operator bundles (r289755-r289757). -6%

Disclaimer:
The data is specific for this particular test, so I could have skipped some commits affecting compile time on other workloads/configurations.
The data I have is not perfect, so I could have skipped some commits, even if they impacted compile-time on this test case.
Same commits might have a different impact on a different test/configuration, up to the opposite to the one listed.
I didn't mean to label any commits as 'good' or 'bad' by posting these numbers. It's expected that some commits increase compile time, we just need to be aware of it and avoid unnecessary slowdowns.

Conclusions:
Changes in optimization thresholds/cost-models usually have the biggest impact on compile time. However, usually they are well-assessed and trade-offs are discussed and agreed on.
Introducing a pass doesn't necessarily mean a compile time slowdown. Sometimes the total compile time might decrease because we're saving some work for later passes.
There are many commits, which individually have a low compile time impact, but together sum up to a noticeable slowdown.
Conscious efforts on reducing compile time definitely help - thanks everyone who's been working on this!

Thanks for reading, any comments or suggestions on how to make LLVM faster are welcome! I hope we'll see this graph going down this year

Michael

This is great, thanks for the January update
Do you mind to share how you collected the numbers (script etc.. and
how you plotted the graph so I can try repeating at home with my
testcases?)

Out of pure curiosity, I would love to see the performance of the resulting binary co-plotted with the same horizontal axis as this compile duration data.

Jon

Michael_Zolotukhin · January 18, 2017, 11:35pm

LNT doesn’t allow to plot them on the same graph, so that’s I was able to just align them one with the other:

PerformanceAndCompileTime.pdf (794 KB)

Michael_Zolotukhin · January 18, 2017, 11:48pm

Hi,

Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

Configuration:
The test I used is tramp3d-v4 from LLVM testsuite. It consists of a single source file, but still takes a noticeable time to compile, which makes it very convenient for this kind of experiments. The file was compiled with Os for arm64 on x86 host.

Results:
The attached PDF has a compile time graph, on which I marked points where compile time changed with a list of corresponding commits. A textual version of the list is available below, but I think it might be much harder to comprehend the data without the graph. A number in the end shows compile time change after the given commit:

1. r239821: [InstSimplify] Allow folding of fdiv X, X with just NaNs ignored. +1%
2. r241886: [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue. +1%
3. r245118: [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl. +2%
4. r246694: [RemoveDuplicatePHINodes] Start over after removing a PHI. -1%
5. r247269: [ADT] Rewrite the StringRef::find implementation to be simpler... +1%
  r247240: [LPM] Use a map from analysis ID to immutable passes in the legacy pass manager... +3%
  r247264: Enable GlobalsAA by default. +1%
6. r247674: [GlobalsAA] Disable globals-aa by default. -1%
7. r248638: [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'. +2%
8. r249802: [SCEV] Call `StrengthenNoWrapFlags` after `GroupByComplexity`; NFCI. +4%
9. r250157: [GlobalsAA] Turn GlobalsAA on again by default. +1%
10. r251049: [SCEV] Mark AddExprs as nsw or nuw if legal. +23%
11. No data
12. r259252: AttributeSetImpl: Summarize existing function attributes in a bitset. -1%
   r259256: Add LoopSimplifyCFG pass. -2%
13. r262250: Enable LoopLoadElimination by default. +3%
14. r262839: Revert "Enable LoopLoadElimination by default". -3%
15. r263393: Remove PreserveNames template parameter from IRBuilder. -3%
16. r263595: Turn LoopLoadElimination on again. +3%
17. r267672: [LoopDist] Add llvm.loop.distribute.enable loop metadata. +4%
18. r268509: Do not disable completely loop unroll when optimizing for size. -34%
19. r269124: Loop unroller: set thresholds for optsize and minsize functions to zero. +50%
20. r269392: [LoopDist] Only run LAA for loops with the pragma. -4%
21. r270630: Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. -28%
22. r270881: Don't allocate in APInt::slt. NFC. -2%
   r270959: Don't allocate unnecessarily in APInt::operator[+-]. NFC. -1%
   r271020: Don't generate unnecessary signed ConstantRange during multiply. NFC. -3%
23. r271615: [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. +22%
24. r276942: Don't invoke getName() from Function::isIntrinsic(). -1%
   r277087: Revert "Don't invoke getName() from Function::isIntrinsic().", rL276942. +1%
25. r279585: [LoopUnroll] By default disable unrolling when optimizing for size.
26. r286814: [InlineCost] Remove skew when calculating call costs. +3%
27. r289755: Make processing @llvm.assume more efficient by using operand bundles. +6%
28. r290086: Revert @llvm.assume with operator bundles (r289755-r289757). -6%

Disclaimer:
The data is specific for this particular test, so I could have skipped some commits affecting compile time on other workloads/configurations.
The data I have is not perfect, so I could have skipped some commits, even if they impacted compile-time on this test case.
Same commits might have a different impact on a different test/configuration, up to the opposite to the one listed.
I didn't mean to label any commits as 'good' or 'bad' by posting these numbers. It's expected that some commits increase compile time, we just need to be aware of it and avoid unnecessary slowdowns.

Conclusions:
Changes in optimization thresholds/cost-models usually have the biggest impact on compile time. However, usually they are well-assessed and trade-offs are discussed and agreed on.
Introducing a pass doesn't necessarily mean a compile time slowdown. Sometimes the total compile time might decrease because we're saving some work for later passes.
There are many commits, which individually have a low compile time impact, but together sum up to a noticeable slowdown.
Conscious efforts on reducing compile time definitely help - thanks everyone who's been working on this!

Thanks for reading, any comments or suggestions on how to make LLVM faster are welcome! I hope we'll see this graph going down this year

Michael

This is great, thanks for the January update
Do you mind to share how you collected the numbers (script etc.. and
how you plotted the graph so I can try repeating at home with my
testcases?)

It involved a lot of manual work, so I'm not sure there is anything to share.
For the graph I just used LNT and some madskills to mark the points of interest. Then I checked out LLVM from the date I wanted to check (near jumps in the graph), built it, and ran the test 20 times to verify the change and find responsible for the change commit. As I said, a lot of manual work, but we're working on some infrastructure to automate some of this though.

Michael

PS: If you haven't used LNT before, then definitely try using it - at least it'll take care of plotting graphs. If you need any guidance on this part, I can try to help.

_sean_silva · January 19, 2017, 6:00am

Am I reading this right that over the course of the graph we have gotten about 50% slower compiling this benchmark, and the execution time of the benchmark has tripled? Those are significant regressions along both dimensions.

– Sean Silva

kbeyls · January 19, 2017, 10:23am

I hope that the documentation at http://lnt.llvm.org/contents.html is enough to get you going. It definitely is a bit rough and could use polishing.
If you have any feedback on what’s most confusing about that documentation, or what’s lacking the most, I’m interested to hear it and try to improve the documentation there based on the feedback.
FWIW, in a few weeks at FOSDEM, I’ll be demonstrating how we’re using LNT to do performance tracking, see https://fosdem.org/2017/schedule/event/lnt/.

Thanks,

Kristof

mehdi_amini · January 19, 2017, 5:39pm

Hi,

On this topic, I just tried to build ToT with clang-3.9.1 and clang-4.0 and the total time to complete `ninja clang` on this machine went from 12m54s to 13m44s for RelWithDebInfo (6.5% slower!) and 11m18s to 12m06s for Release (7% slower!).

Michael_Zolotukhin · January 19, 2017, 11:40pm

Am I reading this right that over the course of the graph we have gotten about 50% slower compiling this benchmark, and the execution time of the benchmark has tripled? Those are significant regressions along both dimensions.

This reading is correct, yes (though I haven’t rechecked performance numbers, I only verified compile-time results).

Michael

Jeremy_Lakeman1 · January 20, 2017, 12:22am

Ah but how did you compile the clang-4.0 you were using? Does it run faster if you compile it with clang-4.0?

mehdi_amini · January 20, 2017, 12:23am

Both clang-3.9.1 and clang-4.0.0 were built with the same compiler and the same options (ThinLTO Release build, without PGO).

mehdi_amini · January 20, 2017, 12:24am

Both clang-3.9.1 and clang-4.0.0 were built with the same compiler and the same options (ThinLTO Release build, without PGO).

(Same compiler being clang-4.0, itself bootstrapped with ThinLTO)

Topic		Replies	Views
llvm and clang are getting slower LLVM Dev List Archives	37	145	April 1, 2016
Recent compile time performance regressions LLVM Dev List Archives	9	109	August 12, 2014
LLVM compile speed significantly slower than GCC (w/ test case) LLVM Dev List Archives	5	110	July 22, 2012
LLVM is getting faster, April edition LLVM Dev List Archives	10	125	April 19, 2017
llvm (the middle-end) is getting slower, December edition LLVM Dev List Archives	32	107	December 22, 2016

llvm is getting slower, January edition

Related Topics