Interprocedural DSE for -ftrivial-auto-var-init

Hi JF,

I’ve heard that you are interested DSE improvements and maybe we need to be in sync.

So far I experimented with following DSE improvements:

  • Cross-block DSE, it eliminates additional 7% stores comparing to existing DSE. But it’s not visible on benchmarks.

  • Cross-block + Interprocedural analysis to annotate each function argument with:

  • can read before write
  • will always write

This annotations gets me 20% stores deleted additional to the current DSE.

This is on LLVM codebase with -ftrivial-auto-var-init=patter.

As-is it’s less than I expected, so I would like to find good benchmark to decide if we should work to make production code from my experiment.

So now I am also planing to try to extend that to whole program analysis.

I will cleanup my code and upload this during this weak, if anyone wants to try.

Vitaly.

This is great! I’ll try out the patches when you post them, and see if it resolves the issues I’d been seeing. I don’t think we need benchmark gains fro this to be worthwhile since variable auto-init adds slightly unusual code. I think it’s aggravating cases where current DSE was failing in innocuous ways.

Hi JF,

   I've heard that you are interested DSE improvements and maybe we need to be in sync.
   So far I experimented with following DSE improvements:

* Cross-block DSE, it eliminates additional 7% stores comparing to existing DSE. But it's not visible on benchmarks.

I take it you couldn’t see any runtime impact? If there’s code size improvements that could also be useful, CTMark in the llvm test suite is a useful subset of benchmarks to check this on (as a baseline use -Os to compare code size).

Thanks,
Amara

>
> Hi JF,
>
> I've heard that you are interested DSE improvements and maybe we need to be in sync.
> So far I experimented with following DSE improvements:
>
> * Cross-block DSE, it eliminates additional 7% stores comparing to existing DSE. But it's not visible on benchmarks.
I take it you couldn’t see any runtime impact? If there’s code size improvements that could also be useful, CTMark in the llvm test suite is a useful subset of benchmarks to check this on (as a baseline use -Os to compare code size).

Thanks,
Amara
>
> * Cross-block + Interprocedural analysis to annotate each function argument with:
> - can read before write
> - will always write
> This annotations gets me 20% stores deleted additional to the current DSE.

I believe we can only benefit from removing extra stores.
Hot functions in existing benchmarks are probably optimized good
enough already, but speeding up the long tail is also important.
Also, at least the repro in
https://bugs.llvm.org/show_bug.cgi?id=40527 has been extracted from a
real kernel benchmark (hackbench), where this extra store costed us
0.45%

I tried -Os and effect of new approach significantly increases.
I run regular DSE and immediately myDSE. With -Os myDSE removes more than 50% of DSE number.
Which is expected as -Os inlines less and regular DSE can’t remove over function call.

Can you post numbers for how many stores get eliminated from CTMark?

Sorry for delay, I was busy with other stuff.

CTMark results.

dse is the current DSE.
dsem is my experimental module level DSE.
dsem runs after dse, so it’s additionally deleted stores.

-O3
dse - Number of stores deleted 3033
dsem - Number of deleted writes 3148

-O3 -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 5618
dsem - Number of deleted writes 3840

-O3 -flto
dse - Number of stores deleted 3985
dsem - Number of deleted writes 3838

-O3 -flto -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 6461
dsem - Number of deleted writes 4215

-Os
dse - Number of stores deleted 1443
dsem - Number of deleted writes 1517

-Os -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 3951
dsem - Number of deleted writes 2259

-Oz
dse - Number of stores deleted 1072
dsem - Number of deleted writes 574

-Oz -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 3420
dsem - Number of deleted writes 1637

Sorry for delay, I was busy with other stuff.

CTMark results.

dse is the current DSE.
dsem is my experimental module level DSE.
dsem runs after dse, so it’s additionally deleted stores.

-O3
dse - Number of stores deleted 3033
dsem - Number of deleted writes 3148

-O3 -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 5618
dsem - Number of deleted writes 3840

-O3 -flto
dse - Number of stores deleted 3985
dsem - Number of deleted writes 3838

-O3 -flto -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 6461
dsem - Number of deleted writes 4215

-Os
dse - Number of stores deleted 1443
dsem - Number of deleted writes 1517

-Os -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 3951
dsem - Number of deleted writes 2259

-Oz
dse - Number of stores deleted 1072
dsem - Number of deleted writes 574

-Oz -ftrivial-auto-var-init=pattern
dse - Number of stores deleted 3420
dsem - Number of deleted writes 1637

This looks great! Do you have a patch ready to go?

I have dirty prof-of-concept patch. I am going to rewrite pieces of it during the May starting now.

Today it’s a new pass which does cross-block DSE, module DSE, and global DSE.
So far the module DSE is the most useful and probably easy integrate to existing DSE.

https://reviews.llvm.org/D61879

Great, thank you for getting this started!