Optimization remarks for non-temporal stores

Hi,

There was a recent discussion on generating non-temporal stores automatically[1]. This is a hard problem to get right. What seems like a much easier but still useful problem to solve is to provide diagnostics to the user where NT stores may help. Then the user can add the corresponding builtins and see if they are beneficial.

My hope is that with the work to add profile-driven optimization diagnostics[2] we could make the number of false positives manageable (eliminate cold loops or hot low-trip count loops, etc.).

On the other hand this requires an interesting new class of optimization remarks. We normally have remarks for optimizations that we automatically perform. Opt remarks report success, failure+reason for automatically performed optimizations. E…g -Rpass-analysis=loop-vectorize will report why loops weren’t vectorized. In this case however we don’t really have a corresponding optimization.

Note that NT stores is just one example here. This is a general problem. We will probably have the same problem when we try to report data-layout transformation opportunities, etc.

I see three ways to solve this:

  1. Report it from an existing, similar pass. In this case LoopDataPrefetch may be a candidate. It would still be somewhat strange to ask the user to pass -Rpass-analysis=loop-data-perfetch to see NT store opportunities but may be not too bad.

  2. Report it as part of the corresponding analysis. This would probably be LoopAccessAnalysis in this case. We would have to make sure that we only report once even if the analysis is run multiple times. (I am not sure if we had to already solve this for the current set of opt remarks or whether the front-end already has this capability.)

  3. Add a new (all-analysis-preserving) pass where we emit this and perhaps other remarks like this. It is sort of read-only pass whose sole purpose is to present optimization suggestions to the user.

I think the best is 3 but it could also be considered an overkill at this early stage.

We may also want a new subclass for this type of diagnostics. Something like DiagnosticInfoOptimizationRemarkSuggestions, in case we want to filter these differently from Remark/Analysis/Missed.

Please let me know if you have any comments or suggestions.

Thanks,
Adam

[1] http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98232
[2] http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334

From: "Adam Nemet" <anemet@apple.com>
To: "llvm-dev (llvm-dev@lists.llvm.org)" <llvm-dev@lists.llvm.org>
Cc: "Hal Finkel" <hfinkel@anl.gov>
Sent: Wednesday, May 18, 2016 3:41:48 PM
Subject: Optimization remarks for non-temporal stores

Hi,

There was a recent discussion on generating non-temporal stores
automatically[1]. This is a hard problem to get right. What seems
like a much easier but still useful problem to solve is to provide
diagnostics to the user where NT stores *may* help. Then the user
can add the corresponding builtins and see if they are beneficial.

My hope is that with the work to add profile-driven optimization
diagnostics[2] we could make the number of false positives
manageable (eliminate cold loops or hot low-trip count loops, etc.).

This seems like a really interesting idea.

On the other hand this requires an interesting new class of
optimization remarks. We normally have remarks for optimizations
that we automatically perform. Opt remarks report success,
failure+reason for automatically performed optimizations. E..g
-Rpass-analysis=loop-vectorize will report why loops weren’t
vectorized. In this case however we don’t really have a
corresponding optimization.

Note that NT stores is just one example here. This is a general
problem. We will probably have the same problem when we try to
report data-layout transformation opportunities, etc.

I see three ways to solve this:

1. Report it from an existing, similar pass. In this case
LoopDataPrefetch may be a candidate. It would still be somewhat
strange to ask the user to pass -Rpass-analysis=loop-data-perfetch
to see NT store opportunities but may be not too bad.

2. Report it as part of the corresponding analysis. This would
probably be LoopAccessAnalysis in this case. We would have to make
sure that we only report once even if the analysis is run multiple
times. (I am not sure if we had to already solve this for the
current set of opt remarks or whether the front-end already has this
capability.)

3. Add a new (all-analysis-preserving) pass where we emit this and
perhaps other remarks like this. It is sort of read-only pass whose
sole purpose is to present optimization suggestions to the user.

I think the best is 3 but it could also be considered an overkill at
this early stage.

I think this makes sense. I have a student who is currently making good progress on a similar recommendation pass for OpenMP pragmas (or, more generally, loops that could be worth executing in parallel) -- also involving a combination of profiling data with other analysis -- and having a real infrastructure for this (i.e. a good place to put this) seems like it would be quite helpful.

We may also want a new subclass for this type of diagnostics.
Something like DiagnosticInfoOptimizationRemarkSuggestions, in case
we want to filter these differently from Remark/Analysis/Missed.

Agreed.

Thanks again,
Hal