RFC - Adding an optimization report facility?

The context of this is performance analysis of generated code. My interest is to trace at a high-level the major decisions done by the various optimizers. For instance, when the inliner decides to inline foo into bar, or the loop unroller decides to unroll a loop N times, or the vectorizer decides to vectorize a loop body.

Many of these details are usually available via -debug-only. However, this has several limitations:

  1. The output is generally too detailed. Passes will often emit the result of their analysis, what failed, what worked, etc. This is often fine when debugging the pass itself, but it’s too noisy for initial analysis.
  2. The output is unstructured and it often uses pass-specific lingo which may confuse someone who just started looking at it.
  3. It forces a debug+asserts build (or at least a build with -UNDEBUG).
  4. Only one pass at a time can be asked to produce debugging output.
    Additionally, I don’t think it’s right to co-opt the -debug-only facility for implementing optimization reports. They have different purposes. An optimization report should simply state what the pass did in fairly terse way. This facilitates initial and comparative analysis. If I want to compare what one compiler did versus the current version, it would be easy to spot what decisions were made by each one.

Clearly, the quality of the information will depend on how many decisions are reported. Not every difference in performance will be detected by comparing optimization reports. But major differences are often due to major passes making slightly different decisions (e.g., the inliner).

My intent is to introduce an optimization report option to LLVM which passes will be able to use to indicate the major decisions they make. Initially, I am tempted to mimic GCC’s -fopt-info (http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#index-fopt-info-747).

Before I get too far into this, do folks think this is a good idea? I’m open to reasonable requests on how the facility should work, etc.

Thanks. Diego.

Yes, and see Tobias's recent work to add "remark" diagnostics to LLVM and
Clang. I think it is targeted at precisely these kinds of use cases.

The context of this is performance analysis of generated code. My interest
is to trace at a high-level the major decisions done by the various
optimizers. For instance, when the inliner decides to inline foo into bar,
or the loop unroller decides to unroll a loop N times, or the vectorizer
decides to vectorize a loop body.

Many of these details are usually available via -debug-only. However, this
has several limitations:

The output is generally too detailed. Passes will often emit the result of
their analysis, what failed, what worked, etc. This is often fine when
debugging the pass itself, but it's too noisy for initial analysis.
The output is unstructured and it often uses pass-specific lingo which may
confuse someone who just started looking at it.
It forces a debug+asserts build (or at least a build with -UNDEBUG).
Only one pass at a time can be asked to produce debugging output.

Additionally, I don't think it's right to co-opt the -debug-only facility
for implementing optimization reports. They have different purposes. An
optimization report should simply state what the pass did in fairly terse
way. This facilitates initial and comparative analysis. If I want to compare
what one compiler did versus the current version, it would be easy to spot
what decisions were made by each one.

Clearly, the quality of the information will depend on how many decisions
are reported. Not every difference in performance will be detected by
comparing optimization reports. But major differences are often due to major
passes making slightly different decisions (e.g., the inliner).

My intent is to introduce an optimization report option to LLVM which passes
will be able to use to indicate the major decisions they make. Initially, I
am tempted to mimic GCC's -fopt-info
(Debugging Options (Using the GNU Compiler Collection (GCC))).

Before I get too far into this, do folks think this is a good idea? I'm open
to reasonable requests on how the facility should work, etc.

Yes please.

-eric

From: "Chandler Carruth" <chandlerc@google.com>
To: "Diego Novillo" <dnovillo@google.com>, "Tobias Grosser" <tobias@grosser.es>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, March 6, 2014 1:11:30 PM
Subject: Re: [LLVMdev] RFC - Adding an optimization report facility?

Before I get too far into this, do folks think this is a good idea?
I'm open to reasonable requests on how the facility should work,
etc.
Yes, and see Tobias's recent work to add "remark" diagnostics to LLVM
and Clang. I think it is targeted at precisely these kinds of use
cases.

+1

-Hal

Hi Diego,

as others already pointed out, I committed the first piece of such an infrastructure in LLVM commit 202474 and clang commit 202475. This is
mostly the backend and printing infrastructure.

This should already be functional, even though Alp pointed out there are still some open issues with the c-bindings and the verify diagnostic consumer (I still need to look into those). The command line options to enable this are currently working, but still very basic. Several people had ideas how to improve on this, but we did not yet agree on a solution. If you are interested in contributing here, that would be great. I have myself a patch on the clang-commits mailing list ('tblgen: Modularize the diagnostic emitter') which prepares the command line infrastructure for possible changes, but the more complicated issue of finding a good interface without increasing complexity is still open. Also, we still need more in-tree users to
better test the infrastructure and to get an idea of what kind of reports we would like to emit (which again can help shaping the command line interface). Your thoughts and contributions are highly appreciated.

Cheers,
Tobias

as others already pointed out, I committed the first piece of such an

infrastructure in LLVM commit 202474 and clang commit 202475. This is
mostly the backend and printing infrastructure.

Thanks, Tobias. I've browsed the two patches and I think they're going to
be exactly what I need. IIUC, the patches add two new LLVM instructions
remark and note. These are inserted in the IL by the passes and the
compiler emits them as diagnostic if the right -W flag is enabled?

One question I have from the Clang patch. If I compile with -Weverything,
will this enable all warnings *and* remarks? In this context, I only want
to enable all remarks. In fact, I want to enable a family of remarks: the
optimization remarks.

There will be other modifiers to these remarks as well:

   1. Report *missed* optimizations, instead of the successful ones.
   2. Increase verbosity of the report. This would be done using note
   nodes, I expect. But we may want varying degrees of verbosity.
   3. Group families of optimizations. For example, i want to report on all
   loop-related optimizations.
   4. IIRC, GCC's -fopt-info will also allow you to collect the reports
   into a separate text file. Not sure how useful I find this feature myself.

Thanks. Diego.

as others already pointed out, I committed the first piece of such an

infrastructure in LLVM commit 202474 and clang commit 202475. This is
mostly the backend and printing infrastructure.

Thanks, Tobias. I've browsed the two patches and I think they're going to
be exactly what I need. IIUC, the patches add two new LLVM instructions
remark and note. These are inserted in the IL by the passes and the
compiler emits them as diagnostic if the right -W flag is enabled?

One question I have from the Clang patch. If I compile with -Weverything,
will this enable all warnings *and* remarks? In this context, I only want
to enable all remarks. In fact, I want to enable a family of remarks: the
optimization remarks.

There will be other modifiers to these remarks as well:

   1. Report *missed* optimizations, instead of the successful ones.
   2. Increase verbosity of the report. This would be done using note
   nodes, I expect. But we may want varying degrees of verbosity.
   3. Group families of optimizations. For example, i want to report on
   all loop-related optimizations.
   4. IIRC, GCC's -fopt-info will also allow you to collect the reports
   into a separate text file. Not sure how useful I find this feature myself.

Re. 4, to name a few that can be handy sometimes: 1) not contaminate
stderr; 2) per TU report; 3) Per opt-group report etc.

Also in GCC's original design, per-pass report filtering was supported, but
that was considered too developer oriented.

David

Ah, thanks. The stderr and per-tu report thing is handy. Your #3 seems
similar to my #3 (group families of optimizations).

Diego.

as others already pointed out, I committed the first piece of such an

infrastructure in LLVM commit 202474 and clang commit 202475. This is
mostly the backend and printing infrastructure.

Thanks, Tobias. I've browsed the two patches and I think they're going
to be exactly what I need. IIUC, the patches add two new LLVM instructions
remark and note. These are inserted in the IL by the passes and the
compiler emits them as diagnostic if the right -W flag is enabled?

One question I have from the Clang patch. If I compile with
-Weverything, will this enable all warnings *and* remarks? In this context,
I only want to enable all remarks. In fact, I want to enable a family of
remarks: the optimization remarks.

There will be other modifiers to these remarks as well:

   1. Report *missed* optimizations, instead of the successful ones.
   2. Increase verbosity of the report. This would be done using note
   nodes, I expect. But we may want varying degrees of verbosity.
   3. Group families of optimizations. For example, i want to report on
   all loop-related optimizations.
   4. IIRC, GCC's -fopt-info will also allow you to collect the reports
   into a separate text file. Not sure how useful I find this feature myself.

Re. 4, to name a few that can be handy sometimes: 1) not contaminate
stderr; 2) per TU report; 3) Per opt-group report etc.

Ah, thanks. The stderr and per-tu report thing is handy. Your #3 seems
similar to my #3 (group families of optimizations).

for 3), I mean you can redirect optimization report in this way:

$COMPILER -fopt-info-loop=a.c.loop.report -fopt-info-vect=a.c.vect.report
   ... -c a.c

David

as others already pointed out, I committed the first piece of such an

infrastructure in LLVM commit 202474 and clang commit 202475. This is
mostly the backend and printing infrastructure.

Thanks, Tobias. I've browsed the two patches and I think they're going to
be exactly what I need. IIUC, the patches add two new LLVM instructions
remark and note. These are inserted in the IL by the passes and the
compiler emits them as diagnostic if the right -W flag is enabled?

Close.

The patch adds one new LLVM diagnostic called 'remark'. 'note' has already been available to add extended information to warnings, but it could not appear independently. This still holds, but 'notes' can now be used to add additional information to remarks, too.

At the moment, remarks can similar to warnings be in a diagnostic group which is enabled with a '-Wgroupname' flag.

One question I have from the Clang patch. If I compile with -Weverything,
will this enable all warnings *and* remarks?

Yes, currently this enables all diagnostics, which means all warnings and remarks.

In this context, I only want
to enable all remarks.

I can see that. How to do this is really open for discussion. I could see us establishing two new flags '-Wall-remarks' for remarks and '-Wall-warnings' for warnings. Or, what I prefer, we could establish a new flag hierarchy e.g. with '-Reverything'.

In fact, I want to enable a family of remarks: the
optimization remarks.

There will be other modifiers to these remarks as well:

    1. Report *missed* optimizations, instead of the successful ones.
    2. Increase verbosity of the report. This would be done using note
    nodes, I expect. But we may want varying degrees of verbosity.
    3. Group families of optimizations. For example, i want to report on all
    loop-related optimizations.
    4. IIRC, GCC's -fopt-info will also allow you to collect the reports
    into a separate text file. Not sure how useful I find this feature myself.

5. Enable/Disable remarks with pragmas / per function

Yes, there are a lot of things to do here. The diagnostic infrastructure
already provides great features, e.g.:

   - Enabling/Disabling diagnostics with pragmas and per function
   - Nested diagnostic groups
   - Enabling a diagnostic group, but disabling individual diagnostics
     or this group.
   - Good integration in tools through libclang
   - ...

However, the command line interface might need to be improved before we start adding a large number of remarks. Some goals I see:

1) We want to reuse as much of the diagnostic infrastructure as possible
2) We do not want to increase the complexity of the warning flags too much
3) We still want to have a similar user interface as the warning flags provide
4) People raised interest in upgrading remarks to warnings or errors
    (I am personally not too sure about this one, but it already works
     with -Werror=groupname)

Several people (including Chris) warned to build up an additional complex diagnostic system on the side, but as Chandler pointed out, we can most likely reuse most of the existing infrastructure but could still adjust the user interface if needed.

I personally believe the best approach is to start small and let the design be driven by the intended use cases. If you are interested, I would propose to add a single diagnostic for the loop inliner using just the existing '-W' infrastructure e.g. with '-Winline-remarkers'.
This both increases test coverage of the exiting features and also
will help to get an idea of the needed features. At the same time, you could help reviewing my first patch on modularizing the clang diagnostic infrastructure. Again, something that may help to get an idea of where to head at this side.

Cheers,
Tobias

The context of this is performance analysis of generated code. My interest is to trace at a high-level the major decisions done by the various optimizers. For instance, when the inliner decides to inline foo into bar, or the loop unroller decides to unroll a loop N times, or the vectorizer decides to vectorize a loop body.

Before I get too far into this, do folks think this is a good idea? I’m open to reasonable requests on how the facility should work, etc.

This is a great idea, and many people would welcome it. Please write up a concrete proposals about how this will work, and we can iterate on that though.

My intent is to introduce an optimization report option to LLVM which passes will be able to use to indicate the major decisions they make. Initially, I am tempted to mimic GCC’s -fopt-info (http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#index-fopt-info-747).

I’m not sure if this is the best design or not (having never used it) - what feedback have you heard from (non-compiler-hacker) people trying to use it?

IMO, the hard part of doing something like this is getting the user experience right. It does you no good to say “hey I unrolled a loop” if you don’t have enough location information to tell the user which loop got unrolled. The key is to give them actionable information, not just the output of -debug :slight_smile:

-Chris

From: "Chris Lattner" <clattner@apple.com>
To: "Diego Novillo" <dnovillo@google.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, March 6, 2014 5:54:02 PM
Subject: Re: [LLVMdev] RFC - Adding an optimization report facility?

The context of this is performance analysis of generated code. My
interest is to trace at a high-level the major decisions done by the
various optimizers. For instance, when the inliner decides to inline
foo into bar, or the loop unroller decides to unroll a loop N times,
or the vectorizer decides to vectorize a loop body. ...

Before I get too far into this, do folks think this is a good idea?
I'm open to reasonable requests on how the facility should work,
etc.

This is a great idea, and many people would welcome it. Please write
up a concrete proposals about how this will work, and we can iterate
on that though.

My intent is to introduce an optimization report option to LLVM which
passes will be able to use to indicate the major decisions they
make. Initially, I am tempted to mimic GCC's -fopt-info (
Debugging Options (Using the GNU Compiler Collection (GCC))
).

I'm not sure if this is the best design or not (having never used it)
- what feedback have you heard from (non-compiler-hacker) people
trying to use it?

IMO, the hard part of doing something like this is getting the user
experience right. It does you no good to say "hey I unrolled a loop"
if you don't have enough location information to tell the user
*which* loop got unrolled. The key is to give them actionable
information, not just the output of -debug :slight_smile:

My suggestion is that we start attaching loop-id metadata to loops in the frontend, and then also start attaching 'srcloc' metadata, just like we do for inline asm statements. This way we can pass back the information we need to the frontend for it to identify the loop without too much trouble. There may be a better long-term design, but this seems, at least, like an easy thing to do in the short term.

-Hal

The context of this is performance analysis of generated code. My interest
is to trace at a high-level the major decisions done by the various
optimizers. For instance, when the inliner decides to inline foo into bar,
or the loop unroller decides to unroll a loop N times, or the vectorizer
decides to vectorize a loop body.

...

Before I get too far into this, do folks think this is a good idea? I'm
open to reasonable requests on how the facility should work, etc.

This is a great idea, and many people would welcome it. Please write up a
concrete proposals about how this will work, and we can iterate on that
though.

My intent is to introduce an optimization report option to LLVM which
passes will be able to use to indicate the major decisions they make.
Initially, I am tempted to mimic GCC's -fopt-info (
Debugging Options (Using the GNU Compiler Collection (GCC))
).

I'm not sure if this is the best design or not (having never used it) -
what feedback have you heard from (non-compiler-hacker) people trying to
use it?

IMO, the hard part of doing something like this is getting the user
experience right. It does you no good to say "hey I unrolled a loop" if
you don't have enough location information to tell the user *which* loop
got unrolled. The key is to give them actionable information, not just the
output of -debug :slight_smile:

yep, the information will be useless without source/location information.

David

I seconds Diego's notion that this would be incredibly useful.

Hal's suggestion of attaching loop-id metadata and 'srcloc' data to candidate loops in the frontend is also sane. I've seen several older vector compilers do this in order to generate simple compile-time vector reports. {loads, stores, ops, idioms, ops-under-mask, etc}. They turn out to be incredibly useful for users. It would be an interesting bit of research to come up with a relatively standard reporting format for scalar, vector [or simd] and mimd constructs [such as openmp].

cheers
john

In this context, I only want

to enable all remarks.

I can see that. How to do this is really open for discussion. I could see
us establishing two new flags '-Wall-remarks' for remarks and
'-Wall-warnings' for warnings. Or, what I prefer, we could establish a new
flag hierarchy e.g. with '-Reverything'.

I quite like the notion of a new flag hierarchy. Using -W seems like
semantic overload. -W already implies diagnostics about dubious constructs
in the code. These notices are pure reporting on optimization activities.

5. Enable/Disable remarks with pragmas / per function

Yes, there are a lot of things to do here. The diagnostic infrastructure
already provides great features, e.g.:

  - Enabling/Disabling diagnostics with pragmas and per function
  - Nested diagnostic groups
  - Enabling a diagnostic group, but disabling individual diagnostics
    or this group.
  - Good integration in tools through libclang
  - ...

However, the command line interface might need to be improved before we
start adding a large number of remarks. Some goals I see:

1) We want to reuse as much of the diagnostic infrastructure as possible
2) We do not want to increase the complexity of the warning flags too much
3) We still want to have a similar user interface as the warning flags
provide
4) People raised interest in upgrading remarks to warnings or errors
   (I am personally not too sure about this one, but it already works
    with -Werror=groupname)

Yes to all. Not so sure about #4. Convert the reports into errors? As
in, fail the build if an inline didn't happen?

Several people (including Chris) warned to build up an additional complex
diagnostic system on the side, but as Chandler pointed out, we can most
likely reuse most of the existing infrastructure but could still adjust the
user interface if needed.

Yes, the one additional ability we'll likely want is to direct these
reports to files. Though that could be added later.

I personally believe the best approach is to start small and let the
design be driven by the intended use cases. If you are interested, I would
propose to add a single diagnostic for the loop inliner using just the
existing '-W' infrastructure e.g. with '-Winline-remarkers'.

Agreed. I prefer an evolutionary approach.

This both increases test coverage of the exiting features and also

will help to get an idea of the needed features. At the same time, you
could help reviewing my first patch on modularizing the clang diagnostic
infrastructure. Again, something that may help to get an idea of where to
head at this side.

Sure. You mean r202475?

Diego.

This is a great idea, and many people would welcome it. Please write up a
concrete proposals about how this will work, and we can iterate on that
though.

Yup, working on one. Will send it out in the coming days.

My intent is to introduce an optimization report option to LLVM which

passes will be able to use to indicate the major decisions they make.
Initially, I am tempted to mimic GCC's -fopt-info (
Debugging Options (Using the GNU Compiler Collection (GCC))
).

I'm not sure if this is the best design or not (having never used it) -
what feedback have you heard from (non-compiler-hacker) people trying to
use it?

Used heavily in our optimization team. It's no altogether different from
what I've seen in other compilers in the past. I don't intend to mimic the
complete UI, I only want to convey the same information.

IMO, the hard part of doing something like this is getting the user

experience right. It does you no good to say "hey I unrolled a loop" if
you don't have enough location information to tell the user *which* loop
got unrolled. The key is to give them actionable information, not just the
output of -debug :slight_smile:

Absolutely. Every note must always be emitted with location information on
the instruction that generates it. In GCC, the compiler keeps track of
source LOCs at all times. But in Clang, we will want to turn on
-gline-tables-only when reports are requested.

The UI is based exclusively on source LOCs. We want to preserve the exact
same formatting used for warnings/errors so that we can feed these notices
back to editors, IDEs, etc.

Diego.

My suggestion is that we start attaching loop-id metadata to loops in the

frontend, and then also start attaching 'srcloc' metadata, just like we do
for inline asm statements. This way we can pass back the information we
need to the frontend for it to identify the loop without too much trouble.
There may be a better long-term design, but this seems, at least, like an
easy thing to do in the short term.

Why not just using the line table in -gline-tables-only? These reports will
need to latch on arbitrary instructions, not just loop headers. As more
transformations use the infrastructure, they will want to emit the report
on whatever instruction they triggered on.

Diego.

From: "Diego Novillo" <dnovillo@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chris Lattner" <clattner@apple.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, March 7, 2014 8:07:19 AM
Subject: Re: [LLVMdev] RFC - Adding an optimization report facility?

My suggestion is that we start attaching loop-id metadata to loops in
the frontend, and then also start attaching 'srcloc' metadata, just
like we do for inline asm statements. This way we can pass back the
information we need to the frontend for it to identify the loop
without too much trouble. There may be a better long-term design,
but this seems, at least, like an easy thing to do in the short
term.

Why not just using the line table in -gline-tables-only? These
reports will need to latch on arbitrary instructions, not just loop
headers. As more transformations use the infrastructure, they will
want to emit the report on whatever instruction they triggered on.

I'd prefer that we not do that; although we can certainly use debugging information to enhance the reporting (to include variable names and the like). I prefer the 'srcloc' on loops solution for two reasons:

1. It does not force users to include debugging symbols just to get optimization reports and, more importantly,

2. Using srcloc is more accurate: If we include only line table information then we miss column information, and so we can't correctly identify a loop with multiple loops per line (and those that arise from macro expansion). This is a real deal-breaker for me.

-Hal

I'd prefer that we not do that; although we can certainly use debugging information to enhance the reporting (to include variable names and the like). I prefer the 'srcloc' on loops solution for two reasons:

1. It does not force users to include debugging symbols just to get optimization reports and, more importantly,

Neither does -gline-tables-only. In fact, we could silently turn on
just the generation of src locs and not emit them to the object.

2. Using srcloc is more accurate: If we include only line table information then we miss column information, and so we can't correctly identify a loop with multiple loops per line (and those that arise from macro expansion). This is a real deal-breaker for me.

We don't need column information. This situation is why I added dwarf
discriminator support recently. It doesn't matter if the whole
program is in one line, we will be able to distinguish the location of
the loops via the loop hierarchy and the discriminator values. This
is similar to how we use discriminators for sample profiling.

I don't want to focus just on loops. We should be able to do
optimization reports on arbitrary instructions. It's true that we
could limit generation of src locs for major constructs at first, but
I'm not sure it's worth the effort.

Diego.

  In this context, I only want

to enable all remarks.

I can see that. How to do this is really open for discussion. I could see
us establishing two new flags '-Wall-remarks' for remarks and
'-Wall-warnings' for warnings. Or, what I prefer, we could establish a new
flag hierarchy e.g. with '-Reverything'.

I quite like the notion of a new flag hierarchy. Using -W seems like
semantic overload. -W already implies diagnostics about dubious constructs
in the code. These notices are pure reporting on optimization activities.

5. Enable/Disable remarks with pragmas / per function

Yes, there are a lot of things to do here. The diagnostic infrastructure
already provides great features, e.g.:

   - Enabling/Disabling diagnostics with pragmas and per function
   - Nested diagnostic groups
   - Enabling a diagnostic group, but disabling individual diagnostics
     or this group.
   - Good integration in tools through libclang
   - ...

However, the command line interface might need to be improved before we
start adding a large number of remarks. Some goals I see:

1) We want to reuse as much of the diagnostic infrastructure as possible
2) We do not want to increase the complexity of the warning flags too much
3) We still want to have a similar user interface as the warning flags
provide
4) People raised interest in upgrading remarks to warnings or errors
    (I am personally not too sure about this one, but it already works
     with -Werror=groupname)

Yes to all. Not so sure about #4. Convert the reports into errors? As
in, fail the build if an inline didn't happen?

Yes, some people requested/suggested this, as it might make sense for certain remarks and as e.g. the edg frontend has a generic diagnostic system where all diagnostics can be upgraded/downgraded by will.
Though, as said above, I am not fully convinced yet. I would not make it a priority now.

Several people (including Chris) warned to build up an additional complex
diagnostic system on the side, but as Chandler pointed out, we can most
likely reuse most of the existing infrastructure but could still adjust the
user interface if needed.

Yes, the one additional ability we'll likely want is to direct these
reports to files. Though that could be added later.

I personally believe the best approach is to start small and let the
design be driven by the intended use cases. If you are interested, I would
propose to add a single diagnostic for the loop inliner using just the
existing '-W' infrastructure e.g. with '-Winline-remarkers'.

Agreed. I prefer an evolutionary approach.

Cool. If you need anyone to review patches, I am very interested to do so.

This both increases test coverage of the exiting features and also

will help to get an idea of the needed features. At the same time, you
could help reviewing my first patch on modularizing the clang diagnostic
infrastructure. Again, something that may help to get an idea of where to
head at this side.

Sure. You mean r202475?

My patch is currently unreviewed on clang-commits. It's an email with the title '[PATCH] tblgen: Modularize the diagnostic emitter'.

Cheers,
Tobias