RFC - Adding an optimization report facility?

From: "Diego Novillo" <dnovillo@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chris Lattner" <clattner@apple.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, March 7, 2014 9:25:12 AM
Subject: Re: [LLVMdev] RFC - Adding an optimization report facility?

>
>
> I'd prefer that we not do that; although we can certainly use
> debugging information to enhance the reporting (to include
> variable names and the like). I prefer the 'srcloc' on loops
> solution for two reasons:
>
> 1. It does not force users to include debugging symbols just to
> get optimization reports and, more importantly,

Neither does -gline-tables-only. In fact, we could silently turn on
just the generation of src locs and not emit them to the object.

>
> 2. Using srcloc is more accurate: If we include only line table
> information then we miss column information, and so we can't
> correctly identify a loop with multiple loops per line (and those
> that arise from macro expansion). This is a real deal-breaker for
> me.

We don't need column information. This situation is why I added
dwarf
discriminator support recently. It doesn't matter if the whole
program is in one line, we will be able to distinguish the location
of
the loops via the loop hierarchy and the discriminator values. This
is similar to how we use discriminators for sample profiling.

Ah, neat! :slight_smile:

I don't want to focus just on loops. We should be able to do
optimization reports on arbitrary instructions. It's true that we
could limit generation of src locs for major constructs at first, but
I'm not sure it's worth the effort.

I agree that we don't want to focus just on loops, but loops and functions are obviously a major use case, and tagging them is feasible. If using discriminators can do the same and more, then I'm fine with that too!

Thanks again,
Hal

This is an interesting discussion to have. I currently use line-table debug info and this work very well, with the only exception that line-table information currently needs to be enabled manually. As this could probably be enabled automatically for the users, this does not seem like an unsolvable problems.

Using srcloc sounds like an interesting alternative, but I have some questions.

1) How does this work with LTO?

Do soclocs actually store information about the source file? Or just the offset?

2) Why are soclocs better when macro-expanding?

I really have no idea. A brief explanation not may be sufficient.

3) What about implicit loops e.g. ->begin, ->end

Can we even emit soclocs for implicit loops e.g. formed by C++ iterators or range loops?

4) Does the preserving behavior of socs differ from debug
    info?

Will they also be kept on 'best effort'? Do we need all passes (e.g. loop rotate) to preserve socs similar to debug info?

Thanks,
Tobias

Yes. I don't think enabling -gline-tables-only would be a problem when
-R is used. It gives us precisely the information we need.

Diego.

If you do that, then -R is going to be substantially different than -W flags, and we should design it as such. This means that none of the warning level control should apply to it, -Weverything (etc) shouldn’t apply to it, and we should not implement these with the diagnostics subsystem.

-Chris

I have written down my thoughts and added some of the feedback from this thread. The proposal is probably still sparse, but I would like to start the discussion so I can shape it into final documentation when we are done painting the bike shed.

https://docs.google.com/document/d/1FYUatSjZZO-zmFBxjOiuOzAy9mhHA8hqdvklZv68WuQ/edit?usp=sharing

Please add your feedback on the document itself (anyone should be able to comment) or to this thread.

Tobias, I will look at your patches this week and start producing my own changes on top of them.

Thanks. Diego.

This feature will be very useful and I suggest taking a bigger perspective.

Not only the major decisions but also simple optimizations metrics like #<opt_instances>, #spills etc can be useful for performance analysis, comparison and tracking. In addition to metrics locality is helpful, specifically a report at a function level. A good place to start and test the design could be to issue current —stats per function. In practice the users of these reports know the hot functions and like to drill down.

The reports themselves could offer various levels of verbosity for the user to pick (and the implementer to decide which information to report at which level).

For performance tuning a compiler that also reports why it didn’t apply an optimization can be very useful (like in “I didn’t vectorize because of this dependency.” etc.).

-Gerolf

This feature will be very useful and I suggest taking a bigger perspective.

Not only the major decisions but also simple optimizations metrics like
#<opt_instances>, #spills etc can be useful for performance analysis,
comparison and tracking. In addition to metrics locality is helpful,
specifically a report at a function level. A good place to start and test
the design could be to issue current --stats per function. In practice the
users of these reports know the hot functions and like to drill down.

Right. Once the base reporting harness is in place, adding calls from
the optimizers can be done incrementally. Initially, the major passes
(inliner, vectorizer, scalar loop optimizers, etc). Mapping stats info
into this report may be doable, but it needs to be something
actionable and understandable by end-users. A stats report that refers
to something obscure like 'eliminated N% of phi nodes' does not seem
to be something useful. Stats also tend to be harder to place at a src
loc.

The reports themselves could offer various levels of verbosity for the user
to pick (and the implementer to decide which information to report at which
level).

That's reflected in the document, yes. Adding levels of verbosity
could be doable.

For performance tuning a compiler that also reports why it didn't apply an
optimization can be very useful (like in "I didn't vectorize because of this
dependency." etc.).

Yes. Also reflected in the document.

Diego.