Recent changes in -gmlt break sample profiling

I’m not sure if this was intended, but it’s going to be a problem for sample profiles.

When we compile with -gmlt, the profiler expects to find the line number for all the function headers so that it can compute relative line locations for the profile.

The tool that reads the ELF binary is not finding them, so it writes out absolute line numbers, which are impossible to match during the profile-use phase.

The problem seems to be that we are missing DW_TAG_subprogram for all the functions in the file.

Attached are the dwarf dumps of the same program. One compiled with my system’s clang 3.4 and the other with today’s trunk. In both compiles, I used -gline-tables-only.

The trunk version is missing all the subprogram tags for the functions in the file. This breaks the sample profiler.

Should I file a bug, or is -gmlt going to be like this from now on? The latter would be a problem for us.

Thanks. Diego.

fnptr-clang36.bad.dwarfdump (4.11 KB)

fnptr-clang34.good.dwarfdump (6.28 KB)

I'm not sure if this was intended, but it's going to be a problem for
sample profiles.

When we compile with -gmlt, the profiler expects to find the line number
for all the function headers so that it can compute relative line locations
for the profile.

The tool that reads the ELF binary is not finding them, so it writes out
absolute line numbers, which are impossible to match during the profile-use
phase.

The problem seems to be that we are missing DW_TAG_subprogram for all the
functions in the file.

Attached are the dwarf dumps of the same program. One compiled with my
system's clang 3.4 and the other with today's trunk. In both compiles, I
used -gline-tables-only.

The trunk version is missing all the subprogram tags for the functions in
the file. This breaks the sample profiler.

Should I file a bug, or is -gmlt going to be like this from now on? The
latter would be a problem for us.

Open to negotiation, but this change is intentional ( for details, see the
commit: http://llvm.org/viewvc/llvm-project?rev=218129&view=rev ).

Even when a subprogram does have inlined subroutines, I've omited the
line/file as well, to reduce -gmlt size (actually, I imagine we could even
remove the top level subroutines entirely - just emitting the inlined
subroutines at the top level - completely garbage DWARF, but it would be
smaller & easy enough to teach symbolizers to deal with that if they don't
already)

To be honest, optimized -gmlt is still pretty big, but these changes do
make unoptimized -gmlt essentially no cost over the line table itself,
which is nice. A longer term approach to solving this is
two-level-line-tables, but they would probably present a problem for the
algorithm you've described too.

- David

Well, this breaks us. Very hard. It absolutely blocks us from using SamplePGO with LLVM.

The alternative would be to make the compiler use absolute line numbers, but in the experience we’ve collected with GCC, this makes the profiles very brittle wrt source changes.

I don’t have a better idea atm. Would there be any other way to generate relative line numbers? Perhaps we could use the first line number we find with samples? The problem here is that this line will shift, depending on how the profile was obtained.

Would it be possible to add these line numbers in some other way?

Thanks. Diego.

I'm not sure if this was intended, but it's going to be a problem for
sample profiles.

When we compile with -gmlt, the profiler expects to find the line number
for all the function headers so that it can compute relative line locations
for the profile.

The tool that reads the ELF binary is not finding them, so it writes out
absolute line numbers, which are impossible to match during the profile-use
phase.

The problem seems to be that we are missing DW_TAG_subprogram for all
the functions in the file.

Attached are the dwarf dumps of the same program. One compiled with my
system's clang 3.4 and the other with today's trunk. In both compiles, I
used -gline-tables-only.

The trunk version is missing all the subprogram tags for the functions
in the file. This breaks the sample profiler.

Should I file a bug, or is -gmlt going to be like this from now on? The
latter would be a problem for us.

Open to negotiation, but this change is intentional ( for details, see
the commit: http://llvm.org/viewvc/llvm-project?rev=218129&view=rev ).

Well, this breaks us. Very hard. It absolutely blocks us from using
SamplePGO with LLVM.

Sorry about that - I knew ASan needed gmlt data of a certain form and
worked carefully to ensure llvm-symbolizer could still symbolize with these
changes, but wasn't aware of this particular dependence from PGO (just
assumed it used the line table directly).

The alternative would be to make the compiler use absolute line numbers,
but in the experience we've collected with GCC, this makes the profiles
very brittle wrt source changes.

It'd be interesting to see the data - I guess you throw out profiles that
don't match on a per-function basis? So adding a few lines to one function
will invalidate the profile for that function, but not all the subsequent
ones in the file?

I don't have a better idea atm. Would there be any other way to generate
relative line numbers?

Possibly.

  Perhaps we could use the first line number we find with samples?

Probably - I guess you use the ELF symbol table to compute the address
range of each function? Given that, you could find the smallest line number
in that address range in the line table, and make everything else relative
to that.

The problem here is that this line will shift, depending on how the
profile was obtained.

Not sure what you're referring to here ^

It does, but it uses relative line numbers. That’s why it needs the source locs for the function headers.

Right. Dehao started using absolute numbers, but then moved to relative numbers when he saw that the performance drops were fairly pronounced as the profile aged. I don’t recall the exact drops he noticed.

That could probably work, but I’m not sure how the ELF reading code in the conversion tool does this calculation. I’ll check it out.

Say the function starts at line 20. If in one profile run, we get samples at line 20 and line 23, then we’ll compute the relative locations as 0 and 3. But if the first sample you get is at line 21, then you’ll compute the relative locations as 0 and 2.

Using the address ranges in the line table may be the way to go. I’ll look at this next week.

Diego.

Diego,

I think sampleFDO needs to be designed in a way which can protect itself from future breakage like this. The roots in the unnecessary dependency of sample FDO on gmlt setting. It is totally reasonable to tune debug binary size by changes like this.

The right way is to fix this is to introduce an internal -g<…> flag for use by sampleFDO – it will have a fixed definition of what needs to be emitted.

David

This sounds like a problem best solved by tracking source code movement via your source control system.

If you know the commit of the code that produced the sample, you should be able to use source control history / diffs to translate absolute line numbers to the location where the source has moved.

This would have the added advantage of highlighting where those samples are likely to be useless or completely misleading.

Diego,

I think sampleFDO needs to be designed in a way which can protect itself
from future breakage like this. The roots in the unnecessary dependency of
sample FDO on gmlt setting. It is totally reasonable to tune debug binary
size by changes like this.

FWIW, it's not 100% clear to me that this is the right change. Might be
worth chatting about on Monday.

The right way is to fix this is to introduce an internal -g<...> flag for
use by sampleFDO -- it will have a fixed definition of what needs to be
emitted.

This doesn't seem worth having an internal flag for. Google is probably the
primary consumer of Clang's -gline-tables-only data. If it doesn't meet our
needs, we can lobby to make it behave the way we need it to.

This sounds like a problem best solved by tracking source code movement via
your source control system.
If you know the commit of the code that produced the sample, you should be
able to use source control history / diffs to translate absolute line
numbers to the location where the source has moved.
This would have the added advantage of highlighting where those samples are
likely to be useless or completely misleading.

This particular scheme might not work, but I was thinking of something
similar last week.

Basically you'd use the same algorithms that the source control system
uses when merging or doing whitespace only diffs etc to construct a
better correspondence between source lines. The problem then comes
down to "we added some hot code that doesn't have any samples" at
which point you'd want the optimizer to pay attention to propagating
weights along edges without them.

Anyhow, some food for thought.

-eric

What you propose here works fine if the behavior of -gmlt satisfies the
need of the optimizer and is well documented with good regression tests:)

David

I'm nearly certain this is the way to go here.

-eric

>
>
>>
>>>
>>>
>>>
>>>>
>>>>>
>>>>> I'm not sure if this was intended, but it's going to be a problem for
>>>>> sample profiles.
>>>>>
>>>>> When we compile with -gmlt, the profiler expects to find the line
>>>>> number for all the function headers so that it can compute relative
line
>>>>> locations for the profile.
>>>>>
>>>>> The tool that reads the ELF binary is not finding them, so it writes
>>>>> out absolute line numbers, which are impossible to match during the
>>>>> profile-use phase.
>>>>>
>>>>> The problem seems to be that we are missing DW_TAG_subprogram for all
>>>>> the functions in the file.
>>>>>
>>>>> Attached are the dwarf dumps of the same program. One compiled with
my
>>>>> system's clang 3.4 and the other with today's trunk. In both
compiles, I
>>>>> used -gline-tables-only.
>>>>>
>>>>> The trunk version is missing all the subprogram tags for the
functions
>>>>> in the file. This breaks the sample profiler.
>>>>>
>>>>> Should I file a bug, or is -gmlt going to be like this from now on?
The
>>>>> latter would be a problem for us.
>>>>
>>>>
>>>> Open to negotiation, but this change is intentional ( for details, see
>>>> the commit: http://llvm.org/viewvc/llvm-project?rev=218129&view=rev
).
>>>
>>>
>>> Well, this breaks us. Very hard. It absolutely blocks us from using
>>> SamplePGO with LLVM.
>>
>>
>> Sorry about that - I knew ASan needed gmlt data of a certain form and
>> worked carefully to ensure llvm-symbolizer could still symbolize with
these
>> changes, but wasn't aware of this particular dependence from PGO (just
>> assumed it used the line table directly).
>
>
> It does, but it uses relative line numbers. That's why it needs the
source
> locs for the function headers.
>
>>
>>
>>>
>>> The alternative would be to make the compiler use absolute line
numbers,
>>> but in the experience we've collected with GCC, this makes the
profiles very
>>> brittle wrt source changes.
>>
>>
>> It'd be interesting to see the data - I guess you throw out profiles
that
>> don't match on a per-function basis? So adding a few lines to one
function
>> will invalidate the profile for that function, but not all the
subsequent
>> ones in the file?
>
>
> Right. Dehao started using absolute numbers, but then moved to relative
> numbers when he saw that the performance drops were fairly pronounced as
the
> profile aged. I don't recall the exact drops he noticed.
>
>>
>>
>>>
>>> I don't have a better idea atm. Would there be any other way to
generate
>>> relative line numbers?
>>
>>
>> Possibly.
>>
>>>
>>> Perhaps we could use the first line number we find with samples?
>>
>>
>> Probably - I guess you use the ELF symbol table to compute the address
>> range of each function? Given that, you could find the smallest line
number
>> in that address range in the line table, and make everything else
relative
>> to that.
>
>
> That could probably work, but I'm not sure how the ELF reading code in
the
> conversion tool does this calculation. I'll check it out.
>
>>
>>
>>>
>>> The problem here is that this line will shift, depending on how the
>>> profile was obtained.
>>
>>
>> Not sure what you're referring to here ^
>
>
> Say the function starts at line 20. If in one profile run, we get
samples at
> line 20 and line 23, then we'll compute the relative locations as 0 and
3.
> But if the first sample you get is at line 21, then you'll compute the
> relative locations as 0 and 2.
>
> Using the address ranges in the line table may be the way to go. I'll
look
> at this next week.
>

I'm nearly certain this is the way to go here.

How does it solve the problem?

David

FWIW, I strongly disagree.

The more modes we have, the harder everything will be to support and keep
track of. The wide variety of modes used for debug information is already
really challenging to support and maintain. We shouldn't make it harder,
and less-used flags seem like the wrong direction.

At a higher level, I don't think "sample profiling" or "asan" are good ways
to design a set of debug information that we want emitted in a particular
mode. Instead, we should look at what fundament information a collection of
tools need access to. Both sample profiling, the sanitizers, and crash
backtraces need access to a very minimal amount of information consisting
of line tables, and that is how we designed '-gline-tables-only' (the LLVM
flag name).

And I think that both this design and Dave's change are totally fine. There
was only one thing we missed: a very particular use case for line tables
that had a particular usage pattern. The problem here is that we don't have
any profile *collection* tests in LLVM. There are some reasons for that
(its hard, etc) but we could probably work on improving it. But the correct
path is the one Dave and Diego identified -- the information *is* still
there, we just need a more clever way of extracting it. And (in addition)
we should probably add some testing strategy for this. =]

The fundamental questions we need to answer are the following:

  1. what is -gmlt option designed for? debugging, profiling, autofdo. Do we expect more use cases for -gmlt?
  2. can gmlt’s behavior be standardized? The meaning of ‘minimal’ really varies depending on the target use of the information. What is minimally enough today may become not enough tomorrow if there is a new target use case identified.
  3. Do we have regression tests for other well established use cases, such as asan?
  4. When we need to add more debug info for -gmlt in the future for enhancement of one of the existing use cases, is it considered a memory and object size regression and get rejected?

For now, if the minimal debug info (after stripping out DW_TAG_subprogram etc) already contains enough information for our need, that will be fine. What if not (now, or in the future)?

David

Alternatives like this were discussed during autofdo’s original design. For instance fetching the source diff and establish line mapping between two revisions. Another one is to use more fine grained offsets in profile to have better source change tolerance. We ended up with function level relative offset solution which is the simplest and works fine in practice. (It is not designed to handle large scale refactoring).

David

Figured it had been. Might still be interesting, but I can see where the function level relative offset is a decent compromise.

-eric

You can get the address of the function from the symbol table and the starting line from the line table with that address and then use the line table normally for lookups.

-eric

The fundamental questions we need to answer are the following:

FWIW, there was a longer discussion on the lists when the actual option was
added. I'll try to relay my memory of that discussion, but you might need
to track it down or involve some of the other people who participated in
it. Honestly, this thread is probably not the best place to rehash the
entire design of this feature...

1) what is -gmlt option designed for? debugging, profiling, autofdo. Do we
expect more use cases for -gmlt?

It was designed for all users of debug information that merely need
location information rather than variable, type, or other debug
information. We expect these to at least include any and everything that is
primarily symbolizing backtraces and analyzing those. I think sample-based
profiling and PGO is definitely in-scope.

2) can gmlt's behavior be standardized? The meaning of 'minimal' really

varies depending on the target use of the information. What is minimally
enough today may become not enough tomorrow if there is a new target use
case identified.

Again, the Clang flag is not '-gmlt'. It is '-gline-tables-only'. But I
think both are clear in their intent... Perhaps we could add some
documentation for the flag, but I don't think we can or need to try to
standardize this... It's a pragmatic thing and should continue to be driven
by pragmatic decisions.

3) Do we have regression tests for other well established use cases, such
as asan?

Both the ASan and TSan test suites should cover this, yes.

4) When we need to add more debug info for -gmlt in the future for
enhancement of one of the existing use cases, is it considered a memory and
object size regression and get rejected?

This is really open ended and impossible to answer. It would depend
entirely on the nature of the enhancement and the impact it had on the
debug info size. It would likely be discussed and a decision made on the
lists.

yep – as long as inline_subroutine is emitted, this should work.

David

It should be, yes. :slight_smile:

-eric