Track locations of individual loads in -Rpass remarks (Re: [polly] Add diagnostic remark for ReportVariantBasePtr)

Move to cfe-dev@

Some context:

When reporting aliasing or too complicated data access functions we would like to emit remarks that point to the relevant load instructions. At the moment, the column info clang emits is not precise enough such that we only point to the beginning of a line.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --- examples/polli » pclang -S -g -gcolumn-info -emit-llvm -O3 -mllvm
- -polly-detect-track-failures -mllvm -polly-detect-keep-going -mllvm
- -polly -Xclang -Rpass-missed="polly-detect" -c tobi-scop.c
tobi-scop.c:6:8: remark: The following errors keep this region from
being a Scop.
        [-Rpass-missed=polly-detect]
    for (int i = 0; i < 32; i++)
         ^
tobi-scop.c:7:5: remark: The base address of this array is not
invariant inside the loop
        [-Rpass-missed=polly-detect]
      A->b[i] = A->b[i + 1];
      ^
tobi-scop.c:7:5: remark: The base address of this array is not
invariant inside the loop
        [-Rpass-missed=polly-detect]

In this case here I have -Rpass-missed -g -gcolumn-info specified.

I would expect the first remark pinned at line 7 column 5
and the second remark pinned at the read in line 7 column 17.

However, the IR only contains dbg nodes for line 7 column 5 inside the
loop body. Tobi's patch further up in the thread fixes it.

To add relevant information to this thread, Diego committed -gcolumn-info in
r212781, so our caret diagnostic is now at the right place.

clang still does not emit separate debug locations for individual
loads/stores. We could now add this on top of -gcolumn-info.

Right. One of the current limitations is that the backend receives
degraded location information. Clang's tracking of locations is
mapped to dwarf locs, which are not as precise.

One possible future direction would be to transfer Clang's source
location tables to the backend, but that seems pretty convoluted and
it would not survive without the front end. A bitcode file with the
source location annotations would not be able to provide actual
file/line/column info.

I came up with the following patch to emit accurate column-info for each load instruction. The patch seems to work for my limited test cases (the once above), but as I did not have worked in this area of
clang, I would appreciate some cross-checks.

Diego, you just recently run the -gcolumn-info test cases on your internal code base and on the gdb test suite. Would it be difficult for
you to create similar information for the attached patch to get an idea of how much overhead this patch would cause?

Cheers,
Tobias

0001-Emit-column-information-for-loads.patch (914 Bytes)

Sure. If I haven't posted anything by Fri, please ping me.

Diego.

A full comparison build is now running. I should have results soon. In the meantime, I tried it on a target that generates a 30Mb shared object:

Without your patch, I see these segment sizes:

text data bss debug debug.dwo symtab strtab other total filename
8505260 444768 308624 12140854 0 1226952 5631742 1632669 29588116 file.so

With your patch:
text data bss debug debug.dwo symtab strtab other total filename
8505260 444768 308624 12565490 0 1226952 5631742 1632669 30012748 file.so

That’s a 3.5% growth in debug info size, for a total size increase of ~1%.

I’ll send numbers over a full build as soon as I get them.

Diego.

I've got full numbers now. The extreme I chose was an outlier. I've
done full builds over our code base. The increase on average debug
sizes is 0.1%. The total file size increase is ~0%.

I don't think your change will make a substantial difference in debug
info sizes. At least, not in what I've observed.

Diego.

I'd have been surprised if it had, but thanks for running the numbers.

Tobi: Go ahead and commit it if you'd like. Please go ahead and watch
the bots to make sure there aren't any failures (they'll likely be
spurious, but we should watch out for them.

One bit of review on the patch, looks like there are extra braces
since the bit after the conditional is a single line?

-eric

Diego, you just recently run the -gcolumn-info test cases on your internal
code base and on the gdb test suite. Would it be difficult for
you to create similar information for the attached patch to get an idea of
how much overhead this patch would cause?

A full comparison build is now running. I should have results soon. In the
meantime, I tried it on a target that generates a 30Mb shared object:

Without your patch, I see these segment sizes:

      text data bss debug debug.dwo symtab strtab
other total filename
   8505260 444768 308624 12140854 0 1226952 5631742
1632669 29588116 file.so

With your patch:
      text data bss debug debug.dwo symtab strtab
other total filename
   8505260 444768 308624 12565490 0 1226952 5631742
1632669 30012748 file.so

That's a 3.5% growth in debug info size, for a total size increase of ~1%.

I'll send numbers over a full build as soon as I get them.

I've got full numbers now. The extreme I chose was an outlier. I've
done full builds over our code base. The increase on average debug
sizes is 0.1%. The total file size increase is ~0%.

I don't think your change will make a substantial difference in debug
info sizes. At least, not in what I've observed.

I'd have been surprised if it had, but thanks for running the numbers.

Thanks Diego for testing. This is very much appreciated. It is good to run this on a large code basis to gain confidence in this change.

Tobi: Go ahead and commit it if you'd like. Please go ahead and watch
the bots to make sure there aren't any failures (they'll likely be
spurious, but we should watch out for them.

Thanks. As I am not a debug system expert, it is good to confirm this is a reasonable approach.

One bit of review on the patch, looks like there are extra braces
since the bit after the conditional is a single line?

Right, I also added a test case and updated three other test cases
which checked full debug informations and where the column information
of loads now changed.

r214162

Thanks,
Tobias

I was too quick and broke five gdb debug info tests:

http://lab.llvm.org:8011/builders/clang-x86_64-ubuntu-gdb-75/builds/16376

I looked in the consecutive reverse test. The expected execution is:

Sorry, it was:

Breakpoint 3, foo () at ./gdb.reverse/consecutive-reverse.c:27

So it is the right breakpoint, but the address is not shown by gdb.

Cheers,
Tobias