compiler-rt RFE to support creating profraw for each Shared Library

Greetings!

Please review our patch compiler-rt RFE patch here – https://reviews.llvm.org/D110232

This patch is about creating profraw for each Shared Library.

<patch_info>

Hello Vedant,

could you share some of the use cases for this feature? > What were the pros/cons of any alternatives you considered (e.g. the %m/%c modes)?

We are taking coverage of shared libraries (i.e. Linux .so) and dumping coverage into files corresponding to SO’s filename.

With this proposed patch, will get coverage as below for sample code (pls refer compiler-rt/test/profile/Linux/instrprof-shared-nProfraws.test):

libhi.so → will dump into libhi.so.profraw

libhello.so → will dump into libhellp.so.profraw

main.out → will dump into main.out.profraw

Regarding ‘%m or %c: (from https://clang.llvm.org/docs/SourceBasedCodeCoverage.html): %m is expands instrumented binary’s signature (and with “%Nm” dumps into poll of N profraws) and %c seems very different from proposed “%n”.

Basically with proposed patch (i.e. using “%n”) one can get one-to-one mapping of Shared Library and generated .profraw.

This makes backtracking and instrumenting final-binary based on feedback easy.

Please let me know in case of further queries or any changes in proposed patch.

Thank you.

-Hiral

Hi Oza,

The mechanics of the patch are clear. What's not clear to me is why dso-specific .profraw files are helpful for code coverage, since merged .profraw's should work just as well.

Have you encountered issues using merged .profraws? Could you clarify what's meant by backtracking and instrumenting [the] final binary?

thanks,
vedant

Hi Oza,

Apologies, I meant to write Hiral here -- sorry for getting this wrong.

Hello Vedant,

What’s not clear to me is why dso-specific .profraw files are helpful for code coverage, since merged .profraw’s should work just as well.

Have you encountered issues using merged .profraws? Could you clarify what’s meant by backtracking and instrumenting [the] final binary?

There were problems when decoding merged profraw files. Couldn’t recall exactly error messages. But symbols in the profraws could not at all be matched to their definitions in the .so files.

Also tried %m which worked, except that we had no way to know which .so file corresponded to which .profraw file when decoding them.

Hence we added %n.

Thank you.

-Hiral

Hello Vedant,

> What's not clear to me is why dso-specific .profraw files are helpful for code coverage, since merged .profraw's should work just as well.
> Have you encountered issues using merged .profraws? Could you clarify what's meant by backtracking and instrumenting [the] final binary?

There were problems when decoding merged profraw files. Couldn’t recall exactly error messages. But symbols in the profraws could not at all be matched to their definitions in the .so files.

I think it'd be instructive to dig into these problems a bit more. How were .profraw contents matched to symbols in a .so? Can you share a minimal test case with (say) two .so's that illustrates the issue with merged .profraws?

To add some context for the line of inquiry: llvm's infrastructure has supported collecting profile data from processes with multiple instrumented DSOs for quite a while. If the existing flow to emit & use merged .profraws for coverage reporting has stopped working, or is buggy, that would indicate a serious regression which we should fix.

Also tried %m which worked, except that we had no way to know which .so file corresponded to which .profraw file when decoding them.

Why is it necessary to know the precise mapping of .profraw files to DSOs? Typically, .profraw files are merged together into an indexed .profdata (via `llvm-profdata merge ...`): that in turn supports coverage analysis for all of the DSOs that contributed profile data.

Hence we added %n.

I'd be hesitant towards adding this to the profile runtime as a workaround for a deeper issue.

thanks,
vedant

Thanks Vedant for feedback.

I am trying to analyse using %m (pls expect delay).

Can %m generate profraw files with basename of DSO and/or binaries (instead of the random basename) ?

Thank you.

Hello Vedant,

Sorry for long delay!

Large projects, usually generate lots of Libraires and Binaries. In this environment, you can assume following kind:

bin/{Foo, Bar,…}

lib/{libA.so, libB.so.1,…}

In this environment running llvm-cov show/export is difficult as llvm-cov requires to pass Filename=“binary/lib-name-for-which-profraw-generated”.

(note: based on filename of profraw we can generate profdata, using profdata in below examples)

For example: $ llvm-cov export < binary-name-for-which-profraw-generated> -instr-profile=one-binary.profdata

The proposed patch https://reviews.llvm.org/D110232 tries to address above thing by storing profraw files as binary.profraw or libname.profraw, e.g. libA.so.profraw or Foo.profraw etc.

So that we can easily bind llvm-cov cmd as – $ llvm-cov export Foo -instr-profile=Foo.profdata

Without above patch: we are getting lots of profraws – 9161986135738019531_0.profraw, 3161383135738013531_0.profraw etc. and it is difficult to map them back to binary/lib.

Thank you.

-Hiral