[RFC] Path mappings for reproducable builds and debugging hacks

Hello all,
I've been discussing this topic on #llvm with some of the regulars, but
this merrits a wider audience. As you some of you might know, NetBSD
allows doing a full release build with GCC in a reproducable way,
including variations of the source locations. This is currently not
possible with clang and I want to fix that. There are four identified
primary points where absolute path names leak into the output:

(1) .file
(2) DWARF
(3) __FILE__
(4) __PRETTY_FUNCTION__ for lambdas etc

We have -fdebug-prefix-map [-fdpm from here] for (2) with some
limitations. I've created -iremap for GCC for (3) years ago, the patch
is still in review limbo. We don't have anything for (1) and (4) right
now in clangland. I've started to write patches for that, but this is a
bit messy as it tends to duplicate code. This made me want to step back
and review whether we need/want many different switches in first place.
I couldn't come up with a very good reason, but it has been mentioned
that Facebook is using -fdpm for speakable hacks to get space into the
binaries for patching in real patches. That seems to be abusive for me,
even when I can somewhat understand the motivation.

My proposal forward is:
(1) Tighten the definition of -fdpm to mean prefix paths, i.e. the next
character must be a path separator:
  -fdebug-prefix-map=/foo=/bar should not change /foobar into /barbar
(2) Introduce a new option -frewrite-path=src=dst and make -fdpm an
alias of it. The translation applies to all four points above.
(3) Introduce a new -gdwarf-path-padding=$n option to prefix all path
names encoded in DWARF with $n path separators.

The goal forward is that path references in output files generated by
clang (except dependency files) should easy adjustable to a canonical
location, independent of where the sources are located.

Joerg

Hello all,
I've been discussing this topic on #llvm with some of the regulars, but
this merrits a wider audience. As you some of you might know, NetBSD
allows doing a full release build with GCC in a reproducable way,
including variations of the source locations. This is currently not
possible with clang and I want to fix that. There are four identified
primary points where absolute path names leak into the output:

(1) .file
(2) DWARF
(3) __FILE__
(4) __PRETTY_FUNCTION__ for lambdas etc

We have -fdebug-prefix-map [-fdpm from here] for (2) with some
limitations. I've created -iremap for GCC for (3) years ago, the patch
is still in review limbo. We don't have anything for (1) and (4) right
now in clangland. I've started to write patches for that, but this is a
bit messy as it tends to duplicate code. This made me want to step back
and review whether we need/want many different switches in first place.
I couldn't come up with a very good reason, but it has been mentioned
that Facebook is using -fdpm for speakable hacks to get space into the
binaries for patching in real patches. That seems to be abusive for me,
even when I can somewhat understand the motivation.

My proposal forward is:
(1) Tighten the definition of -fdpm to mean prefix paths, i.e. the next
character must be a path separator:
  -fdebug-prefix-map=/foo=/bar should not change /foobar into /barbar
(2) Introduce a new option -frewrite-path=src=dst and make -fdpm an
alias of it. The translation applies to all four points above.

These two SGTM. I also don't see why you'd want different path translations
for the different kinds of data. Controlling it all with one flag seems
like a very reasonable proposal. Suggestion: "-frewrite-source-path" may be
a clearer name for the flag.

I'd also like to point out the (proposed, but not accepted) patch to GCC
here to do something similar:
https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html
Note that the discussion does continue in some of the later months too. If
we decide to go for this, it would be nice to respond to the gcc thread
saying what we're planning to do, in case they'd like to be compatible with
it.

(3) Introduce a new -gdwarf-path-padding=$n option to prefix all path
names encoded in DWARF with $n path separators.

This part sounds pretty horrible. =)

Even after the discussion on #llvm earlier, I still don't really understand
why Facebook has a need to mangle dwarf paths in-place, instead of using a
debugger source mapping feature.

I haven’t looked deeply into this issue, but I thought -no-canonical-prefixes was the flag for this use case. Can you elaborate on what makes it unsuitable?