controlling the value of DW_AT_comp_dir

This is a follow-up to the thread that started here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110725/125101.html

Would anyone object to a flag “-fdwarf-compilation-dir=…” clang flag that would clobber the DW_AT_comp_dir setting?

This is a gcc compatibility issue for us as we’re relying on our ability to control this part of the debug output by setting $PWD. A flag would also work for me.

Nick

This is a follow-up to the thread that started here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110725/125101.html

Would anyone object to a flag "-fdwarf-compilation-dir=..." clang flag
that would clobber the DW_AT_comp_dir setting?

This is a gcc compatibility issue for us as we're relying on our ability
to control this part of the debug output by setting $PWD. A flag would
also work for me.

I think a command line is fine, but could you propose the command line for gcc too?

Nick

Thanks,
Rafael

This is a follow-up to the thread that started here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110725/125101.html

Would anyone object to a flag “-fdwarf-compilation-dir=…” clang flag that would clobber the DW_AT_comp_dir setting?

My first reaction is to resist such flag. It is most likely solving the wrong problem. IMO, DW_AT_comp_dir is something whose existence no compiler user needs to know.

This is a gcc compatibility issue for us as we’re relying on our ability to control this part of the debug output by setting $PWD. A flag would also work for me.

Can you elaborate ? What is exactly missing in clang’s debug output today ? and how does it differ from gcc ? and why does it matter ?

Gladly! The build doesn’t happen on the machine where the debugging happens, so the path computed for AT_comp_dir is useless. More concretely, clang is producing a DW_AT_comp_dir which looks like this:

<60> DW_AT_comp_dir : /some/path/ac64e1e113026f77bb1834d0e3ad1410/thesource/

To understand why, you first need to know that we run builds on hermetic build machines. Everything is an upload to these machines, including the source to be built, the headers it requires, the system headers, the compiler and linker, etc. It’s a content-addressable storage system, so the files are named by the md5sum’s of their contents. That’s where the “ac64e1e113026f77bb1834d0e3ad1410” came from.

When we build, we cache the .o produced and use its md5sum to determine whether we need to relink or already have the linked output. Because clang outputs its own md5sum in its .o files (because it outputs its path in DW_AT_comp_dir), a change to clang will always trigger a relink, even if nothing else about the .o file changed. But that’s the smaller issue, it only means that clang will be slower than gcc for our users.

Much worse is that when I try to debug a clang-built binary, gdb will take DW_AT_comp_dir + DW_AT_name and create “/some/path/ac64e1e113026f77bb1834d0e3ad1410/thesource/llvm/lib/VMCore/Function.cpp” which of course doesn’t exist on my system. This has got to be fixed.

To make this work with GCC, we run “PWD=/proc/self/cwd gcc …” which causes gcc to put “/proc/self/cwd” in DW_AT_comp_dir. Yep, that means gdb opens “/proc/self/cwd/llvm/lib/VMCore/Function.cpp” which works fine because /proc/self/cwd is a kernel-provided symlink to the process’ current directory, which will be /home/nlewycky/thesource/.

My proposal is that we’ll pass “-fdwarf-compiler-dir=/proc/self/cwd” to clang. What won’t work is passing “-fdwarf-compiler-dir=/home/nlewycky/thesource” because then we would get cache misses again.

(For more background on our build system, see http://google-engtools.blogspot.com/2011/09/build-in-cloud-distributing-build-steps.html .)

Nick

I’m not really sure why our build system is relevant here. This has been a problem for me many times using very mundane and ordinary build systems. If I build on machine X and then copy the binary to machine Y, it can’t find the source code if it is stored in a different directory, even if the directory structure is entirely compatible.

Why can’t we follow GCC’s lead here, and use PWD (when on a system with such a concept) as the basis for DW_AT_comp_dir? I think what I’m missing is why doing that causes problems…

Chris’s objections (which seem reasonable) are to always using PWD. To be clear, I’m not suggesting that. I’m suggesting that the Clang driver, which is already quite aware of the user’s shell, can inspect PWD and getcwd and consult any other oracle needed to determine a valid working directory, and then pass it via an internal-only flag to the CC1 layer IFF it differs from getcwd.

To understand why, you first need to know that we run builds on hermetic build machines.

I’m not really sure why our build system is relevant here.

Only because there are ways to fix that problem which would still break caching in our build system. I wanted to steer us away from that.

This has been a problem for me many times using very mundane and ordinary build systems. If I build on machine X and then copy the binary to machine Y, it can’t find the source code if it is stored in a different directory, even if the directory structure is entirely compatible.

Why can’t we follow GCC’s lead here, and use PWD (when on a system with such a concept) as the basis for DW_AT_comp_dir? I think what I’m missing is why doing that causes problems…

Works for me. I just want agreement for what to do (flag, using PWD, whatever). I honestly don’t care how it works as long as it works. I can propose a patch using PWD if you want, the plumbing through -cc1 will be the same either way.

Nick

Works for me. I just want agreement for what to do (flag, using PWD,
whatever). I honestly don't care how it works as long as it works. I can
propose a patch using PWD if you want, the plumbing through -cc1 will be
the same either way.

I have a small preference for the original proposal (a command line), but mostly because I have been bitten by environment variables too much in the past.

Nick

Cheers,
Rafael

[+cfe-commits now that this has a patch.]

To understand why, you first need to know that we run builds on hermetic build machines.

I’m not really sure why our build system is relevant here.

Only because there are ways to fix that problem which would still break caching in our build system. I wanted to steer us away from that.

This has been a problem for me many times using very mundane and ordinary build systems. If I build on machine X and then copy the binary to machine Y, it can’t find the source code if it is stored in a different directory, even if the directory structure is entirely compatible.

Why can’t we follow GCC’s lead here, and use PWD (when on a system with such a concept) as the basis for DW_AT_comp_dir? I think what I’m missing is why doing that causes problems…

Works for me. I just want agreement for what to do (flag, using PWD, whatever). I honestly don’t care how it works as long as it works. I can propose a patch using PWD if you want, the plumbing through -cc1 will be the same either way.

Ok, here’s a patch that passes PWD through from the driver into the .bc, and it comes out in the .o files. Yay!

Please review!

Nick

dwarf-comp-dir-1.patch (3.37 KB)

What about following approach…

Index: Support/Unix/Path.inc

What about following approach…

I don’t like it because we’ve been nearly successful avoiding things which would diverge a Google-build of clang from an open-source build of clang. Is there no way we can get this functionality without an ifdef? Is there a reason you don’t like adding a flag?

Nick

Also, as Chris has already pointed out, PWD inspection doesn’t belong in Support/Path… It’s the driver that interfaces with the user’s shell.

Essentially, what you’re looking for to say is, “on this platform use pwd”. Have you considered an alternative to use configure check to enable use of PWD ?

I want you to exhaust all alternatives before deciding to add a command line flag. Adding a command line flag is usually easy way out, but removing an command line flag is almost impossible.

If you add a compiler option then would it be confusing or useful to people on other platforms?

If you still think command line flag is the way to go then pick a name that obvious to someone who does not know what is DWARF. Something like -fcurrent-working-dir=… or as such.

But hold on, the latest proposal has no such flag. We have an internal flag for the CC1 layer merely to factor the logic that deals with platforms and shells and other such oddities into the driver. Users will never see or use this flag.

Also, PWD is a useful thing to base the working directory on for many platforms. Mac, Linux, BSD, etc. I think Nick’s latest patch is very clean and minimal. The Frontend has a clear narrow and explicit interface with no system knowledge (it uses a flag, but an internal one). The Driver automatically sets that flag appropriately based on the user’s system, no flags or other changes necessary.

As far as I’m concerned this probably isn’t too bad. Let’s give Daniel and Devang a chance to object horribly though.

Thanks!

-eric

Ok, fine with me!

It would be best if this were cross-platform.

On POSIX systems, the PWD environment variable is set from the current
working directory of the shell.

The current working directory is also a POSIX concept. Do Win32 and
Win64 have an equivalent?

Windows does have environment variables but I don't think it has
anything like $PWD (or, for Windows, %PWD%).

But do Windows programs have a current working directory that is part
of the process state?

If so, CLang's Windows driver could look up the Win32 or Win64 current
working directory and pass it down the line, or perhaps create a %PWD%
environment variable.

What other operating systems do we support that are neither Windows nor POSIX?

It would be best if this were cross-platform.

On POSIX systems, the PWD environment variable is set from the current
working directory of the shell.

The current working directory is also a POSIX concept. Do Win32 and
Win64 have an equivalent?

Yes, I know of two: %CD% and the much better GetCurrentWorkingDirectoryW and subsequent conversion to UTF-8.

Windows does have environment variables but I don’t think it has
anything like $PWD (or, for Windows, %PWD%).

I think %CD% is quite similar to $PWD.

Ruben