Clang tooling and absolute filenames

Hi,

I've written a Clang tool that does some analysis of a C program and uses the results to guide instrumentation of the resulting LLVM IR. This tool (and the attendant research) wouldn't have happened without Clang, so thanks to all you maintainers! I have encountered one problem, however, which is trivial to fix but seems to be deeply embedded in someone's philosophy of The Way Things Should Be Done.

My issue is that Clang and libClangTooling disagree about the name of the C file they operate on: Clang is (quite sensibly) happy to compile "foo.c", but when I run my Clang tool, it is converted to "/Users/jon/Documents/.../myproject/foo.c" (on line 242 of lib/Tooling/Tooling.cpp). Since Clang (the binary) disagrees with Clang (the library), I have to resort to a sed script that strips `pwd` out of my Clang tool's output!

I can easily modify Tooling.cpp to use the filename passed to libClangTooling, and I would've submitted a patch with this e-mail, but there are tests and document that assume compile_commands.json will always contain absolute filenames. This seems wrong to me: shouldn't the canonical name of a file be whatever was passed to the compiler at the command line? Since tools like clang-check already provide arguments that separate the working directory from filenames, why do we have to refer to absolute filenames in compile_commands.json?

Jon

Hi,

I've written a Clang tool that does some analysis of a C program and uses
the results to guide instrumentation of the resulting LLVM IR. This tool
(and the attendant research) wouldn't have happened without Clang, so
thanks to all you maintainers! I have encountered one problem, however,
which is trivial to fix but seems to be deeply embedded in someone's
philosophy of The Way Things Should Be Done.

That would be me :slight_smile:

My issue is that Clang and libClangTooling disagree about the name of the C

file they operate on: Clang is (quite sensibly) happy to compile "foo.c",
but when I run my Clang tool, it is converted to
"/Users/jon/Documents/.../myproject/foo.c" (on line 242 of
lib/Tooling/Tooling.cpp). Since Clang (the binary) disagrees with Clang
(the library), I have to resort to a sed script that strips `pwd` out of my
Clang tool's output!

What breaks due to that disagreement?

I can easily modify Tooling.cpp to use the filename passed to
libClangTooling, and I would've submitted a patch with this e-mail, but
there are tests and document that assume compile_commands.json will always
contain absolute filenames. This seems wrong to me: shouldn't the canonical
name of a file be whatever was passed to the compiler at the command line?
Since tools like clang-check already provide arguments that separate the
working directory from filenames, why do we have to refer to absolute
filenames in compile_commands.json?

The problem with relative paths is when you get a path handed in, and
you're trying to figure out which command line applies to it.

Cheers,
/Manuel

That would be me :slight_smile:

Nice to meet you. :slight_smile:
  

What breaks due to that disagreement?

I have a string constant in foo.ll that is generated from __FILE__ ("foo.c"). This string constant is used during instrumentation to look up information in my analysis results file, which contains information from all translation units. The trouble is, the analysis doesn't know about "foo.c", even though I ran it with "analyse foo.c"; it thinks the file is called "/Users/jon/[…]/foo.c".

I currently get around this by running my analysis results through sed to remove the "/Users/jon/[…]/" part, but that's pretty hacky (and it won't work if I switch my protobuf output from text representation to binary). Another approach would be to modify my makefile to pass absolute filenames to Clang (in which case its __FILE__ would match what's in the analysis file), but I might like to distribute the analysis file along with the source files it describes, in which case I really, really don't want to be stuck with absolute filenames.

The problem with relative paths is when you get a path handed in, and you're trying to figure out which command line applies to it.

Sure, it's the same issue that I have, but the relative-absolute problem goes the other way. The problem with the current approach is that the desire to use absolute filenames in one case *precludes* the ability to use relative filenames in the other case. If Clang's Tooling framework defaulted to using "whatever string is passed on the command line" as the filename, however, projects that require absolute filenames could use them but other projects (like mine) that prefer relative names can use those.

Jon

> That would be me :slight_smile:

Nice to meet you. :slight_smile:

> What breaks due to that disagreement?

I have a string constant in foo.ll that is generated from __FILE__
("foo.c"). This string constant is used during instrumentation to look up
information in my analysis results file, which contains information from
all translation units. The trouble is, the analysis doesn't know about
"foo.c", even though I ran it with "analyse foo.c"; it thinks the file is
called "/Users/jon/[…]/foo.c".

I currently get around this by running my analysis results through sed to
remove the "/Users/jon/[…]/" part, but that's pretty hacky (and it won't
work if I switch my protobuf output from text representation to binary).
Another approach would be to modify my makefile to pass absolute filenames
to Clang (in which case its __FILE__ would match what's in the analysis
file), but I might like to distribute the analysis file along with the
source files it describes, in which case I really, really don't want to be
stuck with absolute filenames.

I still don't understand how the compilation and the analysis are related,
*but*...

The problem with relative paths is when you get a path handed in, and
you're trying to figure out which command line applies to it.
Sure, it's the same issue that I have, but the relative-absolute problem
goes the other way. The problem with the current approach is that the
desire to use absolute filenames in one case *precludes* the ability to use
relative filenames in the other case. If Clang's Tooling framework
defaulted to using "whatever string is passed on the command line" as the
filename, however, projects that require absolute filenames could use them
but other projects (like mine) that prefer relative names can use those.

Perhaps I don't need to - as far as I understand what you want to change is
basically the filename that gets passed to clang as the "main source file
name". That sounds like it makes sense. If you whip up a patch and send it
to me (preferred on llvm-reviews.chandlerc.com) I think that makes sense...

Or correct me if I seem to have misunderstood your goal...

Cheers,
/Manuel

Thanks! I’ve sent two separate patches to you on Phabricator: the first patch lays some “don’t-fail-silently” groundwork (which should be useful even without the second patches) and the second patch makes the logical change I’m looking for.

Jon