Tools, resource dir and toolchains

Hi all,

This is a somewhat thorny and detail-ridden question, so please bear with me.

For IWYU, we've recently started work on addressing the
"builtin-headers" problem, where a Clang tool can't find e.g.
<stddef.h> as described here:
https://clang.llvm.org/docs/LibTooling.html#builtin-includes

One of our goals has been to allow users to install IWYU to a
non-Clang prefix, but I think we've run into a complete road-block.

- The Driver constructor initializes the resource dir to $(dirname
/path/to/clang)/../lib/clang/$version
- This can be overridden with the `-resource-dir` switch or, if we're
a tool building on Clang, in any bootstrap code where the Driver is
created

So far, so good. If we're careful to overwrite
`compiler->getHeaderSearchOpts().ResourceDir` with a path before using
the Driver to set up a compilation, Clang will use our custom resource
dir.

For the builtin headers, this works well, the path calculation there
is just `$resourcedir/include`.

But the resource dir is used in all sorts of places in the Toolchain
hierarchy and in InitHeaderSearch.cpp to locate various SDKs,
sysroots, auxiliary include paths, etc. Often in a way that they step
up from the resource dir to some (seemingly) unrelated directory. For
example:

Here lies the challenge --

- For the `clang` binary, all path resolution will work out, because
its resource dir and itself are installed together to a common prefix
(I assume?)
- All of this path arithmetic is done in the Clang libraries and
always relative to the running executable
- When the Clang libraries are linked into a Clang tool, that tool
must also be installed to the same prefix or the Clang libraries'
assumptions about where to find toolchain paths are broken
- And if a tool overrides the resource dir, it basically pulls the rug
out from under itself -- there's no realistic way it can duplicate all
the subtrees necessary (this is probably mostly problematic for tools
that need to run in a cross-compilation scenario, but there are
examples in InitHeaderSearch.cpp above where target-specific include
paths break down)

It seems none of this is discoverable at build-time, either,
everything is figured out at runtime based on the running executable.
That makes sense, as the library code doesn't know where it will
ultimately be installed.

It would be nice if a tool was able to say "I depend on Clang being
installed", and it had some magic way to find out *where* a matching
Clang version was installed. That way it could just set its
resource-dir to that Clang path and everything would work. I wonder if
tools could use something like `llvm-config` at runtime to figure out
the install prefix for a given version?

If you've read this far, thanks!

- Kim

I've had a few days to mull on this, and I think it should be pretty
easy to locate the prefix of a given Clang version with some basic
probing similar to what's done in Clang's Toolchain classes for GCC
sysroots, etc.

- Find all 'clang*' on the PATH
- Look for a version suffix matching what we depend on (e.g. clang-7)
- Invoke best-match clang with --version and parse
- Step up twice from that path, e.g. /usr/bin/clang/../../ -> /usr/
- Append lib/clang/<version>/

That's the resource dir a tool should use.

So as long as `clang` is available on the PATH, this should just work.
Otherwise, users can always provide the resource dir with
`-resource-dir`.

And we'd need to make the relevant clang packages prerequisites of the tool.

Does that make sense?

- Kim

New versions of clang (I think 6.x and later) have a flag `-print-resource-dir` that can give you the resource directory path without needing to do crazy text parsing.

-Chris

Thanks, that's definitely usable, but we'd still need to make sure we
have the right version of Clang.

But since the Clang version is part of the path emitted by
`-resource-dir`, maybe it's easier to just get it from there.

So as long as it's a reasonable assumption that a matching Clang is on
the path, we can just invoke them and do a best match on the version
part of the resource dir.

- Kim