Clangd: Compilation-unit incorrectly inferred for a header file

Firstly thanks for producing clangd and its associated VSCode extension!

I’m using clangd 12 with VSCode on Linux. My project is essentially a monorepo (i.e tens of thousands of files, with many subdirectories representing libraries written by many different people). I frequently observe cases where clangd infers an incorrect compilation-unit to use as a basis (i.e include path) for indexing a given header-file. Often it will choose a compilation-database entry that is entirely unrelated to the header it’s asked to index. I believe this is because of the heuristics clangd uses to select a compilation-database entry. IMHO, when clangd is asked to index a header file H, and it needs to choose a compilation database entry C to help index H, it should choose C based on whether H is actually “reachable” from C (e.g naively by verifying that preprocessing each candidate for C actually results in H being included).

I created a small (admittedly contrived) reproduction of this behaviuour. If you create the files below under /var/tmp/clangd and then load the result into VSCode with the clangd extension installed, accept the workspace settings override, and load include/foo/A.hpp, clangd incorrectly infers X.cpp, resulting in an incorrect include path, meaning that the #include of B.hpp cannot be found and you get an error squiggle underlining it. I think clangd should instead infer Y.cpp, because that does have the correct include path (i.e A.hpp is “reachable” from Y.cpp but not from X.cpp).

$ pwd
/var/tmp/clangd
$ cat .vscode/settings.json 
{
    "clangd.onConfigChanged": "restart",
    "clangd.arguments": [
        "--query-driver=/usr/bin/*",
        "--log=verbose",
        "--background-index",
        "--compile-commands-dir=."
    ]
}
$ cat compile_commands.json 
[
    {
        "command": "/usr/bin/g++ -Wall -Wextra -Werror -pedantic -c X.cpp",
        "directory": "/var/tmp/clangd",
        "file": "X.cpp"
    },
    {
        "command": "/usr/bin/g++ -Wall -Wextra -Werror -pedantic -Iinclude/foo -Iinclude/bar -c Y.cpp",
        "directory": "/var/tmp/clangd",
        "file": "Y.cpp"
    }
]
$ cat X.cpp
int main() {
  return 0;
}
$ cat Y.cpp 
#include <A.hpp>

int main() {
  return 1;
}
$ cat include/foo/A.hpp
// foo/A.hpp
#include <B.hpp>
$ cat include/bar/B.hpp
// bar/B.hpp

Thanks!

Chris

Hey Chris – this is a known issue. See Use parsed files to improve header compile commands · Issue #123 · clangd/clangd · GitHub for a mitigation added so far and some discussion of potential further work.