clangd, completion in header files

Hi,

I recently tried to use clangd for Chromium in vscode and hit the same issue.
I experimented a bit with generating a compile_commands.json file with valid rules for headers and came to this solution: https://gist.github.com/xchrdw/bfd2b3a5f765f4195a55d6351daf1b48
I sorted all .cc filenames and then looked up the index of the closest match for each header using binary search. As there were some edge cases like the first or last file in a directory, I additionally compared which file at the index has the largest prefix with the header.
I found a few issues with system headers and some generated protobuf headers but otherwise it works well.
Native support from clangd would be amazing :slight_smile:

Thanks,
Christian

Are there any interesting codebases I should be trying this stuff out on? Ideally - builds on linux without too much fuss, uses cmake and generates compile_commands.json. Preferably not incredibly huge (though I will try chromium).

I’m playing with heuristics; and have a rough tool to validate them.This seems to work pretty well:

  • 2 points if basenames match (without extension). 1 point for substrings, in either direction.
  • for /a/b/c/d/foo.h, 1 point each for “/c/” and “/d/” in the candidate path, and 1 point if it starts with “/a/b”
  • break ties by longest common prefix

I’ve been testing using the LLVM codebase, which has some tricky cases. (nontrivial directory layout, lots of headers without matching cc files)

Out of ~3000 headers (i.e. any file transitively #included) in my llvm checkout, working accuracy seems to be extremely good:
PERFECT (~2200): we pick a TU that transitively included the header. (When there’s a corresponding cc file, it’s usually that on.)
GOOD (~650): we pick a TU whose args “obviously” work (it contains every arg that was defined by every transitively including TU)
MAYBE_BAD (150):

  • about half of these are *.def and *.inc which I haven’t studied, they’re not relevant to clangd
  • most of the remainder are false alarms: the flags are fine, but my heuristic can’t tell
  • I haven’t found any that would give wrong results, though some results are confused (clang/Parse/ParseDiagnostic.h gets associated with clang/lib/Basic/Diagnostic.cpp rather than e.g. clang/lib/Parse/ParseAST.cpp).

Another heuristic solution:

Enumerate all compilation database entries and calculate the matching score for each entry with the target file
Bonus for common leading components
Penalty for diverged path components
Bonus for common trailing characters sans filename extension (e.g. Match.cc Match.h are the same sans extension)

This heuristic is used in cquery.

Add another point for textDocument/didOpen on an unseen filename:
The inferred command line options are not authoritative, they should be overriden by other translation units when later it turns out the header file is included by some entry in the compilation database.

Looks like our mail crossed :slight_smile:
Cquery’s success with this is another point in favor.

I have a prototype in https://reviews.llvm.org/D45006 (+ 2-line patch to enable in clangd: https://reviews.llvm.org/D45007).
It seems to work pretty well at first glance, but it needs rigorous testing. If anyone’s feeling this pain, feel free to try it out!

It does a little bit of indexing to avoid traversing all the entries every time, not sure if that’s worth it.

Sound great, thanks!

One interesting case in Chromium might be generated jni headers. I would be interested what the heuristic finds for these files:

src/out/Android/gen/chrome/browser/jni_headers/chrome/jni/BrowsingDataBridge_jni.h

the implementation is in:

src/chrome/browser/android/browsing_data/browsing_data_bridge.cc

I don’t think that will get matched.

I don’t really care if code completion works in generated files but at least there shouldn’t be incorrect error messages in vscode. Is it possible to exclude a folder from being analyzed by clangd?

Sound great, thanks!

One interesting case in Chromium might be generated jni headers. I would be interested what the heuristic finds for these files:

src/out/Android/gen/chrome/browser/jni_headers/chrome/jni/BrowsingDataBridge_jni.h

Indeed, it doesn’t do well here, but still compiles fine - this seems to be the case when the exact flags used don’t matter much as long as the general project setup is right. Probably quite common in generated files - I guess these tend to have relatively few predictable dependencies, and might prefer unwieldy fully-qualified include paths where a human would add a -I entry. This one includes <jni.h> and “…/…/…/…/…/…/…/…/base/android/jni_generator/jni_generator_helper.h”.

The heuristic fares poorly here because the filename is unique, the last few path components (chrome, jni, jni_headers) are unhelpful in locating related cc files, and the “nearby in the tree” logic doesn’t handle the separate root for genfiles well.
The right file turns out to be e.g. “chrome/browser/android/browsing_data/browsing_data_bridge.cc”. The only useful hints here are chrome/browser (not very specific) and BrowsingDataBridge if we used fuzzy word matching to account for case differences. I’m inclined not to try to solve this case without more data.

The only diagnostics produced are a bunch of “internal linkage… but not defined” where the header forward declares static functions whose implementations are meant to be provided by the including CC file.
This is an artifact of assuming the .h file is standalone, and it seems like a form of non-modularity, but this is a minor issue we can deal with in a few ways and unrelated to the choice of flags.

Relevant logs are:

Chose /usr/local/google/home/sammccall/src/chromium/src/out/Android/gen/chrome/browser/android/proto/client_discourse_context.pb.cc as proxy for /usr/local/google/home/sammccall/src/chromium/src/out/Android/gen/chrome/browser/jni_headers/chrome/jni/BrowsingDataBridge_jni.h

Rebuilding file /usr/local/google/home/sammccall/src/chromium/src/out/Android/gen/chrome/browser/jni_headers/chrome/jni/BrowsingDataBridge_jni.h with command [/usr/local/google/home/sammccall/src/chromium/src/out/Android] …/…/third_party/llvm-build/Release+Asserts/bin/clang++ -x c++ -MMD -MF obj/chrome/browser/client_discourse_context_proto/client_discourse_context.pb.o.d -D V8_DEPRECATION_WARNINGS -D NO_TCMALLOC -D SAFE_BROWSING_DB_REMOTE -D CHROMIUM_BUILD -D FIELDTRIAL_TESTING_ENABLED -D ANDROID -D HAVE_SYS_UIO_H -D ANDROID_NDK_VERSION_ROLL=r16_1 -D CR_CLANG_REVISION=“328575-1” -D __STDC_CONSTANT_MACROS -D __STDC_FORMAT_MACROS -D COMPONENT_BUILD -D __GNU_SOURCE=1 -D CHROMIUM_CXX_TWEAK_INLINES -D _DEBUG -D DYNAMIC_ANNOTATIONS_ENABLED=1 -D WTF_USE_DYNAMIC_ANNOTATIONS=1 -D GOOGLE_PROTOBUF_NO_RTTI -D GOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -D HAVE_PTHREAD -D PROTOBUF_USE_DLLS -I …/… -I gen -I …/…/third_party/protobuf/src -I gen/protoc_out -I …/…/third_party/protobuf/src -fno-strict-aliasing --param ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -D DATE= -D TIME= -D TIMESTAMP= -funwind-tables -fPIC -pipe -fcolor-diagnostics -no-canonical-prefixes -ffunction-sections -fno-short-enums --target=arm-linux-androideabi -isystem …/…/third_party/android_ndk/sysroot/usr/include/arm-linux-androideabi -D ANDROID_API=16 -D NDK_FPABI= -D HAVE_PTHREAD_COND_TIMEDWAIT_MONOTONIC=1 -march=armv7-a -mfloat-abi=softfp -mtune=generic-armv7-a -mfpu=neon -mthumb -Wall -Werror -Wextra -Wimplicit-fallthrough -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -Wno-enum-compare-switch -Wno-null-pointer-arithmetic -Wno-ignored-pragma-optimize -Oz -fno-ident -fdata-sections -ffunction-sections -fomit-frame-pointer -gdwarf-3 -g2 -ggnu-pubnames -fvisibility=hidden -Xclang -load -Xclang …/…/third_party/llvm-build/Release+Asserts/lib/libFindBadConstructs.so -Xclang -add-plugin -Xclang find-bad-constructs -Xclang -plugin-arg-find-bad-constructs -Xclang check-ipc -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-undefined-bool-conversion -Wno-tautological-undefined-compare -std=gnu++14 -fno-exceptions -fno-rtti -isystem …/…/third_party/android_ndk/sources/cxx-stl/llvm-libc++/include -isystem …/…/third_party/android_ndk/sources/cxx-stl/llvm-libc++abi/include -isystem …/…/third_party/android_ndk/sources/android/support/include --sysroot=…/…/third_party/android_ndk/sysroot -fvisibility-inlines-hidden -c /usr/local/google/home/sammccall/src/chromium/src/out/Android/gen/chrome/browser/jni_headers/chrome/jni/BrowsingDataBridge_jni.h -resource-dir=/usr/local/google/home/sammccall/llvmbuild/bin/…/lib/clang/7.0.0

(Yes, the -M flags should really be filtered)

the implementation is in:

src/chrome/browser/android/browsing_data/browsing_data_bridge.cc

I don’t think that will get matched.

I don’t really care if code completion works in generated files but at least there shouldn’t be incorrect error messages in vscode. Is it possible to exclude a folder from being analyzed by clangd?

Not currently, it’ll look at anything with a supported extension.
It’s not obvious to me how to manage either heuristically or via configuration in a way that’s easy to use, so unless people have ideas I’d punt on this and try to improve the results instead.

Hi,
thanks for implementing the heuristics. Header files that have a cc file with the same name are working great now :slight_smile:

I found a few issues that I can’t really explain. It looks like some header-only files are matched with a basically random file:
base/no_destructor.h, base/macros.h and base/callback_forward.h are compiled with commands for third_party/freetype/freetype_source/ftbase.c.
(according to the “-MF obj/third_party/freetype/freetype_source/ftbase.o.d” parameter)

components/content_settings/core/browser/user_modifiable_provider.h is compiled with commands for components/autofill/core/browser/browser/address.cc

There are much better matches such as base/callback_helpers.cc or components/content_settings/core/browser/website_settings_info.cc.

There are also headers where good commands are choosen, e.g. components/browsing_data/core/clear_browsing_data_tab.h with components/browsing_data/core/core/browsing_data_utils.cc, so I wonder what is going wrong with the others files?

I’m using the clangd build 20180424.RC0-1

Ah, this is a fun case… logs from a debug clangd with -debug-only=interpolate:

interpolate: chose …/…/third_party/freetype/src/src/base/ftbase.c as proxy for /usr/local/google/home/sammccall/src/chromium/src/base/no_destructor.h preferring none score=3

The trick is the good match between src/src/base and src/base yields 3 points (should probably be two) - clangd doesn’t know that “src” is semantically meaningless, and “base” is at least ambiguous.

This yields at least a few few possible approaches:

  1. just fix the scorer to award 2 points instead of 3, which will make this rarer (should happen in any case)
  2. blacklist words like “src” from getting points
  3. dynamically detect which words are significant at startup time based on frequency
  4. more substantially change the approach to path matching

I quite like the idea of 3, we could index the N rarest directory segments for each command, rather than the N last.