[clang-tools-extra] A Clang extra tool to find missing includes

Crossposting from cfe-commits, as suggested.

The code is up at http://llvm-reviews.chandlerc.com/D1538 .

This is a tool that uses an external database, like a ctags file, to suggest missing includes to fix a broken translation unit. It can produce diagnostics like:

TESTFILE:6:9: note: Inferring that M names a namespace.
using ::m::X;
^
TESTFILE:6:12: note: For M::X, include m_x_empty_class.h.
using ::m::X;
^

or:

TESTFILE:6:1: note: For M::Q::Y, include m_q_y_empty_class.h.
Y a;
^
TESTFILE:6:1: note: Alternately, for N::Q::Y, include n_q_y_empty_class.h.

It works by looping for a bounded number of times. On each iteration, the broken TU is parsed and passed through semantic analysis. When typos are encountered or when an incomplete type is detected where a complete type is required, the database is consulted to try and find a header with an appropriate declaration or definition. The header is inserted into the TU (currently using a fixed formatting rule).

The primary components of the tool and their roles are:

SemaSearchPlugin: An interface to a semantic database providing name-to-definition-file queries and limited name-to-kind (here, namespace) queries. It’s useful to know if some fully-qualified name points to a namespace because it allows the tool to ‘fake out’ missing namespace declarations without confusing them with static members (Foo::Bar vs ClassFoo::InnerClass::Bar). The tool does this when a typo correction hook isn’t able to find a name during the Sema::LookupNestedNameSpecifierName LookupKind.

CtagsSemaSearchPlugin: A specialization for SemaSearchPlugin that works over Exuberant Ctags files. (Parsing these files is handled by the readtags library.)

FileTracker: Keeps track of data associated with a particular TU being fixed across tool iterations. Holds a MemoryBuffer that contains the most recent version of the file that was rewritten (if rewriting occurred).

FyiAction: Hooks MaybeDiagnoseMissingCompleteType and CorrectTypo calls from Sema.

In the former case, since a complete type has a fully qualified name, a simple query can be made on the SemaSearchPlugin (eg, if we see that Foo::Bar is incomplete in a context where a complete type is necessary, we can query for that fully-qualified name). The database may contain multiple entries for a definition site; ranking these is the plugin’s duty. The highest-ranked suggestion is used to add an include, while other suggestions may be displayed as diagnostic notes. (The ctags plugin simply returns an unordered vector.)

In the latter case, the mapping between (un)qualified name and fully-qualified name is ambiguous. The tool searches DeclContexts from innermost to outermost, appending the typo’s NestedNameSpecifier as appropriate (and taking care to first expand namespace aliases). If it fails a lookup in the database, it will then (as a heuristic) look in the namespaces of available ‘using’ declarations. Finally, it will try simple unqualified lookup.

FyiDiagnosticHooks: Registers custom diagnostics, acts as a diagnostic sink to collect diagnostics (so that intermediate steps in the outer tool loop don’t litter the terminal), and flags some warnings and notes as errors to force the tool to spin around again–for example, if the compiler generates a diag::warn_delete_incomplete, the tool should treat this as seriously as any other incomplete-type error.

FyiPreprocessorHooks: Listens for file-entered and file-left events, watches for inclusion directives, and keeps track of insertion positions in target TUs. The rewrite and header-equality mechanisms currently assume a particular programming style (system first, then qualified-from-project-root) and that the missing headers will be quoted.

FyiActionFactory: Manages the map from filenames to FileTrackers and the global iteration count. FyiActionFactory can save out changes to files (optionally with new suffixes or after moving the old file to a backup).

FindYourIncludes: Contains the tool’s entry point.

More information on the various components is available in Doxygen comments throughout the source code.

  Rebased the code.

[http://llvm-reviews.chandlerc.com/D1538](http://llvm-reviews.chandlerc.com/D1538)

CHANGE SINCE LAST DIFF
  [http://llvm-reviews.chandlerc.com/D1538?vs=4176&id=4558#toc](http://llvm-reviews.chandlerc.com/D1538?vs=4176&id=4558#toc)

Hi Luke,

I think in general this makes sense. One problem is that it’s a really large change, so be prepared for it to require a longer time for the review - if there’s any way to split it up into “manageable” chunks, that provide incremental functionality (note that splitting up vertically will not help at all), that would be super useful…

One concern I have is the ctags database. I haven’t seen useful ctags tools for C++ yet, and I’d vote for rather implementing something simple that works well for C++ on our own than importing a ctags parser. That said, I’ve not yet looked at the code in detail, as the change is large enough that it doesn’t load in phabricator for me (I filed an upstream bug :wink:

Cheers,
/Manuel

Hi Luke,

I think in general this makes sense. One problem is that it's a really
large change, so be prepared for it to require a longer time for the review
- if there's any way to split it up into "manageable" chunks, that provide
incremental functionality (note that splitting up vertically will not help
at all), that would be super useful...

One concern I have is the ctags database. I haven't seen useful ctags
tools for C++ yet, and I'd vote for rather implementing something simple
that works well for C++ on our own than importing a ctags parser. That
said, I've not yet looked at the code in detail, as the change is large
enough that it doesn't load in phabricator for me (I filed an upstream bug
:wink:

A few more high level comments:
- method names in clang should be camelCase() (
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
)
- dumping a "public domain" implementation of a ctags parser will not fly
(according to Chandler) - if we want to support ctags, we'll probably
reimplement it to make sure it's provided with a LLVM compatible license
- why do we want a tool instead of providing an extension point in libclang
to hook this directly into the parser's recovery?

Cheers,
/Manuel

Thanks for taking the time to look at this!

The ctags database seemed like a good way to get a simple backend up and running. I agree that it is not an ideal tool for C++. Of the various efforts going on to build this sort of database, is there one that is generally preferred by the community? For example, there exists cldoc <http://jessevdk.github.io/cldoc/>, which claims to serialize to an XML-based representation. Maybe going from this representation to something like JSON (since there’s already parsing code for this checked in to support compilation databases) would not be too bad.

I thought I’d normalized the function names to the surrounding Clang code, but it looks like I was following an out-of-date standard. (Maybe someone someday will work out a good way of getting clang-format to understand the morphology and case rules for identifiers.)

As to whether this should be part of libclang or a separate tool, I will need to learn more about the former before I can think of a good answer! I regret that I have significantly less time to work on this now than when I was building it as a summer project, so any large refactoring or split work may be blocked for a while, but I am still happy to receive comments.

Thank you again for your time.

Luke

Thanks for taking the time to look at this!

The ctags database seemed like a good way to get a simple backend up and
running. I agree that it is not an ideal tool for C++. Of the various
efforts going on to build this sort of database, is there one that is
generally preferred by the community? For example, there exists cldoc <
http://jessevdk.github.io/cldoc/&gt;, which claims to serialize to an
XML-based representation. Maybe going from this representation to something
like JSON (since there's already parsing code for this checked in to
support compilation databases) would not be too bad.

I think we'll eventually want to build something in clang/Tooling or
clang-tools-extra to provide good C++ project indexing.

Hi Luke,
   That sounds like a very interesting tool to me. Especially for more interactive use cases like ours.
   In cling we would like on a missing/incomplete name to #include the header file (if exists) that provides the definition.

   If you are interested in such use cases I will be happy to discuss, adapt and use your tool within cling's context.
Vassil