Design proposal: Add custom compilation database formats as plugins

Hi all,

to make the tooling library more widely applicable, we’d like to make the CompilationDatabase architecture extensible, i.e. provide the option of implementing custom adapters to other kinds of compilation databases (other than the currently available JSONCompilationDatabase).

In r160061, I provided a hack that enables this, but we definitely want something nicer.

In the attached patch, I am using LLVM’s plugin registry to enable registering new compilation databases. To provide a new CompilationDatabase, you’d need to subclass CompilationDatabasePlugin and CompilationDatabase and register the former with CompilationDatabasePluginRegistry::Add (as shown for JSONCompilationDatabase). This is not a finished patch for review, I’d just like to get feedback on the approach. What do you think?

Also, several points that need to be addressed:

  • Error handling: In CompilationDatabase::loadFromDirectory, I currently simply try each plugin until I find a compilation database. Any ideas, what I should do with the error messages, especially if I do not find any compilation database. Should I concatenate them, potentially with the plugin-name as a prefix?
  • Should I move the JSONCompilationDatabase to a separate set of files as it is now loosely coupled?

Cheers,
Daniel

compilationdatabaseplugins.patch (6.66 KB)

Hi all,

to make the tooling library more widely applicable, we'd like to make the CompilationDatabase architecture extensible, i.e. provide the option of implementing custom adapters to other kinds of compilation databases (other than the currently available JSONCompilationDatabase).

In r160061, I provided a hack that enables this, but we definitely want something nicer.

In the attached patch, I am using LLVM's plugin registry to enable registering new compilation databases. To provide a new CompilationDatabase, you'd need to subclass CompilationDatabasePlugin and CompilationDatabase and register the former with CompilationDatabasePluginRegistry::Add (as shown for JSONCompilationDatabase). This is not a finished patch for review, I'd just like to get feedback on the approach. What do you think?

I definitely prefer this approach!

Also, several points that need to be addressed:
- Error handling: In CompilationDatabase::loadFromDirectory, I currently simply try each plugin until I find a compilation database. Any ideas, what I should do with the error messages, especially if I do not find any compilation database. Should I concatenate them, potentially with the plugin-name as a prefix?

Yes, I think it's reasonable to concatenate them with the name of the plug-in, because we don't know which plug-in the user intended to use.

- Should I move the JSONCompilationDatabase to a separate set of files as it is now loosely coupled?

Sure!

  - Doug

Hi all,

to make the tooling library more widely applicable, we'd like to make the CompilationDatabase architecture extensible, i.e. provide the option of implementing custom adapters to other kinds of compilation databases (other than the currently available JSONCompilationDatabase).

In r160061, I provided a hack that enables this, but we definitely want something nicer.

In the attached patch, I am using LLVM's plugin registry to enable registering new compilation databases. To provide a new CompilationDatabase, you'd need to subclass CompilationDatabasePlugin and CompilationDatabase and register the former with CompilationDatabasePluginRegistry::Add (as shown for JSONCompilationDatabase). This is not a finished patch for review, I'd just like to get feedback on the approach. What do you think?

I definitely prefer this approach!

My main argument against static registries is (bad) experience with
using them in Windows / DLL environments. Looping in Michael Spencer
to verify that this will not be a problem here (I really want the
tooling stuff to not run into problems with Windows down the line).

Cheers,
/Manuel

Hi all,

to make the tooling library more widely applicable, we'd like to make the CompilationDatabase architecture extensible, i.e. provide the option of implementing custom adapters to other kinds of compilation databases (other than the currently available JSONCompilationDatabase).

In r160061, I provided a hack that enables this, but we definitely want something nicer.

In the attached patch, I am using LLVM's plugin registry to enable registering new compilation databases. To provide a new CompilationDatabase, you'd need to subclass CompilationDatabasePlugin and CompilationDatabase and register the former with CompilationDatabasePluginRegistry::Add (as shown for JSONCompilationDatabase). This is not a finished patch for review, I'd just like to get feedback on the approach. What do you think?

I definitely prefer this approach!

My main argument against static registries is (bad) experience with
using them in Windows / DLL environments. Looping in Michael Spencer
to verify that this will not be a problem here (I really want the
tooling stuff to not run into problems with Windows down the line).

Cheers,
/Manuel

This will work if statically linked, but not dynamically unless the
__declspec(dll{export,import}) stuff is sprinkled on. However in this
case it may be possible as the interface seems quite small. Although
it would also require adding sprinkles to any non-stdlib c++ types
that are passed through the interface.

- Michael Spencer

Being limited to static linking on Windows seems fine for now; it's what Daniel's prior solution was doing already, and we know roughly what we need to do to get dynamic linking working better on Windows.

  - Doug

Cool, so I’ll go ahead and finish this patch. There is one remaining issue: If I don’t use anything from a .o file, the linker will simply disregard it, thereby rendering the contained compilation database useless. Is there a standard way to prevent that?

I was looking at adding compilation database support for ninja build
files. However, I ran into the problem that it basically amounted to
rewriting the entire ninja build file parser that is already
implemented inside of ninja. As such, it seems like the very nature of
compilation databases is something that needs to be output by the tool
configuring/doing the build.

Is your thought with this that, for example, ninja would provide a
compilation database plugin reusing its own parser?

--Sean Silva

For ninja in particular, I have long thought that the correct approach is
to write something which can convert 'ninja -t commands' into the JSON
format, or to build a tool for writing the JSON database directly into
ninja. Then CMake (or anything else that generates the build file) can
inject targets to automatically generate and update this file. This
includes re-building generated source files as necessary.

I think for OSS build systems and systems that are largely
command-line-driven like ninja, the file output and interchange system is a
much better alternative. The plugin support I think is mostly useful for
stateful build systems which may really need to implement the database
using custom logic.

For ninja in particular, I have long thought that the correct approach is to write something which can convert 'ninja -t commands' into the JSON format, or to build a tool for writing the JSON database directly into ninja.

This was my line of thought as well.

Since the C++ stdlib unfortunately doesn't have JSON support, what do
you think about a simpler, more plaintext-y compilation database
(PlaintextCompilationDatabase?); it seems like that might be a win
since it is simpler to output. For example, the format would just be

/path/to/dir/
clang++ foo.cpp
foo.cpp
<empty line>

Just a mirror of the JSON one, but with a format easier to output (god
forbid there's a newline in one of the command lines or filenames). On
the other hand, throwing together a simple "good-enough" JSON writer
isn't *too* difficult; nonetheless, for tools written in C/C++, this
could lower the bar to entry. Do you know what CMake currently does?
Does it have its own mini JSON writer?

--Sean Silva

We didn't go with the plaintext route for three important reasons:

1) Filenames do have whitespace in them. The format should be robust
against that.
2) Many other tools and languages should have a minimal barrier to read
them. An existing format eases this.
3) It is *incredibly* easy to write a minimal JSON writer that only
supports the features we need.

Manuel contributed the CMake support for the JSON database, and it includes
just such a trivial JSON writer. =] We can easily place a copy of the code
into LLVM or re-license it however it helps.

That said, the latest version of CMake already has support for JSON + Ninja
-- we didn't contribute it, so I don't know what strategy they followed,
but you should look at that and talk to the ninja and CMake developers
before going too far here.

Er, I should clarify, the latest git snapshot of the 'next' branch or
whatever. =] Not at all released yet.

3) It is *incredibly* easy to write a minimal JSON writer that only supports the features we need.

Ah, ok, I seem to have overestimated how difficult/annoying this would
be to write.

We can easily place a copy of the code into LLVM or re-license it however it helps.

This might be good, to help keep things consistent. Are you thinking
something with a role like ninja's ninja_syntax.py?

--Sean Silva

3) It is *incredibly* easy to write a minimal JSON writer that only supports the features we need.

Ah, ok, I seem to have overestimated how difficult/annoying this would
be to write.

Writing JSON is simple: it's a well-defined format with very little
chance that it ever changes. No need to have a "writer", just print it
out:

stream << "[\n";
for (tu : tus) {
  stream << " {\n " << "\"directory\": " << shellEscape(tu.directory())
  ...
}
stream << "]\n";

The hard part is the shell escaping. CMake already had that built in.

While I also like the plugin concept much better, I am still unsure about the fundamental problem of getting other libraries linked in. The problem is that the linker will simply throw away any object file that has no references to it. After some discussions on IRC, I see two options:

  1. Conditionally compile in a custom registration method, i.e.:
    in CompilationDatabase.cpp:
    void someInit() {
    #ifdef USE_CUSTOM_COMPILATION_DATABASES
    RegisterCompilationDatabases();

#endif
}
And then implement RegisterCompilationDatabases() together with the custom compilation database.
Pros: Very easy to understand and to use right.
Cons: It would be hard to link together multiple such implementations.

  1. Conditionally compile in custom anchors, i.e;.:
    in CompilationDatabase.cpp:
    #ifdef COMPILATION_DATABASE_ANCHORS
    COMPILATION_DATABASE_ANCHORS;
    #endif
    And then e.g. “-DCOMPILATION_DATABASE_ANCHORS=‘extern volatile int MyAnchorSource; static int MyAnchorDest = MyAnchorSource;’”
    Pros: Multiple custom compilation databases can be added. Nice clean separation (conditional code can be somewhere on the bottom of “CompilationDatabase.cpp”.
    Cons: Seems a bit tangled and easy to get the -D part wrong.

I will try to get #2 to work (possibly simplifying it a bit with more macros :-/). Does anyone see a better solution?

Or maybe:
- add a private field 'hook' in CompilationDatabase (a function pointer
to the user callback),
- initialize it by default to NULL
- add a setter to CompilationDatabase for this hook field
- and use the hook if non null in findCompilationDatabaseFromDirectory

The burden of linking in the other required libraries is left to the
user of the hook --- which seems good to me : we can not generally
decide for the user which libraries should be linked in.

This can probably be extended by changing the hook to a vector (of
hooks). This would allow registering several user backends to a specific
CompilationDatabase instance, and have it try to use them in turn, until
one succeed.

Cheers,

Or maybe:
- add a private field 'hook' in CompilationDatabase (a function pointer to
the user callback),
- initialize it by default to NULL
- add a setter to CompilationDatabase for this hook field

This is the problematic part, as it requires changing the tool's code
for each environment, if I understand you correctly. One of the
reasons we want the plugins is that we can use upstream clang tools
without keeping a branch that changes their code to register our
internal environment stuff.

Cheers,
/Manuel

Chandler Carruth wrote:

That said, the latest version of CMake already has support for JSON +
Ninja -- we didn't contribute it, so I don't know what strategy they
followed, but you should look at that and talk to the ninja and CMake
developers before going too far here.

I wrote it and pretty much followed the same thing Manuel did in the
Makefile generator.

The commit which actually adds the feature is trivial:

http://cmake.org/gitweb?p=cmake.git;a=commitdiff;h=db839bec7d076b54c5e9ad0d19386a26557a509e

Manuel mentioned before that he'd like to see ninja being able to generate a
database without cmake too though:

http://thread.gmane.org/gmane.comp.programming.tools.cmake.devel/3678/focus=3697

As Chandler said, it's not in a release yet, but will be in the next release
in a few weeks. Feel free to test the release candidate (I would appreciate
if you did)

I know multiple people who are refusing to work without this any more,
I've been using it since it landed in "next". So consider this part
pretty well tested (at least on the llvm codebase).

Cheers,
/Manuel

Good to hear :slight_smile:

Thanks,

Steve.

Attached is a patch to do most of this restructuring, kindly asking for review.

In particular, it does:

  • Restructure the compilation database architecture to using LLVM’s registry concept. It should now be possible to link in additional compilation databases.
  • Separate the JSONCompilationDatabase from CompilationDatabase to show the loose coupling and serve as an example.

compilationdatabases.patch (32.7 KB)

I'd like the class-comment for JSONCompilationDatabase to still
include what's now in the file comment, so it's visible in doxygen.

Also, I'd prefer to use a result value to capture the error message
instead of llvm::outs'ing in findCompilationDatabaseFromDirectory.
Perhaps we should also switch this to Diagnostics?

Cheers,
/Manuel