(LibTooling) (scan-build-py) Link commands in the CompilationDatabase JSON

Dear Members,
Dear Manuel Klimek and László Nagy (rizsotto),

I'm resurrecting an older discussion
(http://clang-developers.42468.n3.nabble.com/compilation-db-question-td4054364.html)
and replying upon my request to include link commands in the
intercept-build's build.json
(https://github.com/rizsotto/scan-build/issues/80). Here at Ericsson
we develop the tools CodeChecker and CodeCompass. We use the
compilation commands JSON to get the information we need, but neither
CMake's generated database (related discussion:
http://clang-developers.42468.n3.nabble.com/Extending-CMAKE-EXPORT-COMPILE-COMMANDS-td4024793.html)
nor those produced by intercept-build contain linkage commands, which,
in certain cases, our tools need.
For this reason, we have been supplying our own interceptor, LD-LOGGER
(https://github.com/Ericsson/codechecker/tree/master/vendor/build-logger),
but it's messy and unmaintained as of now.

This is why the request to rizsotto's project has been posted, and he
pointed me in the direction of Clang, but I'd like to get some
pointers before I delve into changing the code.
I've tried some dummy build.jsons and scenarios with the current
(today's morning UTC) LibTooling projects such as clang-check and
clang-tidy.

Let's consider an example simple project, which is compiled (clang++
-c a.cpp) and then linked (clang++ a.o -o main.out). This will write
TWO entries into the build.json, one with the compile and one with the
linker command, and libtooling programs work with it perfectly, the
link command (valid as per the Compilation Command Database
specification) is not causing any mayhem.

Now consider the following build commands in the project.

    clang++ a.cpp b1.cpp -o ab1.o
    clang++ a.cpp b2.cpp -o ab2.o
    clang++ ab1.o c.cpp -o one.out
    clang++ ab2.o d.cpp -o two.out

If this is logged, either by our tool (with the linkage commands) or
via intercept-build, or -even- if I create a valid build.json for this
project in an editor, the tools clang-tidy and clang-check fail with
the error
     error: unable to handle compilation, expected exactly one compiler job

Which is understandable, because as of now, a.cpp exists twice in the
compile commands. Actually there are four lines, two with a.cpp as
file, and one-one with b1 and b2, but only two commands are
duplicated. Which is the expected result, seeing how the project is
built in our example. (This, to my understanding, fits the
specification of a CCDb.)

My questions are:

1. Is the "only one compiler job" an expectation only standing in
tools like clang-tidy and clang-check who want to "query" the proper
compilation commandline from the build.json and fail into ambiguity if
there are more, or is this a more general expectation?

2. Rizsotto said, and I quote

"But very little (or none) support for it in the current Clang tooling
library. (I would call the compilation database parser in Clang very
picky/strict.)
Currently I'm busy to merge this code into Clang repository. Would not
implement this feature now. [...] I can put more effort into it, when
there is a more generic driver from Clang side too. As far as I can
see Manuel (one of the guy behind Clang tooling) is supporting it, but
lack resource to implement it. (Be the change you want! ;))"

Assuming that I implement logging the build commands into
intercept-build (or Bear), which are the crucial Clang parts which I
should expect to be broken by the fact that linker commands are in the
database? Should there be a filter somewhere, in some project of
Clang, which filters the link commands on some criteria? (In our
tools, we implemented rules based on which we decided whether or not
an entry in ld-logger's output is a compile or a link command.)

As seen above, to my current understanding, having link commands does
not make LibTooling's head spin around --- but having the same file
referenced multiple time does, at least for some tools.

3. (This is more directed at Manuel)
Did the thought train move forward since November? What is the current
consensus on this approach? We would like to increase our tools'
support for what is generally used and more maintained in the
community.

Best regards,
Whisperity.

Hi Whisperity,

I thought I understood your question on github, but this email is confusing me… Can I ask simple questions to clarify a few things?

Your example is not a valid compilation. Did you try them?

$ clang -c a.c b.c -o ab.o
clang-3.8: error: cannot specify -o when generating multiple output files

$ clang a.c b.c -o ab.o

The second one compiles iff a.c or b.c contains main implementation. Then ab.o becomes not an object file, but an executable. So, that’s already a linking!

To have duplicated entries in compilation database are not problem. So, if you have the same module multiple times, that’s just fine.

$ clang -c a.c -o a.o

$ clang -c a.c -Dkey=value -o a.o

will result two entries where the “file” attribute is the same.

$ clang a.c b.c -o ab

As previously explained this is two compilation and one linking. Current tools will generate a compilation database with two entries only.

[

{“directory”: “.”, “file”: “a.c”, “command”: “cc -c a.c”},

{“directory”: “.”, “file”: “b.c”, “command”: “cc -c b.c”},

]

My understanding was earlier that you want this to be a three element list. Is that correct? Or you want a single element list?

Or even simpler, shall we make an entry for this too?

$ clang a.o b.o -o ab

But then, shall we record linker commands like this?

$ ld a.o b.o -o ab

Or even this?

$ ar crf lib.a a.o b.o

How would you represent these commands in the JSON compilation database?

Because my main point was, you need to define this first (via this list, with consent and implementation support) to have the tools (cmake, intercept-build, etc…) to generate the desired output.

Regards,

Laszlo

Generally, I want the "(>=)three rows" version. Currently our tool
represents these the same way as compile commands are represented,
just the "file" isn't a source code but the object file.

So recording this:

  $ clang++ -c main.cpp a.cpp
  $ clang++ main.o a.o -o main.out

Should result in something like this:

  [
   {"directory": ".", "file": "main.cpp", "command": "clang++ -c main.cpp"},
   {"directory": ".", "file": "a.cpp", "command": "clang++ -c a.cpp"},
   {"directory": ".", "file": "main.o", "command": "clang++ main.o a.o
-o main.out"},
   {"directory": ".", "file": "a.o", "command": "clang++ main.o a.o -o
main.out"},
  ]

This, for my understanding, is compliant to how the file's
specification is: it is clearly visible that the source file for the
command is the object, and by parsing the command, we can be sure that
it was a link command, as long as we can do it concisely, it should be
fine. But I'm not sure which would be the -best available- way for
this.

But simply saying two one-liner commands which include linkage:

// I purposefully omitted the "-o" argument.
$ clang++ main.cpp 1.cpp
$ clang++ main.cpp 2.cpp

Should also show that these files were compiled together. (Right now,
as I see, there is some deduplication happening. And without this
deduplication, clang-check and clang-tidy seems to fail.)

  [
   {"directory": ".", "file": "main.cpp", "command": "clang++ main.cpp 1.cpp"},
   {"directory": ".", "file": "main.cpp", "command": "clang++ main.cpp 2.cpp"},
   {"directory": ".", "file": "1.cpp", "command": "clang++ main.cpp 1.cpp"},
   {"directory": ".", "file": "2.cpp", "command": "clang++ main.cpp 2.cpp"},
  ]

This is why I originally wanted and I still want this to NOT be the
default behaviour, but something that is triggered by a switch. If
your tool can support the understanding of linker commands in the
build.json, you flick this switch, and then you (can) expect an output
that contains link commands. Other tools simply not flick the switch,
and retain the "purely compilation commands only" view.

So what I want to see is the linkage graph represented in the file. By
parsing the file and examining the data in it, I want to be able to
model which translation units and objects were linked together when
the project was built.

Hi Whisperity,

Another idea about the linkage graph… which might contradict to what I’ve just said earlier. :slight_smile:

Why not use ninja build file for compilation database? That’s already stores the linkage graph. All is needed a reader which takes a ninja build file and implement the CompilationDatabase interface to work with libtooling based tools. And for your project another reader that can extract not only the compilations but the linkages too.

How does that sound for you?

The problem with this is that using the "ninja build file" implies
that a "ninja build file" exists, or that we are able to generate it.

Which might not be the case. A (very) quick google search didn't turn
up any meaningful way to transform an existing Makefile into a ninja
build file, or even make an autotools project compile for ninja.

Of course, CMake can generate a ninja build file, but that, while
resolves the assumption/expctation that "ninja build file exists", it
creates a new expectation, that there is a CMake project from which
the ninja build file could be generated.