[PATCH] Implement a sane plugin API for clang

While I'm all for the idea of improving the plugin API, I think that a modest reduction in boilerplate is not sufficiently compelling to foist a new plugin API on people who already have existing code. The funny thing about boilerplate is that it's easy to copy-paste, so it doesn't really impede people from achieving their goals since they can just copy the code that already works. The primary problem of boilerplate is that it has the effect of deterring newbies, and that issue can be easily combated with improved documentation, which avoids breaking every external plugin and tutorial on plugins.

One thing that REALLY sucks with the current approach is the need to specify clang -Xclang -load -Xclang <plugin tarball> -Xclang -add-plugin <plugin name> -Xclang -plugin-<name>-arg -Xclang <blah> ...
With the new approach, the command line is clang -fplugin=<tarball> -fplugin-arg-<name>-<arg>=<blah>, which is a much shorter command line and can actually be passed into CFLAGS/CXXFLAGS without driving libtool bonkers (I detest the need for wrapper scripts just to pass arguments) and also eliminates warnings whenever people use $(CXX) $(CXXFLAGS) as the linker. The present crappy command lines really make plugins feel like second-class citizens to Clang.

The changes in this patch retain almost all of the same functionality as the original plugin approach (including the ability to do things like add custom compile passes via the RegisterPass statics) while wrapping it in a much saner wrapper.

We really ought to have a preliminary discussion on cfe-dev to map out the directions we want to take the plugin API, and what *compelling*, *enabling* changes we can make to it.

I have often heard that the static analyzer in Clang should be remodeled as a plugin instead of its current approach, but I'm not exactly sure what it was meant by that.

Some changes I have personally wanted to be able to do with plugins include the ability to add custom attribute handlers (using annotate("") for extra attributes is a very gross hack), as well as be able to manipulate the pass manager, to make it easier to test custom optimization or analysis patches--this would even imply, in the long run, the ability to add custom passes at LTO time. Modelling as plugins would greatly reduce barriers for other out-of-tree projects that use LLVM and Clang.

While I'm all for the idea of improving the plugin API, I think that a
modest reduction in boilerplate is not sufficiently compelling to foist a
new plugin API on people who already have existing code. The funny thing
about boilerplate is that it's easy to copy-paste, so it doesn't really
impede people from achieving their goals since they can just copy the code
that already works. The primary problem of boilerplate is that it has the
effect of deterring newbies, and that issue can be easily combated with
improved documentation, which avoids breaking every external plugin and
tutorial on plugins.

One thing that REALLY sucks with the current approach is the need to
specify clang -Xclang -load -Xclang <plugin tarball> -Xclang -add-plugin
<plugin name> -Xclang -plugin-<name>-arg -Xclang <blah> ...
With the new approach, the command line is clang -fplugin=<tarball>
-fplugin-arg-<name>-<arg>=<**blah>, which is a much shorter command line
and can actually be passed into CFLAGS/CXXFLAGS without driving libtool
bonkers (I detest the need for wrapper scripts just to pass arguments) and
also eliminates warnings whenever people use $(CXX) $(CXXFLAGS) as the
linker.

Realistically a tiny script might be a better long-term design, allowing
e.g. "clang++ `clang-plugin-config myPlugin.so arg1 arg2` foo.cpp".
Remember that the primary advantage of plugins vs libtooling/libclang is
that they are run as part of a build process, meaning that in reality these
command lines are meant to be generated by a "configure" step and not by
hand. So really the "user friendliness" is determined by "how easy is it to
integrate a clang plugin into my build", and not by the exact commandline
syntax per se. This kind of script could also serve as a useful layer of
indirection and "user friendliness", e.g. it could recognize a
"CLANG_PLUGIN_PATH" or other niceties that would be dubious to add to clang
itself.

The present crappy command lines really make plugins feel like
second-class citizens to Clang.

TBQH it is a second-class citizen if you consider how much development
effort and dogfooding go into it compared to e.g. libtooling or libclang.

Of course, I think it would be great to lift it from this status! But I
think that we need some more discussion and a better sense of direction for
that to happen.

If you really want to immediately push plugins forward in a big way, it
would be monumental to set up a buildbot that runs a clang plugin that does
extra checking that isn't really appropriate for being integrated as a
diagnostic into the compiler proper. For example, a plugin that warns on
incorrect uses of dyn_cast<>. For maximum effect this should be developed
in-tree (probably in clang-tools-extra. Even though it has "tools" in the
name, I don't think anybody would be opposed to developing plugins in
there). It should also have an easy way for people in our community to come
up with and implement good extra checks and get them integrated into that
buildbot.

The changes in this patch retain almost all of the same functionality as

the original plugin approach (including the ability to do things like add
custom compile passes via the RegisterPass statics) while wrapping it in a
much saner wrapper.

My opposition to the current patch is that it does not provide enough value
to our users to compensate for the inconvenience that it will cause them
(by breaking their code). My opposition is not technical; I don't doubt
that your approach here is an improvement from a purely technical
standpoint.

We really ought to have a preliminary discussion on cfe-dev to map out

the directions we want to take the plugin API, and what *compelling*,
*enabling* changes we can make to it.

I have often heard that the static analyzer in Clang should be remodeled
as a plugin instead of its current approach, but I'm not exactly sure what
it was meant by that.

Some changes I have personally wanted to be able to do with plugins
include the ability to add custom attribute handlers (using annotate("")
for extra attributes is a very gross hack), as well as be able to
manipulate the pass manager, to make it easier to test custom optimization
or analysis patches--this would even imply, in the long run, the ability to
add custom passes at LTO time. Modelling as plugins would greatly reduce
barriers for other out-of-tree projects that use LLVM and Clang.

These are all excellent ideas. Please start a new thread so that we can
have a focused discussion about these and other ideas.

-- Sean Silva

A “tiny script” sounds to me a much worse option: you now have to run two processes per compiler invocation instead of one (and on Windows, process creation can take hundreds of milliseconds, so it’s a noticeable slowdown). Also, you have to find more binaries to run it: if I specify CXX via a path, how should a build system know where to run clang-plugin-config from? You could guess by looking up the dirname of CXX and hoping it’s there, but you are also advocating using shell scripts to represent CXX in another email, which renders this approach impossible. I am working on adding a compiler static checker plugin to Mozilla that would check the guarantees our old dehydra plugin used to check: a “must override” annotation (all subclasses must provide their own implementation of this method), a “stack class” annotation (this class cannot be allocated from the heap), and a warning that gets emitted every time you emit a static initializer. The current plugin approach presumes that it is a pure consumer of the AST, which isn’t a viable option in my opinion. One thing I would like to do in the future is be able to map Decls in the AST to functions emitted in the LLVM IR, which is completely impossible under the current architecture. Note also that I’m not removing the current (more or less broken) plugin architecture, so I’m not compelling people to switch. Rather, this is about enabling future changes that permit plugins to not take the view that they happen independently of code generation.

While I'm all for the idea of improving the plugin API, I think that a
modest reduction in boilerplate is not sufficiently compelling to foist a
new plugin API on people who already have existing code. The funny thing
about boilerplate is that it's easy to copy-paste, so it doesn't really
impede people from achieving their goals since they can just copy the code
that already works. The primary problem of boilerplate is that it has the
effect of deterring newbies, and that issue can be easily combated with
improved documentation, which avoids breaking every external plugin and
tutorial on plugins.

One thing that REALLY sucks with the current approach is the need to
specify clang -Xclang -load -Xclang <plugin tarball> -Xclang -add-plugin
<plugin name> -Xclang -plugin-<name>-arg -Xclang <blah> ...
With the new approach, the command line is clang -fplugin=<tarball>
-fplugin-arg-<name>-<arg>=<blah>, which is a much shorter command line and
can actually be passed into CFLAGS/CXXFLAGS without driving libtool bonkers
(I detest the need for wrapper scripts just to pass arguments) and also
eliminates warnings whenever people use $(CXX) $(CXXFLAGS) as the linker.

Realistically a tiny script might be a better long-term design, allowing
e.g. "clang++ `clang-plugin-config myPlugin.so arg1 arg2` foo.cpp".
Remember that the primary advantage of plugins vs libtooling/libclang is
that they are run as part of a build process, meaning that in reality these
command lines are meant to be generated by a "configure" step and not by
hand. So really the "user friendliness" is determined by "how easy is it to
integrate a clang plugin into my build", and not by the exact commandline
syntax per se. This kind of script could also serve as a useful layer of
indirection and "user friendliness", e.g. it could recognize a
"CLANG_PLUGIN_PATH" or other niceties that would be dubious to add to clang
itself.

A "tiny script" sounds to me a much worse option: you now have to run two
processes per compiler invocation instead of one (and on Windows, process
creation can take hundreds of milliseconds, so it's a noticeable slowdown).

The clang driver already forks at least one process per compiler
invocation. Your comments apply equally to that and I don't see anybody
running to fix them (or even complaining), so I'm not convinced that this
is really as significant of an issue as you make it out to be.

Also, you have to find more binaries to run it: if I specify CXX via a
path, how should a build system know where to run clang-plugin-config from?
You could guess by looking up the dirname of CXX and hoping it's there,

I'm not sure I follow your point here. I image clang-plugin-config and the
wrapper to be installed next to clang and be looked up/executed as usual.

but you are also advocating using shell scripts to represent CXX in another

email, which renders this approach impossible.

I also don't see the connection with my suggestion in the other email. In
fact, the wrapper script for plugins and the compile_commands.json
harvester could probably be the same script, and at configure time
clang-plugin-config (or, perhaps better just `clang-config` now that it is
going beyond plugins) would arrange for the wrapper script to perform the
requested actions.

  If you really want to immediately push plugins forward in a big way, it
would be monumental to set up a buildbot that runs a clang plugin that does
extra checking that isn't really appropriate for being integrated as a
diagnostic into the compiler proper. For example, a plugin that warns on
incorrect uses of dyn_cast<>. For maximum effect this should be developed
in-tree (probably in clang-tools-extra. Even though it has "tools" in the
name, I don't think anybody would be opposed to developing plugins in
there). It should also have an easy way for people in our community to come
up with and implement good extra checks and get them integrated into that
buildbot.

I am working on adding a compiler static checker plugin to Mozilla that
would check the guarantees our old dehydra plugin used to check: a "must
override" annotation (all subclasses must provide their own implementation
of this method), a "stack class" annotation (this class cannot be allocated
from the heap), and a warning that gets emitted every time you emit a
static initializer.

Awesome. Please keep us up to date with this work. Some of these checks
seem like they could be relevant to llvm/clang too.

  The changes in this patch retain almost all of the same functionality

as the original plugin approach (including the ability to do things like
add custom compile passes via the RegisterPass statics) while wrapping it
in a much saner wrapper.

My opposition to the current patch is that it does not provide enough
value to our users to compensate for the inconvenience that it will cause
them (by breaking their code). My opposition is not technical; I don't
doubt that your approach here is an improvement from a purely technical
standpoint.

The current plugin approach presumes that it is a pure consumer of the
AST, which isn't a viable option in my opinion. One thing I would like to
do in the future is be able to map Decls in the AST to functions emitted in
the LLVM IR, which is completely impossible under the current architecture.
Note also that I'm not removing the current (more or less broken) plugin
architecture, so I'm not compelling people to switch.

You did delete the only code (PrintFunctionNames) in tree that AFAIK tests
the previous functionality, which I interpreted as meaning that it was dead
to you.

Rather, this is about enabling future changes that permit plugins to not
take the view that they happen independently of code generation.

This did not get through to me from the OP. Could you explain how the
design you implement in this patch achieves that? It should be the emphasis
of the review (and IMHO warrants a "does this direction and implementation
approach sound good to everyone" cfe-dev discussion before proposing code
to be committed).

Also, the command line parsing stuff should be in a separate patch, and IMO
the -fplugin should be just a driver arg: that way, the previous
commandline args for plugins (directly via cc1) remains in a live code path.

As I said earlier, the compatibilty stuff also deserves a rehash, since I'm
still not convinced that it is really useful.

-- Sean Silva

If I were to add a command in my Makefiles to spawn a $(shell), the reviewers would throw a hissy fit. I’ll also point out that Clang is not a widely-used compilers on Windows systems, where this kind of stuff matters more. The logic I currently use to look up llvm-config for building the plugin is as follows: if test -z “$CXX”; then CXX=which clang++ fi if test -z “$LLVMCONFIG”; then LLVMCONFIG=which llvm-config fi if test -z “$LLVMCONFIG”; then LLVMCONFIG=dirname $CXX/llvm-config fi The ideas is to try to make this “just work” if the compiler to be used is clang. However, if CXX is a shell script and clang is not specifically in PATH (the latter case is not an esoteric situation–it’s how our own builders get to clang), then the value returned is wrong. It’s also wrong if people start using clang with versioning numbers: consider clang symlinked to a clang-3.2, but you’re building with clang-3.3. Looking up llvm-config in the path would find the llvm-config for 3.2 here instead of 3.3, which would be wrong. IMHO, gcc’s -print-file-name=plugin is much better (you don’t need to guess at the locations of other tools!). The biggest stumbling block to implementing useful checkers is the inability to add custom annotations… annotate(string) is currently being used as a hack, but what is really needed is the ability to specify custom C++11 attributes. Actually committing the static checker can be found in , but there is a long list of desired analyses here . The old API I consider deprecated, but deprecated does not mean imminent removal. Also, the examples directory isn’t built by default, so I doubt it’s actually really being tested. If you pay careful attention, you’d notice that the plugins are kept around in the CompilerInstance object, which is passed around to all the AST actions, including the CodeGen AST action; the old plugin API stores everything as separate AST actions and instead multiplexes all the AST actions together, so the CodeGen AST action is unaware of the existence of plugins, short of creating Yet Another Static Initializer attachment point. That isn’t feasible here, as the two plugin loading paths are actually doing rather different things, and I don’t think it is desirable to attempt to merge what are conceptually different models of plugins. The primary purpose of compatibility checking is to detect a situation that would almost certainly lead to crash instead of crashing. Users deserve to get useful error messages instead of panicking crash dumps.

The clang driver already forks at least one process per compiler
invocation. Your comments apply equally to that and I don't see anybody
running to fix them (or even complaining), so I'm not convinced that this
is really as significant of an issue as you make it out to be.

If I were to add a command in my Makefiles to spawn a $(shell), the
reviewers would throw a hissy fit. I'll also point out that Clang is not a
widely-used compilers on Windows systems, where this kind of stuff matters
more.

I'm not sure how I got derailed, but for the plugins there is no "wrapper
script", just `clang-plugin-config myPlugin.so arg1 arg2` once at configure
time. (the wrapper script would be for harvesting the compile commands, but
that's a separate discussion and I'm not sure how it got brought into this
one).

Also, you have to find more binaries to run it: if I specify CXX via a
path, how should a build system know where to run clang-plugin-config from?
You could guess by looking up the dirname of CXX and hoping it's there,

I'm not sure I follow your point here. I image clang-plugin-config and
the wrapper to be installed next to clang and be looked up/executed as
usual.

but you are also advocating using shell scripts to represent CXX in

another email, which renders this approach impossible.

I also don't see the connection with my suggestion in the other email.
In fact, the wrapper script for plugins and the compile_commands.json
harvester could probably be the same script, and at configure time
clang-plugin-config (or, perhaps better just `clang-config` now that it is
going beyond plugins) would arrange for the wrapper script to perform the
requested actions.

The logic I currently use to look up llvm-config for building the plugin
is as follows:
if test -z "$CXX"; then
  CXX=`which clang++`
fi
if test -z "$LLVMCONFIG"; then
  LLVMCONFIG=`which llvm-config`
fi
if test -z "$LLVMCONFIG"; then
  LLVMCONFIG=`dirname $CXX`/llvm-config
fi

The ideas is to try to make this "just work" if the compiler to be used is
clang. However, if CXX is a shell script and clang is not specifically in
PATH (the latter case is not an esoteric situation--it's how our own
builders get to clang), then the value returned is wrong. It's also wrong
if people start using clang with versioning numbers: consider clang
symlinked to a clang-3.2, but you're building with clang-3.3. Looking up
llvm-config in the path would find the llvm-config for 3.2 here instead of
3.3, which would be wrong. IMHO, gcc's -print-file-name=plugin is much
better (you don't need to guess at the locations of other tools!).

Sorry I was confusing myself earlier. As I said before the
"clang-plugin-config" runs once at configure time. Let's keep the
discussion of the pros/cons of the wrapper script in the other thread.

     If you really want to immediately push plugins forward in a big way,

it would be monumental to set up a buildbot that runs a clang plugin that
does extra checking that isn't really appropriate for being integrated as a
diagnostic into the compiler proper. For example, a plugin that warns on
incorrect uses of dyn_cast<>. For maximum effect this should be developed
in-tree (probably in clang-tools-extra. Even though it has "tools" in the
name, I don't think anybody would be opposed to developing plugins in
there). It should also have an easy way for people in our community to come
up with and implement good extra checks and get them integrated into that
buildbot.

I am working on adding a compiler static checker plugin to Mozilla that
would check the guarantees our old dehydra plugin used to check: a "must
override" annotation (all subclasses must provide their own implementation
of this method), a "stack class" annotation (this class cannot be allocated
from the heap), and a warning that gets emitted every time you emit a
static initializer.

Awesome. Please keep us up to date with this work. Some of these checks
seem like they could be relevant to llvm/clang too.

The biggest stumbling block to implementing useful checkers is the
inability to add custom annotations... annotate(string) is currently being
used as a hack, but what is really needed is the ability to specify custom
C++11 attributes. Actually committing the static checker can be found in
<767563 - Build a clang static checker,
but there is a long list of desired analyses here
<Bug List;
.

   The changes in this patch retain almost all of the same functionality

as the original plugin approach (including the ability to do things like
add custom compile passes via the RegisterPass statics) while wrapping it
in a much saner wrapper.

My opposition to the current patch is that it does not provide enough
value to our users to compensate for the inconvenience that it will cause
them (by breaking their code). My opposition is not technical; I don't
doubt that your approach here is an improvement from a purely technical
standpoint.

The current plugin approach presumes that it is a pure consumer of the
AST, which isn't a viable option in my opinion. One thing I would like to
do in the future is be able to map Decls in the AST to functions emitted in
the LLVM IR, which is completely impossible under the current architecture.
Note also that I'm not removing the current (more or less broken) plugin
architecture, so I'm not compelling people to switch.

You did delete the only code (PrintFunctionNames) in tree that AFAIK
tests the previous functionality, which I interpreted as meaning that it
was dead to you.

The old API I consider deprecated, but deprecated does not mean imminent
removal. Also, the examples directory isn't built by default, so I doubt
it's actually really being tested.

Rather, this is about enabling future changes that permit plugins to not
take the view that they happen independently of code generation.

This did not get through to me from the OP. Could you explain how the
design you implement in this patch achieves that? It should be the emphasis
of the review (and IMHO warrants a "does this direction and implementation
approach sound good to everyone" cfe-dev discussion before proposing code
to be committed).

If you pay careful attention, you'd notice that the plugins are kept
around in the CompilerInstance object, which is passed around to all the
AST actions, including the CodeGen AST action; the old plugin API stores
everything as separate AST actions and instead multiplexes all the AST
actions together, so the CodeGen AST action is unaware of the existence of
plugins, short of creating Yet Another Static Initializer attachment point.

Is there any way we could fuse the old functionality into the new
functionality?

  Also, the command line parsing stuff should be in a separate patch, and
IMO the -fplugin should be just a driver arg: that way, the previous
commandline args for plugins (directly via cc1) remains in a live code path.

That isn't feasible here, as the two plugin loading paths are actually
doing rather different things, and I don't think it is desirable to attempt
to merge what are conceptually different models of plugins.

  As I said earlier, the compatibilty stuff also deserves a rehash, since
I'm still not convinced that it is really useful.

The primary purpose of compatibility checking is to detect a situation
that would almost certainly lead to crash instead of crashing. Users
deserve to get useful error messages instead of panicking crash dumps.

It does nothing against subtle memory corruption, which is the real issue.
A crash is a *best-case scenario* and is not a priority to protect against
IMO. Blatant crashes are easy to catch and remedy. Regardless, it is
deceptive to have something like DECLARE_PLUGIN_COMPATIBILITY() which still
permits silent corruption to happen, even though the user thinks they are
"protected".

The only satisfactory solution that I can think of is to have a special
configure option that generates and embeds a unique ID (sha1sum of all the
headers?) for a specific build which guarantees compatibility (it could be
called ENABLE_PLUGINS, with the option ENABLE_UNSAFE_PLUGINS available to
imitate the current plugin situation). ENABLE_PLUGINS not be on by default
(presumably it would have some impact on build time), but we could document
that when preparing binary packages of clang that clang should be built
with this flag.

-- Sean Silva

The present entry point is a fundamentally different registration functionality from the new one, both in terms of how a plugin specifies the entry point and in terms of how clang is invoked to load the plugins–it’s not possible to run them the same way and keep the same semantics. It is possible to make #if-guarded wrappers to switch between the two APIs for the plugin code in the common case however, but it’s not something that can automatically be done. That isn’t a perfect solution: it does nothing against guaranteeing, e.g., that clang and the plugin are using ABI-compatible standard libraries. But if we hold new APIs hostage to perfect solutions, then nothing will get added. What I want to protect against is what I believe would be the most common mistake: someone will use a heuristic to figure out which version of clang to compile against, and that heuristic will be wrong. In such a scenario, the only case the current compatibility would fail is if someone has a revision of clang installed to their path but is using a different revision of clang (not in the path) from the same branch to build–at which point I am happy to say “you’re on the bleeding edge, so you should expect to bleed every now and then”.