A plugin proposal for clang

This is a document in partial response to Douglas Gregor’s recent comment about needing someone to push forward a plugin proposal for clang.

What kinds of use cases would people have for a compiler plugin?

  • Add a static analyzer

  • Production of ancillary metadata for types and declarations (e.g., adding reflective capabilities to C/C++)

  • Custom optimization passes
    The current architecture appears to assume that clang plugins and clang itself are primarily useful for static analysis and not for production. If I wish to use a plugin, I have to use:
    clang -cc1 -load /path/to/plugin.so -plugin Foo -plugin-arg-Foo Bar

But if I’m trying to drop in this plugin in a regular build, I need to do:
clang -Xclang -load -Xclang /path/to/plugin.so -Xclang -add-plugin -Xclang Foo -Xclang -plugin-arg-Foo -Xclang Bar

By comparison, a gcc plugin lets me do:
gcc -fplugin=/path/to/plugin.so -fplugin-arg-plugin-bar=baz

The key things that gcc lets me do in a plugin that I think are necessary for clang are to be able to define custom attributes and pragmas, which are largely necessary for custom static analysis scripts: for example, Mozilla defines several attributes to allow checking of properties in their gcc plugins (e.g., enforcing outparameter guidelines). All of its other features appear to be either an artifact of its code design (e.g., GC interactions) or features that are already supported (e.g., adding a new optimization pass).

Another thing which I think is necessary is a macro for the version of clang being built, so that plugins can support multiple versions of clang at the compiler level (supporting multiple versions in binary form is inadvisable, I think).

With that background, here is my rough proposal for plugins; there are other changes that I think many plugin writers might like to see, but I’ll hold off on them, since they don’t inhibit writing useful plugins:

  1. Add in -fplugin= and -fplugin-arg- options for the compiler driver. All of the plugins in the library would be loaded and run as if specified with -add-plugin (in other words, clang still compiles the code).
  2. Add examples of plugins that illustrate iterating over the AST and getting IR for functions for futher passes.
  3. Ensure existence of support in clang for custom attributes.
  4. No guaranteed of API or ABI compatibility between different versions of clang; the onus is on the plugin to figure out how to support multiple versions. Note that this implies that a version macro is needed to report the version of clang/llvm that the plugin is being compiled for.

Thoughts/questions/comments/concerns?

This is a document in partial response to Douglas Gregor's recent comment about needing someone to push forward a plugin proposal for clang.

What kinds of use cases would people have for a compiler plugin?

    * Add a static analyzer
    * Production of ancillary metadata for types and declarations
      (e.g., adding reflective capabilities to C/C++)
    * Custom optimization passes

* anything we haven't thought of yet

The current architecture appears to assume that clang plugins and clang itself are primarily useful for static analysis and not for production. If I wish to use a plugin, I have to use:
clang -cc1 -load /path/to/plugin.so -plugin Foo -plugin-arg-Foo Bar

But if I'm trying to drop in this plugin in a regular build, I need to do:
clang -Xclang -load -Xclang /path/to/plugin.so -Xclang -add-plugin -Xclang Foo -Xclang -plugin-arg-Foo -Xclang Bar

By comparison, a gcc plugin lets me do:
gcc -fplugin=/path/to/plugin.so -fplugin-arg-plugin-bar=baz

The key things that gcc lets me do in a plugin that I think are necessary for clang are to be able to define custom attributes and pragmas, which are largely necessary for custom static analysis scripts: for example, Mozilla defines several attributes to allow checking of properties in their gcc plugins (e.g., enforcing outparameter guidelines). All of its other features appear to be either an artifact of its code design (e.g., GC interactions) or features that are already supported (e.g., adding a new optimization pass).

Another thing which I think is necessary is a macro for the version of clang being built, so that plugins can support multiple versions of clang at the compiler level (supporting multiple versions in binary form is inadvisable, I think).

That wouldn't help with future versions.

With that background, here is my rough proposal for plugins; there are other changes that I think many plugin writers might like to see, but I'll hold off on them, since they don't inhibit writing useful plugins:
1. Add in -fplugin= and -fplugin-arg-<plugin> options for the compiler driver. All of the plugins in the library would be loaded and run as if specified with -add-plugin (in other words, clang still compiles the code).
2. Add examples of plugins that illustrate iterating over the AST and getting IR for functions for futher passes.
3. Ensure existence of support in clang for custom attributes.
4. No guaranteed of API or ABI compatibility between different versions of clang; the onus is on the plugin to figure out how to support multiple versions. Note that this implies that a version macro is needed to report the version of clang/llvm that the plugin is being compiled for.

Thus offloading responsibility from one central point (clang) to N plugin developers with varying levels of ability.
This doesn't consider the possibility that one plugin might require another plugin / interface implementation, or that there could be "standard" plugins as part of clang.

It sounds a lot like apache modules, but without the configuration file.

Instead I'd recommend an interface based approach, like COM.
The compiler driver looks up a registry when it needs to create anything,
and gets a factory object from which it creates instances.

Even control of what steps are performed by an object constructed from
a factory obtained from the registry.

This could allow control of what the compiler does, even without adding
other components.

Thoughts/questions/comments/concerns?
--
Joshua Cranmer
News submodule owner
DXR coauthor

_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

Also, instead of supplying a long list of plugins with their arguments, you
could use a sandbox - a registry in a file, to which plugins are registered,
and checked for conflicting roles at registration time.

Then you just provide the name of the registry sandbox file to the compiler,
and off it goes.

You can check out

    http://sourceforge.net/projects/v3c-dcom/

for examples and a sandbox implementation.

Last but not least, plugins could support different clang versions by
implementing more than one interface.

Philip

These are more likely to be LLVM plugins than clang plugins. We already have a mechanism for optimisation plugins to automatically insert themselves into the optimiser chain - I'm using this for some Objective-C optimisations.

These would be better loaded from a config file by the LLVM plugin loader, because there is no sensible way for clang to pass optimisation plugins to the linker for LTO.

David

This is a document in partial response to Douglas Gregor's recent comment about needing someone to push forward a plugin proposal for clang.

What kinds of use cases would people have for a compiler plugin?

* Add a static analyzer
* Production of ancillary metadata for types and declarations (e.g., adding reflective capabilities to C/C++)
* Custom optimization passes

The current architecture appears to assume that clang plugins and clang itself are primarily useful for static analysis and not for production. If I wish to use a plugin, I have to use:
clang -cc1 -load /path/to/plugin.so -plugin Foo -plugin-arg-Foo Bar

But if I'm trying to drop in this plugin in a regular build, I need to do:
clang -Xclang -load -Xclang /path/to/plugin.so -Xclang -add-plugin -Xclang Foo -Xclang -plugin-arg-Foo -Xclang Bar

By comparison, a gcc plugin lets me do:
gccšš -fplugin=/path/to/plugin.so -fplugin-arg-plugin-bar=baz

I've proposed it a while ago:

http://llvm.org/bugs/show_bug.cgi?id=9621

The key things that gcc lets me do in a plugin that I think are necessary for clang are to be able to define custom attributes and pragmas, which are largely necessary for custom static analysis scripts: for example, Mozilla defines several attributes to allow checking of properties in their gcc plugins (e.g., enforcing outparameter guidelines). All of its other features appear to be either an artifact of its code design (e.g., GC interactions) or features that are already supported (e.g., adding a new optimization pass).

+1

Instead I'd recommend an interface based approach, like COM.
The compiler driver looks up a registry when it needs to create anything,
and gets a factory object from which it creates instances.

One good property to have is for puligns to use the same API as builtin code. That makes it easy to refactor code as a plugin or move in a plugin that was found to be really useful.

Given that, your proposal would mean that different parts of clang would use a COM like interface. Having seen a codebase that does something similar, I would say it is not worth it. It complicates the code and actually make code harder to factor as it moves compile time checks to runtime.

Philip

Cheers,
Rafael

• Custom optimization passes

These are more likely to be LLVM plugins than clang plugins. We
already have a mechanism for optimisation plugins to automatically
insert themselves into the optimiser chain - I'm using this for some
Objective-C optimisations.

These would be better loaded from a config file by the LLVM plugin
loader, because there is no sensible way for clang to pass
optimisation plugins to the linker for LTO.

Running the linker is the clang driver job, so I think there is some work to be done in there.

LLVM itself is already fairly modular, so I think the above item can be replaced with

* Make sure the plugin has access to the pass manager.
* Make it simpler to pass options to the linker plugins.

David

Cheers,
Rafael

  • Custom optimization passes

These are more likely to be LLVM plugins than clang plugins. We already have a mechanism for optimisation plugins to automatically insert themselves into the optimiser chain - I'm using this for some Objective-C optimisations.

Is there currently a way to do that based on compiler flags? For SAFECode, we added a -fmemsafety flag to enable/disable the transforms that add memory safety checks. Being able to write a plugin for Clang that adds command-line options that can enable new transformation passes would alleviate our need to copy Clang into the SAFECode project.

-- John T.

You can do -Xclang -load -Xclang /path/to/plugin.so. This is pretty ugly though. I'd love to see this improved, but it keeps getting pushed down my TODO list...

David

Instead I'd recommend an interface based approach, like COM.
The compiler driver looks up a registry when it needs to create anything,
and gets a factory object from which it creates instances.

One good property to have is for puligns to use the same API as builtin
code. That makes it easy to refactor code as a plugin or move in a
plugin that was found to be really useful.

Isn't NaCl doing something like this?

Given that, your proposal would mean that different parts of clang would
use a COM like interface. Having seen a codebase that does something
similar, I would say it is not worth it. It complicates the code and
actually make code harder to factor as it moves compile time checks to
runtime.

Philip

Cheers,
Rafael

I also liked the ability to do COM interception. This could let you see the call
sequence visually.

I hadn't thought of going the full scripting route - implementing plug ins
with scripting, which would definitely require runtime checks.

Philip

This is a document in partial response to Douglas Gregor’s recent comment about needing someone to push forward a plugin proposal for clang.

What kinds of use cases would people have for a compiler plugin?

  • Add a static analyzer

  • Production of ancillary metadata for types and declarations (e.g., adding reflective capabilities to C/C++)

  • Custom optimization passes
    The current architecture appears to assume that clang plugins and clang itself are primarily useful for static analysis and not for production. If I wish to use a plugin, I have to use:
    clang -cc1 -load /path/to/plugin.so -plugin Foo -plugin-arg-Foo Bar

But if I’m trying to drop in this plugin in a regular build, I need to do:
clang -Xclang -load -Xclang /path/to/plugin.so -Xclang -add-plugin -Xclang Foo -Xclang -plugin-arg-Foo -Xclang Bar

By comparison, a gcc plugin lets me do:
gcc -fplugin=/path/to/plugin.so -fplugin-arg-plugin-bar=baz

The key things that gcc lets me do in a plugin that I think are necessary for clang are to be able to define custom attributes and pragmas, which are largely necessary for custom static analysis scripts: for example, Mozilla defines several attributes to allow checking of properties in their gcc plugins (e.g., enforcing outparameter guidelines). All of its other features appear to be either an artifact of its code design (e.g., GC interactions) or features that are already supported (e.g., adding a new optimization pass).

Yes, I think it would be wonderful if plugins can define custom attributes/pragmas.

Another thing which I think is necessary is a macro for the version of clang being built, so that plugins can support multiple versions of clang at the compiler level (supporting multiple versions in binary form is inadvisable, I think).

This seems fine, although with the amount of churn in Clang’s ASTs, I doubt that any non-trivial plugin will be able to work with two released versions of Clang.

With that background, here is my rough proposal for plugins; there are other changes that I think many plugin writers might like to see, but I’ll hold off on them, since they don’t inhibit writing useful plugins:

  1. Add in -fplugin= and -fplugin-arg- options for the compiler driver. All of the plugins in the library would be loaded and run as if specified with -add-plugin (in other words, clang still compiles the code).

Sure, this makes sense. At this point, it also makes sense to make sure that multiple plugins get chained together properly. For example, plugins are added in the order they are specified on the command line, and any implicitly-created consumers (e.g., the ASTConsumer for generating IR) go at the end of this list.

  1. Add examples of plugins that illustrate iterating over the AST and getting IR for functions for futher passes.

What do you mean by “getting IR for functions”? This makes sense if you’re adding a LLVM optimization pass of some sort through the plugin mechanism.

One useful example would be to add an “annotate” attribute to various declarations, and verify that the attribute made it through to the IR.

I think it’s useful to categorize the various existing and intended extension points for a plugin interface. ASTConsumer and PPCallbacks come to mind immediately, but what else?

  1. Ensure existence of support in clang for custom attributes.

This is its own side-topic. At the moment, the “annotate” attribute is the only generally customizable attribute.

  1. No guaranteed of API or ABI compatibility between different versions of clang; the onus is on the plugin to figure out how to support multiple versions. Note that this implies that a version macro is needed to report the version of clang/llvm that the plugin is being compiled for.

We’ll want some API/ABI compatibility at the plugin interface layer (just for the entry points), but otherwise I agree fully. We don’t want the existence of plugins to keep us from doing refactoring of the ASTs or other core data structures.

  • Doug

2. Add examples of plugins that illustrate iterating over the AST and getting IR for functions for futher passes.

What do you mean by "getting IR for functions"? This makes sense if you're adding a LLVM optimization pass of some sort through the plugin mechanism.

Effectively, an example of how to do LLVM optimization passes via a clang plugin.

One useful example would be to add an "annotate" attribute to various declarations, and verify that the attribute made it through to the IR.

I think it's useful to categorize the various existing and intended extension points for a plugin interface. ASTConsumer and PPCallbacks come to mind immediately, but what else?

Diagnostics (so plugins can add warnings/errors) come to mind quickly as well. I can imagine that some clever people might need to know about some specific codegen details if they are using plugins to generate reflective metadata; everything else I can think of is more or less covered by LLVM.

4. No guaranteed of API or ABI compatibility between different versions of clang; the onus is on the plugin to figure out how to support multiple versions. Note that this implies that a version macro is needed to report the version of clang/llvm that the plugin is being compiled for.

We'll want some API/ABI compatibility at the plugin interface layer (just for the entry points), but otherwise I agree fully. We don't want the existence of plugins to keep us from doing refactoring of the ASTs or other core data structures.

For the moment, I had thought of merely making the existing infrastructure (clang::FrontendPluginRegistry) easier to use, but, on reflection, it seems a better idea to do a more proper initialization approach. The minimal approach amounts to declaring a plugin initialization function that lets users modify global registry values; another approach is to provide callbacks that amount to the virtual functions in FrontendAction

Being able to generate new chunks of AST would also be helpful. That would make it possible to turn source-code generators into plug-ins that generate AST on the fly, without needing to generate code.

-- Erik.

..and rewrite chunks of old AST before it gets into CodeGen.

IMO, both of these should be left for "later", and certainly not part of the set of requirements for a plugin system. Modifying or adding to the AST looks easy to do, but it is hard to do *well*. It's going to require a number of specific hooks in Sema.

  - Doug