Checker Plugins

The idea's come up before, right? I feel like having pluggable check-ins is one of the goals of the analyzer, even as everything is refactored over and over. It would be convenient for organizations to make their own API checks, like we do for Unix and Cocoa (a little). Right? Attributes only go so far.

Anyway, I hacked up a quick plug-in system over the last few days. It approximates Argyrios' very nice group/package system with a poor text-matching version of the registration system, and supports multiple new checkers in one loadable library.

CheckerPlugins.2.patch (15.4 KB)

Hi Jordy,

I think this looks pretty awesome.

My only initial concern is the code in:

+//===----------------------------------------------------------------------===//
+// Printing Help.
+//===----------------------------------------------------------------------===//

Hi Jordy,

Thanks for getting the initiative to start work & discussion on it.
Here's my thoughts on it, let me know what you think.

I'd personally prefer to move away from static registration for the checker plugins but, instead, use an exported C function that has a "standardized" name.
This function will return a CheckerProvider which we can use during checker registration. Moving away from static registration objects gives us much more precise control of the lifetime of CheckerProviders and is more portable (easier to work on windows as well).

Currently the CheckerProvider interface is very minimal and it doesn't enforce any specific naming organization for checkers. I kinda like it this way but then we have the issue of not easily allowing plugins to use the group/package system.
I'd suggest that we provide 2 kinds of examples for plugins.

One should be in the 'examples' directory and it should do the minimum work necessary to get an example checker plugin up & running (e.g. it will just look for one specific checker string in order to be enabled and it will have a very simple printHelp).
Apart from that, we create a directory for a dynamically loaded plugin inside the lib/StaticAnalyzer hierarchy that will contain the required [c]make files to allow using the package system. Also we would need to expose some functions from ClangSACheckerProvider.cpp to avoid duplication of registration/help printing when using the package system.
This lib/StaticAnalyzer plugin will, at least, contain the LLVMConventionsChecker. Not sure if OSX/Cocoa should move to a plugin eventually but at least getting the 'LLVM' package into a plugin would be a good first step.

Thoughts ?

Sorry for not responding to this for a month...my mail filters decided to throw away replies that also showed up on cfe lists. I think I have it fixed now...

I'd personally prefer to move away from static registration for the checker plugins but, instead, use an exported C function that has a "standardized" name. This function will return a CheckerProvider which we can use during checker registration. Moving away from static registration objects gives us much more precise control of the lifetime of CheckerProviders and is more portable (easier to work on windows as well).

The "easier to work on Windows" is the clincher here for me, even though I develop on a Mac!

Currently the CheckerProvider interface is very minimal and it doesn't enforce any specific naming organization for checkers. I kinda like it this way but then we have the issue of not easily allowing plugins to use the group/package system.

What if we made CheckerProvider instances one-to-one with packages? We could make the osx.cocoa package map to OSX_Cocoa (or osx::Cocoa, but I don't think we want to rely on namespace mangling being consistent). Then we have getCheckerProvider_OSX_Cocoa, or whatever—not particularly pretty, but not hard, either.

It wouldn't be hard to maintain the hierarchy part, either: just include a list of subpackages in the parent provider.

As a bonus, this means that groups and packages can have a single common lookup path, even if we want to keep them distinct in the help output.

The problem with all of this is that loading a plugin won't actually add anything to -analyzer-checker-help without some kind of registration. Maybe that has to stay on llvm::Registry...and once that's true, there's no reason not to make the main CheckerProviders use the same registration.

Alternately, we require a -checker-load (instead of plain -load), which uses the name of the *plugin file* to find the provider in it. Or providers. I guess the CheckerProvider-equals-package proposal is a little orthogonal to the plugin architecture.

I'd suggest that we provide 2 kinds of examples for plugins.

One should be in the 'examples' directory and it should do the minimum work necessary to get an example checker plugin up & running (e.g. it will just look for one specific checker string in order to be enabled and it will have a very simple printHelp).

Apart from that, we create a directory for a dynamically loaded plugin inside the lib/StaticAnalyzer hierarchy that will contain the required [c]make files to allow using the package system. Also we would need to expose some functions from ClangSACheckerProvider.cpp to avoid duplication of registration/help printing when using the package system.

If we can come up with a naming-scheme-based package system, it would probably work for external plug-ins as well as internal ones. Then we can pretty much just make ClangSACheckerProvider into a subclass of CheckerProvider with no customization. Or generate subclasses of CheckerProvider from Checkers.td and ditch ClangSACheckerProvider entirely.

If we pull *current* analyzer plugins (besides LLVMConventionsChecker) into dynamic libraries, how does that affect distribution? Do they go in /usr/lib/clang/checkers, or something?

Jordy

Okay, I've come up with a naming-based version of this, probably not final but working fairly well. The plugin client's job is fairly simple:

using clang::ento::CheckerInfo;
extern "C" clang_getCheckersForPackage_PACKAGE (CheckerInfo::List &out) {
  // First include checkers from subpackages.
  clang_getCheckersForPackage_SUBPACKAGE1(out);
  clang_getCheckersForPackage_SUBPACKAGE2(out);
  CheckerInfo::markAllSubpackage(out);

  // Then add checkers from this package.
  out.push_back(CheckerInfo::create<MyChecker>("MyChecker",
    "short description"));
  // The name of the checker class does not have to match the name the
  // client uses to enable your checker.
  out.push_back(CheckerInfo::create<MyOtherChecker>("OtherChecker",
    "short description"));
  // "Hidden" checkers are only enabled if they or their package is
  // explicitly requested, not if they are in a subpackage.
  out.push_back(CheckerInfo::create<HiddenChecker>("HiddenChecker",
    "short description", true));
}

The use of a clang_getCheckersForPackage_PACKAGE function means that we could eventually migrate all internal checkers to this format as well.

The main thing that this doesn't handle is that -analyzer-checker-help doesn't know what packages a plugin might have, so it just uses the basename of the plugin you're loading. The next step would be to look for a clang_checkerPackages_PLUGIN array that contains the names of any packages in PLUGIN.dylib. (Or maybe libPLUGIN.dylib.)

Where should I be going with this?
Jordy

CheckerPlugins.3.patch (23.2 KB)

Hi Jordy,

I'm still forming some thoughts on this. My initial impression while I think this is a good direction, it really does feel overly complicated, particularly with the use of clang_getCheckersForPackage_PACKAGE functions.

I'm most mixed about the CheckerPackage abstraction. How would it actually be used? It seems to me that:

(a) First a plugin must be loaded, e.g. using llvm::sys::DynamicLibrary.

(b) Then the CheckerPackage::create() function is called to register checkers, but this seems to require a priori knowledge of the package name.

In your code snippet below, it's not clear to me where (b) lives. Does it live in the plugin? If so, this seems like a lot of indirection, when we could possibly instead use the plugin registry mechanism instead to find the CheckerManager and then register the checkers. What's the benefit of the clang_getCheckersForPackage abstraction?

The mechanism I had more in mind was that a user writes a plugin, which links against clang and the static analyzer libraries. It uses the llvm::Registry mechanism to load checkers using static constructors when the plugin is loaded. This would be very simple, akin to what we can do with optimization passes:

http://llvm.org/docs/WritingAnLLVMPass.html#registering

In that example, we have RegisterRegAlloc, but instead we could have something like "RegisterCheckers". It could take as an argument a function that returns a factory object (perhaps CheckerPackage?) that allows one to instantiate checkers given a CheckerManager. The key thing to keep in mind is that his simple approach is very flexible. One could possibly register several checkers into different packages. We can also probably decouple checker registration from package registration. Packages are just groups; when a checker gets loaded, it could just declare that it goes into a specific package. If that group doesn't exist, the CheckerManager just creates one with the default visibility. If the package gets explicitly registered (using a similar mechanism, e.g. "RegisterCheckerPackage"), that registration could provide enabled/disabled information for that package, visibility, and strings to describe that package.

In this mode, we get a nice decoupling. Checker writers only have to declare the package the checker goes into by providing a package string. They can also provide strings that describe the checker, which is then used by the registration mechanism. Packages can be managed and registered separately.

This simpler, declarative approach also resolves the -print-help problem. Instead of -print-help, I would expect we move in a direction where checkers just provide descriptive strings, and the help is formatted by whoever cares about it. Checkers, and packages, should not get in the business of printing out anything. That should be up to the client of the CheckerManager.

Anyhow, I'm stilling mulling this over, but I think we want to keep this simple and flexible.

What do you think?

I'm happy to keep revising this; I'm hoping for a good constraint-satisfying solution, and being pointed in the right direction and then let go is fine.

My original design used llvm::Registry, with the registered object implementing a single method that was basically a clang_getCheckersForEntirePlugin. Each plugin then had its own CheckerProvider doing the same sort of name-based package inference that ClangSACheckerProvider does.

Argyrios then pointed out that llvm::Registry doesn't really work well on Windows. I don't entirely understand the thread at http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-May/thread.html#14886 , but it sounds like you have to explicitly instantiate the llvm::Registry templates (more work than ideal). So his idea (which I think is a good one) was to move to a single entry point for each plugin.

To make a detour, the reasons I decided on clang_getCheckersForPackage_PACKAGE instead of clang_getCheckers_PLUGIN were:
(1) it made groups easy -- just delegate
(2) it's uniform; it could work just as well for internal packages as well as external
(3) no string matching against plugin full-names -- the only string manipulation here is splitting off the name of the plugin itself and constructing the name of the function entry point
(4) possibly more efficient (cost of a symbol lookup vs. cost of comparing all possible plugin names)

The disadvantages:
(1) requires redundant information if a /single/ checker is in multiple packages
(2) probably requires a little more memory
(3) still requires a single entry point (a list of packages) for -analyzer-checker-help
(4) somewhat complex compared to a single "here are all my checkers"

It's worth noting that CheckerPackage was entirely an internal helper class for CheckerPackageProvider. It's just a wrapper around the list of classes in a package. Neither CheckerPackage nor Checker are printing anything. :slight_smile:

Still, I guess I'm going at this the wrong way. If I change it back to the "one list per plugin" version, what should it look like from the client side?

void clang_getCheckers_PLUGIN (CheckerInfo::List &checkers) {
  checkers.push_back(CheckerInfo::create<MyChecker>("myplugin.MyChecker", "description goes here"));
  checkers.push_back(CheckerInfo::create<AnotherChecker>("secret.MyChecker", "description goes here"));
}

Advantages:
(1) Only one function per plugin, ever.
(2) Multiple plugins can add to the same package. (possibly a disadvantage?)

Disadvantages:
(1) Child packages are easy, but cross-package groups are impossible
(2) Namespace clashes are more likely (two plugins with the same file name)

This basically what my original code allowed, but without llvm::Registry. Argyrios's original suggestion was that the clang_getCheckers_PLUGIN equivalent should return a CheckerProvider, but I think even that is work the plugin writer shouldn't need to worry about. There's no reason we can't create the checker provider clang-side. (Actually, there's no reason there can't just be a SINGLE CheckerProvider that gathers EVERY plugin's checkers.)

(A note: If DynamicLibrary was shored up to handle private_extern symbols, each library could simply have its own clang_getCheckers function, rather than clang_getCheckers_PLUGIN, which has potential name clash issues. But maybe that's a problem for later.)

Jordy

Here's my latest work of CheckerPlugin, as per off-list discussion on what checker registration should look like. This time I haven't included any dynamic loading yet, only switched the current CheckerProvider interface over to a more useful CheckerRegistry. Plugin checker registration, based on a single per-library entry point (clang_getCheckers?), should pretty much not affect what's in this design at all.

The actual plugin loading code is blocked by the DynamicLibrary patch I sent to llvm-commits a few days ago.

There's a lot of infrastructure that might seem like premature optimization (in particular, keeping around the size of every package to avoid string comparisons when selecting checkers). This was mostly me freaking out about the speed hit on "make test" until I realized that my baseline was wrong. (I recently started building with CXX=clang++, Ted rewrote the CFG traversal, speed of a Debug+Asserts build isn't representative anyway...whatever.) So I can go ahead and take all that out again.

What do you think?
Jordy

CheckerPlugins.4.patch (35 KB)

Whoops, forgot to include the removal of CheckerProvider from CheckerManager.cpp.

CheckerPlugins.4.patch (35.7 KB)

Hey Jordy,

Thanks for your work!

As a general comment I think there should a be clear separation (naming and functionality) between the concept of "registering" checkers (information of available checkers is put into the CheckerRegistry) and "loading" of checkers (checker instances are created and put into CheckerManager). "Registering" will happen first, then "loading" will depend on CheckerRegistry and CheckersControlList.

If you agree, please do related changes like:

void loadBuiltinCheckers(CheckerRegistry &registry); -> void registerBuiltinCheckers(CheckerRegistry &registry);

typedef void (*RegistrationFunction)(CheckerManager &); -> typedef void (*LoadFunction)(CheckerManager &);

allCheckers.registerCheckers(*checkerMgr, checkerOpts); -> allCheckers.loadCheckers(*checkerMgr, checkerOpts);

checkerMgr->finishedCheckerRegistration(); -> checkerMgr->finishedCheckerLoading();

etc..

And ento::registerCheckers combines the two concepts; I'd suggest separating into a registerCheckers function to fill out a CheckerRegistry that will get used for loading or help printing.

Let me know what you think.

-Argyrios

I think it's a good idea to keep these two concepts separate, but "registering" with the CheckerManager is the already-established term. I was using "load" to mean "collect information about what checkers are available". Is there a better name for CheckerRegistry that captures the sense of "what checkers are available"? (Go back to "CheckerProvider"?)

Jordy

I think it's a good idea to keep these two concepts separate, but "registering" with the CheckerManager is the already-established term.

Well we we can always un-establish it :slight_smile:

I was using "load" to mean "collect information about what checkers are available". Is there a better name for CheckerRegistry that captures the sense of "what checkers are available"? (Go back to "CheckerProvider"?)

Across llvm the "registry" concept is well established, I think we should follow the same term for "registering" available checkers and when actually creating concrete checker objects that the CheckerManager will use we can say this is "loading" or "initializing" or something else..

-Argyrios

All right, sounds good. I'll use "initialization". (And all the individual registerXChecker functions will become initializeXChecker in a separate patch.)

Jordy

Jordy,

I'm happy with these changes (coupled with Argiris's feedback). I'm think it's time to start putting them into mainline.

Thanks for working on this.

Ted