Accessing Taint status outside of clang (or: combining AST Matchers with Taint Status)

My high-level goal is to be able to combine Taint Status with AST Matchers. For example, we have a static analysis check that warns you against comparing a float to itself for NaN checking. Hypothetically (as a simple example) I’d like to apply this check only to tainted variables.

My understanding (from the dated clang-analyzer-guide) is that AST Matchers and Taint checking occur at difference times/phases of analysis. Because AST Matchers are syntax checkers, they can happen in clang-tidy, or at other times. But Taint checking only occurs during path-sensitive analysis (I’m not sure if there’s a better name for this phase) - which happens only with an invocation like clang -analyzer.

However, that guide mentions that it is possible to match a particular section of the AST. It also talks about how you can query if a certain symbolic value is tainted. My plan had been to e.g. sub-class Checker<check::BranchCondition> and inside checkBranchCondition() check if a variable in the branch condition is tainted; and if so, try to match the AST of the branch condition against a Matcher. If it matches, then it both fits of the code pattern and is affecting a tainted variable. I got the matching part of this plan working, but not the tainting part.

The low-level problems I encountered with this plan is that it seems like in D59861 the isTainted() API exposed on ProgramState was removed and put into a Taint.h file, which is not exported/exposed from clang like, say, ProgramPoint.h is. This is a problem because I had been planning on building an external library and loading it with -Xclang -load -Xclang ./libclang-plugin.so. It’s not a big deal for me to patch clang locally to expose the header/methods if that’s not a bad idea, and I can upstream the patch if desired. If that’s a bad idea though, should I be planning on building the checks into clang itself? (I can do that, but my inclination is not to if I don’t have to.)

So at a high-level - am I approaching this problem in a reasonable way? And at a low-level - what would be the recommended way to access the taint information?

Hi!

ASTMatchers are a library that can be used any time you have an AST available. You can use AST matchers in Clang Static Analyzer checks. But it looks like matching is not really your problem:

From the last half of your message:

Is there actually a problem other than the said header not being exposed to plugins? From your description I cannot see any other obstacle. If that is the case, matchers have nothing to do with your question, you would have the same problem if you tried to write a check as a plugin without the matchers.

I think out of tree plugins are almost always second class citizens. It would make sense to move Taint.h to a public place to unblock this scenario and I support this idea. You could open a patch doing that move or you could simply add your check to Clang.

Could you elaborate on that? Are you afraid of compilation times or distribution? Unfortunately, there are very few guarantees when it comes to plugins. Basically, if you did not compile the plugin with the exact same version of clang with the exact same version of the compiler using the same set of flags you might experience some problems. As an end result distributing a plugin might not be easier than distributing a modified version of the clang binary.

Is there actually a problem other than the said header not being exposed to plugins?

I don’t think so? I’ll find out :slight_smile:

Could you elaborate on that?

The fewer patches to clang we have, the less we have to rebase on a new clang version. Yes; the static analysis might break on a clang upgrade - but pinning the plugin to a clang version is easy for us. (And then we’d fix the plugin and upgrade clang. It all just happens to be easier for us in our workflow.

I have put ⚙ D123155 [analyzer] Expose Taint.h to plugins up for review; but I am still trying to test it locally to confirm the patch is okay and it does what I want, having some difficulty getting a llvm build to work.