Checker - taint analysis with virtual functions (runtime polymorphism handling)

Hello,

I'm playing with Checker to implement taint-analysis for C++
applications. Refering GenericTaintChecker.cpp, I've implemented my
simple taint-analysis but it seems like tainted symbols are not
propagated for virtual function calls and Checker cannot handle the
C++ class runtime polymorphism?

From my understanding, when checker sees virtual function call

expression, it only knows the declared class type, not the actually
allocated class type. In the example code below I've written, when
Checker sees g_table->append(), it only knows g_table is the member
function of ShapeTable, not of ShapeTableArray.

Could you tell me how to handle this C++ runtime polymorphism issues?
Can I force it to visit all the possible (or concrete) virtual
functions when Checker sees the virtual function calls?

Hi Byoungyoung,

Taint analysis relies on the general clang infrastructure for propagating the taint through/into virtual (and regular) calls. Currently, the static analyzer core is not smart enough to de-virtualize the call in this example. However, we are actively working on better IPA support for C++.

Said that, we would only resolve the function if the analyzer has enough information to de-virtualize. By default (when not enough info is available), the analyzer core would treat the call as opaque. This is the desired behavior. Even for the taint checker, you might only want to propagate the taint into the specific function if you are sure that there is a path on which that would occur.

Cheers,
Anna.

Hello Anna,

Thanks for your answer, Anna. I just wanted to make sure that there's
no current support for handling virtual function calls for the taint
analysis.

I'm thinking to implement the taint-analysis which supports the
virtual function calls. When Checker captures the statement like A *a
= new B(); // A is a parent class of B
, I would re-assign a's clang::Decl with class B so that following
virtual function calls would be made using class B's declarations. I'm
wondering this approach would be working? Or could you suggest me
better ways to handle this issue?

Thanks,
Byoungyoung

Hello Anna,

Thanks for your answer, Anna. I just wanted to make sure that there's
no current support for handling virtual function calls for the taint
analysis.

I'm thinking to implement the taint-analysis which supports the
virtual function calls. When Checker captures the statement like A *a
= new B(); // A is a parent class of B
, I would re-assign a's clang::Decl with class B so that following
virtual function calls would be made using class B's declarations. I'm
wondering this approach would be working? Or could you suggest me
better ways to handle this issue?

The logic for reasoning about polymorphism should be done inside the analyzer core engine as it is not specific to the taint checker. Essentially, when 'a->foo()' is called, we will check if 'a' points to the object 'B' at runtime. If yes, we would inline the call for B's implementation of 'foo'. Jordan has started on adding the reasoning about polymorphism to the analyzer. I am not sure why this particular case is not handled yet, but there can be a lot of edge cases one might need to handle.

You can read clang/docs/analyzer/IPA.txt for more information on how we deal with inter procedural analyses (including polymorphism or dynamic dispatch). This is one of the more complex areas of the analyzer.

Cheers,
Anna.

I really appreciate your help! With that ipa.txt documentation, now I
understand how Clang IPA is working. The reason it's not tainted on my
test program before was that I haven't specified the control flows
between class allocation routines and tainting routines. After giving
the control flows specifically to these routines, it works great!

But still, it seems like current IPA does not handle the global
constructors? i.e. when the constructor is invoked as it is declared
as the global variables (A *a = new B(); // "a" is the global
variable). Since there's no explicit control flows to this
constructors (I think this will be compiled into .ctors section for
ELF binaries and invoked by loaders), current IPA analyzer cannot
handle this case I guess. Is this a sort of TODO-list or could you
direct me some documentations mentioning this?

Thanks,
Byoungyoung

Hi, Byoungyoung. I’m not sure I quite understand your objection. If you have this:

A *a = new B();

where ‘a’ is a global variable, we have no guarantee that it is still the same ‘B’ object when we actually analyze any specific function. After all, some other function could have an ‘a = nullptr;’ pretty much anywhere. Without cross-translation-unit, whole-program analysis, we can’t make a definite guarantee that ‘a’ points to the same object for the entire lifetime of the program.

We currently don’t model this “correctly” even if you use a constant pointer:

A * const a = new B();

because we do per-function analysis, and by the time we’ve entered any function body, ‘a’ has already been initialized. We ought to treat it the same as this:

B a;

which is probably what you should be doing anyway if you really have a global object that lasts for the lifetime of the program. Using ‘new’ may not be safe during static initialization (i.e. before ‘main’), and if the constructor takes (tainted?) arguments, then you’re potentially subject to the static initialization order fiasco.

Currently, the only sort of global values we handle are constant integers. The general issue is tracked internally at Apple by rdar://problem/11720796 and a similar specific case is on our public bug tracker at http://llvm.org/bugs/show_bug.cgi?id=13673.

Our “todo list” of sorts would be our bug-tracking system at http://llvm.org/bugs. If you have an entire self-contained concrete example that you believe should work, please file a bug.

Sorry for the negative answers,
Jordan

P.S. Our handling of ‘new’ in particular is fairly weak right now; there’s some infrastructure work discussed on the bug tracker at http://llvm.org/bugs/show_bug.cgi?id=12014 that ought to result in us correctly tracking the types of objects that come from ‘new’.