[Analyzer] How to deal with lazy compound values when tracking state

Hi there,

I'm implementing a kind of taint propagation and I'm running into a
bit of trouble when some of the tainted values are structs, because of
the LazyCompoundValue optimization. First to illustrate the kind of
thing I want to do, consider:

extern int tainted_function1();
void foo() {
    int x = tainted_function1();
    clang_analyzer_explain(x);
}

Where I try to annotate the taint upon returning from the call
(check::PostCall).
Now, this works well, because `x` is `symbol of type 'int' conjured at
statement 'tainted_function1()'`, so I can store its taintedness in a
SymbolRef->bool map and everyone is happy. However, I'm having trouble
extending the same logic to:

struct foo {
    int a;
    int b;
};
extern struct foo tainted_function2();
void foo() {
    struct foo val = tainted_function2();
    clang_analyzer_explain(val);
}

because `val` is then a `lazily frozen compound value of local
variable 'val'`. I tried playing with that a bit, but I'm having
trouble getting at the symbol from the lazy compound val (I tried
getBinding with the Store and the region from the lazy compound val,
but that just gives me another lazy compound value). How do I
de-lazify the value?

Hello Kano,

This seems to be a work in progress on it: https://reviews.llvm.org/D28445 (with a very thorough comments in the discussion). You can take a look.

03.03.2017 09:02, Keno Fischer via cfe-dev пишет:

Hello, we're almost landing a patch to make this easier: https://reviews.llvm.org/D28445

(the discussion should be helpful as well)

Whoops didn't notice Aleksei's comment sry, should unwrap the thread before answering><

Yes, I saw and have read the thread with interest, though admittedly I
am not sure I am perfectly clear on how to use it yet. However, before
asking for further help, I'll have to play with it a bit and see if I
can apply it to my use case.

Well, you can either take values of structure fields by constructing sub-regions of its base region (field or element regions, through MemRegionManager, or maybe SValBuilder's helper methods would be useful) and getSVal()'ing them (from the lazy compound value's store - which contains contents of the structure, unlike the store in the program state, which contains current contents of the region the structure was copied from) and marking them as tainted.

Or with D28445 applied, you'd also be able to put taint over the whole structure by tainting its default conjured symbol and auto-propagating to all symbols derived from it (this auto-propagation is how our default taint analysis works; it sounds to me that you're implementing your own taint analysis, so i'd wonder what's wrong with the default one).