[analyzer][RFC] Design idea: separate modelling from checking

It seems like it is the analyzer RFC week so I'll contribute a bit. :slight_smile:

TL;DR: I think we may need to split up checkers from execution modeling in order to resolve unclear dependencies between checkers, improve checker development process and resolve some non-trivial issues. There is no any proof-of-concept implementation, just some thoughts I (and our team) wish to summarize and to describe the issue. Feel free to share your opinion or correct me if I made wrong assumptions or wrong conclusions.

0. Pre-history.
There were some thoughts on this topic shared personally during meetings. I and Artem Dergachev agreed on the fact that checker capabilities to do everything that make implementation of some common approaches to simplify the checker API much harder; unfortunately, I don't know if any decision about this was made on latest LLVM devmeeting because its results were not shared.

First things first: right now literally nothing prevents us from re-using state traits from multiple checkers or translation units. As in:

// Model.cpp
REGISTER_MAP_WITH_PROGRAMSTATE(StreamState, SymbolRef, int)
namespace stream_state {
int get(ProgramStateRef State, SymbolRef Sym) {
return State->get<StreamState>(Sym);
}
ProgramStateRef set(ProgramStateRef State, SymbolRef Sym, int Val) {
return State->set<StreamState>(Sym, Val);
}
}

// Model.h
namespace stream_state {
int get(ProgramStateRef State, SymbolRef Sym);
ProgramStateRef set(ProgramStateRef State, SymbolRef Sym, int Val);
}

// Checker.cpp
#include "Model.h"
void Checker::checkPreCall(const CallEvent &Call, CheckerContext &C) {
SymbolRef Sym = Call.getArgSVal(0).getAsSymbol();
int Val = stream_state::get(C.getState(), Sym);
}

This approach is, i guess, even already used, for dynamic type info. I guess nobody paid attention to that, so i was constantly running around and whining how much we need it, and also taint checker wasn't made this way for the same reason, i guess. One more item to refactor into a proper trait that way would be region extent, which is essentially a trait over the region - and it is probably even a mutable trait (see jump-over-VLA bug).

The only downside i see with the implementation above is that we no longer enjoy our nice object oriented syntax, i.e. get(State, Key) instead of State->get(Key). We could probably improve upon that. And, yeah, there's one more function call that wouldn't be inlined, to it may be a tiny bit slower.

So if we are mostly concerned about huge overgrown and poorly-interoperating checkers (MallocChecker, CStringChecker, RetainCountChecker, GenericTaintChecker), we already have everything we need, and i'm all for splitting them up.

Hi!

In general, I like the idea of separating modeling and diagnostics.