I’m fed up with the boilerplate code that manages the checker names and bug types in cases when a single checker class implements multiple checkers.
The problem
(Feel free to skip this section if you’re familiar with the code of checkers that have multiple sub-checkers.)
As a relatively simple example, DivZeroChecker
contains the lines
enum CheckKind { CK_DivideZero, CK_TaintedDivChecker, CK_NumCheckKinds };
bool ChecksEnabled[CK_NumCheckKinds] = {false};
CheckerNameRef CheckNames[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BugTypes[CK_NumCheckKinds];
and then the initialization of the bug types performed by ugly scattered lazy blocks like
if (!BugTypes[CK_TaintedDivChecker])
BugTypes[CK_TaintedDivChecker].reset(
new BugType(CheckNames[CK_TaintedDivChecker], "Division by zero",
categories::TaintedData));
while the checker names are filled manually querying mgr.getCurrentCheckerName()
as
checker->CheckNames[DivZeroChecker::CK_TaintedDivChecker] =
mgr.getCurrentCheckerName();
in the registerXXX
functions.
However, there are much more complex situations, e.g. in MallocChecker
where the bug type data members look like
mutable std::unique_ptr<BugType> BT_DoubleFree[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_DoubleDelete;
mutable std::unique_ptr<BugType> BT_Leak[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_UseFree[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_BadFree[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_FreeAlloca[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_MismatchedDealloc;
mutable std::unique_ptr<BugType> BT_OffsetFree[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_UseZerroAllocated[CK_NumCheckKinds];
mutable std::unique_ptr<BugType> BT_TaintedAlloc;
(and their lazy initializations are scattered everywhere in the source file…)
My goals and plans
- I want to replace all
std:unique_ptr<BugType>
members with plainBugType
s that are initialized eagerly where they are declared. - I want to get rid of direct calls to the magical method
mgr.getCurrentCheckerName()
and ad hoc data members that storeCheckerNameRef
s in checker-specific ways. - I don’t want to touch the source code of “simple” checker classes that only define a single checker.
- I won’t perform large-scale refactoring like splitting big fat checker classes into multiple smaller classes – and extrapolating from the history of the analyzer, it’s unlikely that anybody else will do so in the future. The perfect is the enemy of the good; I want practical incremental improvements instead of dreams that will are postponed forever.
- I’m planning to standardize the practical pattern that checker classes with multiple checkers define the possible
CheckKinds
as enumerators in anenum
that can also be used as array indexes. - I’m planning to tweak the constructor of
BugType
and theregisterChecker
machinery to interact with this standardizedCheckKinds
. (The name “CheckKinds” may change.) - I may introduce a new base class (e.g.
MultiChecker
) for checker classes that have multiple sub-checkers; but I’m also thinking about making the plainChecker
smart enough to provide either “simple” checkers or multicheckers.
I know about the old (abandoned) commit ⚙ D67336 [analyzer][NFC] Introduce SuperChecker<>, a convenient alternative to Checker<> for storing subcheckers which had similar goals.
However there are several differences between that change and my plans:
- It introduces a hierarchical model where a
SuperChecker
has multipleSubChecker
s, while in my suggestion theMultiChecker
has an array of equally valid checker names (corresponding to different check kinds).- In the super/sub checker model, the subcheckers cannot be registered without also registering the superchecker; my plan doesn’t have this kind of limitation.
- The
SuperChecker
commit does not help with the clear construction ofBugType
s (although it could be extended to do so).
@Szelethus @steakhal @NoQ @Xazax-hun What do you think about this plan? Do you see any fundamental issues, or is it OK if I start working on implementing this?