`checkLocation` vs `checkBind` when `isLoad=false`

Hi all, while doing some refactorings, I noticed that checkLocation is not always called on storing a value to a location.
I was expecting that checkLocation can be replaced by checkBind assuming that the isLoad=false of course. However, it turns out there are cases when checkBind gets called but the checkLocation doesn’t. I could probably get a concrete example, but I already moved on, but l still post this question.

After all this, I wonder, why do we even have these two if they essentially achieve the same thing (for stores), or at least they should do the same?

Shouldn’t we “merge” the two and create a separate callback, say checkLoad for the cases when we would have called checkLocation with isLoad=true?

@Xazax-hun @NoQ @DonatNagyE

This sounds to be a reasonable idea. I don’t know too much about the relationship and small differences between these two callbacks, so I cannot contribute much at this early stage.

Renaming checkLocation to checkLoad and remove the isLoad=false case sounds good to me. It will make the API simpler. The difference between checkBind and checkLocation with isLoad=false was never documented, and I am not aware of any deliberate decisions that would prevent us from making this simplification.

It turns out it’s quite easy to make the transition for almost all checkers. There are two exceptions:

  • MallocChecker: When replacing checkLocation(/*isLoad=*/false) callbacks with checkBind, we need to adjust the S if it’s an assignment operator to BinOp->getLHS(). After this, these tests also pass.
  • DereferenceChecker: Well, this has problems. I couldn’t yet figure out how to split checkLocation here.

When I was reading through the code, I found this inside evalStore():

///  @param AssignE The assignment expression if the store happens in an assignment.
///  @param LocationE The location expression that is stored to.
void ExprEngine::evalStore(...) {
  // Proceed with the store.  We use AssignE as the anchor for the PostStore
  // ProgramPoint if it is non-NULL, and LocationE otherwise.
  const Expr *StoreE = AssignE ? AssignE : LocationE;

  // Evaluate the location (checks for bad dereferences).
  ExplodedNodeSet Tmp;
  evalLocation(Tmp, AssignE, LocationE, Pred, state, location, false);

  if (Tmp.empty())

  if (location.isUndef())

  for (const auto I : Tmp)
    evalBind(Dst, StoreE, I, location, Val, false);

This is probably (the/one) snippet that causes me headaches, and passes a different Stmt than I’m expecting in evalBind. Aside from that, it’s also weird to me that evalStore actually reads from a location. I think, if we get this separation done, this load will also disappear from here.

I might be missing something, but where is the read in evalStore? Since we invoke evalLocation with isLoad=false, I’d expect that to be a store.

There is also another difference here. It looks like evalLocation is called unconditionally, but evalBind is only called when we have a proper location. This almost sounds like as if evalLocation would have the purpose of emitting diagnostics (e.g., checking if we are loading an undef value), and evalBind would be for modeling extra logic on a successful store.

I am not 100% convinced whether this distinction is useful, but if we want evalBind to replace checkLocation, we’d need to invoke checkBind unconditionally here.

Indeed, it looks like this one is not a trivial change to make.

In some rare cases, some of the ArrayBoundV2 hits are replaced by cplusplus.NewDelete, e.g.:

void top() {
  int *p = new int[0];
  int n = *p; // warn: either ArrayBoundV2 or NewDelete checkers

This is likely because of how checker callbacks are ordered, and the MallocChecker wins over ArrayBoundV2. If I enable/disable the V2 checker, the other report shows up, see.
The new report offers more specific messages, thus I favor this change.

Now, the numbers look like this:
Out of like 180+ projects, we have differences in only 18 projects, constituting to around 40 report diffs.
Many (if not all) of the report diffs are of the mentioned case, where we have new T[0], and dereference the resulting pointer; and we have a better message.

I also experimented with making the change completely NFC, but it’s really tedious doing so I dropped the idea.

So, if this is still a favored refactoring, I could prepare a PR.
Let me know.