Potentially expensive analysis for missing `[]` in delete expression

Hi,

During a demo, audience caught that I forgot [] in delete expression. clang didn’t warn about it. I was wondering whether this is due to the cost of analysis. My sample was something like:

class A {
int a;
public:
A() : a(new int[5]) { }
~A() { delete /
[]*/ a; }
};

but following doesn’t seem like to analyze missing [] either:

% clang++ -fsyntax-only -x c++ -pedantic -Wall -Wextra -

void f() {
int *a = new int[5];
delete a;
}
%

Is this analysis omitted due to performance concerns or simply an oversight?

Analyzing my sample requires clang to analyze all ctor-initializers, then in-class initializers, to check whether member was new[]'ed each time it sees delete expression where pointee is a MemberExpr. For the latter case, I think checking whether VarDecl initialized with new[] is enough. Did I get this right? I already have a patch for this, but I didn’t test its performance impact on a sufficiently large code base. Do we have previous implementation experience with this?

Ismail

This is an interesting analysis that would really help a lot of people. However, if the analysis intend to be truly accurate, I think it should be implemented in the static analyzer.

For example:

if (cond)
a = new int[5];
else
a = (int*) malloc(sizeof(int)*5);

would easily trick a CFG-based analysis because it can’t know at compile-time whether cond is true or false.

Nevertheless, I still believe that a front-end analysis can help in simpler cases, but we can’t expect to be 100% accurate.

The static analyzer does have this check:

:3:5: warning: Memory allocated by ‘new[]’ should be deallocated by
’delete[]’, not 'delete’
delete a;
**^~~~~~~~**

…but only if it can see both the allocation and deallocation sites in the same path.

Jordan

Hi, I am new here and I am wondering… does the frontend or the static analyzer have any support for generic abstract interpretation?

I would imagine most of the static analysis done in the frontend is abstract interpretation in some form, but I am utterly lost in the code so I have no clue how much of it may be generic and how much is just hardcoded special cases.

I'm not quite sure what you mean here. If what you're asking is whether the static analyzer is a generic virtual machine, then no—it operates on Clang CFGs and ASTs, meaning it's "limited" to C, C++, and Objective-C. (And in theory OpenCL and CUDA.) The downside of this is that it can't handle arbitrary LLVM IR, from other languages or even from C-family language constructs that are hard to model. The upside is that it has a much stronger understanding of the intent of the user's code, and can do a better job presenting issues it finds.

The general design of the analyzer (graph traversal exploring a state space, informed by callbacks) could apply to any language, but the current implementation is not immediately reusable.

Jordan

That is not what I mean.

“Abstract interpretation” is a generic framework for expressing static analyzers, it has nothing to do with the language being analyzed.
In essence, the analysis has its information bound to edges in CFG, and for each kind of CFG node there is a callback that changes information on one side of the node according to information on the other side. For example in variable liveness analysis, you have code that says “the variable is live before an assignment iff it is live after the assignment and is not assigned by it, or it is used by the rhs of the assignment”. With this, you compute a fixed point over the entire CFG. The evaluation strategy is important for performance, but not for correctness, and every analysis expressed as abstract interpretation is simply a bunch of short visitors that get called repeatedly by a generic framework until the analysis stabilizes.

Now to answer my own question, I looked at the liveness analysis in Clang in particular (LiveVariables.cpp) and it seems to me it implements all the responsibilities of a generic framework all within itself. As a result, there is much more code than there need be, and that code is difficult to understand. Therefore, either I am reading the code wrong, or there is no generic framework in clang for abstract interpretation-based analyzers, the existence of which I was asking about.

Ah, thanks for the clarification. You are correct; the infrastructure in LiveVariables, CFG, and CoreEngine is not really generic enough at the moment to be used as the basis of another analysis engine, and ExprEngine, RegionStore, and SVal are very intricately related to the C family of languages.

Patches to make parts of the analyzer more general would probably be accepted as long as they didn’t regress on performance.

Jordan

Jordan,

I am trying to make my patch submission-ready. If I try to diagnose
this case in frontend and issue a warning on the problematic line,
analyzer doesn't issue anything, as far as I can see. I ran tests, and
it failed in test/Analysis/MismatchedDeallocator-path-notes.cpp:
% clang -cc1 -analyze
-analyzer-checker=core,unix.MismatchedDeallocator
-analyzer-output=text MismatchedDeallocator-path-notes.cpp
MismatchedDeallocator-path-notes.cpp:10:3: warning: 'delete' applied
to a pointer that was allocated with 'new[]' treated as 'delete[]'
  delete p; // expected-warning {{Memory allocated by 'new[]' should
be deallocated by 'delete[]', not 'delete'}}
  ^ ~
        []
MismatchedDeallocator-path-notes.cpp:7:12: note: allocated with 'new[]' here
  int *p = new int[1]; // expected-note {{allocated with 'new[]' here}}
           ^
1 warning generated.

With my patch, frontend issues a new warning at line 10, and this
prevents analyzer to issue its own. Presumably, this is a known
behavior of the analyzer. What can I do in this case? Does disabling
my warning in analyzer tests make sense?

Ismail

The reason analyzer doesn't issue its warning is because my patch
causes delete expression to be treated as 'delete[]'.