Ownership attribute for malloc etc. checking

I do think this is pretty cool stuff -- like Ted said, being able to
generalize the leak-checking code would be nice (perhaps even unifying
malloc/free and Cocoa retain counting someday).

I have a number of minor suggestions, such as asking the attribute which
arguments matter instead of iterating over them all and seeing if they're
important, and following the LLVM comment style (complete, capitalized
sentences ending in [.?!] whenever possible). Also using "!" instead of
"not". *grin*

I think you have a logic error dealing with held memory. You're allowing a
double-free if the current free is a hold, but I think what you want is to
allow /uses/ if the /previous/ free was a hold. Perhaps the RefState kind
needs a new enum value, Relinquished or something.

It'd be nice to write a couple of tests for this new attribute. It's
probably sufficiently new to warrant its own test file in the test/Analysis
directory.

Secretly I would love for the module name /not/ to be checked at all, only
associated with a pointer when ownership_returns is encountered. It would
then be expected that any non-escaping marked pointer is cleaned up by the
end of the function body. This would let people define their own modules
without any extra configuration at all.

const char * __attribute__((ownership_returns(my_app_intern)))
intern(const char *);
void __attribute__((ownership_takes(my_app_intern))) release(const char
*);

void useInput (const char *input) {
  const char *str = intern(input);
  // do complicated stuff with str
  release(str);
}

The downside, of course, is that /any/ use of the pointer as a function
argument would count as escaping, making it not very useful. Perhaps there
could be heuristics for this sort of thing -- a non-builtin module name
/must/ be balanced or explicitly escape, or else be passed to a free-like
function. For things like "malloc" and "fopen", we probably have to be more
lenient.

If this is extended to work with multiple owners, you'll also run into the
problem of two kinds of "hold" -- in both cases, there's a new owner, but
the current owner doesn't necessarily give up ownership. But that's a
problem for another day.

Last, as a small note, I don't think ownership_returns should require a
size argument at all -- it falls down for something as simple as this:

struct Node *allocNode() {
  return (struct Node *)malloc(sizeof(struct Node));
}

I see you're using MallocMemAux(); if you use the second form, which takes
a size SVal instead of an Expr*, then you can handle cases where the size
is unknown.

Overall, I like it! Perhaps clean it up a bit, but having a general
"ownership checker" seems like a great idea.

Jordy

Ok, here’s an updated patch that should cover all your suggestions.

Tests are included, they pass.

It also adds a check for functions returning memory that has been freed/taken (but not held, because while this is somewhat dangerous, it is in fact what malloc itself might do).

Andrew

clang-ownership-withtests.patch (30 KB)

Looking good! Some remaining comments below.

I'm also a reasonably new committer, so someone else should probably look
over this as well (particularly the Attr hunks, which I'm not very familiar
with).

Jordy

General:
- It would be nice to make sure one argument isn't both ownership_takes
and ownership_holds.
- LLVM's convention is to use spaces rather than tabs.
- There's a hunk where the only thing that changed is indentation; it'd be
nice to take that out.
- Two more useful test would be free(p); my_free(p) and free(p);
my_hold(p). Both of which should warn, of course.

(MallocChecker::EvalCallExpr)
+ for (const Attr *attr = FD->getAttrs(); attr; attr = attr->getNext())
{
+ if (const OwnershipReturnsAttr* Att =
dyn_cast<OwnershipReturnsAttr>(attr)) {
+ MallocMemReturnsAttr(C, CE, Att);
+ rv = true;
+ }
+ if (const OwnershipTakesAttr* Att = dyn_cast<OwnershipTakesAttr>
(attr)) {
+ FreeMemTakesAttr(C, CE, Att);
+ rv = true;
+ }
+ if (const OwnershipHoldsAttr* Att = dyn_cast<OwnershipHoldsAttr>
(attr)) {
+ FreeMemHoldsAttr(C, CE, Att);
+ rv = true;
+ }
+ }

Here's the dispatch from EvalCallExpr(). I think even though it's
technically a little less efficient, you should just simplify this and use
FD->getAttr<OwnershipTakesAttr>(), etc. It's a little simpler, and it lets
you order the attributes.

Why does that matter? Because an ownership_takes() or ownership_holds()
function might have a return value! If it's also ownership_returns(), then
you're fine, but otherwise you need to set a symbolic return value. (You
can see how this is done in MallocMemAux(), or in StreamChecker.cpp with
fopen.) There should probably be a test for this as well.

Actually, because these attributes don't affect the return value, they
should probably go in a PreVisitCallExpr() callback, rather than
EvalCallExpr(). But for now it's probably okay, as long as you handle that
return value.

(MallocChecker::MallocMemReturnsAttr)
+ for (const unsigned *I = Att->begin(), *E = Att->end(); I != E; ++I,
++count) {
+ const GRState *state =
+ MallocMemAux(C, CE, CE->getArg(*I), UndefinedVal(),
C.getState());
+ C.addTransition(state);
+ }
+ if (count == 0) {
+ const GRState *state = MallocMemAux(C, CE, UndefinedVal(),
UndefinedVal(),
+ C.getState());
+ C.addTransition(state);
+ }

What does it mean to have multiple size fields? There can still only be
one return value. Maybe you should only allow zero or one positions for
ownership_returns (which would make this section simpler).

(MallocChecker::PreVisitReturnStmt)
+ if (RS->isReleased()) {
+ ExplodedNode *N = C.GenerateSink();
+ if (!BT_UseFree)
+ BT_UseFree = new BuiltinBug("Use dynamically allocated memory
after"
+ " it is freed.");

+ BugReport *R = new BugReport(*BT_UseFree,
BT_UseFree->getDescription(),
+ N);
+ C.EmitReport(R);
+ }

This section checks whether the return value is a released address.
Unfortunately this isn't a 100% unambiguous case; consider the following
(contrived) example:

struct Node *freeSmallest () {
  struct Node *min;
  // find the minimum node
  free(min);
  return min; // so the caller knows the ID of the deleted node
}

As long as the caller doesn't dereference the returned pointer, it could
still be a useful value. 95% of the time, this check would be useful, but
since a false positive is worse than a false negative for the analyzer, we
should probably leave it out.

Oops, forgot to reply-all…

clang-ownership-withtests-r1.patch (30.7 KB)

Hi Andrew,

In addition to Jordy’s excellent comments, here are my comments on your patch. Comments inline:

Index: test/Analysis/malloc.c

A few last comments (along with what Ted had to say). Also, I second being
able to put the attributes on the arguments themselves. (Why didn't the GCC
people do nonnull like that?)

Here's the dispatch from EvalCallExpr(). I think even though it's
technically a little less efficient, you should just simplify this and
use
FD->getAttr<OwnershipTakesAttr>(), etc. It's a little simpler, and it
lets
you order the attributes.

However, I did it this way because I wanted to allow two or more

attributes

of the same kind (for the benefit of macros).

In other words, __attribute((ownership_holds, 1))
__attribute((ownership_holds, 2)) should be the same
as __attribute((ownership_holds, 1, 2)).

Ah, good point. Okay, as long as return values are still handled (see
below).

Why does that matter? Because an ownership_takes() or ownership_holds()
function might have a return value! If it's also ownership_returns(),
then
you're fine, but otherwise you need to set a symbolic return value.

(You

can see how this is done in MallocMemAux(), or in StreamChecker.cpp

with

fopen.) There should probably be a test for this as well.

It isn't necessarily the case that an ownership_takes() function that
returns a pointer generated that pointer from an ownership_takes

argument,

or allocated anything, so what would you set the region to, in the

absence

of other information? They default to Unknown, yes?

I believe so, but for return values from function calls we prefer to use a
"conjured symbol value". This is a kind of value that can be reasoned about
symbolically in later statements -- for example, we can bound its value if
it's an integer, or mark that it's null or non-null if it's a pointer.

What does it mean to have multiple size fields? There can still only be
one return value. Maybe you should only allow zero or one positions for
ownership_returns (which would make this section simpler).

The intention here was to implement something for calloc-like functions

and

matrix allocators; one argument is a special case, more than one the

size

of
the region is the product of the arguments times sizeof(pointee of the
return value), taking sizeof(void) as 1. If it isn't worth doing that,
fair
enough, but that's what I was thinking this could lead to.

It's a good idea, but consider the following examples:

Node *getNode (size_t extra) {
  return malloc(sizeof(Node)+extra);
}

Node *getNode (size_t total) {
  assert(total >= extra);
  return malloc(total);
}

Node *getNodes (size_t count) {
  return malloc(sizeof(Node)*count);
}

Your proposal is to use the second or third meaning, based on whether one
or more arguments are included. I think we should just pick one meaning and
stick with it (i.e. not special-case the one-argument case). Or just take
it out, though as long as we allow the zero-argument case it doesn't hurt
anyone to handle arguments in one particular way.

However, the parser only allows the one argument at the moment, so this

is

FWIW, Andrew replied privately to me with,

"I agree, but so far as I can see that is not possible presently."

which strikes me as a good reason not to do it until Sean's done
making it possible.

I think the checker itself is general enough to handle this… if we find one attached to an argument, we could just generate the index in the attribute parser.

Hmm, but which is cleaner and clearer… have the checker know that arguments might be attributed as well, or have the parser push attributes from arguments onto the function? The latter is probably less code, but might be rather obscure. On the other hand, that whole attribute parser is pretty obscure, and is full of 'fixme’s saying it deserves redesigned.

BTW, all the private replies are unintentional, I’m getting used to some new mail clients at present.

Andrew

I don't think we want to duplicate attributes; attributes on the ParmVarDecls should only be on the ParmVarDecls, unless of course we want to incorporate the the attributes into the function's type. The AST should have high fidelity to the actual source code.

That said, I don't think the checker should be in the business of having to reason about how the function was annotated. The checker just cares about the semantics, while it's up to the parser and the attributes to handle the syntax.

One possible solution is to provide a static function(s) to OwnershipAttr et al that allows one to query a FunctionDecl for the relevant annotations. Then the checker just cares if a parameter is annotated, but it doesn't care how it was written.

So, thinking about this led me to this idea:

These attributes actually have minimal differences so far as the checker is concerned. So how about there being only one attribute object type, probably an OwnershipAttr, with an enum kind embedded in it, that can be applied to an argument. Each one would apply to just one argument or the return value, but may have several arguments (the so far unimplemented calloc case will need this, as will pointer-to-pointer malloc). The parser would have to construct two attribute objects for __attribute((ownership_holds, 1, 2)). That way the dispatch can be a case statement on the enum rather than the current mess of casts and tests. This would also save some of the methods from being templates, which I think would be unnecessary complexity.

Or, again in order to reduce the duplication, I could make templates of some of the methods and use the type to represent the attribute kind the same way as I have so far. Disadvantage: this is probably going to bloat.

Andrew

I'm fine with having one attribute with an enum, but I don't think it is mutually exclusive from having subclasses. You can keep all the key logic in one class, and for clients that want to do queries that are specific to a particular attribute they can do downcast. For example:

class OwnershipAttr {
  enum OwnershipKind { Holds, Takes, ... };
  OwnershipKind OKind;
...
};

class OwnershipReturnsAttr : public OwnershipAttr {
public:
  OwnershipReturnsAttr() : OwnershipAttr(Returns) {}

  static bool classof(OwnershipAttr *A) { return A->OKind == Returns; }
};

This is basically the same idiom we use with Stmts and Exprs in Clang's ASTS. This gives you the best of both worlds. You can use the 'enum' when you want switch on the ownership kind, but still can use the classes to encapsulate the relevant logic and add APIs that are only meaningful for a particular attribute. For example, consider the main analysis function in the static analyzer. It is littered with case statements like:

   case Stmt::BlockExprClass:
      VisitBlockExpr(cast<BlockExpr>(S), Pred, Dst);
      break;

The strength of this approach is you get strong-typing when you want it, but it doesn't add a burden if you don't need it (e.g., if you only care that something is a OwnershipAttr* instead of a OwnershipReturnsAttr*).

Ok, so I implemented what I think is all the suggestions…

The patch is around 100 lines shorter, so that’s a definite improvement.

I’m not sure this is as minimal as it could be.

While I was there, I also changed the use of std::sort in the nonnull parsing, since that’s where I picked it up from in the first place.

Andrew

clang-ownership-withtests-r2.patch (26.7 KB)

So, I’m still not sure I understand what you mean me to do here… could you elaborate?

Since you seem to be the symbolic value expert, I’ll ask about the case I’m trying to solve now. I added some symbolic checks to the free handling, similar to realloc, to prevent warnings in the common case of error handling for allocation handlers.

So now this works:

char * __attribute((ownership_returns(malloc))) foo(void) {
char textString = malloc(128sizeof(char));
if(textString == NULL)
return NULL; // No warning here
return textString;
}

But now I have this:

struct it {
char * s;
};

struct it * __attribute((ownership_returns(malloc))) foo(void) {
struct it *rv = malloc(sizeof(struct it));
if (!rv)
return NULL; // Does not warn here.
char textString = malloc(128sizeof(char));
if(textString == NULL)
free(rv);
return NULL; // Warns about a memory leak here
rv->s = textString;
return rv; // Does NOT warn here
}

>> Why does that matter? Because an ownership_takes() or
>> ownership_holds()
>> function might have a return value! If it's also

ownership_returns(),

>> then
>> you're fine, but otherwise you need to set a symbolic return value.
(You
>> can see how this is done in MallocMemAux(), or in StreamChecker.cpp
with
>> fopen.) There should probably be a test for this as well.
>
>
> It isn't necessarily the case that an ownership_takes() function that
> returns a pointer generated that pointer from an ownership_takes
argument,
> or allocated anything, so what would you set the region to, in the
absence
> of other information? They default to Unknown, yes?

I believe so, but for return values from function calls we prefer to

use

a
"conjured symbol value". This is a kind of value that can be reasoned
about
symbolically in later statements -- for example, we can bound its value
if
it's an integer, or mark that it's null or non-null if it's a pointer.

So, I'm still not sure I understand what you mean me to do here... could
you
elaborate?

Since you seem to be the symbolic value expert, I'll ask about the case

I'm

trying to solve now. I added some symbolic checks to the free handling,
similar to realloc, to prevent warnings in the common case of error
handling
for allocation handlers.

Sorry, I should have been clearer! If you take a look at MallocMemAux,
you'll see these lines:

  unsigned Count = C.getNodeBuilder().getCurrentBlockCount();
  SVal RetVal = ValMgr.getConjuredSymbolVal(NULL, CE, CE->getType(),
Count);
  state = state->BindExpr(CE, RetVal);

This sort of code is used for any function call that returns a value. It's
better than just leaving the expression as UnknownVal because it lets us
track assumptions about the value from then on.

Consider this code:

int freeNode (Node *n) __attribute__(ownership_takes(1)) {
  int id = n->id;
  free(n);
  return id;
}

// in some other function
int x = freeNode(last);
if (x < 0) {
  log("invalid node: %d", x);
  return;
}
// from here on x is known to be positive

So all you would need to add is those three lines, for the case in which
you /are/ handling the function (returning true from EvalCallExpr), /not/
calling MallocMemAux (which is already handling the return value), and the
function's return type isn't void.

Alternately, go ahead and make the ownership_takes/ownership_holds cases
part of a new PreVisitCallExpr method. PreVisitCallExpr doesn't require you
to handle the return value, and allows you to stack multiple checks. This
is probably more correct anyway -- it would let users annotate methods that
may eventually need another checker to actually do the evaluation.

As for the example you showed, it should work, except:

struct it * __attribute((ownership_returns(malloc))) foo(void) {
  struct it *rv = malloc(sizeof(struct it));
  if (!rv)
    return NULL; // Does not warn here.
  char *textString = malloc(128*sizeof(char));
  if(textString == NULL)
    free(rv);
    return NULL; // Warns about a memory leak here
  rv->s = textString;
  return rv; // Does NOT warn here
}

...the code is just missing braces around the second if -- the second
"return NULL" is unconditional!

Clang should catch this. Filing a bug. *grin*

Ok, I get what you’re saying, PreVisit seems the right answer.

struct it * __attribute((ownership_returns(malloc))) foo(void) {
struct it *rv = malloc(sizeof(struct it));
if (!rv)
return NULL; // Does not warn here.
char textString = malloc(128sizeof(char));
if(textString == NULL)
free(rv);
return NULL; // Warns about a memory leak here
rv->s = textString;
return rv; // Does NOT warn here
}

…the code is just missing braces around the second if – the second
“return NULL” is unconditional!

Clang should catch this. Filing a bug. grin

D’oh!

So looking at this version:

void __attribute((ownership_returns(malloc))) foo2(void) {
struct it *rv = malloc(sizeof(struct it));
if (!rv)
return NULL;
char textString = malloc(128sizeof(char));
if(textString == NULL) {
free(rv);
return NULL;
}
rv->s = textString;
return rv; // warns of a leak here
}

How could I make the assignment before the final return relinquish ownership of the pointer?

Andrew

CheckerVisitor also supports PreVisitBind (which is callback that occurs before the RHS gets bound to the LHS). You can use that to monitor ownership transfer. We can also add PostVisitBind if that would be useful.

That said, what are the semantics of the ownership algorithm? Does a leak get flagged here, or does the escape of the value to a field silence the warning?

FWIW, ownership checking in the presence of data containers has been researched quite a bit. Here’s some off-hand references that might be useful:

Static Detection of Leaks in Polymorphic Containers, ICSE 2006
http://suif.stanford.edu/~dlheine/icse06-preprint.pdf

A practical flow-sensitive and context-sensitive C and C++ memory leak detector
http://portal.acm.org/citation.cfm?doid=781131.781150

So, prior to deducing ownership annotations (which I think I see how to do now, for non-pathological code), here’s my latest version of the attributes.

The PreVisitBind implements the same algorithm as already used by the Objective C ownership checker: if the pointer escaped from this scope by assignment, let it go. However, assigning to fields of a stack-storage structure does not transfer ownership.

The remaining issue is still that void foo(void ** it) {it=malloc(42);} warns. How would I check for assignment to a pointee of an argument in PreVisitBind?

This is a git diff, if that won’t apply I have plenty of options for regenerating it. (As an aside, why isn’t the project using git?)

Andrew

clang-ownership-pointers.patch (31.7 KB)

The remaining issue is still that void foo(void ** it) {it=malloc(42);} warns.

Do you mean void foo(void ** it) {*it=malloc(42);} ?

Because the one you mentioned should warn. :wink:

-Alexei

Yes, I did of course mean that :slight_smile:

However… never mind, I’ve sorted that case out myself. Revised patch attached.

clang-ownership-pointers2.patch (32 KB)

I can better answer this question if I knew your desired semantics. Does it have to do with 'it' being a 'void**', that it is a function parameter, or both?