[Analyzer] Pointer escape vs. pointer invalidation on function call

Hi all,

I have some questions about CSA invalidation behaviour for the case where some arguments can escape after call.

1. There is a condition in CallEvent::invalidateRegions():

   if (!argumentsMayEscape())
     findPtrToConstParams(PreserveArgs, *this);

The contents of PreserveArgs changed by findPtrToConstParams() is used later for setting a special invalidation trait for its items: TK_PreserveContents. But, as I understand, if some pointer passed to function can escape, all the pointers passed to function get invalidated independently on can they escape or not. Why we don't just filter the escaping regions and invalidate them but invalidate all the pointers instead?

2. For AnyFunctionCall, we think that void* arguments of can escape:

   if (CallEvent::argumentsMayEscape() || hasVoidPointerToNonConstArg())
     return true;

But because of (1), this means that all other pointers passed to such function (including pointers to const) are invalidated. Checkers that use argumentsMayEscape() method explicitly check that the call is located in system header. So, should we move the check for system header into argumentsMayEscape()? It looks like the commit that introduced this behaviour was targeting system header functions only. And should we avoid the invalidation of pointers to constant memory if some pointer argument can escape?

+Devin.

Yep, this behavior can be improved quite a bit. It's indeed overly conservative to believe that a single escaping pointer argument would cause other const pointer arguments' pointed-to regions invalidated.

argumentsMayEscape() should still be relied upon, it seems, because it contains a nice blacklist of weird functions, eg. c++ - What is the purpose of const pointer to void - Stack Overflow

However, making argumentsMayEscape() more fine-grained, eg. letting it provide a list of arguments that may escape, should be quite an improvement.

Additionally, when structures are passed into functions, we could consider const-qualifiers on pointer fields in these structures and avoid invalidating through pointers stored in const-pointer fields - might be an improvement as well. Direct parameters are not special.

Possible aliasing between arguments should still be taken into account, eg. tests with testMixedConstNonConstCalls() added in https://reviews.llvm.org/D19057 should remain "UNKNOWN". So i'd imagine a two-pass procedure, where on the first pass we mark base regions of all const pointers passed into the function with TK_PreserveContents, and on the second pass we remove TK_PreserveContents from base regions of non-const pointers.

Finally, i agree that argumentsMayEscape() makes sense for plain C system library functions only. It should ideally check that the function is within the standard library, and is a plain-C function, and avoid jumping to conclusions if this is not the case.

25/05/2017 8:07 PM, Aleksei Sidorin wrote:

+Devin.

Yep, this behavior can be improved quite a bit. It's indeed overly
conservative to believe that a single escaping pointer argument would cause
other const pointer arguments' pointed-to regions invalidated.

argumentsMayEscape() should still be relied upon, it seems, because it
contains a nice blacklist of weird functions, eg.
c++ - What is the purpose of const pointer to void - Stack Overflow
ose-of-const-pointer-to-void/7726359

However, making argumentsMayEscape() more fine-grained, eg. letting it
provide a list of arguments that may escape, should be quite an improvement.

Additionally, when structures are passed into functions, we could consider
const-qualifiers on pointer fields in these structures and avoid
invalidating through pointers stored in const-pointer fields - might be an
improvement as well. Direct parameters are not special.

Possible aliasing between arguments should still be taken into account,
eg. tests with testMixedConstNonConstCalls() added in
https://reviews.llvm.org/D19057 should remain "UNKNOWN". So i'd imagine a
two-pass procedure, where on the first pass we mark base regions of all
const pointers passed into the function with TK_PreserveContents, and on
the second pass we remove TK_PreserveContents from base regions of
non-const pointers.

Finally, i agree that argumentsMayEscape() makes sense for plain C system
library functions only. It should ideally check that the function is within
the standard library, and is a plain-C function, and avoid jumping to
conclusions if this is not the case.

We do rely on argumentsMayEscape() to support other languages,
specifically, there is ObjC specific code. I suspect, we would benefit from
reasoning about C++ as well, but currently, C++ method decls overloads are
not implemented. I suspect we need to do more investigation on libc++.

As Alexey pointed out, argumentsMayEscape() is used/called mainly after we
know that the function comes from the system header. I do not recall if
there are cases where it is needed for non-system calls, but we could
envision having heuristics and treating some non-system calls with specific
names the same way we treat system calls.

Note, that the implementation itself does include generic heuristics,
applicable to any functions, not just system functions. For example, it
call:
/// \brief Returns true if any of the arguments appear to represent
callbacks.
  bool hasNonZeroCallbackArg() const;

Further investigation would be great here as that code is somewhat
convoluted!

30/05/2017 10:12 PM, Anna Zaks wrote:

    +Devin.

    Yep, this behavior can be improved quite a bit. It's indeed overly
    conservative to believe that a single escaping pointer argument
    would cause other const pointer arguments' pointed-to regions
    invalidated.

    argumentsMayEscape() should still be relied upon, it seems,
    because it contains a nice blacklist of weird functions, eg.
    c++ - What is the purpose of const pointer to void - Stack Overflow
    <c++ - What is the purpose of const pointer to void - Stack Overflow;

    However, making argumentsMayEscape() more fine-grained, eg.
    letting it provide a list of arguments that may escape, should be
    quite an improvement.

    Additionally, when structures are passed into functions, we could
    consider const-qualifiers on pointer fields in these structures
    and avoid invalidating through pointers stored in const-pointer
    fields - might be an improvement as well. Direct parameters are
    not special.

    Possible aliasing between arguments should still be taken into
    account, eg. tests with testMixedConstNonConstCalls() added in
    https://reviews.llvm.org/D19057
    should remain "UNKNOWN". So i'd imagine a two-pass procedure,
    where on the first pass we mark base regions of all const pointers
    passed into the function with TK_PreserveContents, and on the
    second pass we remove TK_PreserveContents from base regions of
    non-const pointers.

    Finally, i agree that argumentsMayEscape() makes sense for plain C
    system library functions only. It should ideally check that the
    function is within the standard library, and is a plain-C
    function, and avoid jumping to conclusions if this is not the case.

We do rely on argumentsMayEscape() to support other languages, specifically, there is ObjC specific code. I suspect, we would benefit from reasoning about C++ as well, but currently, C++ method decls overloads are not implemented. I suspect we need to do more investigation on libc++.

Yep. My point was mostly about the fact that in AnyFunctionCall::argumentsMayEscape() (which i incorrectly referred to as argumentsMayEscape()) we check the identifier of the function, but we don't check if it's a plain C function (a C++ method may have the same identifier, and we're not checking that it's not a method; overrides are not necessary to break the contract of this method, just add some classes with methods of given names; ObjC messages cannot make it into AnyFunctionCall::argumentsMayEscape() because ObjC message isn't any function) or if it comes from a system library (probably an ObjC library in case of NS*Insert*, but it's still a C-style function, not a method, not a message), so the code in this function doesn't make much sense unless it is called in the context in which we have already checked all this stuff.