[analyzer] MallocChecker & checker dependencies

Hi there!

While thinking about how it would be possible to implement various smart ptr related checkers I tried to review current state of MallocChecker and came up with that it would be great to have RegionState info available to other checkers. Could you please share your points of view and comments on the following statements / questions:

  1. Is there any right way for chaining checkers? How they are expected to communicate between each other (excluding generation of nodes / ProgramStates). I’ve heard that there are couple of problems caused by inlining functions, constructors / descructors.

  2. What do you think about moving RegionState to the Core of CSA or providing optional extended info in MemRegion about the source of such region (new opearator / array new, malloc, alloca, etc). So it would be available to all checkers.

  3. Is there any roadmap for CSA and especially for dynamic memory management modeling & related checkers?

Regards, Alexey K

Hi Alexey!

Could you share your ideas of checker design? It is possible that problems you met can be solved in different (maybe even better) way if we know the whole picture.
Regarding your questions, you can find some answers below.

1. a) The first way for checker communication is to share their program state trait. You can see an example of how state trait is exported in GenericTaintChecker or MPIChecker. Generally, you just create a named ProgramStateTrait in the header. You can take a look at TaintManager.h and MPITypes.h and how they are used.
b) To set a dependency from another checker, you can just register it while registering your checker. An example can be found in MallocChecker where register$Checker also calls registerCStringCheckerBasic to register a checker it depends on.
As you pointed, inter-checker communication can become a source of some problems. Most of them are discussed in this conversation: http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html

2. I think there is nothing bad in sharing RegionState across checkers in the way shown in 1a.

3. Artem Dergachev has done some excellent work on improvement of operator 'new' processing in CSA engine. Regarding checkers, I can see some on https://clang-analyzer.llvm.org/potential_checkers.html: for example, undefbehavior.AutoptrsOwnSameObj. You can search this list to find more.

15.03.2018 23:54, Alexey Knyshev via cfe-dev пишет:

Dynamic memory management is pretty much done at this point - we have very good support for malloc()-like stuff and reasonably good support for operator new() (support for operator new() is currently missing because it requires calling an indefinite number of constructors which is not yet implemented). There is also a known issue with custom operator new(...) returning null pointers - in this case we should not be calling the constructor but for now we evaluate the constructor conservatively.

Your real problem will be managing smart pointers themselves because they are C++ objects that have a myriad of methods, they can be copied, moved, assigned, move-assigned, they destroyed, they lifetime-extended, you need to keep all of this in mind when writing a checker. This is slowly getting easier because i'm currently working on that, but until recently it wasn't working correctly in the core, let alone checkers, which is why the few experimental C++ checkers that we currently have required heavy hacks to become possible. But for now you'd rather keep an eye on these problems.

Hello guys,

And thanks for your attention

Aleksei,

Could you share your ideas of checker design? It is possible that problems you met can be solved in different (maybe even better) way if we know the whole picture.

As I said, I’m interested in improving current state of dynamic memory modeling. Especially stuff related to smart pointers (e.g. shared, unique) which also mentioned in list of potential checkers as smartptr.SmartPtrInit

You can see an example of how state trait is exported in GenericTaintChecker or MPIChecker. Generally, you just create a named ProgramStateTrait in the header.

Briefly looked at MPITypes.h. Does it mean we should move RegionState to separate file and register it via Traits in the same manner to make it avaliable from other checkers (not other TU as mentioned in ento::mpi::RequestMap)?

Artem,

you need to keep all of this in mind when writing a checker.

Sure, but on the other hand I think it’s possible to implement and improve modeling of various API calls’ effects step-by-step. Let’s say, it case of SmartPtrInit checker mentioned before the bare minimum would be modeling of construction (and destruction to avoid leak report from existing related checkers).

which is why the few experimental C++ checkers that we currently have required heavy hacks to become possible. But for now you’d rather keep an eye on these problems.

Does it mean it’s better to wait a bit for Core improvements from main contributors? Does it make sense to make an efforts to implement SmartPtrItnit checker in current state of things?

Thanks in advance and regards,

Alexey K

Sorry, accidentally removed part of message before sending.

Aleksei,

Could you share your ideas of checker design? It is possible that problems you met can be solved in different (maybe even better) way if we know the whole picture.

I’m currently have no clear idea how to implement it in the “right manner”. But first of all I would aim to track the origin (malloc, new, new , etc) of SVal and check if Deleter is the expected corresponding way to deallocate such SVal.

Regards, Alexey K

Sorry, accidentally removed part of message before sending.

Aleksei,

    Could you share your ideas of checker design? It is possible that
    problems you met can be solved in different (maybe even better)
    way if we know the whole picture.

I'm currently have no clear idea how to implement it in the "right manner". But first of all I would aim to track the origin (malloc, new, new , etc) of SVal and check if Deleter is the expected corresponding way to deallocate such SVal.

This should already be supported by MallocChecker - i.e. it warns when you allocate memory with, say, new() and release it with free() or delete().

If methods of a smart pointer are "inlined" during analysis, these bugs will already be caught automatically.

Regards, Alexey K

2018-03-16 22:22 GMT+03:00 Alexey Knyshev <alexey.knyshev@gmail.com <mailto:alexey.knyshev@gmail.com>>:

    Hello guys,

    And thanks for your attention

    Aleksei,

        Could you share your ideas of checker design? It is possible
        that problems you met can be solved in different (maybe even
        better) way if we know the whole picture.

    As I said, I'm interested in improving current state of dynamic
    memory modeling. Especially stuff related to smart pointers (e.g.
    shared, unique) which also mentioned in list of potential checkers
    as smartptr.SmartPtrInit
    <https://clang-analyzer.llvm.org/potential_checkers.html&gt;\*

    *

        **You can see an example of how state trait is exported in
        GenericTaintChecker or MPIChecker. Generally, you just create
        a named ProgramStateTrait in the header.

    Briefly looked at MPITypes.h. Does it mean we should move
    RegionState to separate file and register it via Traits in the
    same manner to make it avaliable from other checkers (not other TU
    as mentioned in ento::mpi::RequestMap)?

GenericTaintChecker is made in a fairly intrusive manner by putting stuff into the program state class directly, which isn't very sane. I'd definitely prefer something similar to dynamic type propagation (see DynamicTypeMap.cpp, DynamicTypeMap.h). You can use the usual REGISTER_MAP_WITH_PROGRAMSTATE (etc.) macros in the .cpp file as long as you provide accessor methods declared in the .h file that other checkers could include.

    Artem,

        you need to keep all of this in mind when writing a checker.

    Sure, but on the other hand I think it's possible to implement and
    improve modeling of various API calls' effects step-by-step. Let's
    say, it case of SmartPtrInit checker mentioned before the bare
    minimum would be modeling of construction (and destruction to
    avoid leak report from existing related checkers).

        which is why the few experimental C++ checkers that we
        currently have required heavy hacks to become possible. But
        for now you'd rather keep an eye on these problems.

    Does it mean it's better to wait a bit for Core improvements from
    main contributors? Does it make sense to make an efforts to
    implement SmartPtrItnit checker in current state of things?

You should feel free to start working on the checker in an incremental manner (everything is better in an incremental manner!), but in some cases i may prefer improving the checker API or fixing core modeling bugs instead of writing large portions of checker code to work around inconvenient APIs or bugs, and this is something i might not be immediately capable of doing myself (due to being busy with other things), which may potentially delay your progress.

This should already be supported by MallocChecker - i.e. it warns when you allocate memory with, say, new() and release it with free() or delete.

If methods of a smart pointer are “inlined” during analysis, these bugs will already be caught automatically.

I took a look at your recent work & status report posted there in mailing lists. And I have a question about how it would work in particular cases, e.g.:

  1. Returning smart_ptr by value from function that has it’s implementation in analyzed translation unit but it isn’t called from any other function in the same TU.

std::shared_ptr createInstance() {
return std::shared_ptr(new S, free);

}

If I understand correctly, compiler is free to apply copy-elison optimization on shared_ptr which is returned by value. So destructor call won’t be modeled properly because it’s not inlined during analysis. And bug won’t be caught.

  1. As you mentioned before, report would be fired only if constructor and destructor have been inlined (which is not guaranteed to be done). As the result such check won’t work in general case.

Actually my point is about modeling smart pointers behavior specifically and don’t inline any calls to constructor / destructor & methods those modeled by checker. And the question immediately arises: is there way to control inline policy?

And sorry for any possible misunderstanding from my side as I’m not quite familiar with CSA internals.

Regards, Alexey K

A bit out of scope question. Has someone ever faced problem with declaring/implementing ento::register${CHECKER_NAME} function (if I recall correctly Aleksei already has). I don’t understand what’s wrong with it, but I have to wrap it into ‘namespace clang { namespace ento { … } }’ explicitly which probably works only because linker finds it somehow. In addition I have no chance to register any other checker as the dependency via calling register${OTHER_CHECKER_NAME} even if I add appropriate includes. Thanks in advance!

Sample clang output:

error: ‘void clang::ento::registerSmartPtrInitChecker(clang::ento::CheckerManager&)’ should have been declared inside ‘clang::ento’

Regards, Alexey K

    This should already be supported by MallocChecker - i.e. it warns
    when you allocate memory with, say, new() and release it with
    free() or delete().

    If methods of a smart pointer are "inlined" during analysis, these
    bugs will already be caught automatically.

I took a look at your recent work & status report posted there in mailing lists. And I have a question about how it would work in particular cases, e.g.:
1. Returning smart_ptr by value from function that has it's implementation in analyzed translation unit but it isn't called from any other function in the same TU.

std::shared_ptr<S> createInstance() {
return std::shared_ptr<S>(new S, free);
}

If I understand correctly, compiler is free to apply copy-elison optimization on shared_ptr which is returned by value. So destructor call won't be modeled properly because it's not inlined during analysis. And bug won't be caught.

Indeed, the tempting - but risky - thing to do here is to declare construction of "std::shared_ptr<S>(new S, free);" to be a bug on its own. It is risky because it leaves us with a checker that finds non-bugs; there is a chance that the destructor will never be called while the pointer is still being owned by the shared_ptr. Such code would be ridiculous but at the same time we wouldn't like to drive away users who had to write this to work around an equally ridiculous proprietary API that they have no way of changing.

In this particular case i'd have tried to take the risk, keeping in mind that i might end up rewriting the checker back to the conservative behavior of only warning at the destructor.

Note that copy elision is irrelevant, because in any case we won't free() the pointer at the return site, because reference count will never reach zero within the function. That's the whole point of the shared pointer.

2. As you mentioned before, report would be fired only if constructor and destructor have been inlined (which is not guaranteed to be done). As the result such check won't work in general case.

Yeah, that's pretty much the primary motivation for making an API-specific checker. It's not super strong in this case because this would likely give a very slight increase over the existing coverage.

In this case, however, the much more important motivation is to avoid false positives that occur when we don't know the original reference count of a shared pointer that was constructed before analysis has started - and we start "assuming..." that any destructor of a shared pointer could release memory. This is currently planned to be suppressed in a fairly gross manner (https://reviews.llvm.org/D44281). For this reason we believe that having explicit modeling for various smart pointers is a must.

Actually my point is about modeling smart pointers behavior specifically and don't inline any calls to constructor / destructor & methods those modeled by checker. And the question immediately arises: is there way to control inline policy?

It is indeed possible to make the checker "take over" function call modeling by subscribing on the `eval::Call` callback. In this case both inlining and conservative evaluation by the core are avoided, and the checker who evaluates the call is responsible for modeling *all* effects of the call (because two checkers cannot evalCall the same call, even though they can do the usual checkPreCall/checkPostCall thingy as much as they want).

There is also the BodyFarm mechanism (which allows you to mock simplified ASTs for standard functions) but you DO NOT WANT to use it for C++ because it'd take you forever to construct the correct C++ AST with templates manually.

Hi all,

During last few days I was working on std::shared_ptr constructor modeling and trying to understand how path sensitive analysis works. After some time I faced unexpected modelling of Call arguments SVals. Here is the portion of code that caused analyzer behavior that I don’t understand:

auto del = std::default_delete();
return std::shared_ptr((S*)malloc(sizeof(S)), del);

The case I observed: first argument SVal is modeled properly and I can get RegionState info from program state (provided by MallocChecker). The second one (del symbol) is unknown (DelSVal.dump() shows me that). On the other hand, the following code doesn’t cause such problem and prints ‘&code{free}’ as expected:

const auto &del = free;
return std::shared_ptr((S*)malloc(sizeof(S)), del);

So, the question is: what can cause such imprecise modeling of SVal (Deleter - second constructor arg)?

P.S. I’ve changed shared_ptr constructor declaration in ‘test/Analysis/Inputs/system-header-simulator-cxx-std-suppression.h’:

shared_ptr(_Tp* __p, void(deleter)(void))

to

template

shared_ptr(_Tp* __p, Deleter d)

and I don’t understand why it was declared with function pointer argument instead template. But looks like it isn’t the source of my problem at first sight.

Regards, Alexey K

Hi all,
Any ideas on it?

Regards, Alexey K

“Well,” said Owl, “the customary procedure in such cases is as follows:” 1. View the exploded graph in GraphViz (as described in ). 2. Find the top-most node on it that has incorrect or missing info. 3. Does the node have a checker tag at the bottom? If so, find the respective code that adds transitions in the checker. 4. Otherwise see what statement is being evaluated at this node. The code you’re interested in lies somewhere within ExprEngine::Visit() for the respective statement kind. 5. Put a breakpoint in the code you’ve found and debug step-by-step until you see the problem. Exploded graph is quite overwhelming at first, but it’s not as hard as it looks. In your case the call argument is an UnknownVal - which means that you know that we couldn’t evaluate argument expression. If you’re not sure what kind of expression it is, consult the -ast-dump. The (last) node that evaluates that expression would not contain the value for the expression (because UnknownVal is dropped in Environment - but not in Store). If any of its sub-expressions are also incorrectly dropped, you might want to look at them first. You should only use these simulators for regression testing. They’re simplified standard library headers that try to be as accurate as possible but don’t guarantee that. After all, For your actual development, you should use the real system headers, because that’s what the users will care about, and then try to update the simulators to simulate them as closely as you need.