[GSoC] Interested in idea: "Find null smart pointer dereferences with the Static Analyzer"

Greetings,

I am interested to participate in GSoC 2020. I am particularly interested in the project idea “Find null smart pointer dereferences with the Static Analyzer”. I am doing my masters in computer science and interested in program analysis and verification. I thought GSoC2020 will be a wonderful opportunity to learn more about Clang Static Analyzer and contribute.

I have started reading about smart pointers in C++ to get a good grasp of the concepts. Also, has some experience in implementing Clang Static Analyzer simple checks(similar to SimpleStreamChecker) from the tutorials. I read through few available tutorials and have some basic idea about Control Flow Graph, Exploded Graph and Symbolic Values. I have read the paper “A memory model for static analysis of C programs” to get some theoretical background. I also started looking into NullabilityChecker.cpp to understand the codebase.

I would like to know is this the right place to look?

Could anyone please help me on what should I do next?

Hey!

This is a tiny bit outdated, but very great booklet which is must have:
https://github.com/haoNoQ/clang-analyzer-guide

I believe you have missed that tutorial, and there is another cool checker which tries to model the inner pointer of a string, something similar idea to check the guts of a smart pointer:
InnerPointerChecker.cpp

That was another GSoC project: https://rnkovacs.com/gsoc2018/

Hi,

Thanks for the help!

I have followed this guide(https://github.com/haoNoQ/clang-analyzer-guide) also. Even though it was little outdated, it helped to learn about basic ideas like Exploded Graph and Symbolic Values. And with a little bit of googling I was able to implement the simple checker (to find a main call) given in the guide.
As you suggested, I will have a look at InnerPointerChecker.cpp implementation.

Hey!

Welcome. Let's see.

Nullability checker isn't the one that you're looking for. It's a different beast that governs hunt for null dereferences via so-called "nullability annotations". Like, a language extension is provided through which the programmer can tell the analyzer which variables / functions may or may not hold / produce null pointers, and the analyzer checks whether it makes sense how these nullable and non-null values propagate from one function to another. So it's the same problem but a different technique. It is targeted mostly at finding crashes in Objective-C apps that pass a lot of pointers around across many user-defined functions.

The proposed GSoC project is of a different nature: we want to teach the static analyzer about a very specific C++ API, but we want to teach it much more thoroughly. It's not enough to know that std::unique_ptr::operator->() may occasionally return a null pointer; we'd much rather know when exactly does it return a null pointer (eg., if the smart pointer is freshly default-constructed).

If you want to study existing checkers, check out:
- MoveChecker - the use-after-move checker which already finds *some* null smart pointer dereferences, given that they're guaranteed to be null after move.
- SmartPtrChecker currently does almost nothing, but that's probably where you put your code into :slight_smile:
- IteratorChecker is a large ongoing pioneer project to find iterator and container related bugs such as dereferencing vector.end(). It's the closest thing to what you'll be implementing, but its handling of C++ objects is outdated and overly complicated because some new facilities for C++ support (mostly the ones explained in the second half of 2018 LLVM Developers’ Meeting: A. Dergachev “Faster, Stronger C++ Analysis with the Clang Static..” - YouTube) weren't in place yet when it all started.

Once you understand the project a bit better and like it, the next step is to discuss here (in this mailing list) what is the best way to implement the checker. The ultimate outcome of this discussion will be a so-called "GSoC proposal". It's a few pages of text that you write, post here for more discussion, and eventually upload to the GSoC website. According to the GSoC timeline, the proposal should be submitted by the end of March. The proposal summarizes how *you* understand the project and how *you* plan to tackle it during the summer.

Good luck on your GSoC path!
Artem.

Hi Artem,

Thank you very much for this detailed information and help.
I will checkout the existing checkers you mentioned and try to get a better understanding of the problem.

Hello Artem,

I went through the checkers you suggested. I found this project seems interesting to me and I got a very basic idea about it.

I tried to find out few cases where unique_ptr::operator->() returns null apart from default constructed unique_ptr.
Case 1: Use of std::move on std::unique_ptr

It seems its already covered in the MoveChecker.
Case 2: Use after calling release() on std::unique_ptr

When I ran the analyzer for this scenario, it did produce any warnings
Case 3: Use up.reset() or up.reset(nullptr)

Similar to release() case it seems this case also not covered.
Case 4: Get raw pointer via std::unique_ptr.get() then delete

I am not sure about this case. It seems user explicitly trying to break the code.
Case 5: Use after swap(std::unique_ptr, null)

In case we swap a std::unique_ptr with another std::unique_ptr with pointing null.

I am guessing the list is not complete and this will be a first task, to figure out all possible cases.

And some what same we have to come up with for other smart pointers.

Regarding the implementation part, similar to move checker we have to keep a map for memory region and state (whether it is null or not).
States should be updated based on the changes in MemRegion. I was wondering is this the right way? (I know I still have to figure out lot of details regarding concrete implementations)

In case of default-constructed std::unique_ptr object, why can’t we get symbolic value as null and do a check same as what we are doing for raw pointer?
Is it because some limitations on tracking the symbolic values of std::unique_ptr objects?

Hello Artem,

I went through the checkers you suggested. I found this project seems interesting to me and I got a very basic idea about it.

I tried to find out few cases where unique_ptr::operator->() returns null apart from default constructed unique_ptr.
*Case 1: *Use of std::move on std::unique_ptr
It seems its already covered in the MoveChecker.
*Case 2:* Use after calling release() on std::unique_ptr
When I ran the analyzer for this scenario, it did produce any warnings
*Case 3: *Use up.reset() or up.reset(nullptr)
Similar to release() case it seems this case also not covered.
*Case 4:* Get raw pointer via std::unique_ptr.get() then delete
I am not sure about this case. It seems user explicitly trying to break the code.

No-no, that's not how C++ works :slight_smile: the smart pointer wouldn't be aware that the raw pointer is deleted, so it'll keep hosting the pointer and cause a use-after-free instead. We could warn about those as well though; it might turn out to be an easy addition once you get the checker running.

*Case 5:* Use after swap(std::unique_ptr, null)
In case we swap a std::unique_ptr with another std::unique_ptr with pointing null.

I am guessing the list is not complete and this will be a first task, to figure out all possible cases.
And some what same we have to come up with for other smart pointers.

Regarding the implementation part, similar to move checker we have to keep a map for memory region and state (whether it is null or not).
States should be updated based on the changes in MemRegion. I was wondering is this the right way? (I know I still have to figure out lot of details regarding concrete implementations)

Yup, I think that's the most solid and straightforward solution. Note that you will have to not only enumerate all situations when the smart pointer becomes null, but also all the situations when the smart pointer becomes non-null.

In case of default-constructed std::unique_ptr object, why can't we get symbolic value as null and do a check same as what we are doing for raw pointer?
Is it because some limitations on tracking the symbolic values of std::unique_ptr objects?

Manipulating symbolic values inside the smart pointer is indeed another possible solution. The annoying limitation that we run into here is that our memory model ("RegionStore") doesn't currently allow setting a "default" binding to a whole object when it's a part (say, a field) of a bigger object. This basically means that we have to understand how does the smart pointer work internally (which field corresponds to what) in order to manipulate its symbolic value, which ties us to a specific implementation of the C++ standard library. This might still work for a unique pointer which probably always has exactly one field, but for shared pointers things get complicated.

You can try to overcome this limitation of RegionStore if you're eager enough but that'll be challenging and potentially a lot of work. And even if there wasn't this limitation, this approach isn't necessarily much easier than the first approach.

Hi Artem,

Thank you very much for the help and explanation.

I have created a draft for my GSoC proposal and I am sharing the link with this mail.
Could you please have a look and let me know the feedback for the draft proposal?
Draft proposal: https://docs.google.com/document/d/1HORDm6cq3YolYTPGFLCF5gLjIyf6bygHeWzG1kd1T-g/edit?usp=sharing

The proposal looks good to me with your latest fixes! I encourage you to upload it to the GSoC website; their deadline is tomorrow. Thanks again for your interest :slight_smile:

Hi Artem,

I have uploaded the final proposal.
Once again, thank you very much for your help and support.