Google Summer of Code 2018

Dear All,

I’m Réka Kovács, a final-year M.S. student from Eötvös Loránd University, Budapest, and I would love to work on a Clang SA-related GSoC project this summer.

I’ve been working on static analysis for the past half a year and started meddling in Clang by submitting a few patches:

  • 3 Clang-Tidy checks [1][2][3],

  • a Clang SA check [4],

  • a diagnostic flag extension [5][6], and

  • a tiny tweak in the core [7].

I’m currently studying constraint solving issues in symbolic execution as part of a university project, and plan to continue with a PhD focusing on Clang-related stuff.

I was initially most interested in the Z3 integration project, but I’ve noticed that Mikhail has applied already. Creating a checker for dangling string pointers would also be an interesting challenge, so I’d like to express my enthusiasm for that project.

The main goal for me would be to get more comfortable with the inner workings of the analyzer and learn as much along the process as possible.

I’m also open to any other suggestions, so please be so kind to share your thoughts with me.


[1] bugprone-suspicious-memset-usage:
[2] bugprone-undefined-memory-manipulation:
[3] bugprone-integer-division:
[4] alpha.cplusplus.DeleteWithNonVirtualDtor:
[5] -Wenum-compare:
[6] -Wenum-compare-switch:
[7] model unrepresentable left shifts:

Hey, welcome!

First of all, it's great that you let us know about your interest in the Z3 integration project. It might be puzzling for us to come up with the final arrangement, given how the project doesn't seem to be easy to quantize for cooperative work, but a lot of things may change by the time everything is settled, and your enthusiasm is an important piece of the puzzle!

We didn't come up with other exciting project ideas so far, but the list of projects is definitely not set in stone. I'd let you know if anything shows up, and please feel free to share your ideas of how the analyzer could be improved or what features you want it to have :slight_smile: I guess i'd explain the other project a little bit, for completeness.

The use-after-free-like checker for values managed by temporary objects should be an easier and more straightforward project than Z3, but there are quite a lot of unknowns here as well. Because internals of std::string and other similar classes are too hard for the analyzer's generic use-after-free checker to understand (partially because of the lack of a good solver, but mostly due to how hard it is to track STL's internal invariants, and how not all of the code is necessarily present in the header), an API-specific checker seems to be necessary. The original plan we've had in mind was to keep track of dangerous values like str.c_str() in the program state (similarly to how SimpleStreamChecker tracks file descriptors) and then see if any of them are still present in memory at the end of the original value's lifetime (similarly to how StackAddrEscape checker finds stack pointers at the end of a function's stack frame).

With this description and your knowledge, you'd probably be able to think of how the checker might be implemented (and if it's of interest to you) - though also feel free to ask if you have any questions! The unknowns here include how easy would it be to track scopes (for now we only track function scopes, but if fairly old but recently reincarnated patches [1] and [2] land any time soon, we may get a much better granularity), how easy would it be to track objects when they are moved or lifetime-extended by binding to references, which was a large problem for other C++ object checkers, but we may work our way around it to some extent (or do it properly, depending on my current work outlined in [3] and in follow-up mails in February), and also how helpful inlining would be (eg. would we be able to automagically support string_view-like classes by inlining their methods?). So the checker would need an almost indefinite amount of incremental improvements once the initial prototype is done, some of which must be fairly curious and would certainly expose you to some of the analyzer's internals.