Static Analysis in Clang Round Table - Notes


Here are the notes from the static analysis round table at LLVM Dev meeting 2022.
Note that the round tables were a challenging environment as multiple round tables happened at the same time in the same room making it really hard to listen. Some of the notes are from recollections of the participants, sorry for the inaccuracies.

Attendees: Artem Dergachev, Yitzhak Mandelbaum, Bruno Lopes, Jan Korous, Kostya Serebryany, Paul Kirth, Prabhu Rajasekaran, Vince Bridgers, Devin Coughlin, Ziqing Luo, Balazs Benics (online), Gabor Horvath, and more…

First, we started discussing Apple’s proposal for bounds-safe C++.

Jan: Apple working on hardening C++ code. Hardened libc++ (dynamic checks) and use FixIts to automatically modify the code to use the hardened interfaces (e.g., span), but need to be conservative so they don’t risk introducing bugs. Working to upstream (both the tooling and the hardened libc++). Adoption is expensive and some cases can’t be automated - most cases should be though. Performance regression cost is worth it to avoid the security bugs. New Clang Static Analyzer checks to find bugs in transformed code (e.g., span created with the wrong size).

Vince: What about C?
Jan: Label unsafe blocks and prohibit pointer arithmetic in the rest. Then prioritize the unsafe blocks in reviews and testing.

Apple also works on a different approach for C, RFC is coming.

Gabor: When there are a small number of options for the size of the span, could we offer multiple fixits and let the user pick the right one (instead of just adding a placeholder for the size that the user needs to edit)?

Artem: Maybe, but many functions pass a “size” and use complicated expressions like “size/2” or “size*2” as the actual bounds. It would be hard to handle those.

Gabor: The project is open ended, looks like we could spend an infinite amount of time recognizing more and more patterns. What is a reasonable cut point?

Artem: We plan to increase the number of patterns over time, no well-defined cut point at the moment.

Kostya: Thanks for working on this, Apple beat us to this project, but this is something that someone would have volunteered to do sooner or later. It is a great project.

Jan: If you looked into something similar internally, did you have requirements?

Kostya: ??? (Had hard time listening in the noise)

Q: What about the code size?

Jan: Don’t have exact numbers, but so far it did not look too bad.

Later, we continued discussing the new dataflow framework in Clang:

Yitzie: Short introduction of the project, based on abstract interpretation, soundness is a goal, Clang only had some ad-hoc solutions so far. Summarized the differences between the Clang Static Analyzer (path-sensitive, symbolic execution), vs the new framework (all-path properties). There are already some checks in Clang Tidy based on this framework.

Artem: One problem with dataflow, it is harder to explain the problem to the users, harder to get a “path” to explain what the problem is.

Yitzie: Could we hand over the reports found by the dataflow framework to the Clang Static Analyzer to get an easy-to-understand example?

Gabor: Handing over “goals” to the analyzer could be useful. E.g., when someone is interested in division by zero problems, we could guide the analysis to maximize the coverage of division operators. Also, concolic execution, if we can harvest paths using instrumentation from runtime, we could avoid a whole class of false positives in the analyzer (sidestepping the infeasible path problems).

Artem: The analyzer’s current architecture might not be a good fit for such guided analysis as it’s designed to be run with many checks collaborating to produce good analysis results, it could invalidate the expectations/internal invariants of some checks if we’re only trying to run a single check. But it might be worth looking into this in the future.

Paul: Question about some capabilities of the dataflow framework (information leak?)

Yitzie: It does not have such capabilities out of the box, but it could be added.

Q: Inter-procedural analysis?

Yitzie: There are some capabilities already, but they are limited.

Q: What other synergies are there between the Clang Static Analyzer and the Dataflow Framework.

Artem: Loop widening, …

Balazs: Can we use the dataflow analysis to create function summaries for the Clang Static Analyzer?

Artem: Yes! Such summaries can be used to improve precision of the analysis and make interprocedural analysis faster. However, when it comes to using summaries for the purpose of finding more bugs: The Clang Static Analyzer has an easy-to-use API to write checks, so if we introduce summaries this API will become more complicated to use. It was a goal to make checks easy to write. On the other hand, sometimes we do want to have some summaries (e.g., function returns null when null pointer was passed to it, etc).

Gabor: I really like the API of the Clang Static Analyzer, it introduces an abstraction layer above C++, we don’t need to pattern match on a big language (e.g., how many ways are there in C++ to write to a memory location?), but we can subscribe to callbacks like OnStore and get notifications each time a write happens. This makes writing checks way easier, and we don’t need to be C++ experts.

Yitzie: It would be nice to have similar abstractions for the dataflow framework which is currently using AST matchers heavily in the checks. It is absolutely in scope, but the team is short on resources at the moment.