Hi CSA developers,
I’ve submitted a round table request for the upcoming Dev Meeting (Nov 17-19). Would be great to have a discussion.
Please also invite colleagues who you think might be interested.
Thanks,
Gabor
Hi CSA developers,
I’ve submitted a round table request for the upcoming Dev Meeting (Nov 17-19). Would be great to have a discussion.
Please also invite colleagues who you think might be interested.
Thanks,
Gabor
+1
+Deep because he expressed interest.
Yay! Yes, absolutely, let's have that.
+Andrew and Bruno who attended our tiny cozy static analyzer round table at the bay area meetup!
Andrew, you have some notes from that round table, do you think it makes sense to share them in this mailing list thread?
Hi Randell,
Thank you for your email, I am forwarding it to the list.
Is the list set up to block new subscribers from posting until moderators review?
I really don’t know, maybe someone in the list will know.
Are we having this? If so, when?
It’s 9:30 - 10:30 Wednesday (PST / UTC-8), already up on
Hi!
There are some notes from the CSA roundtable. The meeting was pretty long and we had a chance to get a glimpse into GCC’s analyzer as well, thanks a lot to David for sharing all that info! The notes might be a bit spotty, feel free to add more info in replies or correct anything that I got wrong.
People introducing themselves
Question from the chat: Is analyzer for LLVM IR? No, for source code.
Discussing best way to invoke the analyzer using command line, scan-build, codechecker.
Should we adopt -fanalyze from gcc? It is good for compatibility but we might have different output so we are conflicted. Want to keep this option open.
David, GCC’s analyzer’s author is interested in implementing SARIF so we have a common output format across GCC and Clang.
Discussing what are the best way to get started with CSA development
Short status report on Deep’s GSoC project
Can we use the static analyzer to transform source code? Technically, it can emit fixits, but we would not recommend doing it from the analyzer. For fixits you need to understand all execution paths, analyzer does not do that. And it is also often ambiguous on a single path that which step introduced the problem.
Max mentioned using CSA in education. One of the problems he runs into is regarding the dependencies between checks and he would like to have a way to disable some of the sinks.There were some discussions whether it is possible to declare dependencies among checks when using plugins.
Mohannad shared more context why he is interested in source-to-source transformation.
Tim asked how CSA compares to other analyzers like CppCheck and the participants shared some experiences but it is really hard to compare analyzers objectively.
Anton asked if clang-tidy would be a good place to start working on a tool that can rewrite #pragma directives into builtin calls. We believe it would be a better place than CSA.
We discussed why CSA is not the right tool to reason about reachability. We can use ExprInspectionChecker to see what parts of the code are explored by the analyzer.
Max mentions students sometimes hitting some false positives when they implement non-trivial structures like doubly linked lists. The CSA team suggests reporting all of the false positives.
Max also was interested to learn more about the “mark interesting” features. Artem explained how it is used to construct additional notes when building bug reports.
David shared that GCC’s analyzer work on the gimple layer can piggy-back on LTO, but the representation is already somewhat lowered, which can also introduce problems obscuring what the user wrote.
David color codes the exploded graph nodes in the dumps to make it easier to identify points of interests.
We also discussed how loops are modeled (heuristics to completely unroll certain loops, and do something akin to widening in abstract interpretation). After that we briefly discussed supporting inline assembly and doing taint analysis.
Are there benchmarks for the static analyzer? We have some docker based solutions to easily run analyzer over some project, it is in the LLVM repo, utils/analyzer/SATest.py docker
.
One can also use csa-testbench from github.
Max brought up the conservative evaluation that invalidates a large part of the execution state. He advocated for emulating certain system calls more precisely to avoid excessive invalidations. Artem mentioned there are already facilities for doing that but there is always some room for improvements.
David described how GCC’s analyzer works on the high level. The transfer functions can generate both mergeable and unmergeable symbolic values to control what states can be merged and what state cannot.
Exceptions are a pain point for both analyzers at the moment. Both analyzers have context sensitive program points for doing inter-procedural analysis.
We discussed the analyzer’s inline defensive check heuristics to suppress false positives.
David explained how he got into creating a static analyzer in GCC (after being the CPython maintainer and being fed up with certain reference counting bugs in badly written CPython extensions and wanted to have an automated solution to find those errors).
In GCC, the states are mutated in place as opposed to in Clang where most data structures are immutable (that are used by CSA). Bifurcating state was not trivial with in-place mutated state.
Using gimple David feels like he side-stepped many problems, e,g, how constructors/destructors are represented is more uniform with regular functions calls. But many aggressive optimizations happen early, like some early inlining. Unfortunately, there is no easy way to get a nice, completely unoptimized IR in GCC at the moment. There is a mark-and-sweep GC in GCC that is used in various places. But the analyzer is rarely using the GC for cleaning up memory, it is mostly relying on RAII.
In CSA many of the objects are reference counted. The symbolic values in CSA are pass by value.
Symbolic values are internalized in a manager for GCC, so there is only one instance of every symbolic expression, they are deduplicated.
GCC’s analyzer looks into the memory space and if they are different, it assumes they will not alias.
We discussed how CSA’s model is bad at recovering from wrong assumptions about aliasing.
GCC’s analyzer has a way to trigger a break when reaching a certain program point marked with analyzer_break().
Ideas to deal with aliasing: