Static Analysis Roundtable Notes from EuroLLVM 2025

We had a round table on static analysis at EuroLLVM this year. I wanted to share my notes here. Feel free to make corrections, comments, or add additional notes in replies.

Lifetimes

First, we discussed a flow sensitive analysis to find bugs using [[clang::lifetimebound]] and [[clang::lifetime_capture_by]] annotations. Google’s Clang team has an early prototype, they plan to start upstream it early in the development and do most of the development upstream. They use the Clang Cfg but do not use Clang’s new dataflow analysis framework. The main motivation is that the framework is using SAT solvers and have some performance cliffs. They want the new flow sensitive analysis to be fast enough so that it can be a regular compiler diagnostics and potentially even could be on by default (some flow sensitive checks like uninitialised variable analysis is on by default at Google). Some round table participants expressed some concerns around the current status of the Clang CFG, e.g., handling exception paths, but none of these seemed to be serious enough to use a different approach. ClangIR is too early, and Google want’s to deploy this analysis widely sooner than ClangIR would be mature enough for this use case.

The analysis would do a very limited form of local alias analysis. It is intra-procedural. The main goal is to have very few false positives (this is a corollary of the semantics of the existing lifetime annotations). The main motivation for the dataflow analysis is the significant reduction in crashes within Google after deploying the lifetimebound annotations along with the existing statement local analysis. Hopefully, we get an RFC soon with the details about this analysis, and a blog post about the experience deploying lifetime annotations at Google.

The same analysis will probably be capable of doing some annotation inference, injecting new lifetimebound annotations. Identifying erroneous annotations was not yet considered as a use case. The plan is to write the analysis in a way such that most of the code can be reused for named lifetimes later on when they are introduced. The analysis could be region-based, similar to what Rust is doing.

Since this analysis will have low false positive rate, the only risk is it not finding enough bugs to carry its weight. If that happens, it could be removed from upstream.

Summaries

For more context, I recommend checking out Devin’s keynote from EuroLLVM 2025 once the recording is available. The proposal is:

  • Relatively simple flow-insensitive and context-insensitive summaries (like do not read/write globals, noescape), similar to what Attributor is doing.
  • We need a general infrastructure similar to MapReduce that would produce these summaries for all functions as a first pass and do fixed-point iteration in a second pass. A bit like thinLTO.
  • Ideally, the summaries could be consumed by many different components of Clang: Tidy checks, compiler warnings, clang static analyzer, maybe even codegen. Once summaries are available they would be more powerful.
  • Devin believes we need to have this framework in the front end because some of the properties we want to infer are language specific. Especially, if we want to use these summaries for propagating/inferring annotations for safety and language interoperability.
  • Inferring these annotations is inherently a whole program analysis problem this is why we need summaries.

The Clang Static Analyser has an existing cross translation unit analysis support (based on the ASTImporter) but it does not scale to large projects.

If the summaries are simple enough, we should not run into problems generating user-friendly diagnostics when applying them.

These summaries encode all-paths properties, the clang static analyzer is not suitable to infer them. We would probably have independent analyses producing these and the symbolic execution in the clang static analyzer would benefit from consuming this all path information.

A question is how to make this work in the actual builds. We could either try to create the UI in a way that looks like LTO so existing build systems work, or we could try to make it work based on compilation databases, or we could talk to the build system people to have proper support. Plugging into the existing LTO-like model might not work well as most static analysis tools like tidy would not produce any files.

We shortly touched on some existing inefficiencies with how some existing clang tools deal with modules, deserialising code too eagerly, which takes up too much time for no reason.

Sonar is also looking at making Cross Translation Unit analysis more scalable internally.
Summaries vs inlining is a tradeoff, summaries are more scalable but lose more information, depending on what we do different approaches might be the best.

Could we reuse some of the attributes that Attributor can already infer? Somehow consuming those in the frontend?

We talked about ClangIR as a potential alternative to the ASTImporter based cross translation unit analysis, how it has back references to the AST. Our understanding was that those back references are optional and the AST is not getting serialised alongside ClangIR, so the deserialised ClangIR might not have these back references.

Fixed-point iteration across TUs is likely to be really expensive. To make this work we either need to focus on analysis that are guaranteed to require low number of iterations to converge (like 2), or the fixed point iteration needs to happen very deep in the reduction part where we already abstracted most of the details away.

Google has a global call graph analysis that is used to verify properties like certain banned functions are not called in certain contexts.

Enhance Clang Static Analysis tools with Machine Learning

The Clang Static Analyzer has many cut heuristics for the analysis that are arbitrary. It would be nice to be able to learn the best cut heuristics using ML, but it is expensive to evaluate if the quality of the analysis became better. Often requires manual labor. We need ideas to make this automatic to be able to leverage ML:

  • We could mine historical fixes from software repositories and try to correlate them with the analysis result on old version of software without the fixes.
  • Somehow collect data on existing diagnostics. E.g., if CI bot comments analysis results, users could upvote/downvote diagnostics, we could feed that data back into some ML models.
  • Google has data like this internally, but they cannot share it.
  • It would also be hard to handle data like this within the LLVM org for legal reasons

These cut heuristics balance the time between analysis time and accuracy.
It would be nice if we could just say there is a time budget (e.g, 12 hours) and the analyzer would just spend that amount of time analysing, find as many bugs as possible.
To make this work we need to find a proxy for progress that is more accurate then time because we don’t want to have flaky results.

Another problem is bug ranking. Tools like tidy or the clang static analyzer might produce a large number of warnings when a new project adopt them. A single, low quality warning could give the wrong impression and reduce the user trust in the tool. It would be valuable to help out the users to start with looking at the most impactful bugs:

  • We could try using ML or LLMs to rank bugs according to severity
  • We could try compute confidence levels using LLMs
  • Could we make the analyzer emit additional context that is not entirely human readable or too much information for a human but would aid LLMs? They could aid ranking, computation of confidence/severity, or aid explanation why a certain diagnostic was raised.

The clang static analyzer can emit really long error traces. Sometimes that includes parts that are not relevant to the original bug. Could we ask an LLM to hide the irrelevant parts?

Someone mentioned there are experiments to make LLMs do code changes based on optimisation remarks to improve performance. Could we do something similar with static analysis warnings that do not have fixits?

5 Likes

Sounds like it was a lot of fun, thanks for sharing!

The understanding is right, just want to mention that it’s straightforward to attach AST nodes to operations/types if/when needed. For the serialization part it will probably be a longer route, since we haven’t yet taught CIR to use MLIR’s serialization, but for AST itself the plan is to reuse Clang’s existing AST artifacts mechanism and find a way in CIR to refer to them (not sure how hard this part actually should be).

Thank you for the post. This is very informative. Is it possible to use simple but precise algorithms to run the analysis and then finetune the analysis later?

Having precise semantics for the analysis helps make it easy to understand. A light weight analysis with precise semantics could be run first. That could be followed by a heuristics based analysis that does a deep dive into the bugs found by the precise analysis.