As promised in my previous RFC to come back to the last remaining running time regression, I’m opening the discussion here.
After the clang-18 release, when we did the benchmarks we realized some serious slowdowns. Most of them are fixed now/will be fixed if my proposed patches land from my previous RFC.
Assuming this, there is only one critical slowdown that I know of. This is reproducible on FFmpeg sheervideo.c. In this particular case, I can trace back the problem to having much more complicated symbols - of complexity 30, 31 at times. This acts poorly with the current isTainted() APIs, which in turn recursively traverses a lot of nodes of such complicated SymExprs.
This is likely due to BinarySymExpr::computeComplexity, but I’ll leave this intentionally vague as I have not gathered indisputable evidence.
This was reported here.
Here are some numbers for the time spent inside the static analyzer:
- clang-17 variant of OOBv2: 33 seconds
- clang-18 variant of OOBv2 &
MaxSymbolComplexity=10: 33 seconds (on par with baseline) - clang-18 variant of OOBv2: 927 seconds (28x baseline)
Here are the corresponding flame graphs:
clang-17 variant of OOBv2:
clang-18 variant of OOBv2 & MaxSymbolComplexity=10
clang-18 variant of OOBv2:
These numbers suggest to me that the ballooned symbol complexity poses some challenges as it’s present on trunk currently, and we can hit poor performance characteristics under certain workloads.
I can see some ways mitigating this:
- Limit the
MaxSymbolComplexity. (Easy, quick and dirty. Doesn’t really solve the issue) - Apply some caching or other means of performance optimizations to
isTainted(). (One needs to be really careful not using local statics, but rather properly lifetime-bound caches.) - Find and roll-back the hunk which causes the appearance of such complicated symbols. (This doesn’t really solve the issue)
To close this, I’d invite you to discuss this.
@DonatNagyE @Xazax-hun @NoQ @Szelethus
Personally, I’d probably perfer option no. 2, but I’m also open for option no. 1 assuming it does not degrade the quality of the analysis.


