I think you folks brought up a fairly fascinating subject here.
My biggest criticism of the fundamental design behind the static analyzer is that it performs too many âstate splitsâ. Basically every time it simulates an if-statement with unknown condition, it âforksâ the analysis and considers the remaining code twice: once under the assumption that the true branch is taken, once under the assumption that the false branch is taken. Then once it finds a bug, the tree of these decisions manifests in âAssumingâŚâ path note bubbles.
This approach is great because it simplifies implementation of static analysis dramatically, allowing us to have a hundred checks that were very easy to develop on top of fascinatingly simple and straightforward API, and these checks do find a lot of bugs, where many other similar tools give up. But itâs also fundamentally incorrect because not all execution paths in this exponentially growing tree actually make sense given the readerâs domain-specific knowledge about the program. Depending on the nature of unknown values, such domain-specific knowledge may range from âWell I agree that itâs not exactly obvious that this canât happen, an assertion would indeed be a great way to document thisâ to âItâs painfully obvious that this situation is unimaginable, literally nobody writes code expecting this to be possibleâ. I even go as far as argue that if it wasnât for these overly eager state splits, weâd have easily reached zero false positives, no matter how conservatively-approximately the rest of our engine understands the code. This is literally the root cause of all our problems.
This problem also starts hitting even harder when we âstep intoâ a function call and find, say, a defensive check there, followed by an early return. We canât continue until we fork our analysis over considering whether the check is hit. But the check isnât necessarily written for this caller, it could be defending against other, completely unrelated call sites. We have a number of safeguards against that, so-called âinlined defensive check suppressionsâ, where a bug report is thrown away when its critical bug condition can be tracked back to an assumption made for the purpose of forking over such inlined defensive check.
Very deeply related to that, we typically blindly trust asserts. To us an assert is a hint that we probably ran into this code path because we did way more state splits than we could justify. Asserts act as an escape hatch for the users to suppress our false positives, while hopefully also improving the code and making its underlying contracts more apparent to the human reader, not just the tool. In this sense, to develop a checker that emits a warning every time an assert is violated, is technically possible but in practice itâs a terrible idea. Itâd pretty much warn on every assert, because âAssuming the assert condition is false, assert violation handler will be triggeredâ is a technically correct statement.
Now, as you folks correctly pointed out, when taint comes into play, everything changes completely.
Itâs suddenly perfectly fine to fork analysis every time the unknown condition value is tainted. Indeed, since the value is attacker-controlled, we quite literally have a perfectly correct proof that both branches of the if-statement are truly reachable, regardless of any prior path conditions or assumptions. Weâre no longer worried that weâre exploring too many paths; in fact, we know for sure we arenât. We no longer need to have inlined defensive check suppressions; if the checked condition is tainted, we know that both branches could be taken right here right now, no matter how deep we are into the call stack. We no longer need to trust assertions; instead, we can emit a warning every time a taint-based assumption leads to assertion failure, because this means that weâve found a perfectly real input to the program that causes the assertion to fail.
Obviously this is very hand-wavy and I need to think more about this, for properly formalizing it. But I think this is a fascinating conversation that we need to have more of as we consider stabilizing taint analysis for everyday use.