This is definitely theoretically possible but I wouldn’t recommend it.
Being a moderately “aggressive” bug-finding tool, no matter how sophisticated the technique is, the static analyzer generally relies on code “making sense” in general, so it can focus on finding code that “doesn’t make sense”. One example of such behavior is relying on absence of dead code in the program. Eg., in the toy example
01 int foo(int *x) {
02 if (x == nullptr) { /*...*/ }
03 return *x;
04 }
the code does not make sense because the null check for x
co-exists with an unchecked use of x
. The static analyzer emits a warning: “Assuming x
is null on line 02, there’s null pointer dereference on line 03”. But another possibility is that the check on line 02 is redundant because nobody ever passes nullptr
into the function. This makes the warning a “vacuous truth” at best, but from static analyzer’s perspective it’s a victory nonetheless, because one way or another, the code doesn’t make sense. Like, if it can’t be null, why check? And if it can be null, why use unchecked?
Now, if you feed the static analyzer machine-generated code, the assumption that “the code typically makes sense” becomes invalid. Machine-generated code often fails to make sense and it’s ok, nobody expects it to make sense to begin with, they simply expect it to have the desired properties. So I suspect that you’ll have a massive amount of false positives. Unless the translator produces code very similar to the original code, or otherwise code of very high quality, the static analyzer would needlessly freak out about every redundant operation or defensive check inserted by the machine to make sure the code behaves correctly in the most ridiculous cases.
On top of that, some information that’s available in the original code will be lost in translation, as higher-level constructs of languages like Java will have to be modeled with lower-level constructs of C/C++. This further disconnects the generated source code from the original programmer’s intent, leading to more code that doesn’t necessarily make sense after translation.
Not only this is about the static analyzer understanding the programmer’s intent, but also about the static analyzer being able to explain the bug to the developer in the developer’s own language. The static analyzer doesn’t simply emit one-line warnings, instead it explains execution paths on which problematic things happen, often leading to dozens of notes attached to every warning. If a source-to-source transpiler is used, these notes will need to be translated back to the original language, something that the static analyzer can’t do on its own. Without these notes it would be virtually impossible to understand any of the warnings.
So I think this is going to be bad for the same reason why the static analyzer was built over Clang AST as opposed to, say, LLVM IR. You can think of translation to LLVM IR as if it’s just another source-to-source transpiler, with all the same downsides: original intent often lost in translation, some machine-generated code not necessarily making sense, and explaining the problem in programmer’s preferred terms becoming extremely cumbersome.
Can you build static analysis directly on top of LLVM IR? Absolutely. It’s probably much easier to achieve as well, given that LLVM IR is much simpler than C++. But it’s going to be a very different beast, much less user-friendly, and often less impactful on overall software quality. But with these drawbacks taken into account, such LLVM IR-based static analyzer would be a much better candidate for your experiment, as it wouldn’t rely on an intimate connection to the original source code as much as our static analyzer does.