Patches inspired by the Juliet benchmark

steakhal · August 29, 2023, 3:23pm

I had the pleasure of looking at some of the FNs on the Juliet benchmark for a week.
I focused on the following CWEs in this order:

CWE-787: Out-of-bounds Write CWE-123, CWE-124, CWE-121, CWE-122
CWE-78: OS Command Injection CWE-78
CWE-125: Out-of-bounds Read CWE-126
CWE-22: Path Traversal CWE-23, CWE-36

While looking at the cases, I noticed some repeating patterns and a few easy fixes, along with some hacky and debatable workarounds for catching those FNs.

Given that I only spent a single week on this evaluation, I don’t want to share the results just yet, and the title reflects this. Rather, my intention is to raise awareness that I’d like to upstream some of the quick fixes I applied during that week so that everyone could benefit from this.

I have in total 13 commits, related to:

CStringChecker (2)
ArrayBoundsV2 (3)
GenericTaintChecker (8)

I’d like to upstream them in two batches so that I don’t need to split them from a branch and test them one by one if it’s okay.
The first batch would be for the CStringChecker and ArrayBoundsV2 together, and then the second batch would be for the taint improvements.

Here is the first batch:

(Abandoned) ~~D159105 ArrayBoundCheckerV2 should check the region for taint as well~~
Causes FPs when code uses the result of getenv, e.g. manually parses the content.
(Abandoned) ~~D159106 ArrayBoundCheckerV2 should listen to check::Bind as well~~
Would introduce FPs for zero-sized non-trivial object arrays, in which case we the engine evalBinds(Unknown) to the VarDecl inside the VisitDeclStmt handler. This issue seems to be orthogonal, but this patch would depend on fixing that first.
(Abandoned) ~~D159107 ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations~~
Without looking around in the AST, we cannot tell if the lvalue (e.g. one-past-last element, as a sentinel) is never read from (not a child of an LValueToRValue cast). Diagnosing and halting the abstract machine in such cases does more harm than good.
(Landed) D159108 CStringChecker should check the first byte of the destination of strcpy, strncpy
(Landed) D159109 CStringChecker buffer access checks should check the first bytes

Here is the first batch for improving taint analysis:

#66074: First batch of patches for the Juliet benchmark for taint improvements:
1. (Landed) Fix stdin declaration in C++ tests
2. (Landed) Make socket accept() propagate taint
3. (Landed) Propagate taint for wchar variants of some APIs
(Under review) Fix taint sink rules for exec-like functions
Rest of the patches (TBD).

To set expectations, by landing these 13 commits, we won’t improve the FN rate a lot, it would need a little bit more engineering to cover a significant portion of the FNs. Stay tuned for a follow-up post about the details of what needs to be done to improve on this - regarding the Juliet benchmark. No ETA.

steakhal · August 29, 2023, 3:25pm

I’ll post here the evaluation of the first batch, once I backported the patches to our fork, scheduled and evaluated the diff.

Xazax-hun · August 31, 2023, 4:46pm

Wow, sounds exciting! Do you think we would/should look at those benchmarks more often? Would it make sense to have some scripts in the repo that would make running it straightforward?

steakhal · August 31, 2023, 5:20pm

I don’t have a harness invoking CSA natively.
I don’t plan to implement it, thus scripting is out of scope to me.
We use CSA as a library, and the invocation is custom for our use case. Because of this, I can’t really publish the diff results because we don’t have the tooling for exporting it out of our ecosystem.
We use SARIF btw. as an output format, and we diff SARIFs directly. But SARIFs are not commonly consumed here I believe, for sharing analysis results.

steakhal · September 1, 2023, 11:20am

Actually, one could probably use scan-build or CodeChecker to get some reports (I’ve only used the latter though).
Even if that’s done, we should post the findings somewhere, and I don’t have the means. What I can say is that a lot of code gets generated, so storage might be an issue if someone wants to host it.

Maybe Ericsson folks could give it a try and post the results on their demo server. WDYT @DonatNagyE?

My experience so far is that we are not doing great, say it like that.

steakhal · September 1, 2023, 12:14pm

See the evaluation of the first batch here. FYI nothing ground breaking there.

Topic		Replies	Views
GSoC 2023 proposal - Improve and stabilize the static analyzer’s “taint analysis” checks Static Analyzer gsoc	7	921	March 10, 2023
[analyzer][taint] More precise taint modelling on arrays Static Analyzer	8	495	August 26, 2022
Clang GenericTaintChecker limitations Clang Frontend	11	110	August 19, 2016
[Clang] Improve and stabilize the static analyzer's "taint analysis" checks GSoC gsoc2023	25	2169	September 3, 2023
GenericTaintChecker - taint status examination Static Analyzer	6	440	March 9, 2023

Patches inspired by the Juliet benchmark

Related topics