Symbolic value assumption for some libc function

Hi, I was exploring the ArrayboundChecker and TaintPropagation. I tried to make the return value of read() system call a taint source. I have seen that you already defined it in the GenericTaintChecker.cpp. And I found that the checker will not raise an error for the following code even I replace .Case(“read”, TaintPropagationRule({0, 2}, {1, ReturnValueIndex})) with .Case(“read”, TaintPropagationRule({}, {ReturnValueIndex})):

char buf[20];
int ret = read(0, buf, 3);
buf[ret] = 0; // expect to get warning: index is tainted

I think it is because the tool makes the assumption that the return is less than or equal to 3 after read() syscall somewhere. But I have a hard time to locate the code which handle this specific case. Could you give me some suggestion about which files should I look into in order to turn off the assumption on read() return values?

Thank you!
Regards,
Gavin

Never mind, I have found them in StdLibraryFunctionsChecker. Sorry for disturbing you with stupid question.
By the way, I notice that the Clang Static Analyzer currently dose not support analysis across translational unit for scalability concern. Do you have any suggested direction if I really want to do the taint tracking across files?

Thank you,

Sincerely,
Gavin

Yup, you're correct, the analyzer knows that read() doesn't return an arbitrary integer. If your `buf` was only, say, 2 characters long, you should have received the warning that the attacker can trigger a buffer overflow by forging a successful read of 3 bytes.

You've correctly pin-pointed the entity that's responsible for that. The easiest way to figure this stuff out is to pay attention to checker tags in the Exploded Graph dump (https://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing).

The reason for not having cross-translation-unit analysis is mostly because it's technically annoying to pass data between multiple clang processes. But, yeah, also scalability. There's an experimental attempt to set up the infrastructure that might help you - see `-analyzer-config experimental-enable-naive-ctu-analysis=true`.

You might be interested in https://reviews.llvm.org/D59516 - it's a series of patches on review that adds support for loading taint propagation rules from yaml files (uhm, i also need to take a look at that). If you happen to develop something that auto-generates such yaml files and turn it into some sort of pre-analysis pass across the whole project, you'll essentially develop a certain kind of summary-based cross translation unit analysis for the taint checker. But if you dive into that, please also consider discussing how to make it re-usable enough for other checkers to use :slight_smile:

Thank you so much for the information. Automatically generating those yaml configuration files can help checker understand how the taint propagate after calling a function written in another file.