I'm new to llvm passes. I wonder if I can use the pass to dynamically analyze a program.
As Bekket points out, the way to use DataFlowSanitizer is to modify
your program to call functions in sanitizer/dfsan_interface.h .
Let me explain what I want to do in the following example .
//"Result" is an object that doesn't have a fixed length like int and float.
DataflowSanitizer tracks the bytes of memory that store a variable.
"Result" may not have a fixed length at compile time, but at runtime
it presumably has a fixed set of bytes that hold its contents. If you
can find those bytes, you can read the label.
//The program is as follows.
Result retrieve_data_fun (){
//the function retrieves some sensitive data from files/DB
return result_info;
}
void main(){
...
Result var1, var2, var3l
var1 = retrieve_data_fun();
...
Result var2 = retrieve_data_fun();
printf (var1);
var3 = var2+'xxx';
print (var3);
}
My goal is to track all the returned data from "retrieve_data_fun" and monitor actions on them.
You would need to test for the label on each action.
So, whenever the data is used (e.g., printed) , I want to detect that; maybe by printing a statement or anything.
Could you please help me to do that ?
Here is a sketch of how this might be done:
#include <sanitizer/dfsan_interface.h>
static dfsan_label sensitive_data_label = dfsan_create_label("Data
returned by retrieve_data_fun()", NULL);
// The function retrieves some sensitive data from files or a DB.
// Users should call retrieve_data_fun() instead of this
// implementation, to ensure results are labeled.
Result retrieve_data_fun_impl () {
...
}
Result retrieve_data_fun() {
// All the real work is done in retrieve_data_fun_impl(). This
// function calls it, labels the result, and returns it.
// This style ensures that all return paths in
// retrieve_data_fun_impl() have the label applied to the result.
Result = retrieve_data_fun_impl();
dfsan_set_label<Result>(sensitive_data_label, Result);
return Result;
}
bool IsResultSensitive(const Result& result) {
// There are types for which sizeof(result) does not give the
// bytes needed. For example, if Result is a vector,
// you may need to iterate over the elements in the vector, and
// read the labels of the elements.
dfsan_label label = dfsan_read_label(&result, sizeof(result));
return (dfsan_has_label(label, sensitive_data_label);
}
bool isIntSensitive(int i) {
dfsan_label label = dfsan_read_label(&i, sizeof(i));
return (dfsan_has_label(label, sensitive_data_label);
}
void main() {
// 'sensitive_result' is sensitive. It was labeled as sensitive
// by 'retrieve_data_fun()'.
Result sensitive_result = retrieve_data_fun();
// 'not_a_sensitive_result' is built without reading any sensitive
// data. It is not labeled.
Result not_a_sensitive_result = Result();
// Should print "1".
std::cout << "Is 'sensitive_result' sensitive? "
<< IsResultSensitive(sensitive_result) << std::endl;
// Should print "0".
std::cout << "Is 'not_a_sensitive_result' sensitive? "
<< IsResultSensitive(not_a_sensitive_result) << std::endl;
int derived_from_result = sensitive_result.getIntFromResult(); //
Assume this reads bytes from the result.
// Should print "1".
std::cout << "Is 'derived_from_result' sensitive? " <<
IsIntSensitive(derived_from_result) << std::endl;
// This will fail. See the definition of TakeSomeActionOnResult().
TakeSomeActionOnResult(sensitive_result);
// This will print "REDACTED" because of the operator<< method below.
std::cout << "Sensitive result is "<< sensitive_result << std::endl;
}
bool TakeSomeActionOnResult(const Result& result) {
// Guard against taking this action on sensitive data.
if (IsResultSensitive(result)) {
std::cerr << "WARNING: Tried to call TakeSomeActionOnResult() on "
<< "a sensitive result. This is not allowed!";
return false;
}
...
return true;
}
// Override the << operator to not print sensitive data.
ostream& operator<<(ostream& os, const Result& result)
{
if (IsResultSensitive(result)) {
os << "REDACTED";
} else {
// This result is not sensitive, construct a string form.
...
}
return os;
}