"compiler-rt" - DataFlowSanitizer

Hi all,

I have some questions about “DataFlowSanitizer” from “compiler-rt”.
I want to know how I can test the “DataFlowSanitizer”?
Can I configure it to label only some values, i.e, the return values from specific functions?
Also, how can I print these labels?

Thanks,

Dareen

Hi all,

I have some questions about "DataFlowSanitizer" from "compiler-rt".
I want to know how I can test the "DataFlowSanitizer"?

This document is a good reference for DataFlowSanitizer:
  https://clang.llvm.org/docs/DataFlowSanitizer.html

Can I configure it to label only some values,

The section named "Example" in the document above shows a simple
program that sets and tests for labels.

dfsan_create_label() creates a label.

dfsan_set_label() applies a label to the memory holding a variable.

i.e, the return values from specific functions?

To label the return value of a function, add a call to
dfsan_set_label() on the return value of the function:

  // Outside the function:
  dfsan_label return_label = dfsan_create_label("return_label", 0);

  // An example function:
  int MyFunction(int a, int b) {
    ...
    int result = ...;

    // Set a label on the returned value:
    dfsan_set_label(return_label, &result, sizeof(result));

    return result;
  }

Also, how can I print these labels?

To discover the label on a variable, you can test for it and print the result:

  int var = ...;

  // Does 'var' have label 'return_label'?
  dfsan_label var_label = dfsan_get_label(var);
  if (dfsan_has_label(var_label, return_label)) {
    printf("'var' has the label ''return_label");
  }

To see the state of all labels at the time the program exits set, set
the shell variable DFSAN_OPTIONS to "dump_labels_at_exit=<file path>".
For example, suppose the example program in the document is in a file
named "dfsan.c". Here are the commands I ran to see the state of all
labels when it exits:

  # Compile dfsan.c into a binary named "dfsan":
  $ clang -g -fsanitize=dataflow dfsan.c -o dfsan

  # Run it. There is no output because all assertions pass:
  $ ./dfsan

  # Run it again with shell variable DFSAN_OPTIONS set to export label
state to standard out on exit:
  $ env DFSAN_OPTIONS=dump_labels_at_exit=/dev/stdout ./dfsan
  ==21994==INFO: DataFlowSanitizer: dumping labels to /dev/stdout
  1 0 0 i
  2 0 0 j
  3 0 0 k
  4 1 2
  5 3 4

If you tell us more about what you are trying to accomplish with
DataFlowSanitizer, we may be able to give more specific advice.

I’m new to llvm passes. I wonder if I can use the pass to dynamically analyze a program.

Let me explain what I want to do in the following example .

//“Result” is an object that doesn’t have a fixed length like int and float.

//The program is as follows.

Result retrieve_data_fun (){

//the function retrieves some sensitive data from files/DB

return result_info;

}

void main(){

Result var1, var2, var3l

var1 = retrieve_data_fun();

Result var2 = retrieve_data_fun();

printf (var1);

var3 = var2+‘xxx’;

print (var3);

}

My goal is to track all the returned data from “retrieve_data_fun” and monitor actions on them.

So, whenever the data is used (e.g., printed) , I want to detect that; maybe by printing a statement or anything.

Could you please help me to do that ?

Thanks,
Daren

I haven’t used DFS yet, but I don’t think you need to write a pass.

You just need to include <sanitizer/dfsan_interface.h> and insert dfsan_* functions in the source code you want to inspect, following instructions in https://clang.llvm.org/docs/DataFlowSanitizer.html or Sam’s instructions in previous email of this thread.

Bekket

I'm new to llvm passes. I wonder if I can use the pass to dynamically analyze a program.

As Bekket points out, the way to use DataFlowSanitizer is to modify
your program to call functions in sanitizer/dfsan_interface.h .

Let me explain what I want to do in the following example .

//"Result" is an object that doesn't have a fixed length like int and float.

DataflowSanitizer tracks the bytes of memory that store a variable.
"Result" may not have a fixed length at compile time, but at runtime
it presumably has a fixed set of bytes that hold its contents. If you
can find those bytes, you can read the label.

//The program is as follows.
Result retrieve_data_fun (){
//the function retrieves some sensitive data from files/DB

return result_info;
}

void main(){
...
Result var1, var2, var3l
var1 = retrieve_data_fun();
...
Result var2 = retrieve_data_fun();

printf (var1);
var3 = var2+'xxx';
print (var3);

}

My goal is to track all the returned data from "retrieve_data_fun" and monitor actions on them.

You would need to test for the label on each action.

So, whenever the data is used (e.g., printed) , I want to detect that; maybe by printing a statement or anything.
Could you please help me to do that ?

Here is a sketch of how this might be done:

  #include <sanitizer/dfsan_interface.h>

  static dfsan_label sensitive_data_label = dfsan_create_label("Data
returned by retrieve_data_fun()", NULL);

  // The function retrieves some sensitive data from files or a DB.
  // Users should call retrieve_data_fun() instead of this
  // implementation, to ensure results are labeled.
  Result retrieve_data_fun_impl () {
     ...
  }

  Result retrieve_data_fun() {
    // All the real work is done in retrieve_data_fun_impl(). This
    // function calls it, labels the result, and returns it.
    // This style ensures that all return paths in
    // retrieve_data_fun_impl() have the label applied to the result.
    Result = retrieve_data_fun_impl();
    dfsan_set_label<Result>(sensitive_data_label, Result);
    return Result;
  }

  bool IsResultSensitive(const Result& result) {
    // There are types for which sizeof(result) does not give the
    // bytes needed. For example, if Result is a vector,
    // you may need to iterate over the elements in the vector, and
    // read the labels of the elements.
    dfsan_label label = dfsan_read_label(&result, sizeof(result));
    return (dfsan_has_label(label, sensitive_data_label);
  }

  bool isIntSensitive(int i) {
    dfsan_label label = dfsan_read_label(&i, sizeof(i));
    return (dfsan_has_label(label, sensitive_data_label);
  }

  void main() {
    // 'sensitive_result' is sensitive. It was labeled as sensitive
    // by 'retrieve_data_fun()'.
    Result sensitive_result = retrieve_data_fun();

    // 'not_a_sensitive_result' is built without reading any sensitive
    // data. It is not labeled.
    Result not_a_sensitive_result = Result();

    // Should print "1".
    std::cout << "Is 'sensitive_result' sensitive? "
                   << IsResultSensitive(sensitive_result) << std::endl;

    // Should print "0".
    std::cout << "Is 'not_a_sensitive_result' sensitive? "
                   << IsResultSensitive(not_a_sensitive_result) << std::endl;

    int derived_from_result = sensitive_result.getIntFromResult(); //
Assume this reads bytes from the result.

    // Should print "1".
    std::cout << "Is 'derived_from_result' sensitive? " <<
IsIntSensitive(derived_from_result) << std::endl;

    // This will fail. See the definition of TakeSomeActionOnResult().
    TakeSomeActionOnResult(sensitive_result);

    // This will print "REDACTED" because of the operator<< method below.
    std::cout << "Sensitive result is "<< sensitive_result << std::endl;
  }

  bool TakeSomeActionOnResult(const Result& result) {
    // Guard against taking this action on sensitive data.
    if (IsResultSensitive(result)) {
      std::cerr << "WARNING: Tried to call TakeSomeActionOnResult() on "
                    << "a sensitive result. This is not allowed!";
      return false;
    }
    ...
    return true;
  }

  // Override the << operator to not print sensitive data.
  ostream& operator<<(ostream& os, const Result& result)
  {
    if (IsResultSensitive(result)) {
      os << "REDACTED";
    } else {
      // This result is not sensitive, construct a string form.
      ...
    }
    return os;
  }