clang dataflow sanitizer vs. shared objects

I’m running into some pain with dataflow sanitizer, and I’m wondering anyone’s found a good work-around.

I’m trying to analyze a code base which delegates a lot of functionality to shared objects. The application looks up functions within the .so using hard-coded function names. For example, “dlsym(…, “foo” );”

Unfortunately for me, the dataflow sanitizer prepends “dfsw$” to the name of any function compiled with the dataflow sanitizer enabled. So if function “foo” was compiled with the sanitizer enabled, I’d need to change these dlsym invocations to something like “dlsym(…, “dfsw$foo”);”

For now, I’m just black-listing ( via -fsanitize-blacklist ) every function that’s exported by one of the application’s shared objects. This addresses the symbol lookup problem, but it means my dataflow labels are lost on data transmitted through these black-listed functions.

Does anyone know of a good workaround to this problem, and/or what a longer-term solution might look like?


I'm not aware of a good solution to this problem at the moment. One possibility
is to write a custom wrapper for the dlsym function that tries the symbol
name both with and without the "dfs$" prefix, but this would potentially
allow uninstrumented function pointers to leak into the program.


Using --defsym to create aliases during linking is another option. Perhaps DFSan should do this itself?


Yury, thanks, that’s an interesting idea.

Speaking only for myself, I think the ideal behavior would be the following, assuming there’s a reasonable way to implement it:

(1) Libraries (both static and dynamic) provide both a normal and dfs-enabled version of each exported function.

(2) The compiler and linker work together so that a function call site calls the dfs-enabled version of the target function if and only if the caller was also compiled with dfs.

I realize this is non-trivial for a few reasons, I’m just thinking about what my ideal endpoint would be.

Following up on this thread: Found that Peter’s instructions to get dfsan support in libcxx/libcxxabi [1] does not work on latest upstream commits (LLVM: 1f22900; Clang: 3457cd5; compiler-rt: 7bbc72c; libcxx: da1818a; libcxxabi: 75a7bf6).

I have attached the stack trace with diagnostics. Basically, an assertion fails here:

lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:1404: void {anonymous}::DFSanVisitor::visitCallSite(llvm::CallSite): Assertion `!(cast( CS.getCalledValue()->getType()->getPointerElementType())->isVarArg() && dyn_cast(CS.getInstruction()))’ failed.

Pointers are much appreciated.



