Debug Info and DFSan

Hi Peter,

After discovering several bugs in ArgumentPromotion and DeadArgumentElimination where llvm::Functions were replaced with similar functions (with the same name) to transform their type in some way, I started looking at all calls to llvm::Function::takeName to see if there were any other debug info quality bugs in similar callers.

One such caller is the DataFlowSanitizer, and I don’t see any debug info tests for this so I’m wondering what /should/ happen here.

Is DFSan+DebugInfo something that matters? I assume so.

It looks like DFSan is creating wrappers (in/around DataFlowSanitizer.cpp:680-700) - when it does this, should it update the debug info for these functions? Or are these internal instrumentation functions & nothing to do with the code the user wrote? I can’t quite tell from the code.

Could you provide any C/C++ source examples whis part of DFSan fires reliably, so I could experiment with some examples and see how the debug info looks?

Thanks,

  • David

Hi Peter,

After discovering several bugs in ArgumentPromotion and
DeadArgumentElimination where llvm::Functions were replaced with similar
functions (with the same name) to transform their type in some way, I
started looking at all calls to llvm::Function::takeName to see if there
were any other debug info quality bugs in similar callers.

One such caller is the DataFlowSanitizer, and I don't see any debug info
tests for this so I'm wondering what /should/ happen here.

Is DFSan+DebugInfo something that matters? I assume so.

It may be important in the future, but at the moment the dfsan runtime library
does not make use of debug info. The debug info could still be useful for
regular debugging tasks though.

It looks like DFSan is creating wrappers (in/around
DataFlowSanitizer.cpp:680-700) - when it does this, should it update the
debug info for these functions? Or are these internal instrumentation
functions & nothing to do with the code the user wrote? I can't quite tell
from the code.

The functions created by that part of the code replace the original functions,
so they should inherit the debug info for those functions.

But the code below that can also create wrapper functions which do not need
debug info (lines 712-746). Wrappers normally show up for uninstrumented
functions (e.g. main and many libc functions).

Could you provide any C/C++ source examples whis part of DFSan fires
reliably, so I could experiment with some examples and see how the debug
info looks?

This is an example of a program that exercises the replacement function and
wrapper features.

> Hi Peter,
>
> After discovering several bugs in ArgumentPromotion and
> DeadArgumentElimination where llvm::Functions were replaced with similar
> functions (with the same name) to transform their type in some way, I
> started looking at all calls to llvm::Function::takeName to see if there
> were any other debug info quality bugs in similar callers.
>
> One such caller is the DataFlowSanitizer, and I don't see any debug info
> tests for this so I'm wondering what /should/ happen here.
>
> Is DFSan+DebugInfo something that matters? I assume so.

It may be important in the future, but at the moment the dfsan runtime
library
does not make use of debug info. The debug info could still be useful for
regular debugging tasks though.

Yeah - that'd be the baseline. I assume some people probably care about
being able to debug a dfsan binary. Not sure if your users have
specifically highlighted this situation or come across the bugs I'm
speculating about.

> It looks like DFSan is creating wrappers (in/around
> DataFlowSanitizer.cpp:680-700) - when it does this, should it update the
> debug info for these functions? Or are these internal instrumentation
> functions & nothing to do with the code the user wrote? I can't quite
tell
> from the code.

The functions created by that part of the code replace the original
functions,
so they should inherit the debug info for those functions.

Yep, that won't happen for free - we have to stitch it back into the debug
info.

But the code below that can also create wrapper functions which do not need
debug info (lines 712-746). Wrappers normally show up for uninstrumented
functions (e.g. main and many libc functions).

> Could you provide any C/C++ source examples whis part of DFSan fires
> reliably, so I could experiment with some examples and see how the debug
> info looks?

This is an example of a program that exercises the replacement function and
wrapper features.

--------------------------------------------------------------------------------
#include <stddef.h>
#include <string.h>

size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
  return strlen_ptr(str);
}

int main(void) {
  return len(strlen, "foo");
}

--------------------------------------------------------------------------------

In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its original
name (the pass treats it as an uninstrumented function), and wrappers are
created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
C runtime calls the regular 'main' function directly).

OK, so when you say wrappers are created - where does the name go? "main"
keeps the name? (it's important which way this transformation is done - if
the guts of main are moved to a new function and the name moved as well,
that's not the same as keeping the same function as far as debug info
metadata is concerned)

I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
gives a relevant stack trace:

#0 __strlen_sse2_pminub () at
../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
#1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
s_label=<optimized out>, ret_label=0x7fffffffddee)
    at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
#2 0x000055555556bbdc in dfsw$strlen ()
#3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
str=0x55555556fe17 "foo") at strlen.c:5
#4 0x000055555556bb96 in main ()

In this stack trace, #2 is the compiler-generated wrapper function for
strlen.

It looks like the debug info for 'len' is preserved correctly, but I don't
know why the debug info for 'main' is missing.

Yeah - not quite sure either. I'll poke around with it for a little bit.

> Hi Peter,
>
> After discovering several bugs in ArgumentPromotion and
> DeadArgumentElimination where llvm::Functions were replaced with similar
> functions (with the same name) to transform their type in some way, I
> started looking at all calls to llvm::Function::takeName to see if there
> were any other debug info quality bugs in similar callers.
>
> One such caller is the DataFlowSanitizer, and I don't see any debug info
> tests for this so I'm wondering what /should/ happen here.
>
> Is DFSan+DebugInfo something that matters? I assume so.

It may be important in the future, but at the moment the dfsan runtime
library
does not make use of debug info. The debug info could still be useful for
regular debugging tasks though.

Yeah - that'd be the baseline. I assume some people probably care about
being able to debug a dfsan binary. Not sure if your users have
specifically highlighted this situation or come across the bugs I'm
speculating about.

> It looks like DFSan is creating wrappers (in/around
> DataFlowSanitizer.cpp:680-700) - when it does this, should it update the
> debug info for these functions? Or are these internal instrumentation
> functions & nothing to do with the code the user wrote? I can't quite
tell
> from the code.

The functions created by that part of the code replace the original
functions,
so they should inherit the debug info for those functions.

Yep, that won't happen for free - we have to stitch it back into the debug
info.

But the code below that can also create wrapper functions which do not
need
debug info (lines 712-746). Wrappers normally show up for uninstrumented
functions (e.g. main and many libc functions).

> Could you provide any C/C++ source examples whis part of DFSan fires
> reliably, so I could experiment with some examples and see how the debug
> info looks?

This is an example of a program that exercises the replacement function
and
wrapper features.

--------------------------------------------------------------------------------
#include <stddef.h>
#include <string.h>

size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
  return strlen_ptr(str);
}

int main(void) {
  return len(strlen, "foo");
}

--------------------------------------------------------------------------------

In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
original
name (the pass treats it as an uninstrumented function), and wrappers are
created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
C runtime calls the regular 'main' function directly).

OK, so when you say wrappers are created - where does the name go? "main"
keeps the name? (it's important which way this transformation is done - if
the guts of main are moved to a new function and the name moved as well,
that's not the same as keeping the same function as far as debug info
metadata is concerned)

I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
gives a relevant stack trace:

#0 __strlen_sse2_pminub () at
../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
#1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
s_label=<optimized out>, ret_label=0x7fffffffddee)
    at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
#2 0x000055555556bbdc in dfsw$strlen ()
#3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
str=0x55555556fe17 "foo") at strlen.c:5
#4 0x000055555556bb96 in main ()

In this stack trace, #2 is the compiler-generated wrapper function for
strlen.

It looks like the debug info for 'len' is preserved correctly, but I don't
know why the debug info for 'main' is missing.

Yeah - not quite sure either. I'll poke around with it for a little bit.

DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);

replaces the debug info metadata's pointer to main with a pointer to
dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
both don

> Hi Peter,
>
> After discovering several bugs in ArgumentPromotion and
> DeadArgumentElimination where llvm::Functions were replaced with
similar
> functions (with the same name) to transform their type in some way, I
> started looking at all calls to llvm::Function::takeName to see if
there
> were any other debug info quality bugs in similar callers.
>
> One such caller is the DataFlowSanitizer, and I don't see any debug
info
> tests for this so I'm wondering what /should/ happen here.
>
> Is DFSan+DebugInfo something that matters? I assume so.

It may be important in the future, but at the moment the dfsan runtime
library
does not make use of debug info. The debug info could still be useful for
regular debugging tasks though.

Yeah - that'd be the baseline. I assume some people probably care about
being able to debug a dfsan binary. Not sure if your users have
specifically highlighted this situation or come across the bugs I'm
speculating about.

> It looks like DFSan is creating wrappers (in/around
> DataFlowSanitizer.cpp:680-700) - when it does this, should it update
the
> debug info for these functions? Or are these internal instrumentation
> functions & nothing to do with the code the user wrote? I can't quite
tell
> from the code.

The functions created by that part of the code replace the original
functions,
so they should inherit the debug info for those functions.

Yep, that won't happen for free - we have to stitch it back into the
debug info.

But the code below that can also create wrapper functions which do not
need
debug info (lines 712-746). Wrappers normally show up for uninstrumented
functions (e.g. main and many libc functions).

> Could you provide any C/C++ source examples whis part of DFSan fires
> reliably, so I could experiment with some examples and see how the
debug
> info looks?

This is an example of a program that exercises the replacement function
and
wrapper features.

--------------------------------------------------------------------------------
#include <stddef.h>
#include <string.h>

size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
  return strlen_ptr(str);
}

int main(void) {
  return len(strlen, "foo");
}

--------------------------------------------------------------------------------

In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
original
name (the pass treats it as an uninstrumented function), and wrappers are
created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
C runtime calls the regular 'main' function directly).

OK, so when you say wrappers are created - where does the name go? "main"
keeps the name? (it's important which way this transformation is done - if
the guts of main are moved to a new function and the name moved as well,
that's not the same as keeping the same function as far as debug info
metadata is concerned)

I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
gives a relevant stack trace:

#0 __strlen_sse2_pminub () at
../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
#1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
s_label=<optimized out>, ret_label=0x7fffffffddee)
    at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
#2 0x000055555556bbdc in dfsw$strlen ()
#3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
str=0x55555556fe17 "foo") at strlen.c:5
#4 0x000055555556bb96 in main ()

In this stack trace, #2 is the compiler-generated wrapper function for
strlen.

It looks like the debug info for 'len' is preserved correctly, but I
don't
know why the debug info for 'main' is missing.

Yeah - not quite sure either. I'll poke around with it for a little bit.

DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);

replaces the debug info metadata's pointer to main with a pointer to
dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
both don

... they both don't use RAUW because they have to visit each call site to
do special stuff anyway. (they're replacing one function with another of a
different type, RAUW isn't suitable)

I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and
thus the debug info doesn't describe any function at all. It's possible
that there's some other reason we don't end up describing any function at
all...

In any case if we did have "main" in the debug info describe a function, at
this point, it would be describing the dfsw$main, not main main. So we'd
need to handle remapping the debug info back to the original uninstrumented
function anyway, I assume? (that's the expected behavior? That the
uninstrumented function is the one described by debug info, not the
instrumented wrapper?)

>
>
>
>>
>>
>>
>>> > Hi Peter,
>>> >
>>> > After discovering several bugs in ArgumentPromotion and
>>> > DeadArgumentElimination where llvm::Functions were replaced with
>>> similar
>>> > functions (with the same name) to transform their type in some way, I
>>> > started looking at all calls to llvm::Function::takeName to see if
>>> there
>>> > were any other debug info quality bugs in similar callers.
>>> >
>>> > One such caller is the DataFlowSanitizer, and I don't see any debug
>>> info
>>> > tests for this so I'm wondering what /should/ happen here.
>>> >
>>> > Is DFSan+DebugInfo something that matters? I assume so.
>>>
>>> It may be important in the future, but at the moment the dfsan runtime
>>> library
>>> does not make use of debug info. The debug info could still be useful for
>>> regular debugging tasks though.
>>>
>>
>> Yeah - that'd be the baseline. I assume some people probably care about
>> being able to debug a dfsan binary. Not sure if your users have
>> specifically highlighted this situation or come across the bugs I'm
>> speculating about.
>>
>>
>>> > It looks like DFSan is creating wrappers (in/around
>>> > DataFlowSanitizer.cpp:680-700) - when it does this, should it update
>>> the
>>> > debug info for these functions? Or are these internal instrumentation
>>> > functions & nothing to do with the code the user wrote? I can't quite
>>> tell
>>> > from the code.
>>>
>>> The functions created by that part of the code replace the original
>>> functions,
>>> so they should inherit the debug info for those functions.
>>>
>>
>> Yep, that won't happen for free - we have to stitch it back into the
>> debug info.
>>
>>
>>> But the code below that can also create wrapper functions which do not
>>> need
>>> debug info (lines 712-746). Wrappers normally show up for uninstrumented
>>> functions (e.g. main and many libc functions).
>>>
>>> > Could you provide any C/C++ source examples whis part of DFSan fires
>>> > reliably, so I could experiment with some examples and see how the
>>> debug
>>> > info looks?
>>>
>>> This is an example of a program that exercises the replacement function
>>> and
>>> wrapper features.
>>>
>>>
>>> --------------------------------------------------------------------------------
>>> #include <stddef.h>
>>> #include <string.h>
>>>
>>> size_t len(size_t (*strlen_ptr)(const char *), const char *str) {
>>> return strlen_ptr(str);
>>> }
>>>
>>> int main(void) {
>>> return len(strlen, "foo");
>>> }
>>>
>>> --------------------------------------------------------------------------------
>>>
>>> In this example, 'len' is rewritten to 'dfs$len', 'main' keeps its
>>> original
>>> name (the pass treats it as an uninstrumented function), and wrappers are
>>> created for 'main' and 'strlen' (the wrapper for 'main' is unused as the
>>> C runtime calls the regular 'main' function directly).
>>>
>>
>> OK, so when you say wrappers are created - where does the name go? "main"
>> keeps the name? (it's important which way this transformation is done - if
>> the guts of main are moved to a new function and the name moved as well,
>> that's not the same as keeping the same function as far as debug info
>> metadata is concerned)

In this case the function keeps its original name and the wrapper is created
separately.

>>> I compile this with '-O0 -g'. A 'break main'/'run'/'break strlen'/'cont'
>>> gives a relevant stack trace:
>>>
>>> #0 __strlen_sse2_pminub () at
>>> ../sysdeps/x86_64/multiarch/strlen-sse2-pminub.S:33
>>> #1 0x00005555555587ff in __dfsw_strlen (s=0x55555556fe17 "foo",
>>> s_label=<optimized out>, ret_label=0x7fffffffddee)
>>> at llvm/projects/compiler-rt/lib/dfsan/dfsan_custom.cc:203
>>> #2 0x000055555556bbdc in dfsw$strlen ()
>>> #3 0x000055555556bb51 in len (strlen_ptr=0x55555556bbc0 <dfsw$strlen>,
>>> str=0x55555556fe17 "foo") at strlen.c:5
>>> #4 0x000055555556bb96 in main ()
>>>
>>> In this stack trace, #2 is the compiler-generated wrapper function for
>>> strlen.
>>>
>>> It looks like the debug info for 'len' is preserved correctly, but I
>>> don't
>>> know why the debug info for 'main' is missing.
>>>
>>
>> Yeah - not quite sure either. I'll poke around with it for a little bit.
>>
>
> DataFlowSanitizer.cpp:719: F.replaceAllUsesWith(WrappedFnCst);
>
> replaces the debug info metadata's pointer to main with a pointer to
> dfsw$main (this issue doesn't come up with DAE or ArgPromo because they
> both don
>

... they both don't use RAUW because they have to visit each call site to
do special stuff anyway. (they're replacing one function with another of a
different type, RAUW isn't suitable)

I /guess/ we never end up codegen'ing dfsw$main? (I haven't looked) and
thus the debug info doesn't describe any function at all. It's possible
that there's some other reason we don't end up describing any function at
all...

In any case if we did have "main" in the debug info describe a function, at
this point, it would be describing the dfsw$main, not main main. So we'd
need to handle remapping the debug info back to the original uninstrumented
function anyway, I assume? (that's the expected behavior? That the
uninstrumented function is the one described by debug info, not the
instrumented wrapper?)

Yes, the debug info describes 'main', not 'dfsw$main'. So I agree that
the references in the debug info would need to refer to 'main' instead of
'dfsw$main'. Not sure if there is a good way to do that. Perhaps we could
manually RAUW and skip metadata, but I'm not sure if that would work if the
debug metadata is not a direct user.

Thanks,

Here’s a basic patch which would solve it in sort of the same way as the other optimizations I was fixing (just special case the debug info & fix it up). I can work up a test case for this as well, or you can, if you like/this seems reasonable.

dfsan-debug.diff (1.59 KB)

Looks good, thanks!

Can you write the test case, please? You probably have more experience
writing debug info tests than I do.

Looks good, thanks!

Can you write the test case, please? You probably have more experience
writing debug info tests than I do.

Sure - though how would I get the pre-dfsan .ll file to produce this
behavior? I've tried compiling to a .ll file without dfsan, then feeling
that .ll through opt -dfsan, and I got different output. Instead of
wrapping main, it just replaced it. Is there some attribute or other thing
the frontend is adding to ensure main is wrapped, rather than converted?

> Looks good, thanks!
>
> Can you write the test case, please? You probably have more experience
> writing debug info tests than I do.
>

Sure - though how would I get the pre-dfsan .ll file to produce this
behavior? I've tried compiling to a .ll file without dfsan, then feeling
that .ll through opt -dfsan, and I got different output. Instead of
wrapping main, it just replaced it. Is there some attribute or other thing
the frontend is adding to ensure main is wrapped, rather than converted?

The pass takes an ABI list file, which in this case needs to contain the
following two entries in order to tell the pass that main is uninstrumented:

fun:main=uninstrumented
fun:main=discard

Please take a look at the test/Instrumentation/DataFlowSanitizer/abilist.ll
test case for an example of how to specify an ABI list (you only need the
-dfsan-abilist flag).

Peter

>
> > Looks good, thanks!
> >
> > Can you write the test case, please? You probably have more experience
> > writing debug info tests than I do.
> >
>
> Sure - though how would I get the pre-dfsan .ll file to produce this
> behavior? I've tried compiling to a .ll file without dfsan, then feeling
> that .ll through opt -dfsan, and I got different output. Instead of
> wrapping main, it just replaced it. Is there some attribute or other
thing
> the frontend is adding to ensure main is wrapped, rather than converted?

The pass takes an ABI list file, which in this case needs to contain the
following two entries in order to tell the pass that main is
uninstrumented:

fun:main=uninstrumented
fun:main=discard

Please take a look at the test/Instrumentation/DataFlowSanitizer/abilist.ll
test case for an example of how to specify an ABI list (you only need the
-dfsan-abilist flag).

Lovely. Committed in r219249.

Thanks!