My firmware usage scenario is quite sensitive to the code size. Does the Ubsan have the build option to use callbacks instead of inline code for the Ubsan instrumentations (like Asan’s -asan-instrumentation-with-call-threshold=0) ?
Most of UBSan runs in the late parts of Clang, see the lib/CodeGen directory.
2. My firmware usage scenario is quite sensitive to the code size. Does the Ubsan have the build option to use callbacks instead of inline code for the Ubsan instrumentations (like Asan's -asan-instrumentation-with-call-threshold=0) ?
I think your best bet for controlling code bloat is to compile with -fsanitize=undefined -fsanitize-trap=undefined.
> I think your best bet for controlling code bloat is to compile with
> -fsanitize=undefined -fsanitize-trap=undefined.
Also you may not need all of UBSan's checks at the same time -- so pick
and choose among its checks using the finer-grained flags.
If you're really stuck against a hard limit on code size, try applying
UBSan to a subset of files in your project at a time.
Hi John,
Thank your suggestion. I like the trap-funcion way. With the compile options "-fsanitize=undefined -fsanitize-trap=undefined -ftrap-function=__my_trap_function", my firmware can save +40% code size. It is great!
But I have another question about the trap-function. I hope to print the wrong code running IP address in my trap function (then, I can use llvm-symbolizer to know its code location), and I also hope to print undefined behavior specific type info, e.g. add_overflow, type_mismatch_v1, etc., which are defined in compiler-rt\lib\ubsan\ubsan_interface.inc. How should I correctly define the __my_sanitizer_trap() and let clang/llvm input relative info into my trap function?
But I have another question about the trap-function. I hope to print the wrong code running IP address in my trap function (then, I can use llvm-symbolizer to know its code location), and I also hope to print undefined behavior specific type info, e.g. add_overflow, type_mismatch_v1, etc., which are defined in compiler-rt\lib\ubsan\ubsan_interface.inc. How should I correctly define the __my_sanitizer_trap() and let clang/llvm input relative info into my trap function?
IIRC nothing gets pushed onto the stack before your custom trap function is called. A bit of Clang hacking will be required if you want to alter this behavior.
I think your best bet for controlling code bloat is to compile with
-fsanitize=undefined -fsanitize-trap=undefined.
Also you may not need all of UBSan's checks at the same time -- so pick
and choose among its checks using the finer-grained flags.
If you're really stuck against a hard limit on code size, try applying
UBSan to a subset of files in your project at a time.
Hi John,
Thank your suggestion. I like the trap-funcion way. With the compile options "-fsanitize=undefined -fsanitize-trap=undefined -ftrap-function=__my_trap_function", my firmware can save +40% code size. It is great!
You might try compiling with '-fsanitize=undefined -fno-sanitize-recover=undefined'. You should still see some code size savings with this option. You'll need to link in the ubsan runtime when compiling in no-recovery mode, but the diagnostics will be better.
Ubsan doesn't provide an option of using callbacks to implement its instrumentation. If the no-recovery mode won't work for you, it's pretty simply to write a custom ubsan runtime that fits in a single object file. That's what I ended up doing to sanitize our kernel (xnu), so I can offer help if you decide to go down that path.
You might try compiling with '-fsanitize=undefined -fno-sanitize-
Ubsan doesn’t provide an option of using callbacks to implement its
instrumentation. If the no-recovery mode won’t work for you, it’s pretty
simply to write a custom ubsan runtime that fits in a single object file. That’s
what I ended up doing to sanitize our kernel (xnu), so I can offer help if you
decide to go down that path.
Vedant,
Thank you. I’m OK to write customized runtime libs for Ubsan. In fact, I did in this way when I enable the Asan in my firmware. My problem is how to correctly implement the UBsan C++ runtime lib with pure C functions in my firmware. You know, the UBsan define the runtime interface in C++, which is different from the Asan extern “C” ones. Many UBsan RT lib input parameters type are mixed with C++ Class. I’m not sure how to correctly parse the C++ class with C structure. E.g. Many UBsan RT functions use the Class SourceLocation to pass the source location info. How should I parse or map the class SourceLocation layout to a C structure?
> You might try compiling with '-fsanitize=undefined -fno-sanitize-
> Ubsan doesn't provide an option of using callbacks to implement its
> instrumentation. If the no-recovery mode won't work for you, it's pretty
> simply to write a custom ubsan runtime that fits in a single object file. That's
> what I ended up doing to sanitize our kernel (xnu), so I can offer help if you
> decide to go down that path.
Vedant,
Thank you. I'm OK to write customized runtime libs for Ubsan. In fact, I did in this way when I enable the Asan in my firmware. My problem is how to correctly implement the UBsan C++ runtime lib with pure C functions in my firmware. You know, the UBsan define the runtime interface in C++, which is different from the Asan extern "C" ones.
The UBSan runtime uses `extern "C"' too, though this may not be apparent due to the use of macros to generate the handler declarations. The full ubsan interface is listed in the ubsan_interface.inc file.
Many UBsan RT lib input parameters type are mixed with C++ Class. I'm not sure how to correctly parse the C++ class with C structure. E.g. Many UBsan RT functions use the Class SourceLocation to pass the source location info. How should I parse or map the class SourceLocation layout to a C structure?
It isn't necessary to change the definition of SourceLocation, since it's only used by the runtime. The handler functions accept pointers to SourceLocation objects, so as long as the runtime's definition matches the structures the compiler emits, things work. The only thing to note about SourceLocation is that you'll need to use __sync_lock_test_and_set in the acquire() method.
It isn’t necessary to change the definition of SourceLocation, since it’s only used by the runtime. The handler functions accept pointers to SourceLocation objects, so as long as the runtime’s definition matches the structures the compiler emits, things work.
Vedant,
Yes, I don’t need change the SourceLocation definition. I find some rule of mix C and C++ as below link (wish this rule is not another “undefined behavior”), and my C code should can directly access the data in the SourceLocation C++ class object. OK, I know how to do it now.