[RFC] Instrumented versions of libc++ for different sanitizers

The purpose of this RFC is to discuss shipping instrumented versions of libc++ with LLVM, for different sanitizers. The motivation for this change is to add ASan annotations to libc++, which require full libc++ instrumentation. It is already a good idea to use ASan instrumented library – and for example required by MSan – what currently may be handled at the build system level or by sanitizer*bootstrap build bots, which does that already.

Background

I want to add ASan annotations to libc++, which require full libc++ instrumentation to work (The change is std::basic_string annotations.The discussion about that started here - further in this post, I’m going to use observations made there by @ldionne and @vitalybuka, thank you!).

Therefore, it would be good to ship ASan libc++ with LLVM.

Additional effects

Results of that discussion may be also helpful in different situations like shipping a library with unstable ABI:

I think knowing how to do this would benefit us since we have similar needs for other efforts. For example, it would be conceivable to ship an unstable-ABI version of the library, and a hardened version as well, and doing that would follow the exact same steps.
ldionne

Also, while my main goal is libc++ with ASan, I want to include msan/hwasan/tsan in the patch as well.

What we want to discuss

  1. Multiple CMake invocations or building multiple libraries from a single invocation?

We need to build this new version of the library (do we do that with multiple CMake invocations or do we do like compiler-rt and allow building multiple libraries from a single invocation?)
ldionne

Single invocation would be nice, as it’s better scale on other sanitizers.
vitalybuka

  1. Inform vendors. Is there anything we can do to make it easier?

We need to ship this library as part of the LLVM release and vendors need to also start shipping it with their own releases if they want -fsanitize=address to work
ldionne

CC: @EricWF

  1. How should we handle other components in the LLVM stack that use libc++ and may need to be instrumented as well? Should we provide instrumented versions of them too?

Probably yes, otherwise it’s too easy to create a situation with a false positive while using container annotations (especially important for strings). It’s a known limitation of vector annotations – everything has to be instrumented.

  1. How should we name and locate the instrumented libraries? Should we use prefixes or suffixes (e.g., libc++asan.dylib or asan/libc++.dylib) or some other scheme?

  2. How should we modify the compiler driver to link the correct version of libc++ depending on the sanitizer flags?

  3. Everything else related to the topic.

All suggestions are welcome!

I have experience with building libc++ with different sanitizers, but I am not very familiar with the details of the libc++ build system implementation. I would appreciate any guidance or pointers on where to look and how to implement those changes. That may save me (or whoever is going to implement it, but probably me) a lot of time!

Thank you for your feedback and suggestions!

Summary

Shipping instrumented versions of libc++ for different sanitizers as part of the LLVM release and making the compiler driver automatically link them depending on the sanitizer flags. This would improve the usability and reliability of sanitizers with libc++ and allow ASan ABI breaking changes. That’s a preliminary implementation of this proposal, feedback and suggestions from the community are more than welcome.

CC: @philnik

1 Like

Thanks for looking into this!
I would really love if I could easily use sanitizers without having to recompile libc++.

I am not completely sure about ABI compatibility, though. Do all relevant sanitizers guarantee a stable ABI across clang releases? Could I link a libc++tsan.dylib compiled with clang 15 against my main executable compiled with clang 16?

If I am reading ⚙ D143675 Discussion: Darwin Sanitizers Stable ABI correctly, then ASan is only ABI-stable on Darwin but not on Linux and Windows. Is the plan to ship the annotated libc++ versions only on Darwin? Will we force matching versions between libc++ and clang on the other systems if I want to use sanitizers?

Furthermore, I could not find anything on the ABI stability of TSan, MSan and other sanitizers.

I would propose to use the same naming scheme as libclang_rt. E.g., on my machine I see the following variants of libclang_rt:

$ ls lib/clang/16/lib/darwin
libclang_rt.asan_osx_dynamic.dylib              libclang_rt.lsan_osx_dynamic.dylib              libclang_rt.tsan_osx_dynamic.dylib              libclang_rt.xray-fdr_osx.a
libclang_rt.cc_kext.a                           libclang_rt.osx.a                               libclang_rt.ubsan_minimal_osx.a                 libclang_rt.xray-profiling_osx.a
libclang_rt.fuzzer_interceptors_osx.a           libclang_rt.profile_osx.a                       libclang_rt.ubsan_minimal_osx_dynamic.dylib     libclang_rt.xray_osx.a
libclang_rt.fuzzer_no_main_osx.a                libclang_rt.stats_client_osx.a                  libclang_rt.ubsan_osx_dynamic.dylib             liborc_rt_osx.a
libclang_rt.fuzzer_osx.a                        libclang_rt.stats_osx_dynamic.dylib             libclang_rt.xray-basic_osx.a

The compiler driver apparently already picks the correct version of libclang_rt based on the provided compiler flags. Can this logic be reused?

Making it work only on Darwin is not my goal for sure!

If ASan ABI is not stable already, maybe forcing the same versions for sanitized binaries is not a bad choice? In that case, if I understand correctly, it’s already required just with an additional step of compiling everything manually. At the end, everything has to be compiled with ASan anyway. Ready libc++ makes it a little bit easier, and when Sanitizers ABI becomes stable, shipping sanitized libc++ is in place. I don’t see much of negatives here. @ldionne what’s your opinion about it?

@vitalybuka do you expect any unique issues for other sanitizers than ASan?

It makes sense, thank you!

I will look closer to it, if (re)using it is possible, it would be probably the right choice. @MaskRay do you have any suggestions? I’m not very familiar with the driver.

FWIW I wouldn’t put too much time in ABI stability. Most people don’t ship with any sanitizers, since they aren’t meant for that. I don’t see a problem with just providing static libraries and making it clear that the ABI of the sanitized libraries is not stable.

Furthermore, I could not find anything on the ABI stability of TSan, MSan and other sanitizers.

ABI is not changing very frequently, but I guess it’s good enough if we just require that instrumented libc++ ver. N, is compiled with clang ver. N, so sanitized program should also use the same compiler.
If mismatched versions works for you - good, if not, it’s unsupported anyway.

wouldn’t put too much time in ABI stability.

Agree. We can reconsider if we find common usecase.

Thank you @philnik @vitalybuka! If we don’t need to consider ABI stability, it makes it easier.