Building Type Sanitizer (TySan) from patches, fails due to `CompilerRt` issues in `tsan` CMakeLists, anyone with experience here?

Hey all,


I’m really interested in experimenting with TySan.


I’ve never built anything from patches before, but I found a helpful comment from @MaskRay that gave me an idea of how it could be done on one of the PR’s:


With this in mind, I wrote a small script to aggregate the patches, and also fix one small issue from current mainline (the bitcode number 86 is now in use by memory attribute):

#!/bin/bash
set -u -x

patch_list=("D32197" "D32198" "D32199" "D137414")
for patch in "${patch_list[@]}"; do
  # Apply patch in non-interactive mode, skipping any patches that fail to apply or have already been applied.
  # (The first patch has a change that will fail due to CMake changes, but the latest patch from Nov. 2022 fixes this.)
  curl -L "https://reviews.llvm.org/$patch?download=1" | patch --strip=1 --forward --fuzz 3 --ignore-whitespace --silent
done

bitcodes_patch=$(cat <<'EOF'
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 2b474b67425c..cd14780783cc 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -690,6 +690,7 @@ enum AttributeKindCodes {
   ATTR_KIND_NO_MERGE = 66,
   ATTR_KIND_NULL_POINTER_IS_VALID = 67,
   ATTR_KIND_NOUNDEF = 68,
+  ATTR_KIND_SANITIZE_TYPE = 87,
   ATTR_KIND_BYREF = 69,
   ATTR_KIND_MUSTPROGRESS = 70,
   ATTR_KIND_NO_CALLBACK = 71,
EOF
)

echo "$bitcodes_patch" | patch --strip=1 --forward --silent
echo "Done."

However, the problem is that even with this patch, it still does not seem to build:

$ cmake -S llvm -B build -G Ninja \
	-DLLVM_ENABLE_PROJECTS="clang" \
	-DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi" \
	-DLLVM_USE_LINKER=mold \
	-DCMAKE_BUILD_TYPE=Release \
	-DCMAKE_C_COMPILER=clang \
	-DCMAKE_CXX_COMPILER=clang++ 
$ cd build
$ sudo cmake --build . --target install

Here is some of the error output I see:

CMake Error at /home/user/projects/llvm-project/compiler-rt/cmake/Modules/AddCompilerRT.cmake:368 (add_library):
  add_library cannot create target "clang_rt.tsan-x86_64" because another
  target with the same name already exists.  The existing target is a static
  library created in source directory
  "/home/user/projects/llvm-project/compiler-rt/lib/tsan/rtl".  See
  documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  /home/user/projects/llvm-project/compiler-rt/lib/tsan/rtl/CMakeLists.txt:239 (add_compiler_rt_runtime)


CMake Error at /home/user/projects/llvm-project/compiler-rt/cmake/Modules/CompilerRTUtils.cmake:526 (add_custom_target):
  add_custom_target cannot create target "install-clang_rt.tsan-x86_64"
  because another target with the same name already exists.  The existing
  target is a custom target created in source directory
  "/home/user/projects/llvm-project/compiler-rt/lib/tsan/rtl".  See
  documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  /home/user/projects/llvm-project/compiler-rt/cmake/Modules/AddCompilerRT.cmake:436 (add_compiler_rt_install_targets)
  /home/user/projects/llvm-project/compiler-rt/lib/tsan/rtl/CMakeLists.txt:239 (add_compiler_rt_runtime)

Does anyone have suggestions on how this could be fixed? It’d be greatly appreciated :smiley:
Thank you!

I don’t know why, but commenting out everything after if(APPLE) in compiler-rt/lib/tsan/rtl/CMakeLists.txt made the build work.


(Lines 117-288)


Super puzzled as to why this is though – it seems like Thread Sanitizer is getting added twice, IE the calls to this:

add_compiler_rt_runtime(clang_rt.tsan

And others were being invoked from somewhere else during the build. :confused:

EDIT: Nevermind, I get:

/usr/local/bin/ld: cannot find /usr/local/lib/clang/16/lib/linux/libclang_rt.tysan-x86_64.a: No such file or directory

Check this patch: ⚙ D137414 [TySan] Fix Type Sanitizer build on Linux
Edited: I just saw you have added D137414 in your script.
In my environment I use arc patch D<Revision> to apply patch.
Note that the order of D32198、D32199、D32197 may matter.

1 Like

Ah I did not know there was a command (arc) to apply the whole stack of changes, thank you @Enna1
I figured there had to be something like this.


There was one other issue that happened – during the patch compiler-rt/cmake/Modules/AllSupportedArchDefs.cmake wound up with an incorrect line, which was causing the failure above.


For some reason though, despite now that I can get it to successfully build, it still does not build libclang_rt.tysan.


Do you know how I can debug this by chance?
[user@MSI build]$ ls lib/clang/16/lib/x86_64-unknown-linux-gnu/
clang_rt.crtbegin.o                      libclang_rt.dfsan.a.syms                 libclang_rt.hwasan_cxx.a.syms            libclang_rt.profile.a                    libclang_rt.ubsan_minimal.so
clang_rt.crtend.o                        libclang_rt.dyndd.so                     libclang_rt.hwasan-preinit.a             libclang_rt.safestack.a                  libclang_rt.ubsan_standalone.a
libclang_rt.asan.a                       libclang_rt.fuzzer.a                     libclang_rt.hwasan.so                    libclang_rt.scudo_standalone.a           libclang_rt.ubsan_standalone.a.syms
libclang_rt.asan.a.syms                  libclang_rt.fuzzer_interceptors.a        libclang_rt.lsan.a                       libclang_rt.scudo_standalone_cxx.a       libclang_rt.ubsan_standalone_cxx.a
libclang_rt.asan_cxx.a                   libclang_rt.fuzzer_no_main.a             libclang_rt.memprof.a                    libclang_rt.scudo_standalone.so          libclang_rt.ubsan_standalone_cxx.a.syms
libclang_rt.asan_cxx.a.syms              libclang_rt.gwp_asan.a                   libclang_rt.memprof.a.syms               libclang_rt.stats.a                      libclang_rt.ubsan_standalone.so
libclang_rt.asan-preinit.a               libclang_rt.hwasan.a                     libclang_rt.memprof_cxx.a                libclang_rt.stats_client.a               libclang_rt.xray.a
libclang_rt.asan.so                      libclang_rt.hwasan_aliases.a             libclang_rt.memprof_cxx.a.syms           libclang_rt.tsan.a                       libclang_rt.xray-basic.a
libclang_rt.asan_static.a                libclang_rt.hwasan_aliases.a.syms        libclang_rt.memprof-preinit.a            libclang_rt.tsan.a.syms                  libclang_rt.xray-fdr.a
libclang_rt.builtins.a                   libclang_rt.hwasan_aliases_cxx.a         libclang_rt.memprof.so                   libclang_rt.tsan_cxx.a                   libclang_rt.xray-profiling.a
libclang_rt.cfi.a                        libclang_rt.hwasan_aliases_cxx.a.syms    libclang_rt.msan.a                       libclang_rt.tsan_cxx.a.syms              liborc_rt.a
libclang_rt.cfi_diag.a                   libclang_rt.hwasan_aliases.so            libclang_rt.msan.a.syms                  libclang_rt.tsan.so
libclang_rt.dd.a                         libclang_rt.hwasan.a.syms                libclang_rt.msan_cxx.a                   libclang_rt.ubsan_minimal.a
libclang_rt.dfsan.a                      libclang_rt.hwasan_cxx.a                 libclang_rt.msan_cxx.a.syms              libclang_rt.ubsan_minimal.a.syms

Okay, I think I see what is going on.


There is a CMake variable, COMPILER_RT_SANITIZERS_TO_BUILD in compiler-rt/cmake/config-ix.cmake:

set(ALL_SANITIZERS asan;dfsan;msan;hwasan;tsan;safestack;cfi;scudo_standalone;ubsan_minimal;gwp_asan)
set(COMPILER_RT_SANITIZERS_TO_BUILD all CACHE STRING
    "sanitizers to build if supported on the target (all;${ALL_SANITIZERS})")
list_replace(COMPILER_RT_SANITIZERS_TO_BUILD all "${ALL_SANITIZERS}")

I need to set this to include tysan and it should work, I think.


Edit: SUCCESS!! I will make a blogpost with details on how to build TySan against Clang trunk so others can experiment and avoid the pitfalls :raised_hands:


Edit2: It doesn’t appear to actually do anything :slightly_frowning_face: I’ve tried the examples from the YouTube talks and the slides, I’m unable to get any kind of assertion to kick in. Maybe because of the opaque pointer changes since the original development?

You can try tests in compiler-rt/test/tysan/, TySan will catch and report type based aliasing violations.
I also tried the examples from YouTube talks, TySan will catch the bug in my test.

1 Like

Oh, I didn’t know about those tests, it seems that it does in fact work – thank you!! :tada:


Feel free to ask me anything about TySan. Hope I can help! :grinning:

1 Like

This weekend I start my holiday break from work, so I plan on writing a blogpost about how to build the TySan and sharing it on my Twitter and some other places (I am not very popular but maybe it’ll help a little :sweat_smile:)


Hopefully some more people will give it some attention, I think it’s a very cool project. Myself, I do not fully understand Type-Based Aliasing entirely, so I think I need to study it some more.


One question I have about the diagnostic messages – are they sometimes “backwards”? For instance, to take one of the test examples:

long foo(int *x, long *y) {
  *x = 0;
  *y = 1;
  // CHECK: ERROR: TypeSanitizer: type-aliasing-violation
  // CHECK: WRITE of size 8 at {{.*}} with type long accesses an existing object of type int
  // CHECK: {{#0 0x.* in foo .*int-long.c:}}[[@LINE-3]]

  return *x;
}

int main(void) {
  long l;
  printf("%ld\n", foo((int *)&l, &l));
}

Here we pass a pointer-to-long to both the int and long params. The pointer originates from main so it exists “first”. But the diagnostic says:

WRITE of size 8 at .. with type long accesses an existing object of type int

Which makes it seem as though the int in the argument were the original object. Or maybe I don’t understand :thinking:

1 Like

This weekend I start my holiday break from work, so I plan on writing a blogpost about how to build the TySan and sharing it on my Twitter and some other places (I am not very popular but maybe it’ll help a little :sweat_smile:)

I wrote a blog about “strict-alising, TBAA and TypeSanitizer” in Chinese, Strict Aliasing, TBAA and TypeSanitizer - Enna1's website. Maybe it can help a little.

Which makes it seem as though the int in the argument were the original object.

This is about the algorithm behind TypeSanitizer.
Let’s say long l is 8-bytes.

  1. *x = 0; access the first 4-bytes of 8-bytes. TySan store the type information into long l 's metadata, which indicates the first 4-bytes of 8-bytes were first accessed as int type.
  2. *y = 1; access the whole 8-bytes of 8-bytes. TySan found that we accessed the same address of y as type int before, and this time we access it as long. Because of the strict-aliasing rule, type int object and type long object can not be alias, so TySan report this as a type-aliasing-violation.
1 Like

which indicates the first 4-bytes of 8-bytes were first accessed as int type.

TySan found that we accessed the same address of y as type int before, and this time we access it as long.

Ahh okay, now I see, the other errors make a lot more sense now too. Thanks for explaining! :smiley: