Reviving TypeSanitizer - a sanitizer to catch type-based aliasing violations

fhahn · October 20, 2022, 8:06pm

Hi,

A while ago, @hfinkel proposed a new sanitizer for type-based aliasing violations. He also shared an initial implementation as well:

LLVM part: ⚙ D32198 [TySan] A Type Sanitizer (LLVM)
Clang part: ⚙ D32199 [TySan] A Type Sanitizer (Clang)
runtime part: ⚙ D32197 [TySan] A Type Sanitizer (Runtime Library)

Recently I spent some time rebasing the patches, ported them to current main and also made them work in macOS.

At the moment, there are a few key limitations on type-based aliasing in LLVM/Clang and one factor blocking progress in that area is concerns for correctness; in particular the fact that we don’t have a way to check a program is free of type-based aliasing violations makes it difficult the determine if a mis-compile is caused by a new bug in the TBAA implementation or UB in the source.

Therefore I propose to revive the type sanitizer patches and try to get an initial version submitted. To do that I would need help, especially on the sanitizer runtime side. At the moment, the all tysan tests added in the runtime patch work on macOS, but it would be great if someone could give it a try on Linux as well.

Please let me know if you are interested in helping out, there certainly are many areas that would need further tuning and testing.

cc’ing some of the people on the original review on discourse: @jdoerfert @kcc

Enna1 · October 21, 2022, 1:37am

Hi @fhahn, I’m familiar with ASan and TSan but not familiar with TySan. I’m very interested in helping this.

leonardchan · October 31, 2022, 9:25pm

Glad to see this is getting picked up again. We had this on queue as an intern project. What kind of help would you need landing these patches?

fhahn · November 2, 2022, 12:23pm

Thanks both @Enna1 & @leonardchan!

I think one of the main things is that I am only able to test this on macOS so far. Any help with checking the tests (in particular the end-to-end tests in compiler-rt) on Linux would be great. Then there’s the technical review of the implementation.

It would also be great if we could collect additional test cases for violations we want to detect or should not detect.

Finally, performance is also an area that will likely need improvements. Both that and increasing the test coverage is probably easier done once we have a baseline in tree.

Enna1 · November 4, 2022, 1:08pm

Hi @fhahn, I tried TypeSanitizer on my Linux environment, I think I make TySan build and tests passed on Linux.
Should I update the diff in the initial implementation patch or send a new patch( ⚙ D137414 [TySan] Fix Type Sanitizer build on Linux)?
Thanks!

fhahn · November 15, 2022, 11:33pm

That’s great, thank you very much!

I think I’ll commandeer the reviews and restart the reviews.

leonardchan · February 24, 2023, 8:42pm

Hiya. What’s the current status of this? If there’s any blockers or reviewers are needed, I’d be happy to volunteer.

fhahn · March 10, 2023, 8:36pm

I recently rebased the patches and addressed the latest comments (thanks everyone!).

Any additional help with the reviews + testing on Linux/X86 and Linux/AArch64 would be very much appreciated

xry111 · December 3, 2023, 12:45pm

Any hope to continue the work in GitHub PR era?

IMO this would be very helpful for “legacy” code like OpenSSL…

fhahn · December 22, 2023, 7:23pm

Yes hopefully!

Just moved the patches to GH as PRs:

github.com/llvm/llvm-project

[TySan] A Type Sanitizer (LLVM)

llvm:main ← llvm:users/fhahn/tysan-a-type-sanitizer-llvm

opened 07:18PM - 22 Dec 23 UTC

fhahn

+2137 -6

This patch introduces the LLVM components of a type sanitizer: a sanitizer for t…ype-based aliasing violations. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. [TySan] LLVM part. Move from https://reviews.llvm.org/D32198

github.com/llvm/llvm-project

[TySan] A Type Sanitizer (Clang)

llvm:users/fhahn/main.tysan-a-type-sanitizer-clang ← llvm:users/fhahn/tysan-a-type-sanitizer-clang

opened 07:18PM - 22 Dec 23 UTC

fhahn

+190 -30

This patch introduces the runtime components of a type sanitizer: a sanitizer fo…r type-based aliasing violations. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. The Clang changes seems mostly formulaic, the one specific change being that when the TBAA sanitizer is enabled, TBAA is always generated, even at -O0. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. Based on https://reviews.llvm.org/D32199. https://github.com/llvm/llvm-project/pull/76259 (LLVM support)

github.com/llvm/llvm-project

[TySan] A Type Sanitizer (Runtime Library)

llvm:users/fhahn/main.tysan-a-type-sanitizer-runtime-library ← llvm:users/fhahn/tysan-a-type-sanitizer-runtime-library

opened 07:18PM - 22 Dec 23 UTC

fhahn

+1504 -2

This patch introduces the runtime components of a type sanitizer: a sanitizer fo…r type-based aliasing violations. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. (This includes build fixes for Linux from Mingjie Xu) Based on https://reviews.llvm.org/D32197.

fhahn · April 19, 2024, 11:36am

Just rebased the patches again and addressed the current comments.

I also did a quick search of the issue tracker to find strict-alias violations that caused mis-compiles (reported by users) to check if the current implementation catches them.

Caught:
https://github.com/llvm/llvm-project/issues/86685
https://github.com/llvm/llvm-project/issues/68655
https://github.com/llvm/llvm-project/issues/62828
https://github.com/llvm/llvm-project/issues/62544
https://github.com/llvm/llvm-project/issues/47137
https://github.com/llvm/llvm-project/issues/45282

Not caught (Did not check closely yet if they may not be violations)
https://github.com/llvm/llvm-project/issues/78154
https://github.com/llvm/llvm-project/issues/51837

On the SingleSource subset of the test-suite on ARM64 macOS, there currently are the following failures. I had a quick look at some of them (SingleSource/UnitTests/Vectorizer/* and the violations there are in libc++'s random number generator, not yet confirmed if this is a false positive)

Failed Tests (30):
  test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test
  test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test
  test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test
  test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test
  test-suite :: SingleSource/Benchmarks/Misc/flops.test
  test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test
  test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists1.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-sieve.test
  test-suite :: SingleSource/Benchmarks/Shootout/Shootout-methcall.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-class_hierarchy.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-exception_spec_test.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-inlined_cleanup.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-throw_rethrow_test.test
  test-suite :: SingleSource/Regression/C/Regression-C-2003-05-21-BitfieldHandling.test
  test-suite :: SingleSource/UnitTests/2003-05-02-DependentPHI.test
  test-suite :: SingleSource/UnitTests/2004-11-28-GlobalBoolLayout.test
  test-suite :: SingleSource/UnitTests/2006-01-23-UnionInit.test
  test-suite :: SingleSource/UnitTests/2009-04-16-BitfieldInitialization.test
  test-suite :: SingleSource/UnitTests/2020-01-06-coverage-008.test
  test-suite :: SingleSource/UnitTests/SignlessTypes/Large/cast.test
  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test
  test-suite :: SingleSource/UnitTests/Vectorizer/dependences.test
  test-suite :: SingleSource/UnitTests/Vectorizer/index-select.test
  test-suite :: SingleSource/UnitTests/Vectorizer/recurrences.test
  test-suite :: SingleSource/UnitTests/Vectorizer/runtime-checks.test
  test-suite :: SingleSource/UnitTests/byval-alignment.test
  test-suite :: SingleSource/UnitTests/initp1.test

gbMattN · April 19, 2024, 1:13pm

I’ve tested TySan on Linux/X86 and the tests are passing. I also tested it on some historical internal issues that involved type aliasing and it caught all of them.
The two cases you say aren’t caught both involve unions. In the pull requests you mention that Clang’s TBAA representation currently has issues representing unions. Do we know if that’s still the case?

fhahn · May 2, 2024, 6:12pm

Great, thanks for checking!

That’s an interesting observation! Clang doesn’t generate TBAA metadata when accessing union fields, but at least in wrong code at -O2 and above on x86_64-linux-gnu (affecting clang-6.0 and above versions) · Issue #51837 · llvm/llvm-project · GitHub the problematic accesses are through short & long pointer variables, which get assigned a pointer to the union members.

Need to check if there’s anything that’s causing the missed tracking of the issue.

gbMattN · May 16, 2024, 1:41pm

I’ve looked into this and Clang is generating metadata for the union access, just the wrong metadata. It defines accesses to union fields to be “omnipotent char”. All scalar types alias with omnipotent char.
While the problematic accesses are through short and long pointer variables, the first access is through a union field. This means the shadow memory says this memory space is of type omnipotent char. As short and long both alias with omnipotent char, the alias violation is not caught.

I went through the IR generated pre-TySan passes, changed the omnipotent char tags to be the correct types, and then ran the TySan passes. When running the resulting program, TySan correctly pick up the strict aliasing violations. Gist is linked below.
Old llvm issues which have strict aliasing violations, yet are not caught by the current TySan. They are put here with edited LLVM IR explaining why they are not caught. (github.com)

It would be nice if we could catch issues like this, but I think this is currently a limitation of Clang

jyknight · May 16, 2024, 2:29pm

From Clang’s side, accesses to a union fields visibly through the union type are indeed allowed to type-pun between any of the types in the union. That’s why it annotates such accesses as ‘omnipontent char’.

But, I’d expect the behavior from TySan to be:

d.c = 42;, stored through the union: OK, and sets the memory type to omnipotent char.
*e = 413;: OK, and modifies the type to unsigned long.
*i = 612;: report an error, as it’s attempting to store type short to memory typed with unsigned long.

gbMattN · May 30, 2024, 9:06am

Hi @fhahn, in light of the test investigation, do you feel more work on TySan is needed for its current state to be accepted into upstream?

TheLastRar · June 3, 2024, 12:02pm

Hi

Was trying to test this on a project called PCSX2 (https://github.com/PCSX2/pcsx2/tree/master)

When specifying -fsanitize=type, Clang gets stuck compiling MTGS.cpp (MTGS.cpp)
Clang also consumes up an excessive amount of ram attempting to compile it.

The clang command is as follows;

/home/air/llvm-project/build/bin/clang++ -DCPUINFO_SUPPORTED_PLATFORM=1 -DENABLE_OPENGL -DENABLE_VULKAN -DPCSX2_DEVBUILD -DSOUNDTOUCH_FLOAT_SAMPLES -DST_NO_EXCEPTION_HANDLING -DWAYLAND_API -DX11_API -DXBYAK_NO_EXCEPTION -DZYCORE_STATIC_DEFINE -DZYDIS_STATIC_DEFINE -D_DEVEL -D_M_X86=1 -D__POSIX__ -I/home/air/pcsx2/pcsx2 -I/home/air/pcsx2/build/pcsx2 -I/home/air/pcsx2/build/common/include -I/home/air/pcsx2/3rdparty/xbyak -I/home/air/pcsx2/3rdparty/glad/include -I/home/air/pcsx2/3rdparty/vulkan-headers/include -I/home/air/pcsx2/common/../3rdparty/include -I/home/air/pcsx2/common/.. -I/home/air/pcsx2/3rdparty/fmt/fmt/include -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/ext/c4core/src/c4/ext/fast_float/include -I/home/air/pcsx2/3rdparty/imgui/include -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/src -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/ext/c4core/src -I/home/air/pcsx2/3rdparty/libchdr/include -I/home/air/pcsx2/3rdparty/libzip/lib -I/home/air/pcsx2/build/3rdparty/libzip -I/home/air/pcsx2/3rdparty/cpuinfo/include -I/home/air/pcsx2/3rdparty/zydis/dependencies/zycore/include -I/home/air/pcsx2/3rdparty/zydis/include -I/home/air/pcsx2/3rdparty/cubeb/include -I/home/air/pcsx2/3rdparty/rcheevos/include -I/home/air/pcsx2/3rdparty/discord-rpc/include -I/home/air/pcsx2/3rdparty/rapidjson/include -I/home/air/pcsx2/3rdparty/demangler/include -I/home/air/pcsx2/3rdparty/simpleini/include -I/home/air/pcsx2/3rdparty/freesurround/include -I/home/air/pcsx2/3rdparty/soundtouch/soundtouch -I/home/air/pcsx2/3rdparty/lzma/include -isystem /home/air/pcsx2/deps/include -isystem /usr/include/freetype2 -isystem /home/air/pcsx2/deps/include/SDL2 -DFMT_EXCEPTIONS=0 -O2 -g -DNDEBUG -march=native -pipe -fvisibility=hidden -pthread -fno-rtti -fno-exceptions -Wall -Wextra -Wno-unused-function -Wno-unused-parameter -Wno-missing-field-initializers -ffp-contract=fast -fno-strict-aliasing -fsanitize=type -Wstrict-aliasing -Wno-parentheses -Wno-missing-braces -Wno-unknown-pragmas -std=gnu++20 -Winvalid-pch -Xclang -include-pch -Xclang /home/air/pcsx2/build/pcsx2/CMakeFiles/PCSX2.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /home/air/pcsx2/build/pcsx2/CMakeFiles/PCSX2.dir/cmake_pch.hxx -MD -MT pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o -MF pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o.d -o pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o -c /home/air/pcsx2/pcsx2/MTGS.cpp

PCSX2 is configured for build type Devel
This project is a little unusual, as you need to build some of its dependences.
You can use the dependences build script as mentioned here (build-dependencies) to do so

For Clang, I had locally merged the LLVM, Clang & Runtime TySAN PRs (in that order) onto commit 5f243b3fffca42ed320529a54aefd86087aa85f8

Edit: Should mention I’m running on Ubuntu 22.04 LTS

fhahn · June 5, 2024, 9:59am

Thanks for looking into this. As long as we understand why we don’t catch those cases I don’t think that should be a blocker.

fhahn · June 5, 2024, 10:01am

Thanks for trying this out! TySan generates a large amount of instrumentation code, so in some cases that can be a challenge w.r.t. to compile-time and memory usage. I don’t think we will be able to mitigate this in all circumstances, but I think there are a number of efficiency improvements we could iteration on.

TheLastRar · June 5, 2024, 10:59am

I had looked further into it

Modifying this structure in MTGS.cpp

struct BufferedData
{
	u128 m_Ring[RingBufferSize];
	u8 Regs[Ps2MemSize::GSregs];

	u128& operator[](uint idx)
	{
		pxAssert(idx < RingBufferSize);
		return m_Ring[idx];
	}
};

to this

struct BufferedData
{
	u128* m_Ring;
	u8 Regs[Ps2MemSize::GSregs];

	BufferedData()
	{
		m_Ring = new u128[RingBufferSize];
	}

	u128& operator[](uint idx)
	{
		pxAssert(idx < RingBufferSize);
		return m_Ring[idx];
	}

	~BufferedData()
	{
		delete m_Ring;
	}
};

Resolved the excessive memory utilisation during compile

RingBufferSize is 524288, and with the size u128, gives an array that is 8mb big.

It seems there is an issue dealing with (the use of) large static arrays, or large structures/classes.

Edit: Also building as Debug rather than Devel/ReleaseWithDebug Info seems to help

Topic		Replies	Views
Building Type Sanitizer (TySan) from patches, fails due to `CompilerRt` issues in `tsan` CMakeLists, anyone with experience here? Clang Frontend	10	472	December 22, 2022
Contributing a new sanitizer for pointer casts LLVM Dev List Archives	4	137	April 26, 2017
crashing w/sanitizers Using Clang	0	145	February 20, 2018
LLVMdev Digest, Vol 89, Issue 60 LLVM Dev List Archives	1	114	November 25, 2011
adding sanitizers for llvm generated code LLVM Dev List Archives	0	97	July 15, 2016

Reviving TypeSanitizer - a sanitizer to catch type-based aliasing violations

Related topics