Reviving TypeSanitizer - a sanitizer to catch type-based aliasing violations

Hi,

A while ago, @hfinkel proposed a new sanitizer for type-based aliasing violations. He also shared an initial implementation as well:

Recently I spent some time rebasing the patches, ported them to current main and also made them work in macOS.

At the moment, there are a few key limitations on type-based aliasing in LLVM/Clang and one factor blocking progress in that area is concerns for correctness; in particular the fact that we don’t have a way to check a program is free of type-based aliasing violations makes it difficult the determine if a mis-compile is caused by a new bug in the TBAA implementation or UB in the source.

Therefore I propose to revive the type sanitizer patches and try to get an initial version submitted. To do that I would need help, especially on the sanitizer runtime side. At the moment, the all tysan tests added in the runtime patch work on macOS, but it would be great if someone could give it a try on Linux as well.

Please let me know if you are interested in helping out, there certainly are many areas that would need further tuning and testing.

cc’ing some of the people on the original review on discourse: @jdoerfert @kcc

14 Likes

Hi @fhahn, I’m familiar with ASan and TSan but not familiar with TySan. I’m very interested in helping this.

Glad to see this is getting picked up again. We had this on queue as an intern project. What kind of help would you need landing these patches?

Thanks both @Enna1 & @leonardchan!

I think one of the main things is that I am only able to test this on macOS so far. Any help with checking the tests (in particular the end-to-end tests in compiler-rt) on Linux would be great. Then there’s the technical review of the implementation.

It would also be great if we could collect additional test cases for violations we want to detect or should not detect.

Finally, performance is also an area that will likely need improvements. Both that and increasing the test coverage is probably easier done once we have a baseline in tree.

Hi @fhahn, I tried TypeSanitizer on my Linux environment, I think I make TySan build and tests passed on Linux.
Should I update the diff in the initial implementation patch or send a new patch( ⚙ D137414 [TySan] Fix Type Sanitizer build on Linux)?
Thanks!

1 Like

That’s great, thank you very much!

I think I’ll commandeer the reviews and restart the reviews.

3 Likes

Hiya. What’s the current status of this? If there’s any blockers or reviewers are needed, I’d be happy to volunteer.

I recently rebased the patches and addressed the latest comments (thanks everyone!).

Any additional help with the reviews + testing on Linux/X86 and Linux/AArch64 would be very much appreciated

1 Like

Any hope to continue the work in GitHub PR era?

IMO this would be very helpful for “legacy” code like OpenSSL…

2 Likes

Yes hopefully!

Just moved the patches to GH as PRs:

2 Likes

Just rebased the patches again and addressed the current comments.

I also did a quick search of the issue tracker to find strict-alias violations that caused mis-compiles (reported by users) to check if the current implementation catches them.

Caught:
https://github.com/llvm/llvm-project/issues/86685
https://github.com/llvm/llvm-project/issues/68655
https://github.com/llvm/llvm-project/issues/62828
https://github.com/llvm/llvm-project/issues/62544
https://github.com/llvm/llvm-project/issues/47137
https://github.com/llvm/llvm-project/issues/45282

Not caught (Did not check closely yet if they may not be violations)
https://github.com/llvm/llvm-project/issues/78154
https://github.com/llvm/llvm-project/issues/51837

On the SingleSource subset of the test-suite on ARM64 macOS, there currently are the following failures. I had a quick look at some of them (SingleSource/UnitTests/Vectorizer/* and the violations there are in libc++'s random number generator, not yet confirmed if this is a false positive)

Failed Tests (30):
  test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test
  test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test
  test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test
  test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test
  test-suite :: SingleSource/Benchmarks/Misc/flops.test
  test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test
  test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists1.test
  test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-sieve.test
  test-suite :: SingleSource/Benchmarks/Shootout/Shootout-methcall.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-class_hierarchy.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-exception_spec_test.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-inlined_cleanup.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-throw_rethrow_test.test
  test-suite :: SingleSource/Regression/C/Regression-C-2003-05-21-BitfieldHandling.test
  test-suite :: SingleSource/UnitTests/2003-05-02-DependentPHI.test
  test-suite :: SingleSource/UnitTests/2004-11-28-GlobalBoolLayout.test
  test-suite :: SingleSource/UnitTests/2006-01-23-UnionInit.test
  test-suite :: SingleSource/UnitTests/2009-04-16-BitfieldInitialization.test
  test-suite :: SingleSource/UnitTests/2020-01-06-coverage-008.test
  test-suite :: SingleSource/UnitTests/SignlessTypes/Large/cast.test
  test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test
  test-suite :: SingleSource/UnitTests/Vectorizer/dependences.test
  test-suite :: SingleSource/UnitTests/Vectorizer/index-select.test
  test-suite :: SingleSource/UnitTests/Vectorizer/recurrences.test
  test-suite :: SingleSource/UnitTests/Vectorizer/runtime-checks.test
  test-suite :: SingleSource/UnitTests/byval-alignment.test
  test-suite :: SingleSource/UnitTests/initp1.test
4 Likes

I’ve tested TySan on Linux/X86 and the tests are passing. I also tested it on some historical internal issues that involved type aliasing and it caught all of them.
The two cases you say aren’t caught both involve unions. In the pull requests you mention that Clang’s TBAA representation currently has issues representing unions. Do we know if that’s still the case?

Great, thanks for checking!

That’s an interesting observation! Clang doesn’t generate TBAA metadata when accessing union fields, but at least in wrong code at -O2 and above on x86_64-linux-gnu (affecting clang-6.0 and above versions) · Issue #51837 · llvm/llvm-project · GitHub the problematic accesses are through short & long pointer variables, which get assigned a pointer to the union members.

Need to check if there’s anything that’s causing the missed tracking of the issue.

I’ve looked into this and Clang is generating metadata for the union access, just the wrong metadata. It defines accesses to union fields to be “omnipotent char”. All scalar types alias with omnipotent char.
While the problematic accesses are through short and long pointer variables, the first access is through a union field. This means the shadow memory says this memory space is of type omnipotent char. As short and long both alias with omnipotent char, the alias violation is not caught.

I went through the IR generated pre-TySan passes, changed the omnipotent char tags to be the correct types, and then ran the TySan passes. When running the resulting program, TySan correctly pick up the strict aliasing violations. Gist is linked below.
Old llvm issues which have strict aliasing violations, yet are not caught by the current TySan. They are put here with edited LLVM IR explaining why they are not caught. (github.com)

It would be nice if we could catch issues like this, but I think this is currently a limitation of Clang

From Clang’s side, accesses to a union fields visibly through the union type are indeed allowed to type-pun between any of the types in the union. That’s why it annotates such accesses as ‘omnipontent char’.

But, I’d expect the behavior from TySan to be:

  • d.c = 42;, stored through the union: OK, and sets the memory type to omnipotent char.
  • *e = 413;: OK, and modifies the type to unsigned long.
  • *i = 612;: report an error, as it’s attempting to store type short to memory typed with unsigned long.

Hi @fhahn, in light of the test investigation, do you feel more work on TySan is needed for its current state to be accepted into upstream?

Hi

Was trying to test this on a project called PCSX2 (https://github.com/PCSX2/pcsx2/tree/master)

When specifying -fsanitize=type, Clang gets stuck compiling MTGS.cpp (MTGS.cpp)
Clang also consumes up an excessive amount of ram attempting to compile it.

The clang command is as follows;

/home/air/llvm-project/build/bin/clang++ -DCPUINFO_SUPPORTED_PLATFORM=1 -DENABLE_OPENGL -DENABLE_VULKAN -DPCSX2_DEVBUILD -DSOUNDTOUCH_FLOAT_SAMPLES -DST_NO_EXCEPTION_HANDLING -DWAYLAND_API -DX11_API -DXBYAK_NO_EXCEPTION -DZYCORE_STATIC_DEFINE -DZYDIS_STATIC_DEFINE -D_DEVEL -D_M_X86=1 -D__POSIX__ -I/home/air/pcsx2/pcsx2 -I/home/air/pcsx2/build/pcsx2 -I/home/air/pcsx2/build/common/include -I/home/air/pcsx2/3rdparty/xbyak -I/home/air/pcsx2/3rdparty/glad/include -I/home/air/pcsx2/3rdparty/vulkan-headers/include -I/home/air/pcsx2/common/../3rdparty/include -I/home/air/pcsx2/common/.. -I/home/air/pcsx2/3rdparty/fmt/fmt/include -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/ext/c4core/src/c4/ext/fast_float/include -I/home/air/pcsx2/3rdparty/imgui/include -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/src -I/home/air/pcsx2/3rdparty/rapidyaml/rapidyaml/ext/c4core/src -I/home/air/pcsx2/3rdparty/libchdr/include -I/home/air/pcsx2/3rdparty/libzip/lib -I/home/air/pcsx2/build/3rdparty/libzip -I/home/air/pcsx2/3rdparty/cpuinfo/include -I/home/air/pcsx2/3rdparty/zydis/dependencies/zycore/include -I/home/air/pcsx2/3rdparty/zydis/include -I/home/air/pcsx2/3rdparty/cubeb/include -I/home/air/pcsx2/3rdparty/rcheevos/include -I/home/air/pcsx2/3rdparty/discord-rpc/include -I/home/air/pcsx2/3rdparty/rapidjson/include -I/home/air/pcsx2/3rdparty/demangler/include -I/home/air/pcsx2/3rdparty/simpleini/include -I/home/air/pcsx2/3rdparty/freesurround/include -I/home/air/pcsx2/3rdparty/soundtouch/soundtouch -I/home/air/pcsx2/3rdparty/lzma/include -isystem /home/air/pcsx2/deps/include -isystem /usr/include/freetype2 -isystem /home/air/pcsx2/deps/include/SDL2 -DFMT_EXCEPTIONS=0 -O2 -g -DNDEBUG -march=native -pipe -fvisibility=hidden -pthread -fno-rtti -fno-exceptions -Wall -Wextra -Wno-unused-function -Wno-unused-parameter -Wno-missing-field-initializers -ffp-contract=fast -fno-strict-aliasing -fsanitize=type -Wstrict-aliasing -Wno-parentheses -Wno-missing-braces -Wno-unknown-pragmas -std=gnu++20 -Winvalid-pch -Xclang -include-pch -Xclang /home/air/pcsx2/build/pcsx2/CMakeFiles/PCSX2.dir/cmake_pch.hxx.pch -Xclang -include -Xclang /home/air/pcsx2/build/pcsx2/CMakeFiles/PCSX2.dir/cmake_pch.hxx -MD -MT pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o -MF pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o.d -o pcsx2/CMakeFiles/PCSX2.dir/MTGS.cpp.o -c /home/air/pcsx2/pcsx2/MTGS.cpp

PCSX2 is configured for build type Devel
This project is a little unusual, as you need to build some of its dependences.
You can use the dependences build script as mentioned here (build-dependencies) to do so

For Clang, I had locally merged the LLVM, Clang & Runtime TySAN PRs (in that order) onto commit 5f243b3fffca42ed320529a54aefd86087aa85f8

Edit: Should mention I’m running on Ubuntu 22.04 LTS

Thanks for looking into this. As long as we understand why we don’t catch those cases I don’t think that should be a blocker.

1 Like

Thanks for trying this out! TySan generates a large amount of instrumentation code, so in some cases that can be a challenge w.r.t. to compile-time and memory usage. I don’t think we will be able to mitigate this in all circumstances, but I think there are a number of efficiency improvements we could iteration on.

1 Like

I had looked further into it

Modifying this structure in MTGS.cpp

struct BufferedData
{
	u128 m_Ring[RingBufferSize];
	u8 Regs[Ps2MemSize::GSregs];

	u128& operator[](uint idx)
	{
		pxAssert(idx < RingBufferSize);
		return m_Ring[idx];
	}
};

to this

struct BufferedData
{
	u128* m_Ring;
	u8 Regs[Ps2MemSize::GSregs];

	BufferedData()
	{
		m_Ring = new u128[RingBufferSize];
	}

	u128& operator[](uint idx)
	{
		pxAssert(idx < RingBufferSize);
		return m_Ring[idx];
	}

	~BufferedData()
	{
		delete m_Ring;
	}
};

Resolved the excessive memory utilisation during compile

RingBufferSize is 524288, and with the size u128, gives an array that is 8mb big.

It seems there is an issue dealing with (the use of) large static arrays, or large structures/classes.

Edit: Also building as Debug rather than Devel/ReleaseWithDebug Info seems to help