Building clang 15 with MemTagSanitizer enabled

Hello LLVM community, I find myself in need of your help. I’m not an experienced LLVM user, I just started getting my hands on this huge and impressive project as part of my studies. As you might have guessed, the issues I ran into have been many but I managed on my own up until now. Well, now I have indeed encountered a problem for which I have run out of possible solutions and google searches. Here it comes:

I’m interested in using the MemTagSanitizer feature of clang. This feature, which is only available when targeting a specific architecture ( -target aarch64-linux -march=armv8+memtag), enables a few backend passes that instrument the code with a hardware-based version of the Address Sanitiser. These passes can be enabled by using the -fsanitize=memtag option when invoking clang.
Unfortunately, albeit this feature works when using the repository version of clang (e.g. clang 10 that comes with ubuntu), it does not work when using a freshly compiled version of clang. I’ve tried to build it in various manner but I never managed to get a clang with that working feature.

Here’s my latest build configuration: ‘cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=“clang;clang-tools-extra;lld;lldb;openmp;polly;pstl;compiler-rt” -DLLVM_TARGETS_TO_BUILD=“AArch64”
…/llvm’
As you can see, I’ve enabled almost all of the projects in the hope they influence the final clang, unfortunately with no luck.
I realise that this memtag feature is probably seldom used and I would not be surprised if only few of you have ever heard of it, but you might still know if there exists some way to enable secondary or experimental features for clang.
I really appreciate any help, thank you!

Can you define “does not work”? Do you get an error during compilation, a runtime error running the compiled program, etc? Then we can figure that out specifically.

I went ahead and tried it myself anyway. I was able to use at least the stack tagging portion with a configuration of:

cmake ../llvm-project/llvm -DLLVM_ENABLE_PROJECTS="llvm;clang" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=AArch64 -G Ninja

I don’t think the compiler used to build clang needs to support MTE either. As long as it can compile the new clang, which will.

Compiling a basic program:

int main() {
  char foo[10];
  int i;
  return 0;
}

With:

./bin/clang -target aarch64-linux-gnueabi -march=armv8-a+memtag /tmp/test.c -S -o - -fsanitize=memtag

You’ll see the tag/untag instructions in the output:

        sub     sp, sp, #48
        .cfi_def_cfa_offset 48
        mov     x8, xzr
        irg     x9, sp, x8
        addg    x8, x9, #32, #0

Note: “MTE” is what AArch64 calls its memory tagging extension. If you want to read more look for “FEAT_MTE” in Documentation – Arm Developer.

You’ll need to be aware of a couple of things to actually run it:

  • You need memory tagging capable hardware. Which means QEMU at this point. Some of LLDB’s scripts will help you build it (Testing LLDB using QEMU — The LLDB Debugger) if you haven’t already (this does need an MTE capable compiler, you could use the clang you just built).
  • You’ll need to have the stack allocated as tagged memory (PROT_MTE) for the tagging to work (the instructions may even fault if memory is untagged, not 100% sure there). To do that you’ll need to see if there’s a latest glibc that’ll do that for you, or hack in some allocation of your own (I haven’t used this feature myself so I’m guessing here).

So that’s stack tagging, for the heap I think you need to use the Scudo allocator which I don’t know a lot about. I believe that when you build compiler-rt in the way you’ve shown, it uses the clang you just built so at least you won’t need a host compiler with MTE support.

Also in case this confused you like it confused me, HWASAN (hardware address sanitizer) is not using the memory tagging extension like Scudo and the stack tagging. It is using the top byte ignore (TBI) feature where you can stash bits in the top byte of a pointer.

In some sense you have ASAN which is software tagging, HWASAN which is hardware assisted tagging and memory tagging where the hardware does all the work.

If you’re interested in tagging globals that isn’t implemented yet but it’s being worked on. If you search the mailing archives there are a few threads about it.

Finally if you want to debug all this you can use LLDB 13 or greater (Debugging Memory Tagging with LLDB 13 | Blog | Linaro), or a recent GDB.

1 Like

Thank you for the great answer. For “does not work” I mean that the code that it generates is not instrumented.
But you helped me finding out the problem: since your version worked I realized I must have changed the sources files somehow. It was indeed the problem!

Thank you again for your help, I really appreciated it

Great! Feel free to ask again if you have any more problems.