"RFC: Enforcing pointer type alignment in Clang" and UBSan

The behavior of UBSan and “RFC: Enforcing pointer type alignment in Clang” is somewhat mismatched. I discussed this issue at -fsanitize=alignment false positive with intended unaligned struct member access · Issue #83710 · llvm/llvm-project · GitHub a while ago but had no response so I’d like to discuss it here.

The concerned behavior can be demonstrated with the following function:

/* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/inode.c?h=v6.7 */

struct dir_entry {
	u64 ino;
	u64 offset;
	unsigned type;
	int name_len;
};

u64 f6(struct dir_entry *entry)
{
    u64 buffer;
    void *p = (void *)&entry->offset;
    __builtin_memcpy(&buffer, p, sizeof(buffer));
    return buffer;
}

-fsanitize=alignment warns if the entry argument of f6 is misaligned. But this behavior contradicts with " [llvm-dev] RFC: Enforcing pointer type alignment in Clang":
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html

This RFC says:

Note that this definition is syntactic, meaning that it is expressed in terms of the components of a single statement. This means that an access that might be undefined behavior if written as a single statement:
highlyAlignedStruct->charMember = 0;
may not be undefined behavior if split across two statements:
“char *member = &highlyAlignedStruct->charMember;
*member = 0;
In effect, the compiler promises to never propagate alignment assumptions between statements through its knowledge of how a pointer was constructed. This is necessary in order to allow local workarounds to be reliable.

So p in this example should not have the alignment assumption propagated, but -fsanitize=alignment warns according to the assumption.

I’d like to resolve this contradiction by having either the RFC or the sanitizer corrected.

I personally feel changing the sanitizer’s behavior is more sensible; it makes sense that the RFC says it is necessary to allow local workarounds, and I also found real examples in QEMU (which I’m mainly concerned with) but also Linux. You can find some examples in -fsanitize=alignment false positive with intended unaligned struct member access · Issue #83710 · llvm/llvm-project · GitHub, and it’s not hard to find more by grepping Linux source code.

Why do you think these two are contradictory? sanitizers are generally there to find bugs, which I think there is no disagreement that this technically is. Whether the compiler actually exploits these cases is entirely orthogonal.

It is because -fsanitize=alignment: check memcpy/memmove arguments by MaskRay · Pull Request #67766 · llvm/llvm-project · GitHub says -fsanitize=alignment follows the mentioned RFC. Your comment still shows that I missed the third option: intentionally (not by accident) letting the sanitizer behave differently from the compiler. It would be nice if we have another RFC that describes the divergence between the compiler and sanitizer in that case.

That being said, it seems unlikely that the current behavior of sanitizer helps catch a bug. To suppress the warning from the sanitizer, we will need to modify f6 as follows:

u64 f6(struct dir_entry *entry)
{
    u64 buffer;
    void *p = (char *)entry + offsetof(struct dir_entry, offset);
    __builtin_memcpy(&buffer, p, sizeof(buffer));
    return buffer;
}

Note that the initialization of p is modified. This construct is more error-prone than the original one because we need to write the type name again for offsetof; a tragedy happens if the type mismatches. The original one is free from such a type error. The modified version is also simply more verbose.

So I think it makes more sense to stick to the RFC as -fsanitize=alignment: check memcpy/memmove arguments by MaskRay · Pull Request #67766 · llvm/llvm-project · GitHub intends.

In your example, the sanitizer is already validating alignment as part of struct-member access (&entry->offset); the memcpy is irrelevant. This is a reasonable point at which to do so, and is not a mistake. It definitely helps catch bugs!

If you are concerned about code readability, my best advise is: don’t make misaligned pointers in the first place. There is never a real need to do so; it just serves to make the code more confusing, and makes your code rely subtly on invisible-to-the-reader compiler extensions.

Your example function, f6, doesn’t actually require a struct dir_entry* as input , so it is entirely misleading to declare it as taking such. It forces the caller to do an invalid cast – and in absence of compiler extensions, it is UB! It would be better to write as:

u64 f6(void *entry)
// or: u64 f6(char entry[static sizeof(struct dir_entry)])
{
    u64 buffer;
    void *p = (char *)entry + offsetof(struct dir_entry, offset);
    __builtin_memcpy(&buffer, p, sizeof(buffer));
    return buffer;
}

The sanitizer does not validate alignment at entry->offset. For example, the sanitizer will not complain about the following code when entry is not aligned:

void f7(struct dir_entry *entry)
{
    &entry->offset;
}

So the sanitizer is actually generating a warning at memcpy() and that does not help catch bugs. (I was wrong, f7 actually does generate a warning.)

But should it really generate a warning at entry->offset? If we want to avoid accidentally having an unaligned pointer, the compiler should generate a warning when such a pointer is constructed as early as possible. For example, think of the following function:

void f8()
{
    char a[sizeof(struct dir_entry) + 1];
    (struct dir_entry *)(a + 1);
}

This function creates an unaligned pointer, but it does not generate a warning as it does not dereference. Generating a warning with &entry->offset is inconsistent with this behavior and insufficient to catch bugs.

“Don’t make misaligned pointers at all” is a good guideline in principle, but existing codebases like QEMU and Linux don’t enforce it and have “local workarounds” described in the aforementioned RFC. Such codebases will have too many warnings and I think that’s the reason why the RFC talks about “local workarounds”.

f8’s cast is UB if a isn’t suitably aligned (which it may happen to be) so yes that should warn.

It seems the current behavior of sanitizer is not appropriate for either of

  1. to allow “local workarounds” (useful in codebases in like QEMU).
  2. to prevent any unaligned pointer construction (to keep all pointers free from misalignment and the codebase UB-free).

For 1, both f6 with &entry->offset and f8 should not give warnings.
For 2, both f6 with &entry->offset and f8 should give warnings. (Or perhaps f6 may not need to give warnings in the assumption that a warning has already been raised when constructing a misaligned entry).

In reality, f6 gives warnings, and f8 doesn’t. It is not ideal for either of the two use cases.