C++ strict aliasing and TBAA

C++ strict aliasing says roughly that two pointers to different types cannot alias (with an exception for char). The standard elaborates on what ‘different’ means exactly - in all versions since C++11, but quoting specifically in C++17, [basic.lval] item 8:

If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:56
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or
contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.

It seems this is not what happens in clang (godbolt link) :

struct Wrapper1 { long int t; };
struct Wrapper2 { long int t; };
struct S1 { Wrapper1 a; Wrapper2 b; };
struct S2 { Wrapper1 a; Wrapper2 b; };

// Assignment vectorized properly:
void f1(S1& s1, const S2& s2 ) {
    s1.a.t = s2.a.t;
    s1.b.t = s2.b.t;
}
// Assignment not vectorized due to potential aliasing (see opt remarks):
void f2(S1& s1, const S2& s2 ) {
    s1.a = s2.a;
    s1.b = s2.b;
}

Moreover it seems the TBAA source file says explicitly:

// If they [the type trees for the object pair checked for aliasing] have different roots, they’re part of different potentially
// unrelated type systems, so we return Alias to be conservative.

This isn’t a stale comment, as the implementation indeed does that:

  // If the final access types have different roots, they're part of different
  // potentially unrelated type systems, so we must be conservative.
  if (!CommonType) {
    if (GenericTag)
      *GenericTag = nullptr;
    return true;    <----
  }

I’m under the impression that changing this return true to return false unlocks many optimizations, while conforming to the standard. In the code’s own terminology - if the final access types have different roots, they’re part of different potentially unrelated type systems, so the standard allows assuming they do not alias. No need to be conservative.

WDYT?
@jdoerfert @alina

I’m under the impression that changing this return true to return false enables many optimizations, while conforming to the standard. In the code’s own terminology - if the final access types have different roots, they’re part of different potentially unrelated type systems, so the standard allows assuming they do not alias. No need to be conservative.

The existence of unrelated type systems implies that you have code that is being inlined across language boundaries (LTO would be involved). While the standards may be vague on how cross-language stuff is supposed to work, I’m sure users would not be happy if it were impossible to pass a pointer to an object between languages without violating strict aliasing.

The root node here is common:

!36 = !{!"omnipotent char", !37, i64 0}
!37 = !{!"Simple C++ TBAA"}

It looks to me like it doesn’t have the intermediate TBAA nodes in your example, so it just considers both to be storing a long to a location with no additional information about it.

So I misunderstood ‘different type systems’ (thanks @jcranmer) and also ‘root’ (thanks @vjtnash).
Still, this is a very common pattern. For one example, every strong-typedef library wraps primitive types this way - thereby blocking optimizations.
Can this be mitigated? Where were the TBAA MDNodes supposed to be added?

This issue has a possibly simpler manifestation, that occurs often e.g. when using strong-typedef library (or any wrappers to primitive types:

struct Wrapper1 { long int t; };
struct Wrapper2 { long int t; };
struct S { Wrapper1 a; Wrapper2 b; };

// Assignment vectorized properly:
void f1(S& s1, S& s2 ) {
    s1 = s2;
}

// Assignment not optimized due to potential aliasing between a and b
// (see opt remarks):
void f3(S& s1, S& s2) {
    s1.a = s2.a;
    s1.b = s2.b;
}

// Assignment vectorized properly:
void f2(S& s1, S& s2) {
    s1.a.t = s2.a.t;
    s1.b.t = s2.b.t;
}

cf Compiler Explorer

Compiler explorer now has “LLVM opt pipeline viewer” (and should soon be able to display a couple together). Inspecting with it I see that the IR for

void fGood(S1& s1, const S2& s2 ) {
    s1.a.t = s2.a.t;
    s1.b.t = s2.b.t;
}

After PostOrderFunctionAttrsPass, is:

define dso_local void @fGood(S1&, S2 const&)(%struct.S1* nocapture noundef nonnull writeonly align 8 dereferenceable(16) %s1, %struct.S2* nocapture noundef nonnull readonly align 8 dereferenceable(16) %s2) local_unnamed_addr {
entry:
  %t = getelementptr inbounds %struct.S2, %struct.S2* %s2, i64 0, i32 0, i32 0
  %0 = load i64, i64* %t, align 8
  %t2 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i64 0, i32 0, i32 0
  store i64 %0, i64* %t2, align 8
  %t3 = getelementptr inbounds %struct.S2, %struct.S2* %s2, i64 0, i32 1, i32 0
  %1 = load i64, i64* %t3, align 8
  %t5 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i64 0, i32 1, i32 0
  store i64 %1, i64* %t5, align 8
  ret void
}

While the IR for -

void fBad(S1& s1, const S2& s2 ) {
    s1.a = s2.a;
    s1.b = s2.b;
}

after the same pass is different only by naming and ordering:

define dso_local void @fBad(S1&, S2 const&)(%struct.S1* nocapture noundef nonnull writeonly align 8 dereferenceable(16) %s1, %struct.S2* nocapture noundef nonnull readonly align 8 dereferenceable(16) %s2) local_unnamed_addr {
entry:
  %0 = getelementptr inbounds %struct.S2, %struct.S2* %s2, i64 0, i32 0, i32 0
  %1 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i64 0, i32 0, i32 0
  %2 = load i64, i64* %0, align 8
  store i64 %2, i64* %1, align 8
  %3 = getelementptr inbounds %struct.S2, %struct.S2* %s2, i64 0, i32 1, i32 0
  %4 = getelementptr inbounds %struct.S1, %struct.S1* %s1, i64 0, i32 1, i32 0
  %5 = load i64, i64* %3, align 8
  store i64 %5, i64* %4, align 8
  ret void
}

From this pass on the IR for these 2 functions diverges wildly. It seems codegenprepare is somehow very sensitive to the instructions order. Perhaps this can help focus on the cause?
@jdoerfert @alina

Your IR snippets unfortunately are missing the !tbaa metadata which accounts for the difference in codgen. The issue is that the tbaa metadata generated for fBad doesn’t allow determining no-alias (as you said), so SLPVectorization is not happening.

Looking at the TBAA metadata generated by Clang for fBad, I am not yet sure why it is not using disjoint paths for both accesses. Looking…

@fhahn thanks, you’re right of course. I submitted a bug to compiler explorer, hopefully will be solved soon.

@fhann any luck figuring out why TBAA isn’t generated properly for fBad?

Fixing this could mean that strong-typedefs (wrapping some type, say int, with a class - to make the compiler treat it as a different type) would be a viable way to communicate non-aliasing to the compiler. At least in our particular code base I suspect the impact would be measurable.