Aliasing rules difference between GCC and Clang

Hi,

It was a while now since I first asked about TBAA, and I am now getting back to this, trying to understand TBAA as it seems that the issue I described earlier originally in this thread is still one of many "TODO"s in CGExpr.cpp (http://lists.llvm.org/pipermail/llvm-dev/2018-September/126290.html).

I would like to help on this to the best of my abilities, but I'm not quite sure if there is a current plan of incremental changes, or if it would be possible to handle the array member of a struct case specifically at this point?

I am tempted to believe that the extension to handle my particular case is not too far-fetched, as Eli indicated earlier:

This program:

struct A { int i; };
struct B { int i; };

void h(struct A *a, struct B *b, int *prim_i) {
a->i = 0;
b->i = 0;
*prim_i = 0;
}

translates to

%i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0
store i32 0, i32* %i, align 4, !tbaa !2
%i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0
store i32 0, i32* %i1, align 4, !tbaa !7
store i32 0, i32* %prim_i, align 4, !tbaa !9

with this type DAG:

         \!2 = \!\{\!3, \!4, i64 0\}
      \!3 = \!\{\!"A", \!4, i64 0\}
   \!4 = \!\{\!"int", \!5, i64 0\}
\!5 = \!\{\!"omnipotent char", \!6, i64 0\}

!6 = !{!"Simple C/C++ TBAA"}
!7 = !{!8, !4, i64 0}
!8 = !{!"B", !4, i64 0}
!9 = !{!4, !4, i64 0}

Just as one would hope for, TBAA produces a NoAlias result between the stores to %A and %B:

NoAlias: store i32 0, i32* %i1, align 4, !tbaa !7 <-> store i32 0, i32* %i, align 4, !tbaa !2
MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 0, i32* %i, align 4, !tbaa !2
MayAlias: store i32 0, i32* %prim_i, align 4, !tbaa !9 <-> store i32 0, i32* %i1, align 4, !tbaa !7

The above indicates that TBAA is capable of handling structs with scalar members, but if I change the program to look more like the benchmark, like:

struct A { int i[2]; };
struct B { int i[2]; };

void h(struct A *a, struct B *b, int *prim_i) {
a->i[0] = 0;
b->i[0] = 0;
*prim_i = 0;
}

, which translates to

%i = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0
%arrayidx = getelementptr inbounds [2 x i32], [2 x i32]* %i, i64 0, i64 0
store i32 0, i32* %arrayidx, align 4, !tbaa !2
%i1 = getelementptr inbounds %struct.B, %struct.B* %b, i32 0, i32 0
%arrayidx2 = getelementptr inbounds [2 x i32], [2 x i32]* %i1, i64 0, i64 0
store i32 0, i32* %arrayidx2, align 4, !tbaa !2
store i32 0, i32* %prim_i, align 4, !tbaa !2

, the type DAG is generated as

      \!2 = \!\{\!3, \!3, i64 0\}
   \!3 = \!\{\!&quot;int&quot;, \!4, i64 0\}
\!4 = \!\{\!&quot;omnipotent char&quot;, \!5, i64 0\}

!5 = !{!"Simple C/C++ TBAA"}

This makes all the stores alias, which is overly conservative, as %struct.A and %struct.B do not alias.

It seems that when the struct member is an array, the base type becomes the element type. This is simply unhandled still in CodeGenTBAA ("TODO"). My question then is if it would be possible to instead create the DAG nodes !"A" and !"B" (as in the first type DAG above) and use them as the base types and then have the access type to be !"int", with the right offset into the struct node?

A struct like

struct S { int i[2]; long l[2]; };

would then be represented with a base node, something like

      \!3 = \!\{\!&quot;S&quot;, \!4, i64 0, \!6, 8\}
   \!4 = \!\{\!&quot;int&quot;, \!5, i64 0\}
\!5 = \!\{\!&quot;omnipotent char&quot;, \!6, i64 0\}
   \!6 = \!\{\!&quot;long&quot;, \!5, i64 0\}

, giving !"int" as base at offset 0 and !"long" as base at offset 8 (given that i[2] is 8 bytes).

An access like s->i[0], would then get a node like

!2 = !{!3, %4, i64 0}

, and this would give a NoAlias above, right? Did I understand this correctly, or did I miss anything relevant?

What method(s) would need to be modified to handle this? Is it feasible to do this now, or are there other planned steps that somebody is working on?

Would this improvement require the NewStructPathTBAA flag to be on, and if so, what is the outlook on this?

@Ivan: For what I can see, you had a list of planned improvements. What are your thoughts on this?

Thanks for any help,

Jonas

Hello, Jonas,

> It seems that when the struct member is an array, the base type
> becomes the element type. This is simply unhandled still in
> CodeGenTBAA ("TODO"). My question then is if it would be
> possible to instead create the DAG nodes !"A" and !"B" (as in
> the first type DAG above) and use them as the base types and
> then have the access type to be !"int", with the right offset
> into the struct node?

Correct, the current default TBAA info is not capable of representing accesses to (ranges of) array elements, hence the decay to element-type access.

Still, for elements indexed with constants. such as a->i[0] in your example, it is indeed possible to generate proper TBAA tags. Such tags would be no different from tags generated for accesses to non-array fields that reside at the same offset within the same base type. If there are real life cases where adding the support for constant-indexed accesses would make a difference, then I think that would be an improvement.

> struct S { int i[2]; long l[2]; };
> ...
> An access like s->i[0], would then get a node like
> and this would give a NoAlias above, right? Did I understand
> this correctly, or did I miss anything relevant?

Yes, TBAA assumes that accesses to int's and long's do not alias.

> Would this improvement require the NewStructPathTBAA flag to be
> on, and if so, what is the outlook on this?

It depends on whether handling of constant-indexed accesses would be enough for your use cases.

One of the goals behind the new TBAA is to be able to accurately represent various kinds accesses to subobjects. This includes non-constant-indexed accesses to array elements. More info on how such accesses are supposed to be represented can be found here:

https://reviews.llvm.org/D40975

If that's what you need, then the new TBAA info seems to be the most promising option. But please note that it's still a work in progress and is not mature enough for production use.

Hope this helps.

Regards,

Hi Ivan,

Hello, Jonas,

> It seems that when the struct member is an array, the base type
> becomes the element type. This is simply unhandled still in
> CodeGenTBAA ("TODO"). My question then is if it would be
> possible to instead create the DAG nodes !"A" and !"B" (as in
> the first type DAG above) and use them as the base types and
> then have the access type to be !"int", with the right offset
> into the struct node?

Correct, the current default TBAA info is not capable of representing accesses to (ranges of) array elements, hence the decay to element-type access.

Still, for elements indexed with constants. such as a->i[0] in your example, it is indeed possible to generate proper TBAA tags. Such tags would be no different from tags generated for accesses to non-array fields that reside at the same offset within the same base type. If there are real life cases where adding the support for constant-indexed accesses would make a difference, then I think that would be an improvement.

> struct S { int i[2]; long l[2]; };
> ...
> An access like s->i[0], would then get a node like
> and this would give a NoAlias above, right? Did I understand
> this correctly, or did I miss anything relevant?

Yes, TBAA assumes that accesses to int's and long's do not alias.

What I was actually hoping for was that by keeping the base type ('S') in the type DAG instead of just 'int', two accesses into two different access (struct) types both accessing an 'int' element would be 'NoAlias' by TBAA.

I don't think handling just constant indexes would help (me). I was hoping that perhaps we could keep the current TBAA algorithm and just extend it by also adding the struct base type in the type DAG. So instead of letting an array (member) access become just 'int', it could become 'A' -> 'int', which would not alias with 'B' -> 'int'.

Would this be a possible incremental improvement while working towards the new TBAA in your opinion? To me, it seems like a first good step to just handle different struct types, which does not require any size awareness. I am not sure however how simple this would be, and I would not want it to get in your way with the new TBAA.

/Jonas

Hi, Jonas,

Hi Ivan,

Hello, Jonas,

> It seems that when the struct member is an array, the base type
> becomes the element type. This is simply unhandled still in
> CodeGenTBAA ("TODO"). My question then is if it would be
> possible to instead create the DAG nodes !"A" and !"B" (as in
> the first type DAG above) and use them as the base types and
> then have the access type to be !"int", with the right offset
> into the struct node?

Correct, the current default TBAA info is not capable of representing accesses to (ranges of) array elements, hence the decay to element-type access.

Still, for elements indexed with constants. such as a->i[0] in your example, it is indeed possible to generate proper TBAA tags. Such tags would be no different from tags generated for accesses to non-array fields that reside at the same offset within the same base type. If there are real life cases where adding the support for constant-indexed accesses would make a difference, then I think that would be an improvement.

> struct S { int i[2]; long l[2]; };
> ...
> An access like s->i[0], would then get a node like
> and this would give a NoAlias above, right? Did I understand
> this correctly, or did I miss anything relevant?

Yes, TBAA assumes that accesses to int's and long's do not alias.

What I was actually hoping for was that by keeping the base type ('S') in the type DAG instead of just 'int', two accesses into two different access (struct) types both accessing an 'int' element would be 'NoAlias' by TBAA.

It should result in NoAlias, provided the access types or offsets are different, yes.

I don't think handling just constant indexes would help (me). I was hoping that perhaps we could keep the current TBAA algorithm and just extend it by also adding the struct base type in the type DAG. So instead of letting an array (member) access become just 'int', it could become 'A' -> 'int', which would not alias with 'B' -> 'int'.

IIRC, there were proposals/attempts to represent accesses to array elements as accesses to their first elements, which can technically be encoded with the current TBAA format and thus may work as an incremental improvement on top of the existing TBAA machinery you are looking for. But this may need making sure there will be no regressions for some tricky cases like those that involve GCC's type punning and changing effective types within unions and in dynamic memory.

Would this be a possible incremental improvement while working towards the new TBAA in your opinion? To me, it seems like a first good step to just handle different struct types, which does not require any size awareness. I am not sure however how simple this would be, and I would not want it to get in your way with the new TBAA.

/Jonas

Regards,

Hi Ivan,

Hi, Jonas,

Sorry for the late answer.

As far as I'm aware, those proposals didn't result in any patches published. Here's the original message discussing the approach:

[llvm-dev] RFC: Representing unions in TBAA
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110082.html

Despite its title, it also suggests changes supposed to address the issue with representation of array-element accesses.

As to my own patches pending publication, they are all for the new TBAA format, which you said would be of no help in your case.

Regards,

Hi Ivan,

As to my own patches pending publication, they are all for the new TBAA format, which you said would be of no help in your case.

I actually thought they would help, but merely suggested an intermediate step while waiting for those further improvements you are working on. I would be happy to try your patches and evaluate if it helps my test case...

thanks

/Jonas

Hi, Jonas,

I see. In that case I think you could just give the new TBAA a try by adding the cc1's -new-struct-path-tbaa option and seeing if it does what you need. The support for member arrays is already there, see /tools/clang/test/CodeGen/tbaa-array.cpp . That would also give us some more testing (which we need very much), and then resolve issues as we go. The previously mentioned local patches pending publishing are actually supposed to just fix some special cases. IIRC, the only major part we are still missing is the proper support for unions. An attempt to support them can be seen at <https://reviews.llvm.org/D39455&gt;, but turned out it doesn't implement exactly what we are after with TBAA and unions, so this needs some more work.

Regards,

Hi Ivan,

I see. In that case I think you could just give the new TBAA a try by adding the cc1's -new-struct-path-tbaa option and seeing if it does what you need.

Unfortunately, it seems that the type DAG does not get any new base-nodes for the struct types. The only difference I can see is that there seems to be a size-field in the TBAA nodes as well now. So it seems that the two struct types still are may-alias...

I supposed I need to pass -new-struct-path-tbaa to all the -cc1 invocations, so I first ran clang with -save-temps -v, and then reran all of them while adding the option as well. Did I do something wrong or miss anything?

/Jonas

Program:

typedef struct {
double c[3][3];
} MATRIX_TY;

typedef struct {
double c[3];
} VECTOR_TY;

double e = 0.0;
MATRIX_TY *s;
VECTOR_TY *v;
int g = 0;
void h() {
int i = e;
s->c[0][i] = g;
v->c[0] = g;
g = e;
}

Current/normal type DAG:

!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}
!6 = !{!7, !7, i64 0}
!7 = !{!"int", !4, i64 0}
!8 = !{!9, !9, i64 0}
!9 = !{!"any pointer", !4, i64 0}

Extended/size aware type DAG (with -new-struct-path-tbaa):

!2 = !{!3, !3, i64 0, i64 8}
!3 = !{!4, i64 8, !"double"}
!4 = !{!5, i64 1, !"omnipotent char"}
!5 = !{!"Simple C/C++ TBAA"}
!6 = !{!7, !7, i64 0, i64 4}
!7 = !{!4, i64 4, !"int"}
!8 = !{!9, !9, i64 0, i64 8}
!9 = !{!4, i64 8, !"any pointer"}

Ah, you are right. Apologies, I was under impression that the patch enabling the support for member arrays is already there, but apparently it's not. OK, I'm going to re-visit things and see how much work it is, and then will eventually put the changes on review. Will let you know. Thanks and sorry for the confusion.