Hi Daniel,
I see your point about LLVM and C/C++ type agnostic. I think TBAA was
invented to partially cover this gap and give optimization opportunities
when LLVM types are not sufficient but C/C++ types have required
information.
What do you think about following example:
struct S {
int a[10];
int b;
};
int foo(struct S *ps, int i) {
ps->a[i] = 1;
ps->b = 2;
return ps->a[0];
}
define i32 @foo(%struct.S* nocapture %ps, i32 %i) #0 {
entry:
%idxprom = sext i32 %i to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32
0, i64 %idxprom
store i32 1, i32* %arrayidx, align 4, !tbaa !1
%b = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0, i32 1
store i32 2, i32* %b, align 4, !tbaa !5
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %ps, i64 0,
i32 0, i64 0
%0 = load i32, i32* %arrayidx2, align 4, !tbaa !1
I'm not entirely sure why TBAA is necessary to disambiguate ps->a from
ps->b, it looks like basicaa should already be able to say they don't
overlap.
Does this not happen?
ret i32 %0
}
!1 = !{!2, !2, i64 0}
!2 = !{!"int", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
!5 = !{!6, !2, i64 40}
!6 = !{!"S", !3, i64 0, !2, i64 40}
Missing information here is the range inside struct S that could be
accessed.
What do you mean by "could be accessed". Do you mean "valid to access in
C"?
Also as you can see array member of struct in TBAA is presented as
omnipotent char not as an array of int.
Agreed.
Arrays in struct in TBAA can be represented something like this:
!6 = !{!"S", !7, i64 0, !2, i64 40}
!7 = !{!"<unique id of int[10]>", !2, i64 0}
And 'ps->a[i]' could have TBAA like this:
!8 = !{!6, !7, i64 0}
Yes. This should likely work. Note that size, while nice, is harder.
One thing that is sadly still common (at least in C) is to do this:
struct S {
int b;
int a[0]; // or 1
};
and malloc it at (sizeof S + 40 * sizeof (int)), then write into a[1...39].
If we want to break that, it is likely a lot of stuff gets broken (at one
point when we did it in gcc, we broke 80% of all the packages in a given
linux distro ....)
As far as I can see if struct is enclosed in another struct, information
about inner struct get lost only offset present. But I think for arrays it
is better to keep array type in TBAA for the struct and element accesses.
Don't get me wrong, i think that it would be nice to have offset and size,
and gcc does indeed track this info on it's own.
I'm just trying to understand where you think it will provide better info.
Because once you get into cases like:
struct S {
int a[10];
int b;
};
int foo(struct S *ps, int *i) {
ps->a[i] = 1;
*i = 3;
return ps->b;
}
You have no guarantee, for example, that *i and *(ps->b) are not the same
memory.