GVN removing loads that are affected by call

So given an intrinsic that has a pointer as in/out and IntrWriteMem property.

call intrinsic(address a, …);
loop over address a
load from address a + offset
call intrinsic (address a, …);
loop over address a
load from address a + offset

GVN is removing the second loads, despite the second call overwriting the memory starting at address a. AA has the intrinsics marked as unknown instructions but has all of these as mayAlias in a set. I’m not seeing this issue with -fno-unroll-loops.

Thanks.

It would be good to have an actual IR reproducer here.

This is right before GVN:

define i32 @foo(<4 x i16> %p, <4 x i16> %p1, i16* nocapture %res) local_unnamed_addr #0 !dbg !6 {
entry:
%temp = alloca i64, align 8
%tmpcast = bitcast i64* %temp to [4 x i16]*
%0 = bitcast i64* %temp to i8*, !dbg !8
call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0) #3, !dbg !8
store i64 0, i64* %temp, align 8, !dbg !9
%1 = bitcast i64* %temp to <4 x i16>, !dbg !10
%2 = call <4 x i16>
@llvm.XXX.intrinsic(<4 x i16>* nonnull %1, <4 x i16> %p, i32 0), !dbg !11, !tbaa !12
%arrayidx = bitcast i64* %temp to i16*, !dbg !16
%3 = load i16, i16* %arrayidx, align 8, !dbg !16, !tbaa !17
br label %for.body, !dbg !19

for.body: ; preds = %entry
%arrayidx1 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 1, !dbg !20
%4 = load i16, i16* %arrayidx1, align 2, !dbg !20, !tbaa !17
%cmp3 = icmp sgt i16 %3, %4, !dbg !21
%spec.select = select i1 %cmp3, i16 %4, i16 %3, !dbg !22
%arrayidx1.1 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 2, !dbg !20
%5 = load i16, i16* %arrayidx1.1, align 2, !dbg !20, !tbaa !17
%cmp3.1 = icmp sgt i16 %spec.select, %5, !dbg !21
%spec.select.1 = select i1 %cmp3.1, i16 %5, i16 %spec.select, !dbg !22
%arrayidx1.2 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 3, !dbg !20
%6 = load i16, i16* %arrayidx1.2, align 2, !dbg !20, !tbaa !17
%cmp3.2 = icmp sgt i16 %spec.select.1, %6, !dbg !21
%spec.select.2 = select i1 %cmp3.2, i16 %6, i16 %spec.select.1, !dbg !22
store i16 %spec.select.2, i16* %res, align 2, !dbg !23, !tbaa !17
%7 = tail call <4 x i16>* @llvm.XXX.intrinsic(<4 x i16>* %2, <4 x i16> %p1, i32 0), !dbg !24, !tbaa !12
%8 = load i16, i16* %arrayidx, align 8, !dbg !25, !tbaa !17
br label %for.body12, !dbg !26

for.body12: ; preds = %for.body
%arrayidx14 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 1, !dbg !27
%9 = load i16, i16* %arrayidx14, align 2, !dbg !27, !tbaa !17
%cmp16 = icmp sgt i16 %8, %9, !dbg !28
%spec.select39 = select i1 %cmp16, i16 %9, i16 %8, !dbg !29
%arrayidx14.1 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 2, !dbg !27
%10 = load i16, i16* %arrayidx14.1, align 2, !dbg !27, !tbaa !17
%cmp16.1 = icmp sgt i16 %spec.select39, %10, !dbg !28
%spec.select39.1 = select i1 %cmp16.1, i16 %10, i16 %spec.select39, !dbg !29
%arrayidx14.2 = getelementptr inbounds [4 x i16], [4 x i16]* %tmpcast, i32 0, i32 3, !dbg !27
%11 = load i16, i16* %arrayidx14.2, align 2, !dbg !27, !tbaa !17
%cmp16.2 = icmp sgt i16 %spec.select39.1, %11, !dbg !28
%spec.select39.2 = select i1 %cmp16.2, i16 %11, i16 %spec.select39.1, !dbg !29
%conv24 = sext i16 %spec.select39.2 to i32, !dbg !30
call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3, !dbg !31
ret i32 %conv24, !dbg !32

There is still not enough information here.

My first guess. The `!tbaa` annotation on the `XXX.intrinsic` and the `load`
basically encode there is no alias. Easy to verify, remove the ones on the
intrinsic.

~ Johannes

P.S. If this was a bug in GVN, and I assume it is not, a reproducer would help
a lot. So a small IR sample that shows the problem and which we can run.
This is a "redacted?" IR fragment in which I don't know what transformation
is problematic. I also can not run it through GVN, which makes it impossible
to reproduce.

Yes, this is for downstream/out of tree target so I’m not sure how you could reproduce it either but I thought the IR might help a bit.

My guess is it’s not a bug in GVN as much as an issue with the intrinsic properties, or lack thereof. I put this in the first post but the Alias Sets show:

AliasSet[0x75fbd0, 9] may alias, Mod/Ref Pointers: (i8* %1, 8), (i16* %arrayidx, 2), (i16* %arrayidx1, 2), (i16* %arrayidx5, 2), (i16* %19, 2), (i16* %arrayidx6, 2), (i16* %arrayidx14, 2), (i16* %arrayidx19, 2)
4 Unknown instructions: void , <4 x i16>* %6, <4 x i16>* %22, void

Can you share the attributes of the intrinsic declaration,
assuming removing `!tbaa` didn't help.

argmemonly, nounwind, writeonly

-fno-strict-aliasing did not help.

Nothing comes to mind given the information.

~ Johannes

Ok, thanks. I’ll try to come up with a more generic test case.

So this seems to be an issue with the intrinsic setting NoCapture<0> (it returns the pointer it takes). We have a similar intrinsic that does the exact same thing but doesn’t return the pointer. It also has NoCapture<1> and doesn’t cause this issue. Also, I managed to substitute memcpy and there were no issues, it also has nocapture.

Are there some papers/citations which talk more about Pointer Capture/Capture Tracking?

I saw this blog: https://jonasdevlieghere.com/escape-analysis-capture-tracking-in-llvm/
And I read the CaptureTracking.h short description but I’d like a clearer grasp.

Thanks,

Ryan

A function that returns a pointer argument "captures" it for now.

The Attributor uses "nocapute_maybe_returned" internally but it is not
an enum attribute (yet).

I’m still a bit fuzzy on the definition but that doesn’t seem to completely agree with pointer capture?

" A pointer value is captured if the function makes a copy of any part of the pointer that outlives the call. "

I can replace memcpy in our example and see no issues. memcpy is no capture.

#define SIZE 5
#include
int foo(short a[SIZE], short b[SIZE], short *res) {
short temp[SIZE] = {0, 0, 0, 0, 0};
short ptr;
ptr = temp;
memcpy(ptr, b, SIZE
2);
short t = temp[0];
for (int i = 1; i < 4; i++) {
if (t > temp[i]) t = temp[i];
}
res = t;
memcpy(ptr, b, SIZE
2);
t = temp[0];
for (int i = 1; i < 4; i++) {
if (t > temp[i]) t = temp[i];
}
return t;
}

So you’re saying that if the memcpy intrinsic, in this case, returned a pointer (even if not directly used, ie “ptr” in this case) and had NoCapture attribute, the second set of loads would be removed?

-Ryan

Your example/trouble starts with an "identified function local" (for short, IFL) pointer
(see llvm::isIdentifiedFunctionLocal), i.a., as it falls out of `malloc`, `alloca`, or
a `__restrict` qualified argument.

Since no access on a pointer not derived from the IFL one can cause the underlying memory to be altered,
we can ignore other accesses *iff* we can show they don't use a pointer that is "based-on" the IFL one.
The easiest way to prove it is not based on the IFL one is to show the IFL pointer does not escape since
then every based-on use can be identified by def-use chains. A call argument that is nocapture guarantees
no copy of the pointer is made that outlives the call. If you return the pointer, a copy is made and you
broke the contract. We consequently "ignore" accesses through that pointer when we argue about the IFL one
as they cannot alias.

Since we often return a pointer argument, the Attributor uses "no-capture-maybe-returned"
which we recently discussed to propose as a new enum attribute. You could then mark the argument as such
and capture + use traversals could continue with the intrinsic value to determine all accesses to the IFL.

I hope this makes it somewhat clearer.

WRT. memcpy, it does not copy the pointers you put it but the memory they point to. The address of the memory
did not escape/was not captured.

~ Johannes

Johannes,

Thanks, this makes it a bit more clear, yes. I was a bit confused if it was copying the pointer or the data being pointed to. It looks like you are proposing the new attribute to fill this hole that the compiler is not currently able to tell that this pointer, though “escaped” is not used again and is therefore functionally the same as not captured?

Do you have a phab for the recently proposed no-capture-maybe-returned attribute?

Thanks.

Johannes,

   Thanks, this makes it a bit more clear, yes. I was a bit confused if it
was copying the pointer or the data being pointed to. It looks like you are
proposing the new attribute to fill this hole that the compiler is not
currently able to tell that this pointer, though "escaped" is not used
again and is therefore functionally the same as not captured?

The pointer is not "captured" in the general sense but it is potentially
returned. That allows you to keep the "nocapture" property if you follow
the return value uses as if they might be the pointer itself.

   Do you have a phab for the recently proposed no-capture-maybe-returned
attribute?

No and I don't have code that makes use of it outside the Attributor.

@Juneyoung expressed interest in the attribute recently. He, or maybe
you, could write the RFC, add the enum attribute pluming, and then uses
in transformations. We could use the Attributor to do some preliminary
testing but it would be best to play around with the usage of this new
idea in other transformations as well.

~ Johannes

Yep, let me work on adding the new attribute.

Juneyoung

Juneyoung,

Are you currently working on this? If not, please let me know and I can try to work something up.

Thanks,

Ryan

Nope, I was occupied by something else recently.
Sure, feel free to work on it. :slight_smile:

Juneyoung