Intrinsics InstrReadMem memory properties

Hello,

According to include/llvm/IR/Intrinsics.td, InstrReadMem property indicates that the intrinsic only reads from and does not write to memory.

Does this mean that it can read anywhere in the memory? Because we already have ‘InstrArgMemOnly’ for intrinsics which only access memory that its argument(s) point(s) to.

If ‘InstrReadMem’ really means read from anywhere in the memory, this should imply that, if there’s an intrinsic having this property after a dead store, the latter should not be eliminated by optimizations?

This is not the current behavior of LLVM though, so it seems that my guesses are wrong… But at least, can someone show me the mistake here?

Thanks for your time,

Son Tuan Vu

Hi Son Tuan Vu,

if not restricted by *writeonly*, *readonly*, or *readnone* (basically), a call can access any object for which the
callee could potentially know the address. That means, if the address of an object cannot be known to the callee,
it cannot access that object. An example is given below. Thus, a dead store can be eliminated if the memory cannot
be read by any subsequent operation. If you think there is a bug, could you provide a reproducer?

Example:

void unknown();
void foo() {
   int *A = malloc(100 * sizeof(A[0]));
   int B[100];
  for (int i = 0; i < 100; i++)
    A[i] = B[i] = i;

  // The addresses/objects A and B are not known to the unknown function and the stores above can be removed.
  unknown();

  free(A);
}

I hope this helps,
  Johannes

Hi Johannes,

Thanks for your reply. I now see more clearly how things work with these properties. However, what would be an object whose address is potentially known by a callee? I suppose the intrinsic arguments and global variable?

So IIUC, if not restricted by *only properties, an intrinsic could access to:

  • only its arguments if IntrArgMemOnly specified,
  • its arguments and the global variable as well if Intr*Mem (other than IntrNoMem) specified.

Please tell me if I’m correct or not!

Thanks again,

You are on the right track. Addresses could get exposed in various ways,
a probably non-exclusive list is:
- passed as arguments
- communicated through a global
- via I/O, or more general, system calls. This includes all forms of
   synchronization, e.g., inter-lane communication.
- transitively passed by any of the means above, e.g., the address of a
   pointer to the object could be exposed.

So if we take the example below and add:
  bar(&A[50]);
just before the call to unknown, we have to assume A is known to unknown
now, at least if we do not have information about bar that would suggest
otherwise.

Ok, now I think I’ve found a bug:

Consider this C code:
void bar(int b) {
int a[10];
memset(a, b, 10);
}

which generates this IR code:
define dso_local void @bar(i32 %b) #0 {
entry:
%b.addr = alloca i32, align 4
%a = alloca [10 x i32], align 16
store i32 %b, i32* %b.addr, align 4
%arraydecay = getelementptr inbounds [10 x i32], [10 x i32]* %a, i64 0, i64 0
%0 = bitcast i32* %arraydecay to i8*
%1 = load i32, i32* %b.addr, align 4
%2 = trunc i32 %1 to i8
call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 %2, i64 10, i1 false)
ret void
}

Now I have a pass that inserts an intrinsic with IntrReadMem into the IR:

define dso_local void @bar(i32 %b) #0 {
entry:
%b.addr = alloca i32, align 4
%a = alloca [10 x i32], align 16
store i32 %b, i32* %b.addr, align 4
%arraydecay = getelementptr inbounds [10 x i32], [10 x i32]* %a, i64 0, i64 0
%0 = bitcast i32* %arraydecay to i8*
%1 = load i32, i32* %b.addr, align 4
%2 = trunc i32 %1 to i8
call void @llvm.memset.p0i8.i64(i8* align 16 %0, i8 %2, i64 10, i1 false)

tail call void @mem_read_test(i8* %0)

ret void
}

; Function Attrs: nounwind readonly
declare void @mem_read_test(i8*) #2

However, the call to memset() still got optimized away by DSE. What am I missing here? Or this is indeed a bug in DSE?

Does the behavior change if you remove the tail from the call to your intrinsic?

I can later look in more detail.

So I removed the ‘tail’ from the call and try out different properties:

  • IntrNoMem: memset() and the intrinsic are both optimized away as expected

  • IntrWriteMem: memset() optimized away by DSE but the intrinsic isn’t. I would expect both to be removed, since the intrinsic is now also a dead store.

  • IntrReadMem: memset() and the intrinsic are both optimized away unexpectedly (CSE removes the intrinsic, then InstCombine removes memset). The latter is understandable, but why the intrinsic gets optimized in the first place?

  • IntrArgMemOnly: none gets optimized away as expected

  • ReadOnly<0>: none gets optimized away as expected

  • ReadNone<0> / WriteOnly<0>: none gets optimized unexpectedly

Am I missing something here or there are indeed bugs here? Btw, can you tell me how and why ‘tail’ changes the optimizer behavior?

Thanks a lot for your explanation!

- IntrWriteMem: memset() optimized away by DSE but the intrinsic isn't. I would expect both to be removed, since the intrinsic is now also a dead store.

IntrWriteMem means the intrinsic could write to memory *anywhere*, not
just based on its argument.

- IntrReadMem: memset() and the intrinsic are both optimized away *unexpectedly* (CSE removes the intrinsic, then InstCombine removes memset). The latter is understandable, but why the intrinsic gets optimized in the first place?

I haven't checked the code, but an intrinsic that only reads memory
(no other side effects) and returns void can't actually accomplish
anything observable.

Am I missing something here or there are indeed bugs here?

It all looks as expected to me.

Btw, can you tell me how and why 'tail' changes the optimizer behavior?

From the LangRef about tail (and musttail): "Both markers imply that

the callee does not access allocas from the caller". That seems
directly applicable to your example.

The reason, of course, is that if a call is actually implemented as a
tail call then the current stack frame is reused for the new callee.
So the lifetime of objects on it has ended and accessing them is just
not possible in a well-defined program.

Cheers.

Tim.

Thanks Tim for your reply.

What about the case where the intrinsic is ReadNone and doesn’t get optimized? Also when it is WriteOnly, memset() does not get DSE’d?

Sorry, I didn't notice those on the end of your list. They're missing
optimizations, but not correctness issues.

I expect it's because all of the earlier cases have equivalent
attributes that apply to normal function calls, so there's a stronger
motivation to optimize them well.

Cheers.

Tim.