Is it valid to dereference a pointer that have undef bits in its offset?

Hello all,

Is it valid to dereference a pointer that has undef bits in its offset?

For example,

%p = alloca [8 x i8]
%p2 = gep %p, (undef & 8)
store 0, %p2

undef & 8 is always less than 8, so technically it will store zero to one of the array’s elements.

The reason is that I want to improve no-undef analysis by suggesting that a pointer that is passed to load/store is well-defined, by making it raise UB when a pointer with undef bits is given.

A suggested patch is here: https://reviews.llvm.org/D87994

I wonder whether there is a case using this to do something that I’m not aware.

Thanks,
Juneyoung

%p2 = gep %p, (undef & 8)
A silly typo: undef & 8 → undef & 7

I think it’s reasonable to expect that IR generated by frontends doesn’t do this.

Not sure about transforms; I can imagine that we might speculate a load without proving all the bits are well-defined.

-Eli

My feeling tells me we should allows this.
No proper justification handy but your example doesn't strike me as UB.

~ Johannes

I think we need to allow this. Otherwise, we have to prove that addresses are non-undef before we can hoist or sink a memory instruction. Today, aliasing can use things like known bits, and if we imposed a no-undef in address requirement, we'd either need to replace such reasoning in AA, or have passes which wish to hoist/sink check the property afterwards.

Or to say it differently, I think it's reasonable for %p2 and %p3 to be provably no alias and dereferenceable, and for %v and %v2 to be safe to speculate.

%p = alloca [16 x i8]
%p2 = gep %p, (undef & 7)
%v = load %p2
%p3 = gep %p, 8
%v2 = load %p3

Keep in mind that the undef doesn't have to be literal and can be arbitrarily obscured (e.g. behind a function call). The alternative interpretation is extremely limiting.

Philip

To be fair, if the address has to be `noundef` the example would just be UB. That said, I still believe it "is not".

Thank you for the infos; it seems making it raise UB is problematic.

Would clarifying it in LangRef be good? I can update the patch to contain the information instead.

Another concern is then, how can we efficiently encode an assumption that a pointer variable in IR does not have undef bits?

Certainly, in the front-end language, (most of) pointers won’t have undef bits, and it would be great if the information is still available in IR.
A pointer argument can be encoded using noundef, but, e.g., for a pointer that is loaded from memory, such information disappears.
I think this information is helpful reducing the cost of fixing existing undef/poison-related optimizations, because we can conclude that we don’t need to insert freeze in more cases.

Juneyoung

Thank you for the infos; it seems making it raise UB is problematic.

Would clarifying it in LangRef be good? I can update the patch to contain
the information instead.

Yes, please.

Another concern is then, how can we efficiently encode an assumption that a
pointer variable in IR does not have undef bits?
Certainly, in the front-end language, (most of) pointers won't have undef
bits, and it would be great if the information is still available in IR.
A pointer argument can be encoded using noundef, but, e.g., for a pointer
that is loaded from memory, such information disappears.
I think this information is helpful reducing the cost of fixing existing
undef/poison-related optimizations, because we can conclude that we don't
need to insert freeze in more cases.

I thought we solved that already:

 \`call void llvm\.assume\(i1 true\) \["noundef"\(type\* %ptr\), "noundef"\(type2\* %ptr2\)\]\`

See http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Is that enough for your needs?

~ Johannes

`call void llvm.assume(i1 true) [“noundef”(type* %ptr),

“noundef”(type2* %ptr2)]`

Maybe I can try this first, thanks.