Code generated for pointer pair -> pointer + size -> pointer pair conversion

Hi,

clang emits the following x64 code for `vector.data() + vector.size()`
(where vector is a std::vector<int32> instance that contains two internal pointers that point to the beginning and the end of an array):

   movq (%rdi), %rcx // rdi is a pointer to the vector
   movq 8(%rdi), %rax
   subq %rcx, %rax
   andq $-4, %rax
   addq %rcx, %rax

Is there any way to tell clang in the vector implementation that the array is aligned, so that it could reduce this code to a simple load
`movq 8(%rdi), %rax`?

This kind of optimization would be helpful for inlined code that converts back and forth between a pointer pair representation and pointer + size representation of an array reference.

- Stephan

Sorry, I should have been more precise here: Is there a way to tell the optimizer that it can safely assume that the difference between the address values of two given pointers is a multiple of the pointee type size? (I had no luck using __builtin_unreachable for this purpose.)

- Stephan

The optimizer already knows this. Here's the IR for a pattern like yours:

  %sub.ptr.lhs.cast = ptrtoint i32* %1 to i64
  %sub.ptr.rhs.cast = ptrtoint i32* %0 to i64
  %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
  %sub.ptr.div = ashr exact i64 %sub.ptr.sub, 2
  %add.ptr = getelementptr inbounds i32* %0, i64 %sub.ptr.div

The "exact" bit on the ashr is a hint that the shift only shifts out zeros (because the pointers are aligned). The getelementptr gets lowered into adds later in the SelectionDAG, but the "exact" bit is lost by then. It has to assume that the value may be unaligned and inserts the andq $-4.

I see two ways to fix this:

1. Teach SelectionDAG about "exact" bits. I don't think this is possible with our current infrastructure.
2. Teach InstCombine how to fuse ashrs and getelementptr instructions. Not sure how tricky this is.

- Ben

Thanks for the explanation!

The case where the size of the pointee type equals the type's alignment is the simplest and most common case. More generally, an optimizer could exploit that the difference of two pointers must be a multiple of the pointee type size (even if the array wasn't aligned), since in C/C++ only pointers to elements of the same array may be subtracted.

Interestingly, GCC doesn't seem to be able to completely optimize away round-trip conversions between the pointer pair and pointer + size representations of an array either.

- Stephan

This is really interesting. It seems we run into a similar problem with:

ArrayRef<int> arrayRefOpt(ArrayRef<int> A) {
  ArrayRef<int> B(A.data(), A.size());
  ArrayRef<int> C(B.begin(), B.end());
  ArrayRef<int> D(C.data(), C.size());
  ArrayRef<int> E(D.begin(), D.end());
  ArrayRef<int> F(E.data(), E.size());
  return D;
}

Which generates:
__Z11arrayRefOptN4llvm8ArrayRefIiEE:
       0: 55 pushq %rbp
       1: 48 89 e5 movq
%rsp, %rbp
       4: 48 8d 14 b5 00 00 00 00 leaq
(,%rsi,4), %rdx
       c: 48 c1 fa 02 sarq $2,
%rdx
      10: 48 89 f8 movq
%rdi, %rax
      13: 5d popq %rbp
      14: c3 ret

Maybe this is bug-report-worthy?

-- Sean Silva

    Hi,

    clang emits the following x64 code for `vector.data() + vector.size()`
    (where vector is a std::vector<int32> instance that contains two
    internal pointers that point to the beginning and the end of an array):

       movq (%rdi), %rcx // rdi is a pointer to the vector
       movq 8(%rdi), %rax
       subq %rcx, %rax
       andq $-4, %rax
       addq %rcx, %rax

    Is there any way to tell clang in the vector implementation that the
    array is aligned, so that it could reduce this code to a simple load
    `movq 8(%rdi), %rax`?

    This kind of optimization would be helpful for inlined code that
    converts back and forth between a pointer pair representation and
    pointer + size representation of an array reference.

This is really interesting. It seems we run into a similar problem with:

ArrayRef<int> arrayRefOpt(ArrayRef<int> A) {
   ArrayRef<int> B(A.data(), A.size());
   ArrayRef<int> C(B.begin(), B.end());
   ArrayRef<int> D(C.data(), C.size());
   ArrayRef<int> E(D.begin(), D.end());
   ArrayRef<int> F(E.data(), E.size());
   return D;
}

Which generates:
__Z11arrayRefOptN4llvm8ArrayRefIiEE:
        0: 55 pushq
%rbp
        1: 48 89 e5 movq
  %rsp, %rbp
        4: 48 8d 14 b5 00 00 00 00 leaq
  (,%rsi,4), %rdx
        c: 48 c1 fa 02 sarq
  $2, %rdx
       10: 48 89 f8 movq
  %rdi, %rax
       13: 5d popq
  %rbp
       14: c3 ret

Yeah, that's the reverse direction, where the optimizer doesn't seem to know that the upper bits of the array length must be 0 since an array length (or any integer added to a pointer) can't be larger than SIZE_MAX/sizeof(T).

Maybe this is bug-report-worthy?

I could submit one, though I'm not sure which product/component to file this under.

- Stephan

I could submit one, though I'm not sure which product/component to file this
under.

Libraries/Scalar optimizations should about right.

Cheers,
Rafael

Thanks! I've submitted http://llvm.org/bugs/show_bug.cgi?id=16408

- Stephan