Hi,
Consider this code:
std::vector v;
v.resize(256);
for (i = 0; i < … ; ++i) {
a += v[b[i]];
}
This is a gather loop, and should be able to be vectorized. however…
I as a programmer can see that the size of v.data() is at least 256. I know because of the contract of std::vector that v.data() is a unique heap object that doesn’t alias anything.
However, LLVM knows none of this. Only if I force-inline std::vector::__append and friends does LLVM actually see the operator new(256) call - without that LLVM has no idea of the underlying storage of v, or of its size.
Now, the vectorizer can emit runtime pointer checks, but one thing it absolutely requires is knowledge of the maximum size of a pointed-to object so it can do range intersection analysis. Without this information, it can’t even emit a runtime pointer check.
So this RFC is about what exactly is going wrong here. I don’t understand quite how we intend LLVM to gain this information - are we missing some intrinsics or abstractions? Or is the inliner just doing a poor job?
I can’t imagine that in the common case inlining all the way to operator new() would be beneficial - it’s only beneficial in this case because the object is newly constructed so all of the branches in __append can be folded away when inlining.
Any information welcome 
Cheers,
James
However, LLVM knows none of this. Only if I force-inline
std::vector::__append and friends does LLVM actually see the operator
new(256) call - without that LLVM has no idea of the underlying storage of
v, or of its size.
This looks like an inlining issue...
Now, the vectorizer can emit runtime pointer checks, but one thing it
absolutely requires is knowledge of the maximum size of a pointed-to object
so it can do range intersection analysis. Without this information, it can't
even emit a runtime pointer check.
Yup.
So this RFC is about what exactly is going wrong here. I don't understand
quite how we intend LLVM to gain this information - are we missing some
intrinsics or abstractions? Or is the inliner just doing a poor job?
I wouldn't say "poor job", as inlining heuristics are complicated, but
I think that's one of the corner cases, yes.
I'm guessing the levels of indirection in this case are high enough
that the inliner gives up? I can't imagine resize being big enough to
pass the heuristics threshold, even after inlining everything.
An alternative would be inter-procedural analysis, but that's a big
monster in itself, and would require the target function (resize) to
have been totally inlined anyway, which is one step away from the
final inlining.
I can't imagine that in the common case inlining all the way to operator
new() would be beneficial - it's only beneficial in this case because the
object is newly constructed so all of the branches in __append can be folded
away when inlining.
Isn't this a case for LTO inlining / specialization?
cheers,
--renato
I'm curious about what you have in mind exactly? What extra-information are available at LTO-time in this case compare to a non-LTO compilation?
Right, that was more of a question really...
My reasoning is that, if LTO generated different versions of resize(),
some with constant folding at link time (due to the types of
arguments, ex. resize(256)), you could end up with a deeper inlining
and maybe get to a point where resize() is inlined into the caller for
that one particular case. This would expose the call to new(), and
therefore help the alias analysis to see that the pointers don't
alias.
That would be a poor man's version of compile-time inter-procedural
analysis, but at link time, if that makes any sense...
cheers,
--renato
Oh it reminds me of a heuristic I wanted to explore: add an inlining bonus when all the arguments to the function are constant at the call site (independently of LTO or not).
From: "James Molloy via llvm-dev" <llvm-dev@lists.llvm.org>
To: "LLVM Dev" <llvm-dev@lists.llvm.org>, "Chandler Carruth"
<chandlerc@google.com>
Sent: Friday, April 1, 2016 2:56:25 AM
Subject: [llvm-dev] RFC: std::vector and identified objects
Hi,
Consider this code:
std::vector v;
v.resize(256);
for (i = 0; i < ... ; ++i) {
a += v[b[i]];
}
This is a gather loop, and should be able to be vectorized. however
...
I as a programmer can see that the size of v.data() is at least 256.
I know because of the contract of std::vector that v.data() is a
unique heap object that doesn't alias anything.
However, LLVM knows none of this. Only if I force-inline
std::vector::__append and friends does LLVM actually see the
operator new(256) call - without that LLVM has no idea of the
underlying storage of v, or of its size.
Now, the vectorizer can emit runtime pointer checks, but one thing it
absolutely requires is knowledge of the maximum size of a pointed-to
object so it can do range intersection analysis. Without this
information, it can't even emit a runtime pointer check.
So this RFC is about what exactly is going wrong here. I don't
understand quite how we intend LLVM to gain this information - are
we missing some intrinsics or abstractions?
FWIW, this is one of the primary motivators for http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3988.pdf (which I need to get back to working on soon). The size is also important information (but likely not as important as the aliasing in this case - it is more important if you have conditional accesses and lack hardware predicated loads/stores).
In this case, where the object is local, I can imagine some smarter IPA (function attributes or otherwise) helping. I think that, in general, we'll probably need to provide some attributes that can be used to adorn the std::vector implementation, translated into some associated intrinsics and/or metadata, that will allow the backend to compute these various properties.
Or is the inliner just doing a poor job?
Perhaps this too 
-Hal
Hi Hal,
FWIW, this is one of the primary motivators for http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3988.pdf (which I need to get back to working on soon). The size is also important information (but likely not as important as the aliasing in this case - it is more important if you have conditional accesses and lack hardware predicated loads/stores).
In this case, where the object is local, I can imagine some smarter IPA (function attributes or otherwise) helping. I think that, in general, we’ll probably need to provide some attributes that can be used to adorn the std::vector implementation, translated into some associated intrinsics and/or metadata, that will allow the backend to compute these various properties.
I like that proposal. It’s dated late 2014 - is it still making progress?
Would it be appropriate to prototype these attributes in clang and teach libc++ about them? That way we could optimize such idioms better even before this extension gets accepted into C++.
I’m just trying to scope out the work required before I can get this workload to be optimized properly (do I “just” have to implement an extension in clang and the appropriate LLVM extensions/metadata, or do I have to wait until that proposal proceeds further?)
Cheers,
James