TBH, no hard data at all. Just my experience poking at passes that cared about this and the advice I received from others when hacking on them. Sorry if anything I wrote came off as some kind of firm, or unwavering position, it was more that I’m not sure the proposed change is good. It might be, but I’m not sure yet. See below for more details though.
Not at all.
All-zero-GEPs aren’t new for better or worse. We’ve been canonicalizing toward them since before Chris refactored all of this code in 2007, so this is a very long-standing pattern even if it is a bad one. And several parts of LLVM of course handle them. They have to given that we’re canonicalizing that direction.
I learned of this years ago when Duncan Sands taught me about it to explain why my SROA rewrite hit so many all-zero GEPs. Since then, it hasn’t seemed to me personally like a significant issue that we canonicalize towards all-zero GEPs. And I’ve not heard many folks hit issues with it since then. So my advice is typically “yep, you need to handle all-zero GEPs”. That’s essentially the default for when a pass doesn’t handle canonical form.
It isn’t an endorsement though, and it doesn’t mean we can’t or shouldn’t change the canonical form! If we have new evidence that makes a change warranted, we absolutely should. Some kinds of evidence I can remember in the past motivating a change:
- It’s really hard and/or awkward to teach things about the canonical form.
- The canonical form doesn’t express as much information or is in any other way a less useful/expressive representation.
- Empirical evidence shows that a different canonical form despite being perfectly equivalent on the prior two points, is less convenient.
The statement from Piotr that it is easy to teach GVN about this seems to indicate we’re not talking about the first two issues, but if we are, that’s a whole new discussion (and an interesting one).
It seems possible that #3 is true (I suspect you think #3 is true from your email at least). While we should in general fix these kinds of issues when we find them, it does seem somewhat less urgent. And since there are conflicting experiences, we probably want at least a comprehensive survey before we make a change.
In this particular case I suspect that we used to have issues related to #2 – SROA before my rewrite relied on GEPs to understand type structures being decomposed. While I think all of the semantic issues here are gone, a number of people in the last couple of years have still insisted that bitcasting pointers blocks optimizations so we should at least investigate this prior to making a change.
But this led me to the last paragraph in my email – if all of this goes away with typeless pointers, it’s not clear that it’s worth pursuing a change for #3. Not saying we definitely shouldn’t, just that we should weigh that against removing types from pointers entirely so that these issues don’t come at all, regardless of how we spell the instruction.
Sorry it seemed orthogonal. All I meant was that handling all zero GEPs might be unusually low cost because of handling all constant GEPs. That only really speaks to #3 above, and it really only lessens the cost. As you say, it can’t eliminate it as a GEP is different from a bitcast and so we may end up handling both.
Well, my intent here was not to say it would help existing passes but only that several existing passes already handled all constant GEPs and the all zero case largely fell out as a consequence.
The example I’m most familiar with is SROA of course. In all but one place it uses code to handle all constant GEPs and doesn’t need to special case all zero GEPs. BasicAA appears to be similar. The vectorizers also seem to have existing code handling constant GEPs that would handle zero GEPs for free.
As for examples of where it would help in a fundamental way? No, I think all of those are gone. It used to help SROA before the rewrite. It also helped DependenceAnalysis before it was rewritten. The first was crippled by relying on this but remaining conservatively correct. The second was actually buggy because it relied on this without remaining conservatively correct. LLVM has been moving away from this being a useful thing to fundamentally rely on.
Examples of where it would help in a trivial way? Any pass that handles all-constant-GEPs, but not bitcasts. We could easily teach such a pass to handle bitcasts though. We could also teach the passes that handle bitcasts to handle all-zero-GEPs. In fact, we’ve automated this for most passes with stripPointerCasts and friends.
I don’t see a fundamental advantage of one over the other, so we’re left with essentially an engineering tradeoff. If we weren’t planning to remove pointer types entirely, this would still be an important engineering tradeoff, but I’m not sure which way it would go. Given that we’re planning to remove pointer types entirely, I’d rather focus on that change rather than changing canonicalization strategies, and patch passes to cope with today’s canonical form until we finish. But that is a fairly mild “rather”. New information could quite easily change my mind. And it is just my two cents of course.
Anyways, again, sorry if my previous email came off as a mandate, I just meant to indicate that the issue was not clear cut to me, not that it was some kind of definite thing one way or the other.