Type based GEP plans

Hi @nikic , @sebpop, @kasuga-fj ,

I wanted to discuss this:

The goal is to move away from the type-based/structural getelementptr (GEP) instruction, to a ptradd instruction, which does no more and no less than adding an offset to a pointer.

as Nikita wrote in his blogpost. The context I wanted to bring this up is loop-interchange and DependenceAnalysis, and we have discussed this a bit there already on those merge requests (for example here), but thought a separate thread might help to separating things out. More specifically, I am interested if we can come up with a short term and long term plan for this and if something along those lines would be acceptable or not.

Short-term

We would really like to get interchange enabled, but are now blocked on a corner case. In a nutshell the problem is: type punning in C/C++ results in valid IR that can confuse DependenceAnalysis in some cases. We could put a bandaid on it and fix it, but only if we are allowed to look at GEP->getSourceElementType(). There is already one occurence of this in the delinearizer that you would like to see go away because of the GEP type-based direction that you are on, and now I am kind of asking if we can add one more. But again, I am really hoping that is a stopgap while we work on the longer term solution.

Longer-term

The type-based GEPs don’t provide any semantical information, except for delinearization. One way to look at this as follows: the type information that is going to disappear from the GEPs, will have to be reintroduced in another way. That’s what we are working on, but is a longer term thing. I.e., we want to emit type information in the front-end and picked up by the optimiser. For example, related but not the same thing: bound information for statically declared arrays, and assumes for accesses. We hope to have an RFC soon and a talk about this at this upcoming llvm dev conference. But the point is that this needs community review and discussions, and is going to take months (and quite a few I think). This means we would be blocked for all that time.

That’s why I am interested in exploring if a short term stopgap and longer term commitment to address the issue properly is a possibility. Please let me know what you think, if you have other ideas, or if I missed something.

The stop-gap solution would be to remove type based delinearization entirely, without replacement.

The less stop-gap solution is @kasuga-fj work to implement fixed size delinearization in a more principled way (the work started at [Delinearization] Add function for fixed size array without relying on GEP by kasuga-fj · Pull Request #145050 · llvm/llvm-project · GitHub).

The long-term direction would be to combine that work with additional frontend-provided information about object bounds.

I started this with the intention of providing a short-term solution. I believe this function requires some improvements, but I don’t see any solution other than entirely eliminating the current GEP-dependent heuristics if we want to enable it by default. (I’d greatly appreciate it if you could point me to someone with expertise in this area and can review my PR.)

Additionally, I’d like to add a few comments (partly to confirm my understanding). Just to note, my understanding is mostly limited to what’s documented in The Often Misunderstood GEP Instruction — LLVM 21.0.0git documentation.

I’ve pointed out several corner cases, but these represent only a subset of cases where DA yields incorrect results. There are likely more, so addressing only these specific cases isn’t sufficient (in fact, I believe the only viable solution is to remove the GEP-driven delinearization completely).

Also, as far as I understand, these cases are not solely caused by type punning in C/C++. GEP is merely an instruction for address computation, and discrepancies between the GEP source element type and the type of the underlying object can result from other causes as well (though I’m not sure whether such cases actually occur in practice).

1 Like

Maybe I should check and reevaluate this, but don’t think it was doing anything useful without this.
But it’s very attractive though, just to get this running, but then it needs to trigger at least somewhere, so will check that.

Yep, but it’s quite a bit of work, and also becomes instantly technical debt and once we have the proper solution this needs changing again. That’s why I started to wonder about this whole thing, is this really better than using getElementSourceType a little while longer?

Agreed. That’s what we will be working on.

Hm, maybe I’m misunderstand something, but I don’t think this will become technical debt. At least at a cursory look, this seems like the correct general approach to follow, and future additions would build on top of it.

I don’t disagree with this, but would like to place this footnote. First of all, to be clear and get this out of the way, for valid IR, analysers and optimisers should do the right thing, no question. But for some sort of impact analysis: who’s producing this IR? Does this occur in practice? And can we somehow quantify this? Fortran won’t produce this and DA is correct which is why we have enabled it in Flang. For C/C++, this is type-punning, which isn’t portable code. We capture quite some of these cases, but not all. So I think it’s fair to say that we are working on a few corner cases. I believe there are downstream users that have loop optimisations enabled by default, which support this suspicion of corner cases and “impact analysis”. I appreciate this last bit is not the strongest argument, but it has to say something.

I know downstream users shouldn’t impact upstream developments, but maybe one more (related) footnote is that there might be quite a few downstream users that are impacted if we remove type based GEPs before we have a proper alternative in place. Also from that point of view a deprecation plan might be good.

I’ll admit that was partly speculation and that’s because we don’t have the solution yet to pass on type information. But with that general idea in mind, it’s probably not a stretch to see that we will still need Delinearization, which will be driven more by the explicit type information, but that we will see significant rewrites of this part of the code. That’s the goal, I think, simplify it and make it more robust.

Yes, I think your opinion does make sense. I doubt anyone would actually generate such weird IR, and even I feel that the cases I posted are somewhat contrived and not very practical. However, I think this is exactly the gap between the passes that are enabled by default and those that aren’t: we really should consider every possible input and user. It might be worth noting that LLVM is not a backend only for C/C++ and Fortran; there are many other languages whose compilers use LLVM as a backend. I agree with you that we cannot quantify the “impact”, but as for enablement, I believe that correctness issues affecting a few users should take precedence over performance benefits for many.

This might sound strange, but personally, I feel it could be unfortunate that the current GEP-driven heuristic “ends up succeeding anyway”. If we enable it as it is (i.e., keep the GEP-driven heuristics), it will likely introduce additional costs when migrating from GEP to ptradd; we would also need to account for the performance regressions due to the replacement. In this case, your “long-term solution” might be a blocker of the migration. I think this could be a form of technical debt, which we should try to avoid.

I don’t intend to remove the current delinearization function immediately. It should be sufficient to replace the calls to it in the passes involved with LoopInterchange.

As I wrote in my previous comment, I think it’s fine to keep the existing delinearization functions (though it would be ideal to remove them as well). What we really need to do to enable LoopInterchange is, to remove the calls to them triggered by running LoopInterchange.
I’m planning to continue working in my current direction, but I don’t think this conflicts with your long-term solution. Your approach is probably more robust and completely superior to mine, but even if that turns out to be the case, it would simply mean that my work becomes obsolete and can be removed, which is not technical debt, I think.

Whether we should call it change, churn, or technical debt, I don’t know and it probably doesn’t really matter, but when I e.g. look at delinearizeFixedSizeArray in #145050 then I am thinking that all of this could be very different if we start emitting type information, and #145050 and follow ups is quite a bit of work. So that’s why I started wondering if this is all worth it, is this really better than relying on gep element source type?

My short term solution would be the following while we work on the proper one:

We filter out type punning by looking at GEP source types in absence and during the deprecation of type based GEPs. I think that could be a scan in interchange if we want to keep it separate, and is just a bit of filtering. Or we move a bit more of the logic into DA as there’s already some logic in there.

Or do you thinking that finishing off the work in #145050 and the required follows up is low hanging fruit?

This is just a quick response for now.

In my understanding, reasoning something based on the GEP source element type in optimization heuristics is more than deprecated; it is disallowed. And this is why your short-term solution is not appropriate, in the context of enabling LoopInterchange by default (@nikic let me know if I am wrong).

Assuming my understanding is correct, we must choose one of the following:

  • Remove all calls to tryDelinearizeFixedSizeImpl, as @nikic suggested in the first reply of this thread.

  • Replace them with another function that does not rely on the source element type of GEP at all.

I started #145050 to pursue the second option, which will hopefully offer a better solution than the first. I don’t think #145050 currently outperforms the existing implementation in terms of analysis capabilities. While I do have some ideas for improvements, but I haven’t tested them yet, and it’s unclear how effective they’ll be. However, even in its current form (without enhancements), I believe it’s still better than having no solution at all.

Could you please clarify your concerns? I agree it’s certainly less robust than analysis using emitted type information, but fundamentally, the two don’t conflict, I think. Are you referring to cases like the following?

Yes, this doesn’t match the actual underlying object type, but I don’t think that’s necessarily a problem. Wouldn’t it be sufficient to simply prioritize the emitted type information?

1 Like

It seems to me that now you are not just relying on Fortran not producing such code, you are also relying on all the other LLVM passes that run before RA to not produce such code. Obviously those passes will not take into account such a Fortran-specific invariant. So, it seems quite fragile to rely on this.