Opaque pointers and i8 GEPs

nikic · September 4, 2021, 9:17am

Hi everyone,

During the transition to opaque pointers (see https://llvm.org/docs/OpaquePointers.html for context) a somewhat recurring issue is the choice of source element types for getelementptr instructions. Consider these two GEPs:

getelementptr [[i32 x 10], x 10], ptr %p, i64 0, i64 1, i64 0

getelementptr i8, ptr %p, i64 40

If I counted correctly, these two GEPs perform the same address calculation. With typed pointers, they differed in their source and result pointer types. With opaque pointers, these GEPs are now strictly equivalent.

Various transforms in LLVM need to generate a GEP with a certain offset from a base pointer. In order to minimize bitcasts, they will usually try to use the source pointer element type as the GEP source element type and try to find a sequence of GEP indices that encodes the desired offset. If they can’t do that, they fall back to bitcasts to i8*, doing an i8 GEP and bitcasting to the result pointer type (or give up altogether).

The most sophisticated and complete implementation of this exists in SROA, which will determine the optimal GEP to generate in some three hundred lines of code: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/SROA.cpp#L1420-L1714

Now, with opaque pointers, doing this is entirely pointless. We can always simply generate a “getelementptr i8, ptr %Base, i64 Offset” without introducing bitcasts. With opaque pointers, the basic i8 GEP representation is already the optimal one.

In addition to that, generating something other than an i8 GEP requires us to pick an appropriate source element type. With typed pointers, that was simply the pointer element type. With opaque pointers, doing this requires some kind of heuristic guess. It would be possible to recurse through operations (selects, phis, etc) until we hit a value with an associated type, such as a GEP, alloca or global. However, doing this is not reliable, because there might not be such a value (e.g. if it is based on a function argument). There is no guarantee that this would arrive at the same GEP type as typed pointers did, and I think doing this goes against the spirit of opaque pointers.

My proposal here is that in opaque pointers mode, LLVM should consider i8 GEPs canonical for GEPs with constant offsets. We should not attempt to “guess” a good GEP type to use, and we should not try to generate complex GEP structures if a simple one will do. I don’t think there’s really any disadvantages to this, apart from the fact that it makes the discrepancy between typed and opaque pointer mode larger.

Context for this mail is https://reviews.llvm.org/D109259, where Arthur requested this to be discussed on llvm-dev.

Regards,
Nikita

nlopes · September 4, 2021, 11:34am

Sounds like a good idea. LLVM's memory model doesn't allow field-sensitive AA, so the GEP typing doesn't matter.
The exception might be TBAA (unclear, as I'm not familiar). That case may require multiple GEPs, one per field? I've no clue; just wondering what's the impact there.

Nuno

David_Chisnall3 · September 4, 2021, 1:08pm

I completely agree. Eventually, I’d like GEP to lose the ability to have multiple type arguments and I think this is a good first step in that direction:

1. Always generate GEPs with i8 offsets with all address calculation in the arithmetic leading up to the operation.
2. Canonicalise all other GEPs to this form.
3. Remove support for other kinds of GEP.

In the back end, a GEP is just a {PTR} + {expression that evaluates to an integer} add operation. That’s what it looks like in any target and it’s the most that we guarantee about memory in the IR, so everything that looks as if it supports more than this adds complexity for no benefit. It also means that we sometimes miss optimisation opportunities because GEPs can encode complex arithmetic expressions that are not CSE’d because they’re a separate and special kind of arithmetic.

David

aeubanks · September 4, 2021, 7:16pm

Does this break the inrange specifier for GEPs? I don’t see how we’d be able to use inrange to specify that a GEP can only access a certain element in an allocated object if we use i8 GEPs. CC’ing Peter for more thoughts.

Initial inrange proposal: https://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html

nikic · September 4, 2021, 7:50pm

Does this break the inrange specifier for GEPs? I don’t see how we’d be able to use inrange to specify that a GEP can only access a certain element in an allocated object if we use i8 GEPs. CC’ing Peter for more thoughts.

Initial inrange proposal: https://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html

Yeah, I don’t think inrange is compatible with this. I think we can safely keep using a typed GEP for the special case of a constexpr GEP based on a GlobalValue, for the time being. There’s a clear type to use in that case. I suspect that even apart from the inrange specifier some of our globals-related optimizations currently rely on typed GEPs. We’re very good at treating GEPs as pure offset arithmetic for GEP instructions, but not so good when it comes to constant expressions.

Regards,
Nikita

David_Chisnall3 · September 6, 2021, 9:58am

For those not wanting to read the proposal, the idea is to provide a stricter version of inbounds where the bounds are within some sub-object.

I don't really like the way that this proposal is using LLVM type information, because LLVM type information is not something that we're able to make any strong guarantees about for in-memory types and so anything using LLVM types to define semantics is quite fragile.

I'd much rather see this expressed as a range. You could express this with llvm.assume with a pair of integer comparisons, there's no need to tie it to the type info, though it may be desirable to provide a range as an attribute to avoid needing 5 instructions for the GEP.

David

Topic		Replies	Views
[InstSimplify] Remove zero-index opaque pointer GEP LLVM Project	4	344	November 21, 2022
Opaque pointers and GEP LLVM Dev List Archives	1	77	June 11, 2021
[RFC] Replacing getelementptr with ptradd IR & Optimizations	38	7316	November 18, 2024
Using "opaque pointers" right now? LLVM Dev List Archives	6	127	December 18, 2019
A question about getElementPtr LLVM Dev List Archives	2	93	November 8, 2006

Opaque pointers and i8 GEPs

Related topics