Background and Motivation
Address calculations in LLVM are currently represented using the getelementptr
(GEP) instruction. Semantically, GEPs represent a provenance-preserving addition of an offset to a base pointer.
However, the actual IR representation is type-based. As such, there are many different ways to represent semantically equivalent GEP instructions. All of the following are the same:
%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 0, i32 0, i64 4
%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 1, i32 1
%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 2, i32 0, i64 -2
%gep = getelementptr [0 x i32], ptr %p, i64 0, i64 4
%gep = getelementptr i32, ptr %p, i64 4
%gep = getelementptr i14, ptr %p, i64 8
%gep = getelementptr i8, ptr %p, i64 16
All of the following are also the same:
%gep = getelementptr [2 x i32], i64 0, i64 %idx
%gep = getelementptr i32, i64 %idx
%offset = shl %idx, 2
%gep = getelementptr i8, i64 %offset
There is currently no real canonical form for GEPs. Components like SCEVExpander which create GEPs from âthin airâ always use i8
GEPs, but to the most part we preserve whichever source element type the frontend happened to generate.
The core problems with the current representation could be summarized as follows:
- Converting GEPs into offset arithmetic requires additional code in many places.
- We often donât bother, leading to optimization failures.
- If we do bother, it has high compile-time overhead.
And now in some more detail, as well as some less important issues:
-
The lack of a canonical representation regularly leads to optimization failures. While ideally all optimizations would decompose GEPs into offset arithmetic and operate in that form, we usually only actually do this for constant offset GEPs. Decomposition of variable GEPs is only performed in a few particularly important places, such as BasicAA.
Instead, many optimizations will check for source element type consistency instead, i.e. they check that the type matches between two GEPs, or a GEP and a global etc. This means that optimizations fail to apply when GEP source element types are not in the expected form. Opaque pointers have exacerbated this problem, because the GEP source element type is no longer rooted in the pointer type and can be completely arbitrary.
Iâve encountered many optimization failures of this kind in the past, and have usually addressed them by switching optimizations to use various forms of offset decomposition. HoweverâŚ
-
Converting structural GEPs into offset representation has a high compile-time overhead. Among other things, it requires computing the alloc size (and as such ABI alignment) of all indexed subtypes.
This means that for hot code paths, it can be hard to justify performing a conversion into offset representation. For example, while I have changed GVN to treat GEPs as pure offset arithmetic, I abandoned the same effort in EarlyCSE because the compile-time impact would be too high to justify.
-
GEPs can contain non-trivial address arithmetic: A single GEP doesnât correspond to a single addition, it can encode a sequence of many multiplies and adds.
Because it is part of the GEP, redundant calculations between GEPs, as well GEPs and other instructions, are hidden at the IR level. There is no opportunity to CSE/LICM/etc them.
Address arithmetic is only exposed in the backend, which will be able to recover many simple cases, but fail in more complex ones (e.g. cross-BB redundancies).
-
Analysis and optimization of scalable GEPs is essentially completely unsupported, even in core components like BasicAA. Offset decomposition of scalable GEPs would require dealing with an additional optional vscale factor everywhere, which is too much complexity to be worthwhile.
Proposal
This RFC proposes to replace the getelementptr
instruction with the ptradd
instruction, which is a provenance-preserving addition of an offset to a pointer. Some examples follow:
; Constant offset (p + 16)
%res = ptradd ptr %p, i64 16
; Scaled variable offset (p + 4 * idx)
%offset = shl nuw nsw i64 %idx, 2
%res = ptradd inbounds ptr %p, i64 %offset
; Scaled variable plus constant offset (p + 12 * idx + 8)
%offset = mul i64 %idx, 12
%tmp = ptradd ptr %p, i64 %offset
%res = ptradd ptr %tmp, i64 8
; Vscale offset (p + vscale * 16 * idx)
; Corresponding to a <vscale x 4 x i32> type, for example.
%vscale = call i64 @llvm.vscale.i64()
%offset = shl i64 i64 %vscale, 4
%res = ptradd ptr %p, i64 %offset
The offset type must match the index type size of the pointer address space. (If enforcing this turns out to be technically infeasible, we can fall back to implicitly truncating or sign extending to the index type size, as GEPs currently do.)
The following features of GEPs are preserved:
- The
inbounds
attribute, with the same semantics. Offset calculation is no longer part of the instruction, so it needs to be separately annotated. The direct equivalent would bensw
attributes on anymul
,shl
oradd
contributing to the offset calculation. However, the new representation also offers the opportunity to encode addition information by usingnuw
attributes as well. - The pointer or offset or both may be vectors, with the same semantics.
- A constant expression variant is supported, with the syntax
ptradd (ptr @g, i64 C)
.
inrange
GEP constant expressions currently have an inrange
attribute, which can be placed on one of the indices, with the following semantics (quoting from LangRef):
If the inrange keyword is present before any index, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside of the bounds of the element selected by the index marked as inrange. The result of a pointer comparison or ptrtoint (including ptrtoint-like operations involving memory) involving a pointer derived from a getelementptr with the inrange keyword is undefined, with the exception of comparisons in the case where both operands are in the range of the element selected by the inrange keyword, inclusive of the address one past the end of that element.
With ptradd
, the inrange
attribute will instead accept an offset range, as illustrated by these examples:
getelementptr ({ [4 x i32], [4 x i32] }, ptr @g, i64 0, inrange i32 1, i64 0)
; =>
ptradd (ptr @g, i64 16, inrange [0, 16])
getelementptr ({ [4 x i32], [4 x i32] }, ptr @g, i64 0, inrange i32 1, i64 1)
; =>
ptradd (ptr @g, i64 20, inrange [-4, 12])
The offset range specifies which offsets of the ptradd result may be accessed (with otherwise the same semantics as current inrange
).
The design matches that of the proposed memory region intrinsic. However, as this needs to be a constant expression, we cannot actually use such an intrinsic for this purpose.
vscale constant expression
Currently, vscale can be presented either using the @llvm.vscale
intrinsic, or using the ptrtoint (getelementptr (<vscale x 1 x i8>, ptr null, i64 1) to i64)
constant expression. The latter will no longer exist with the ptradd
representation, because it has no intrinsic notion of vscale.
The current stance of this proposal is that only the @llvm.vscale
representation should remain, because there are plans to make vscale non-constant across functions anyway, in which case the constant expression is ill-defined.
If it turns out that this is not viable for some reason, then the fallback plan would be to introduce a first-class vscale
constant to replace the constant expression representation.
Benefits
The ptradd instruction addresses the issues raised in the motivation section:
- ptradd is an accurate representation of the underlying IR semantics, without redundant encodings. As such, redundancy elimination works without further effort, avoiding optimization failures.
- ptradd is always in offset representation, as such analyses/transforms do not have to go out of their way to convert GEPs into this representation. This avoids optimization failures and improves compile-times.
- Offset arithmetic is explicitly materialized in IR, and as such visible to all IR level transforms.
- Vscale is just like any other value as far as ptradd is concerned. As such, all analyses/transforms automatically have decent support for vscale ptradds.
Migration
This is a major IR change. We are just through the opaque pointer migration, which required a lot of effort not just inside LLVM, but also in all 3rd party consumers. Iâm sure people are wary of embarking on another major change.
The good news is that this change is expected to have much less impact on 3rd party consumers of LLVM than the opaque pointers migration, for reasons outlined in the following.
Mapping getelementptr â ptradd
As long as DataLayout is available, any getelementptr instruction can be expanded into a sequence of multiplies, adds, ptradds and vscale intrinsic calls.
Existing getelementptr IRBuilder APIs will continue working in a ptradd world: They will just emit the appropriate ptradd sequence instead. In fact, the structural getelementptr API is likely more convenient than ptradd when it comes to generating IR for many compiler frontends.
Conversely, any ptradd can be interpreted as an i8
getelementptr. During the migration process, ptradds will pretend to be i8 GEPs, to allow existing code to continue working. Specialized ptradd code may be needed for more optimal handling, but existing code should handle them correctly.
With that in mind, the migration process is expected to go through the following steps.
Step 1: Make DataLayout available
Expanding GEP into ptradd requires DataLayout (DL) availability. However, IRBuilder currently doesnât always have an available DataLayout.
The suggested course of action is to require DataLayout to be always available at IRBuilder construction time. Most IRBuilder uses already implicitly provide this by specifying an insertion point.
If IRBuilder is constructed without an insertion point (only with an LLVM context), a data layout will now also be required.
This will also require a change to the LLVM C API: LLVMCreateBuilder
and LLVMCreateBuilderInContext
will be removed in favor of LLVMCreateBuilder2
, which accepts both a context and a data layout.
The second step will be to remove the DL-unware IRBuilder ConstantFolder
in favor of the DL-aware TargetFolder
, which can now always be used, thanks to unconditional DL availability.
The third step will be to require a DataLayout argument in the ConstantExpr::getGetElementPtr()
methods.
Finally, uses of GetElementPtrInst::Create()
should be replaced with IRBuilder usage where possible. Once this is done, everything is ready for automatic conversion from getelementptr to ptradd.
Step 2: Inrange representation, canonicalize constexpr GEPs
(This can likely be done in parallel to step 1.)
The inrange representation on the getelementptr
constexpr is changed from an index modifier to accept a range, same as for ptradd
, with the old representation being auto-upgraded in bitcode. The GlobalSplit pass (the only user of this annotation) needs to be updated to support the new representation.
Once this is done, canonicalize all constant expression GEPs with constant offset to use i8
GEPs, approximating their final representation. Doing this early allows us to ensure that all optimizations deal with the new representation, on a controlled subset of GEPs.
Step 3: Introduce ptradd under a flag
Introduce the ptradd
instruction, which will only be used if the -enable-ptradd
flag is enabled. Treat ptradd
like an i8
GEP in existing code. Implement auto-upgrade support from getelementptr to ptradd in bitcode, IR parsing, IRBuilder and constexpr creation.
Evaluate clang and other frontend with ptradd enabled and address encountered optimization issues. (In theory there at least shouldnât be correctness issues, but you know what they say about theory and practiceâŚ)
Step 4: Enable ptradd by default, migrate tests
Similar to the opaque pointers migration, ptradd would be enabled by default, while existing tests would opt out via -enable-ptradd=0
.
Tests would then be gradually migrated to the new representation. This might easily be the most time-consuming part of the entire migration.
Once this is done, getelementptr support is removed.
Anticipated complications
The main complication I can predict at this point is cost modelling. While I have listed the fact that offset calculation is explicitly materialized and not an implicit part of a GEP instruction as a benefit above, it does come with a flip side:
Many targets have addressing modes that allow folding a limited-range multiply and add into load/store instructions, making these calculations essentially free.
GEP cost modelling will currently consider such GEPs frees. This is of course all kinds of wrong (not every GEP will be directly used in a load/store), but itâs a relatively simple approximation.
With the multiply/shift no longer being part of the GEP, itâs less clear whether it is actually free due to addressing mode folding. This can likely be mitigated by checking for ptradd-only users, but itâs worth noting as a weak point of the new representation.