[RFC] Replacing getelementptr with ptradd

nikic · February 23, 2023, 2:06pm

Background and Motivation

Address calculations in LLVM are currently represented using the getelementptr (GEP) instruction. Semantically, GEPs represent a provenance-preserving addition of an offset to a base pointer.

However, the actual IR representation is type-based. As such, there are many different ways to represent semantically equivalent GEP instructions. All of the following are the same:

%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 0, i32 0, i64 4
%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 1, i32 1
%gep = getelementptr { [2 x i32], i32 }, ptr %p, i64 2, i32 0, i64 -2
%gep = getelementptr [0 x i32], ptr %p, i64 0, i64 4
%gep = getelementptr i32, ptr %p, i64 4
%gep = getelementptr i14, ptr %p, i64 8
%gep = getelementptr i8, ptr %p, i64 16

All of the following are also the same:

%gep = getelementptr [2 x i32], i64 0, i64 %idx
%gep = getelementptr i32, i64 %idx
%offset = shl %idx, 2
%gep = getelementptr i8, i64 %offset

There is currently no real canonical form for GEPs. Components like SCEVExpander which create GEPs from “thin air” always use i8 GEPs, but to the most part we preserve whichever source element type the frontend happened to generate.

The core problems with the current representation could be summarized as follows:

Converting GEPs into offset arithmetic requires additional code in many places.
We often don’t bother, leading to optimization failures.
If we do bother, it has high compile-time overhead.

And now in some more detail, as well as some less important issues:

The lack of a canonical representation regularly leads to optimization failures. While ideally all optimizations would decompose GEPs into offset arithmetic and operate in that form, we usually only actually do this for constant offset GEPs. Decomposition of variable GEPs is only performed in a few particularly important places, such as BasicAA.

Instead, many optimizations will check for source element type consistency instead, i.e. they check that the type matches between two GEPs, or a GEP and a global etc. This means that optimizations fail to apply when GEP source element types are not in the expected form. Opaque pointers have exacerbated this problem, because the GEP source element type is no longer rooted in the pointer type and can be completely arbitrary.

I’ve encountered many optimization failures of this kind in the past, and have usually addressed them by switching optimizations to use various forms of offset decomposition. However…
Converting structural GEPs into offset representation has a high compile-time overhead. Among other things, it requires computing the alloc size (and as such ABI alignment) of all indexed subtypes.

This means that for hot code paths, it can be hard to justify performing a conversion into offset representation. For example, while I have changed GVN to treat GEPs as pure offset arithmetic, I abandoned the same effort in EarlyCSE because the compile-time impact would be too high to justify.
GEPs can contain non-trivial address arithmetic: A single GEP doesn’t correspond to a single addition, it can encode a sequence of many multiplies and adds.

Because it is part of the GEP, redundant calculations between GEPs, as well GEPs and other instructions, are hidden at the IR level. There is no opportunity to CSE/LICM/etc them.

Address arithmetic is only exposed in the backend, which will be able to recover many simple cases, but fail in more complex ones (e.g. cross-BB redundancies).
Analysis and optimization of scalable GEPs is essentially completely unsupported, even in core components like BasicAA. Offset decomposition of scalable GEPs would require dealing with an additional optional vscale factor everywhere, which is too much complexity to be worthwhile.

Proposal

This RFC proposes to replace the getelementptr instruction with the ptradd instruction, which is a provenance-preserving addition of an offset to a pointer. Some examples follow:

; Constant offset (p + 16)
%res = ptradd ptr %p, i64 16

; Scaled variable offset (p + 4 * idx)
%offset = shl nuw nsw i64 %idx, 2
%res = ptradd inbounds ptr %p, i64 %offset

; Scaled variable plus constant offset (p + 12 * idx + 8)
%offset = mul i64 %idx, 12
%tmp = ptradd ptr %p, i64 %offset
%res = ptradd ptr %tmp, i64 8

; Vscale offset (p + vscale * 16 * idx)
; Corresponding to a <vscale x 4 x i32> type, for example.
%vscale = call i64 @llvm.vscale.i64()
%offset = shl i64 i64 %vscale, 4
%res = ptradd ptr %p, i64 %offset

The offset type must match the index type size of the pointer address space. (If enforcing this turns out to be technically infeasible, we can fall back to implicitly truncating or sign extending to the index type size, as GEPs currently do.)

The following features of GEPs are preserved:

The inbounds attribute, with the same semantics. Offset calculation is no longer part of the instruction, so it needs to be separately annotated. The direct equivalent would be nsw attributes on any mul, shl or add contributing to the offset calculation. However, the new representation also offers the opportunity to encode addition information by using nuw attributes as well.
The pointer or offset or both may be vectors, with the same semantics.
A constant expression variant is supported, with the syntax ptradd (ptr @g, i64 C).

inrange

GEP constant expressions currently have an inrange attribute, which can be placed on one of the indices, with the following semantics (quoting from LangRef):

If the inrange keyword is present before any index, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside of the bounds of the element selected by the index marked as inrange. The result of a pointer comparison or ptrtoint (including ptrtoint-like operations involving memory) involving a pointer derived from a getelementptr with the inrange keyword is undefined, with the exception of comparisons in the case where both operands are in the range of the element selected by the inrange keyword, inclusive of the address one past the end of that element.

With ptradd, the inrange attribute will instead accept an offset range, as illustrated by these examples:

getelementptr ({ [4 x i32], [4 x i32] }, ptr @g, i64 0, inrange i32 1, i64 0)
; =>
ptradd (ptr @g, i64 16, inrange [0, 16])

getelementptr ({ [4 x i32], [4 x i32] }, ptr @g, i64 0, inrange i32 1, i64 1)
; =>
ptradd (ptr @g, i64 20, inrange [-4, 12])

The offset range specifies which offsets of the ptradd result may be accessed (with otherwise the same semantics as current inrange).

The design matches that of the proposed memory region intrinsic. However, as this needs to be a constant expression, we cannot actually use such an intrinsic for this purpose.

vscale constant expression

Currently, vscale can be presented either using the @llvm.vscale intrinsic, or using the ptrtoint (getelementptr (<vscale x 1 x i8>, ptr null, i64 1) to i64) constant expression. The latter will no longer exist with the ptradd representation, because it has no intrinsic notion of vscale.

The current stance of this proposal is that only the @llvm.vscale representation should remain, because there are plans to make vscale non-constant across functions anyway, in which case the constant expression is ill-defined.

If it turns out that this is not viable for some reason, then the fallback plan would be to introduce a first-class vscale constant to replace the constant expression representation.

Benefits

The ptradd instruction addresses the issues raised in the motivation section:

ptradd is an accurate representation of the underlying IR semantics, without redundant encodings. As such, redundancy elimination works without further effort, avoiding optimization failures.
ptradd is always in offset representation, as such analyses/transforms do not have to go out of their way to convert GEPs into this representation. This avoids optimization failures and improves compile-times.
Offset arithmetic is explicitly materialized in IR, and as such visible to all IR level transforms.
Vscale is just like any other value as far as ptradd is concerned. As such, all analyses/transforms automatically have decent support for vscale ptradds.

Migration

This is a major IR change. We are just through the opaque pointer migration, which required a lot of effort not just inside LLVM, but also in all 3rd party consumers. I’m sure people are wary of embarking on another major change.

The good news is that this change is expected to have much less impact on 3rd party consumers of LLVM than the opaque pointers migration, for reasons outlined in the following.

Mapping getelementptr ↔ ptradd

As long as DataLayout is available, any getelementptr instruction can be expanded into a sequence of multiplies, adds, ptradds and vscale intrinsic calls.

Existing getelementptr IRBuilder APIs will continue working in a ptradd world: They will just emit the appropriate ptradd sequence instead. In fact, the structural getelementptr API is likely more convenient than ptradd when it comes to generating IR for many compiler frontends.

Conversely, any ptradd can be interpreted as an i8 getelementptr. During the migration process, ptradds will pretend to be i8 GEPs, to allow existing code to continue working. Specialized ptradd code may be needed for more optimal handling, but existing code should handle them correctly.

With that in mind, the migration process is expected to go through the following steps.

Step 1: Make DataLayout available

Expanding GEP into ptradd requires DataLayout (DL) availability. However, IRBuilder currently doesn’t always have an available DataLayout.

The suggested course of action is to require DataLayout to be always available at IRBuilder construction time. Most IRBuilder uses already implicitly provide this by specifying an insertion point.

If IRBuilder is constructed without an insertion point (only with an LLVM context), a data layout will now also be required.

This will also require a change to the LLVM C API: LLVMCreateBuilder and LLVMCreateBuilderInContext will be removed in favor of LLVMCreateBuilder2, which accepts both a context and a data layout.

The second step will be to remove the DL-unware IRBuilder ConstantFolder in favor of the DL-aware TargetFolder, which can now always be used, thanks to unconditional DL availability.

The third step will be to require a DataLayout argument in the ConstantExpr::getGetElementPtr() methods.

Finally, uses of GetElementPtrInst::Create() should be replaced with IRBuilder usage where possible. Once this is done, everything is ready for automatic conversion from getelementptr to ptradd.

Step 2: Inrange representation, canonicalize constexpr GEPs

(This can likely be done in parallel to step 1.)

The inrange representation on the getelementptr constexpr is changed from an index modifier to accept a range, same as for ptradd, with the old representation being auto-upgraded in bitcode. The GlobalSplit pass (the only user of this annotation) needs to be updated to support the new representation.

Once this is done, canonicalize all constant expression GEPs with constant offset to use i8 GEPs, approximating their final representation. Doing this early allows us to ensure that all optimizations deal with the new representation, on a controlled subset of GEPs.

Step 3: Introduce ptradd under a flag

Introduce the ptradd instruction, which will only be used if the -enable-ptradd flag is enabled. Treat ptradd like an i8 GEP in existing code. Implement auto-upgrade support from getelementptr to ptradd in bitcode, IR parsing, IRBuilder and constexpr creation.

Evaluate clang and other frontend with ptradd enabled and address encountered optimization issues. (In theory there at least shouldn’t be correctness issues, but you know what they say about theory and practice…)

Step 4: Enable ptradd by default, migrate tests

Similar to the opaque pointers migration, ptradd would be enabled by default, while existing tests would opt out via -enable-ptradd=0.

Tests would then be gradually migrated to the new representation. This might easily be the most time-consuming part of the entire migration.

Once this is done, getelementptr support is removed.

Anticipated complications

The main complication I can predict at this point is cost modelling. While I have listed the fact that offset calculation is explicitly materialized and not an implicit part of a GEP instruction as a benefit above, it does come with a flip side:

Many targets have addressing modes that allow folding a limited-range multiply and add into load/store instructions, making these calculations essentially free.

GEP cost modelling will currently consider such GEPs frees. This is of course all kinds of wrong (not every GEP will be directly used in a load/store), but it’s a relatively simple approximation.

With the multiply/shift no longer being part of the GEP, it’s less clear whether it is actually free due to addressing mode folding. This can likely be mitigated by checking for ptradd-only users, but it’s worth noting as a weak point of the new representation.

nlopes · February 23, 2023, 3:42pm

Let me give an historical perspective around getelementptr.
In early days, a design decision of LLVM was that the IR of a program should be target independent. One could run target-independent optimizations (DL=null) as a preprocessing step, and then, on the target device, optimize for a specific DL.

getelementptr was an important piece in this vision as you give in a type so indexes are multiplied with a type size that may not be fixed.

This vision quickly broke because when compiling from C and other languages, a lot of the ABI and target features are baked when generating the IR. So, the IR is not really target independent. And probably there’s no use case where the type sizes are determined later and can vary depending on the DL (?).

Given that, fixing the offset multipliers in gep makes sense. The only question is where we go with your proposed solution of having an instruction that does p + idx or do we want an instruction like p + m * idx.
Having a multiplier is probably common, so the ptradd proposal will always replace 1 gep with 2 instructions. My question is only if there are differences in terms of canonicalization between these 2 alternatives?

jcdutton · February 23, 2023, 5:01pm

Hi,

Minor comment.

When documenting “ptradd”, please make it clear which parameters are signed and which are unsigned.
If adding “ptradd”, why not also add other provenance-preserving operations, such as “ptrand” and “ptror” ?

tschuett · February 23, 2023, 5:11pm

There is already @llvm.ptrmask. You could add as first small step @llvm.ptradd to ease the migration.

nikic · February 23, 2023, 5:33pm

Adding direct scaling support to ptradd is definitely a viable alternative, and would still satisfy the core goals of this proposal. I think there are advantages and disadvantages to both.

Advantages of supporting scaling in ptradd:

Slightly less instructions on average. (I think the effect of this will actually be quite small, because I expect most scaling to be CSEd.)
Easier cost modelling (see the issue outlines in “anticipated complications” – this would basically go away.)
Optimizations cannot obscure scaling, so that addressing mode matching is more reliable.

Disadvantage of supporting scaling are:

Makes the instruction more complicated, meaning that code has to deal with it. To give an example, say you have select (ptradd p, 2 * x), (ptradd p, 4 * y). Folding this to ptradd p, (select 2 * x, 4 * y) is not straightforward and requires materializing additional instructions (and reasoning about whether that’s really worthwhile or not). Without scaling support this optimization is trivial. Similar cases elsewhere.
Scaling is not exposed in IR, and as such not exposed to CSE, LICM, etc. This leads to issues like Multiple evaluations of a GEP · Issue #50528 · llvm/llvm-project · GitHub when redundant address calculations make it into the final assembly.
There are more ways to write the same thing and the usual issues that come with it. I think this is mostly mitigated by saying that having all multipliers in the ptradd is canonical, but there are externalities to that, especially when it comes to multi-use values and flag preservation.

Overall I lean towards making ptradd a pure add, because that makes IR and transformations simpler, but I could be convinced either way here.

This is going to be the same as for GEPs, i.e. the base pointer is unsigned and the offset is signed. (This could be, and probably should be, extended by a nuw flag, but that’s a separate proposal that would work just as well on GEPs.)

These operations are rare enough that they almost certainly do not merit a dedicated instruction. As pointed out, there already is a ptrmask intrinsic (though we never got around to actually using it).

ptradd is a replacement for the existing getelementptr instruction. Adding additional pointer operations on top of that is out of scope for this proposal.

kparzysz-quic · February 23, 2023, 5:57pm

Is this going to affect memory dependence analysis? IIRC, there were some issues with recovering the original array indexing from a GEP, and if we could represent address generation in a way that would make that type of analysis easier, it would be very helpful.

efriedma-quic · February 23, 2023, 8:12pm

I’m a little concerned about existing analysis passes depending on the types of GEP operations; not necessarily for correctness, but as a hint to drive internal datastructures. There are still a lot of in-tree users of GEP types (getSourceElementType()/getResultElementType()/gep_type_iterator/etc.). Do we have some idea how many of those are actually sensitive to the GEP types? How many are sensitive to whether we use integrated GEP arithmetic vs. a tree of add/mul ops?

This is likely going to be heavily dependent on “nsw” markings on arithmetic to reliably analyze “inbounds” GEPs. We might need to revisit optimizations that end up stripping off nsw markings for various reasons.

This is going to painful for some out-of-tree code my team maintains, and I expect the same is true for others, but I don’t have any reason to expect it’s impossible to transition that code. It’ll just make the analysis more complicated.

nikic · February 23, 2023, 9:16pm

Depends on what exactly you mean by that. In terms of analyses that actually use GEP types as an optimization heuristic, I’m only aware of a single place that does this: https://github.com/llvm/llvm-project/blob/54e51074989333a9d512b962851eaabdc003e6be/llvm/lib/Analysis/Delinearization.cpp#L486 As far as I know, this is the only place where some actual loss of information might occur.

What is a bit more common is code that currently only works with specific GEP types. To give an example, the optimization at https://github.com/llvm/llvm-project/blob/54e51074989333a9d512b962851eaabdc003e6be/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp#L100 requires the global and the GEP to have the same type. Supporting ptradd in this transform would require making it work on an offset representation. (Even without ptradd, this would be required to make it work on Rust code, because in rustc IR the global type and the GEP type are always different.)

I think the number of transforms that would need additional handling for ptradd is fairly limited, in part thanks to the work already done during the opaque pointer migration.

I’d be interested in hearing more about this. What are you doing that benefits from GEP types?

efriedma-quic · February 23, 2023, 11:08pm

Delinearization, like you mentioned… we have some code doing something similar to the upstream getIndexExpressionsFromGEP.

It’s possible to derive the dimensions of a memory access based on the structure of the memory accesses and address arithmetic, but it’s harder than just verifying accesses follow a pattern derived from the type. You can’t do it access by access, though; you have to consider all the accesses together to compute the right dimensions. And there’s not really a “right” answer in some cases; if you’re not using information from the source code, the number of dimensions is a heuristic to some extent. And picking the wrong dimensions can lead to generating runtime checks which will never succeed.

We also have a specialized memory layout optimization pass which currently uses IR types as a hint describing the structure of memory, and which accesses are related to each other.

Granted, depending on GEPs for this sort of thing was never really completely reliable, due to bitcasts/overindexing/etc. Maybe it’s worth considering a dedicated “dimension” intrinsic to describe array dimensions.

I assume the intent here is AllocaInst::getAllocatedType() and GlobalValue::getValueType() are also eventually going away?

nhaehnle · March 2, 2023, 12:12pm

I like this proposal for LLVM’s role in code generation for a C-like machine model and I think it should be done.

However, LLVM is in practice used for more than that. We have DXIL and SPIR-V backends. I am less familiar with DXIL, but SPIR-V is very clear about the fact that many of its pointers are “logical” and we cannot in fact treat them as just using byte offsets. Of course, this is already a problem today, but explicitly moving to a ptradd is likely going to make it worse. So that part feels like it needs a bit more thought.

In step 3, what does adding ptradd actually mean? In terms of the C++ representation, it feels like ptradd can just be an i8 GEP; and a PtrAddInst class might be added that is literally a GEPInst with an additional isa-check. (I suppose there are some minor details around inrange.) The flag would then really be about making non-i8 GEPs illegal and using the appropriate auto-upgrades and builder behavior for that.

nikic · March 2, 2023, 2:27pm

Right, this is the same problem as with opaque pointers, and I expect that it will be possible to solve it in a similar way:

The DXIL/SPIRV backends already perform some kind of pointer type “guessing” (because the output IR has pointee types), and getelementptr instructions can be rewritten in terms of those guessed types. If we have a ptradd ptr %p, i64 4 and the type guessing determines that the ptr is a i32*, then the ptradd can be rewritten into getelementptr i32, i32* %p, i64 1.

A potential complication I can foresee here is that the absence of getelementptr types makes the type guessing itself less reliable. E.g. if you see three i32 loads at the appropriate offsets, you could “guess” a {i32, i32, i32}* type, or a [3 x i32]* type, or a i32* type from that and construct GEPs for any of those. I don’t know to what degree the precisely chosen type matters here, as long as it is reasonably “nice”.

I’d definitely appreciate feedback from SPIRV/DXIL maintainers on how feasible this is. It would also be interesting to know whether the trivial fallback (bitcast + i8 GEP + bitcast) works for these targets – that is, is producing a “nice” GEP just a matter of code quality, or of correctness?

It’s worth mentioning that we already do have some key transforms that will always produce i8 GEPs, such as SROA or SCEV, so I expect these backends already need some way to deal with this…

Not exactly what I had in mind, but what you suggest does sound reasonable to me.

jcranmer · March 9, 2023, 10:08pm

I’d definitely appreciate feedback from SPIRV/DXIL maintainers on how feasible this is. It would also be interesting to know whether the trivial fallback (bitcast + i8 GEP + bitcast) works for these targets – that is, is producing a “nice” GEP just a matter of code quality, or of correctness?

I can’t speak authoritatively for all use cases, but my understanding is that if the source is a compute kernel, then things should generally just work (especially given target extension types should suffice to solve most of the necessary value-tracking issues). There might be an issue if i8-addressable memory is not available, but this is so far out of my wheelhouse that I can’t do anything more than speculate (and LLVM today has issues with non-i8-addressable targets).

A potential complication I can foresee here is that the absence of getelementptr types makes the type guessing itself less reliable. E.g. if you see three i32 loads at the appropriate offsets, you could “guess” a {i32, i32, i32}* type, or a [3 x i32]* type, or a i32* type from that and construct GEPs for any of those. I don’t know to what degree the precisely chosen type matters here, as long as it is reasonably “nice”.

GEPs are a critical input for type scavenging. In the longer term, it does seem as if SPIR-V will be gaining an untyped pointer extension, with compute kernels eventually moving to it, obviating the need for the type scavenger. Given the ability to pun pointers with bitcasts, it’s probably not too damaging if all GEPs become i8. But I think GEP types are useful enough to keep around (especially for array accesses!) that it’s worth trying to spend some effort keeping them around.

I assume the intent here is AllocaInst::getAllocatedType() and GlobalValue::getValueType() are also eventually going away?

Honestly, this concerns me more than ptradd, since it drastically reduces the ability to correctly guess type information where it’s necessary.

preames · March 16, 2023, 3:10pm

I think the general goal here is correct, but I want to suggest a change of implementation order.

Unless I’m missing something in the proposal, the ptradd is equivalent to a getelementptr i8. Given this, I think adding the new instruction should be our absolutely last step, not our first. All of the questions around canonicalization and optimization effectiveness can be evaluated with the current representation.

I advocate for introducing a canonicalizing transform which converts all non-i8 GEPs to i8 geps. Doing so, initially under a flag, should rapidly expose issues where we are relying on the type of the gep for optimization hints. Identifying and fixing these seems like the largest technical risk for this proposal - e.g. maybe delinearization can’t be easily rewritten - and front loading that work seems very worthwhile.

Once once we have been canonicalizing to getelementptr i8 for a while should we bother to add/redefine/replace the existing instruction.

Part of the reason for advocating this work order is that I am not entirely sure the proposal is going to work out. I’ve given this some thought, and I had been personally leaning towards something which made legal addressing modes (on a per target basis) explicitly part of the addressing. (Yes, that’s ugly. That’s why I haven’t proposed it yet.) I think it’s worth a serious attempt, and I’d love to be wrong, but I also want to front load the technical risk during as much as possible.

Suresh_M · March 17, 2023, 8:13am

PtrAdd would complicate Structure Analysis and data layout optimizations.

To add to what efriedma-quic was already discussing!!!

The opaque pointers have already complicated Structure Analysis and data layout optimizations. Opaque pointers has removed the (Luxurious) high level constructs (with typeinfo) in IR and is making it closer to low level or machine code. Typed pointers to more extent was making pointer types explicit and ‘bitcast’ instructions were providing a ease picture of type cast or using objects in different (in disciplined) typed mode.

‘Geps’ are abstract and disciplined ways of accessing structures, these definitely helps in Structure Analysis and data layout optimizations. ‘Geps’ currently even in Opaque world are providing type information ‘getSourceElementType()’, analysis would get further complicated without this minimal information.

Pointer types of structure fields, function prototypes with exact pointer type, global variables with exact pointer types, ‘bitcast’ instructions, alloca pointer types, etc (a) —were all assets earlier in typed world. Now in the opaque world figuring out (a) would require lots of analysis effort and compile time. These issues complicate the Structure analysis.

Further removal of ‘Geps’ which accesses the first field of a structure as mentioned in

complicates the analysis.

Lastly this feature of PtrAdd would make the structure analysis further complicated with no type information and the discipline/abstraction of GEP.

jrtc27 · March 17, 2023, 9:34am

The types on pointers had no real meaning, so analyses that exploited that extra information were (subtly) broken; it was not a “high level construct”.

Similarly I don’t think the types in GEPs matter, they just aid in performing the arithmetic?

So I don’t think you lose any semantically meaningful information.

nhaehnle · March 17, 2023, 9:57am

Yes indeed.

However, the pragmatic truth is that LLVM is used very widely, including in situations where the structural information does matter, e.g. in certain places in GPU compilers. So, lots of people ended up using GEPs in ways where the types do matter. That was never entirely kosher but it happened to work, and so that’s where we are, and I do think we need a sort of project-wide guidance for how to deal with it.

For LLPC (the AMD shader compiler), I’ve been thinking recently that we would likely want to introduce an explicit sgep (structural GEP) operation that is used for situations where the structural information matters and you can’t just replace the sgep by a pointer addition. In our case, such situations exist because graphics APIs have opaque objects whose high-level representation doesn’t have a fixed size because the size may depend on the hardware generation or on the chosen wave width (number of vector lanes).

It’s very likely that we’ll move ahead with an sgep in LLPC some time this year, but it would be even better to work with upstream on this sort of problem.

krzysz00 · March 17, 2023, 9:10pm

I’ve been poking at weird GPU pointer types over in Representing buffer descriptors in the AMDGPU target - call for suggestions.

Because there’s no way to opt a pointer type out of the invariant that, for example

%q = gep {i32, i32}, ptr addrspace(A) %p, i32 2, i32 1
%q2 = gep i8, ptr addrspace(A) %p, i32 20
%q3 = gep i32, ptr addrspace(A) %p, i32 5
assert(memLoc(%q) == memLoc(%q2) == memLoc(%q3)

, that is, there’s no way to make a pointer where the structure of pointer arithmetic matters, I haven’t been able to represent AMDGPU’s structured buffers as pointers in a reasonable way … and the representation of raw buffers I’m planning is a hack that the overly strong “non-integral” semantics to prevent something from breaking my struct{buffer resource, offset} that’s hiding as a pointer type until late in the IR.

So, yeah, given that we have these architecture features, some notion of “this is a type of pointer where the indexing operations are not ‘move around an array of bytes’” is probably quite a good idea.

(especially for the structured buffers, where we’d ideally want the IR-level value to desugar to struct {ptr addrspace(8) buffer, i32 index, i32 offset}, where current GEP doesn’t let you express which level of that indexing hierarchy you’re targeting)

Suresh_M · March 20, 2023, 7:20am

The types on the pointers would mostly correspond to the types actually written in input source code (high level language). This would definitely help anyone visually inspecting the IR.

I agree that we cannot blindly rely on these types, but the analysis to prove the types of pointers or objects would be far more simpler. Explicit Bit cast instructions are the exact places where the pointers could change tracks.

For example if we have a ‘struct node’ in a program , and if we prove that

No other type/object is bit casted to pointer to ‘struct node’
Note: Just exceptions like Malloc/Calloc/Realloc have to be treated specially
and an object of type ‘struct node’ is not type casted to any other object
Similarly we need to take care of bit casts of pointers with multiple indirection as will.

Then we can prove that all instances of pointer to 'struct node In the program are indeed pointers to ‘struct node’

Suresh_M · March 20, 2023, 8:17am

The Abstraction and discipline of GEP is help full in structure analysis.
For example if we have a ‘struct node’ in a program instantiated as array of structures.
A pointer to struct node is guaranteed to point to beginning of the structure as long as there are no bit cast violations. Generally there would be two type of GEP instructions operating on structures,

GEP on structure pointer (increment/decrement in multiple of structure size)
GEP on structure pointer to get individual fields (simple offset arithmetic)
(Care has to be taken that address of fields are not used as pointer to arrays).

With this above picture there would be no confusion that structure pointer always points to the beginning of a structure. A Structure pointer would always be a multiple of structure size. No optimizations usually disturb this picture.

But with PtrAdd this sort of simplicity of view and analysis could be lost.
It could take a lots of effort to understand which pointer is a structure pointer and which pointer is accessing which field. After may be performing complex optimizations on IR this could become a complicated task.

In some cases it could happen that after optimizations we may not be able to prove that a structure pointer would be a multiple of structure size due to variable bounds whose value is not known at compile time.

jdoerfert · March 20, 2023, 5:26pm

There is no such thing in LLVM-IR, as far as I can tell.
%struct.S %ptr does not mean %p points to the beginning of a struct, it never did.
All it means that if we do address space computations with %p, assume it does point to a struct.
What is really at that location, if anything, is irrelevant and casting a pointer to/from a struct is fine. Making conclusions based on the type has often been conceptually wrong.

The absence of current passes to change some perceived invariant is not a guarantee. Since we won’t guarantee the conditions in the IR, depending on them is ill advised.

If you want to analyze “structs”, or any memory really, use byte-based reasoning and determine effective type size (not type) by the access sizes. To try that out, run AAPointerInfo in the Attributor.

Topic		Replies	Views
Opaque pointers and i8 GEPs LLVM Dev List Archives	5	147	September 6, 2021
The GEP formats when generating IR. LLVM Dev List Archives	3	110	June 1, 2021
RFC: GEP as canonical form for pointer addressing LLVM Dev List Archives	18	109	February 25, 2014
getelementptr confusion LLVM Dev List Archives	0	85	November 5, 2019
GetElementPtr LLVM Dev List Archives	26	121	December 13, 2011