[RFC] Replacing getelementptr with ptradd

I like this proposal for LLVM’s role in code generation for a C-like machine model and I think it should be done.

However, LLVM is in practice used for more than that. We have DXIL and SPIR-V backends. I am less familiar with DXIL, but SPIR-V is very clear about the fact that many of its pointers are “logical” and we cannot in fact treat them as just using byte offsets. Of course, this is already a problem today, but explicitly moving to a ptradd is likely going to make it worse. So that part feels like it needs a bit more thought.

In step 3, what does adding ptradd actually mean? In terms of the C++ representation, it feels like ptradd can just be an i8 GEP; and a PtrAddInst class might be added that is literally a GEPInst with an additional isa-check. (I suppose there are some minor details around inrange.) The flag would then really be about making non-i8 GEPs illegal and using the appropriate auto-upgrades and builder behavior for that.

Right, this is the same problem as with opaque pointers, and I expect that it will be possible to solve it in a similar way:

The DXIL/SPIRV backends already perform some kind of pointer type “guessing” (because the output IR has pointee types), and getelementptr instructions can be rewritten in terms of those guessed types. If we have a ptradd ptr %p, i64 4 and the type guessing determines that the ptr is a i32*, then the ptradd can be rewritten into getelementptr i32, i32* %p, i64 1.

A potential complication I can foresee here is that the absence of getelementptr types makes the type guessing itself less reliable. E.g. if you see three i32 loads at the appropriate offsets, you could “guess” a {i32, i32, i32}* type, or a [3 x i32]* type, or a i32* type from that and construct GEPs for any of those. I don’t know to what degree the precisely chosen type matters here, as long as it is reasonably “nice”.

I’d definitely appreciate feedback from SPIRV/DXIL maintainers on how feasible this is. It would also be interesting to know whether the trivial fallback (bitcast + i8 GEP + bitcast) works for these targets – that is, is producing a “nice” GEP just a matter of code quality, or of correctness?

It’s worth mentioning that we already do have some key transforms that will always produce i8 GEPs, such as SROA or SCEV, so I expect these backends already need some way to deal with this…

Not exactly what I had in mind, but what you suggest does sound reasonable to me.

1 Like

I’d definitely appreciate feedback from SPIRV/DXIL maintainers on how feasible this is. It would also be interesting to know whether the trivial fallback (bitcast + i8 GEP + bitcast) works for these targets – that is, is producing a “nice” GEP just a matter of code quality, or of correctness?

I can’t speak authoritatively for all use cases, but my understanding is that if the source is a compute kernel, then things should generally just work (especially given target extension types should suffice to solve most of the necessary value-tracking issues). There might be an issue if i8-addressable memory is not available, but this is so far out of my wheelhouse that I can’t do anything more than speculate (and LLVM today has issues with non-i8-addressable targets).

A potential complication I can foresee here is that the absence of getelementptr types makes the type guessing itself less reliable. E.g. if you see three i32 loads at the appropriate offsets, you could “guess” a {i32, i32, i32}* type, or a [3 x i32]* type, or a i32* type from that and construct GEPs for any of those. I don’t know to what degree the precisely chosen type matters here, as long as it is reasonably “nice”.

GEPs are a critical input for type scavenging. In the longer term, it does seem as if SPIR-V will be gaining an untyped pointer extension, with compute kernels eventually moving to it, obviating the need for the type scavenger. Given the ability to pun pointers with bitcasts, it’s probably not too damaging if all GEPs become i8. But I think GEP types are useful enough to keep around (especially for array accesses!) that it’s worth trying to spend some effort keeping them around.

I assume the intent here is AllocaInst::getAllocatedType() and GlobalValue::getValueType() are also eventually going away?

Honestly, this concerns me more than ptradd, since it drastically reduces the ability to correctly guess type information where it’s necessary.

I think the general goal here is correct, but I want to suggest a change of implementation order.

Unless I’m missing something in the proposal, the ptradd is equivalent to a getelementptr i8. Given this, I think adding the new instruction should be our absolutely last step, not our first. All of the questions around canonicalization and optimization effectiveness can be evaluated with the current representation.

I advocate for introducing a canonicalizing transform which converts all non-i8 GEPs to i8 geps. Doing so, initially under a flag, should rapidly expose issues where we are relying on the type of the gep for optimization hints. Identifying and fixing these seems like the largest technical risk for this proposal - e.g. maybe delinearization can’t be easily rewritten - and front loading that work seems very worthwhile.

Once once we have been canonicalizing to getelementptr i8 for a while should we bother to add/redefine/replace the existing instruction.

Part of the reason for advocating this work order is that I am not entirely sure the proposal is going to work out. I’ve given this some thought, and I had been personally leaning towards something which made legal addressing modes (on a per target basis) explicitly part of the addressing. (Yes, that’s ugly. That’s why I haven’t proposed it yet.) I think it’s worth a serious attempt, and I’d love to be wrong, but I also want to front load the technical risk during as much as possible.

4 Likes

PtrAdd would complicate Structure Analysis and data layout optimizations.

To add to what efriedma-quic was already discussing!!!

The opaque pointers have already complicated Structure Analysis and data layout optimizations. Opaque pointers has removed the (Luxurious) high level constructs (with typeinfo) in IR and is making it closer to low level or machine code. Typed pointers to more extent was making pointer types explicit and ‘bitcast’ instructions were providing a ease picture of type cast or using objects in different (in disciplined) typed mode.

‘Geps’ are abstract and disciplined ways of accessing structures, these definitely helps in Structure Analysis and data layout optimizations. ‘Geps’ currently even in Opaque world are providing type information ‘getSourceElementType()’, analysis would get further complicated without this minimal information.

Pointer types of structure fields, function prototypes with exact pointer type, global variables with exact pointer types, ‘bitcast’ instructions, alloca pointer types, etc (a) —were all assets earlier in typed world. Now in the opaque world figuring out (a) would require lots of analysis effort and compile time. These issues complicate the Structure analysis.

Further removal of ‘Geps’ which accesses the first field of a structure as mentioned in

complicates the analysis.

Lastly this feature of PtrAdd would make the structure analysis further complicated with no type information and the discipline/abstraction of GEP.

The types on pointers had no real meaning, so analyses that exploited that extra information were (subtly) broken; it was not a “high level construct”.

Similarly I don’t think the types in GEPs matter, they just aid in performing the arithmetic?

So I don’t think you lose any semantically meaningful information.

4 Likes

Yes indeed.

However, the pragmatic truth is that LLVM is used very widely, including in situations where the structural information does matter, e.g. in certain places in GPU compilers. So, lots of people ended up using GEPs in ways where the types do matter. That was never entirely kosher but it happened to work, and so that’s where we are, and I do think we need a sort of project-wide guidance for how to deal with it.

For LLPC (the AMD shader compiler), I’ve been thinking recently that we would likely want to introduce an explicit sgep (structural GEP) operation that is used for situations where the structural information matters and you can’t just replace the sgep by a pointer addition. In our case, such situations exist because graphics APIs have opaque objects whose high-level representation doesn’t have a fixed size because the size may depend on the hardware generation or on the chosen wave width (number of vector lanes).

It’s very likely that we’ll move ahead with an sgep in LLPC some time this year, but it would be even better to work with upstream on this sort of problem.

I’ve been poking at weird GPU pointer types over in Representing buffer descriptors in the AMDGPU target - call for suggestions.

Because there’s no way to opt a pointer type out of the invariant that, for example

%q = gep {i32, i32}, ptr addrspace(A) %p, i32 2, i32 1
%q2 = gep i8, ptr addrspace(A) %p, i32 20
%q3 = gep i32, ptr addrspace(A) %p, i32 5
assert(memLoc(%q) == memLoc(%q2) == memLoc(%q3)

, that is, there’s no way to make a pointer where the structure of pointer arithmetic matters, I haven’t been able to represent AMDGPU’s structured buffers as pointers in a reasonable way … and the representation of raw buffers I’m planning is a hack that the overly strong “non-integral” semantics to prevent something from breaking my struct{buffer resource, offset} that’s hiding as a pointer type until late in the IR.

So, yeah, given that we have these architecture features, some notion of “this is a type of pointer where the indexing operations are not ‘move around an array of bytes’” is probably quite a good idea.

(especially for the structured buffers, where we’d ideally want the IR-level value to desugar to struct {ptr addrspace(8) buffer, i32 index, i32 offset}, where current GEP doesn’t let you express which level of that indexing hierarchy you’re targeting)

The types on the pointers would mostly correspond to the types actually written in input source code (high level language). This would definitely help anyone visually inspecting the IR.

I agree that we cannot blindly rely on these types, but the analysis to prove the types of pointers or objects would be far more simpler. Explicit Bit cast instructions are the exact places where the pointers could change tracks.

For example if we have a ‘struct node’ in a program , and if we prove that

  1. No other type/object is bit casted to pointer to ‘struct node’
    Note: Just exceptions like Malloc/Calloc/Realloc have to be treated specially

  2. and an object of type ‘struct node’ is not type casted to any other object

  3. Similarly we need to take care of bit casts of pointers with multiple indirection as will.

Then we can prove that all instances of pointer to 'struct node In the program are indeed pointers to ‘struct node’

The Abstraction and discipline of GEP is help full in structure analysis.
For example if we have a ‘struct node’ in a program instantiated as array of structures.
A pointer to struct node is guaranteed to point to beginning of the structure as long as there are no bit cast violations. Generally there would be two type of GEP instructions operating on structures,

  1. GEP on structure pointer (increment/decrement in multiple of structure size)
  2. GEP on structure pointer to get individual fields (simple offset arithmetic)
    (Care has to be taken that address of fields are not used as pointer to arrays).

With this above picture there would be no confusion that structure pointer always points to the beginning of a structure. A Structure pointer would always be a multiple of structure size. No optimizations usually disturb this picture.

But with PtrAdd this sort of simplicity of view and analysis could be lost.
It could take a lots of effort to understand which pointer is a structure pointer and which pointer is accessing which field. After may be performing complex optimizations on IR this could become a complicated task.

In some cases it could happen that after optimizations we may not be able to prove that a structure pointer would be a multiple of structure size due to variable bounds whose value is not known at compile time.

There is no such thing in LLVM-IR, as far as I can tell.
%struct.S %ptr does not mean %p points to the beginning of a struct, it never did.
All it means that if we do address space computations with %p, assume it does point to a struct.
What is really at that location, if anything, is irrelevant and casting a pointer to/from a struct is fine. Making conclusions based on the type has often been conceptually wrong.

The absence of current passes to change some perceived invariant is not a guarantee. Since we won’t guarantee the conditions in the IR, depending on them is ill advised.

If you want to analyze “structs”, or any memory really, use byte-based reasoning and determine effective type size (not type) by the access sizes. To try that out, run AAPointerInfo in the Attributor.

From the perspective of the DirectX backend (and I suspect probably SPIR-V too), this transformation will be much easier if the existing GEP instruction remains present and supported in the IR. DXIL is LLVM 3.7 IR, so we need to generate GEPs. Generating GEPs from ptradd instructions probably isn’t too terrible.

One thing I would like to point out is that while LLVM IR does get a lot of ABI-ness baked into it, there’s also a lot that still isn’t baked in. I’ve absolutely worked on production LLVM-based tools that have done evil, horrible things with DataLayouts which just kinda work with GEP, and will absolutely not work with ptradd without additional IR updates.

In GPU compilers it is more common that I’d like to admit to have LLVM IR-based intermediate representations that are target-agnostic-ish, and have GPU drivers lower that IR to target-specific IR. In that transformation data layouts can (and sometimes do) change.

None of that is insurmountable. It could be mitigated by having good utilities for converting pointer address math into GEPs and back out. That would allow data layout conversions to run by converting to GEPs and back.

Once a Structure node is allocated a through a Alloca / Global Var /Malloc_call that would be the start addresses of the structure. Now if these addresses are assigned to node structure pointers and there are no creepy bit casts on these addresses
(which needs to be proven through analysis), all the GEPs operating on node pointers would definitely maintain the consistency.

Please refer https://llvm.org/pubs/2003-09-30-LifelongOptimizationTR.pdf
section 2.2 Language-independent Type Information, Cast and GetElementPtr which throws some light on these aspects.

I do agree that llvm does not guarantee anything about structure type or other invariant information. But the only point is that the analysis to check these aspects would have been far more simpler with GEPs and the typed pointer world.

LLVM has no pointer types that are meaningful the way you hope/pretend they are.
No 20y old document will change that and the fact that we have opaque types should make it clear.

If you now argue that a pointer to an allocation is of the type of the allocation and this can be used for analysis, I can respond in two ways:

  1. There is no type of an allocation that is meaningful in the way you want it to be. Neither alloca, globals, nor malloc carry a meaningful type in the IR. The type of the former two is just interesting for size and alignment purposes, not to be conflated with what the memory is actually used for.
  2. If you already do analysis, what does the type buy you? At the end of the day you would follow and collect all offsets and check the access type. Intermittent bit casts are harmless, the effective type is interesting. That said, you cannot even use access types (load/store) to judge as we translate from float to i32, from anything to i8 accesses, and maybe the other way around. All you have is the fact there is an allocation of a certain size, there are offsets of certain bytes, and there are accesses of a given size. The type part of the access and GEP is really not helping with figuring this out, and, as has been mentioned multiple times, the type part can be misleading anyway, e.g., due to changes by the compiler.

Finally, you might want to provide an actual argument why it would be “far more simpler”, especially after we got rid of typed pointers in trunk. The beginning of an allocation (your first point) is the same in both worlds (w/ and w/o typed GEPs). gep i32, { i32, i32 }* %p, i32 0, i32 1 is the same as gep i8, i8* %p, i32 4, no need for the struct type. etc.

This seems like a good way to determine the effect in an easy way, e.g., in the IR reader.

1 Like

Generally speaking (not llvm in this context) structure types in C language would be preferred by a programmer because of its simplicity which holds multiple basic data types in it. It is not just size and alignment, but it is also language constructs which helps him access these structures hiding the layout as to which field at which offset. The programmer could rather allocate a chunk of memory (*i8) or array and try to keep multiple fields in it and track which field is at which byte offset himself, but that would be terribly complicated right!. Simplicity is the one which has driven the compiler world where the languages have grown like

   1) machine code to assembly, 
   2) assembly to IR, 
   3) IR to high-level 
   4) and so on.

And I believe that 20y old beautiful doc was highlighting these special features which no other IR was providing.

Answering 1 and 2

  1. Struct Type allocation is not just about size & alignment but its layout as well which field at which offset. And the abstract constructs (Geps) which help us to access them.
  2. Once Struct Type and its accesses is ascertained it gives us ways to change its layout may be
    a) reorder fields,
    b) remove dead fields,
    c) change fields size or type,
    etc…

Collecting offsets and painting the picture of structure access would be complicated in opaque pointer world or with PtrAdds. Intermittent bit casts would complicate the process (may be like how far you would want to go to understand the things). I do agree that it may or would be possible with PtrAdds tracking the offset and painting the picture but that would be complicated than what we can do in the typed world.

I have attached a sample test case SM1.c and the two IRs

  1. typed IR
  2. Opaque IR

In the typed IR

%14 = call noalias i8* @calloc(i64 noundef %13, i64 noundef 24) #13
%15 = bitcast i8* %14 to %struct.Node1*

Can be treated as a single instruction and this gives a clear clue that ‘calloc’ is allocating an object of type %struct.Node1.

Same applies to the following,

%48 = bitcast %struct.Node1* %15 to i8*
call void @free(i8* noundef %48) #11

Apart from the above there are no
a) bit cast instructions to %struct.Node1*
b) bit cast instructions from %struct.Node1*

Giving a clear clue that %struct.Node1* are accessed as %struct.Node1* itself.
Now any GEP instruction in the entire program operating on %struct.Node1* would be consistent,
Consistent meaning,

  1. %struct.Node1* would be incremented or decremented only in multiples of %struct.Node1 size, when operated through GEP

  2. Any access to fields would only be through GEP instruction only.
    (Note care has to be taken to ensure that pointers to fields are not used as arrays)
    And the GEP would readily provide the types of fields as well.

Here the analysis has not traversed the complete path of how %struct.Node1* flows but has still able to quickly analyze its discipled access in this case. Further here we are not doing lots of book keeping for,

a) which instruction or pointer is of which type, 
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*, 
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1*      


Following picture readily is available in the IR,

{
    %struct.Node1 = type { i32, i64, %struct.Node1* }
    define internal i32 @func_check(%struct.Node1* nocapture noundef readonly %0, i32 noundef %1)
    define internal void @func(%struct.Node1* nocapture noundef writeonly %0, i32 noundef %1)
    define internal void @func_1(%struct.Node1* noundef %0, i32 noundef %1)
    define internal i32 @func_1_check(%struct.Node1* noundef readonly %0, i32 noundef %1) 
    define internal i32 @func_12_check(%struct.Node1* noundef readonly %0, i32 noundef %1)
}

All we would be doing is that just confirm or a certain that whatever information is present in IR is correct (no need of any book keeping). With this consistent picture of disciplined access it should be easier to change the layout of the %struct.Node1.

In the Opaque IR

Here it would not be readily evident that

%13 = call noalias ptr @calloc(i64 noundef %12, i64 noundef 24) #13

is allocating a %struct.Node1* .

There is not explicit bit cast instructions any were, we have to traverse all the path of flow of %13 to check if it is accessed in %struct.Node1* way. We need to ensure that no other object get mixed up with the usage path of %13. Thank god at least we still have GEPs which gives some indication that it is operating on %struct.Node1* 's Protocol.

Now we need to do book keeping on ,

a) which instruction or pointer is of which type, 
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*, 
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1*  …..

Since all these information will not be readily available as in opaque world.

{
%struct.Node1 = type { i32, i64, ptr }
define internal i32 @func_check(ptr nocapture noundef readonly %0, i32 noundef %1)
define internal void @func(ptr nocapture noundef writeonly %0, i32 noundef %1)
define internal void @func_1(ptr noundef %0, i32 noundef %1)
define internal i32 @func_1_check(ptr noundef readonly %0, i32 noundef %1)
define internal i32 @func_12_check(ptr noundef readonly %0, i32 noundef %1)
}

Once PTRAdd comes in may be guessing which structure types are active also would be like moving in a dark room to examine something !
I felt that opaque pointers and PtrAdds could have been implemented as an optimization pass and could have pitched in at some final stages of compilation or LTO stage!. Rather than being a default from the start or front end.

SM1.c (2.8 KB)
temp_op_ll.c (11.2 KB)
temp_ty_ll.c (13.1 KB)

I’m not super up to speed on LLVM’s infrastructure, but aren’t “so, what type has been allocated here/is being pointed to/…?” questions what the type-based alias analysis hints are for?

Yes, but you can’t rely on those. And it’s only really for C’s notion of disallowing type aliasing

Regarding the type-based alias analysis hints.
Why not approach it from:

  1. I don’t know what type this value is.
  2. Take the alias analysis hint, and test whether it is true, and if so use it as its type.
  3. If the test turns out to the false. i.e. the hint looks wrong, then make inferences as needed, or skip optimizations in these cases.
    I think there was another thread in this area:
    [RFC] Better support for typed pointers in an opaque pointer world

(Disclaimer: I’m an LLVM n00b, so this might be wrong.)

My feeling here is that the middle-end would be much easier forcing the separation, and only merging again in the back-end.

I was looking at code that, roughly, needed a + (b-a) to keep AllocIds correct, and ended up opening Collapse `x/N*N` → `x` when the multiplication is inside a GEP and the division exact · Issue #62124 · llvm/llvm-project · GitHub

That’s the kind of missing optimization that can’t happen if there’s just ptradd without including the multiplication. The optimizer obviously handles removing udiv exact+mul for normal integers (and has done so for ages), so if the address calculation logic just does that, there’s no extra logic needed. But if the multiplication exists inside the addressing, then that’s a separate case that needs to be handled.

(AKA “The lack of a canonical representation regularly leads to optimization failures.”, as was said in the OP.)

I’m coming to this very late, sorry (I broke my shoulder around the time this was proposed and am still catching up on LLVM things), but I enthusiastically support this proposal (I wanted precisely this when opaque pointers were introduced). It gives a clearer canonical form (if you do two different GEPs that end up pointing to the address, how do you tell that they alias? You compute the thing that this gives you already), it maps clearly to a single-provenance model, and it removes the single hardest thing in the IR to explain to new people.

1 Like