[RFC] Replacing getelementptr with ptradd

From the perspective of the DirectX backend (and I suspect probably SPIR-V too), this transformation will be much easier if the existing GEP instruction remains present and supported in the IR. DXIL is LLVM 3.7 IR, so we need to generate GEPs. Generating GEPs from ptradd instructions probably isn’t too terrible.

One thing I would like to point out is that while LLVM IR does get a lot of ABI-ness baked into it, there’s also a lot that still isn’t baked in. I’ve absolutely worked on production LLVM-based tools that have done evil, horrible things with DataLayouts which just kinda work with GEP, and will absolutely not work with ptradd without additional IR updates.

In GPU compilers it is more common that I’d like to admit to have LLVM IR-based intermediate representations that are target-agnostic-ish, and have GPU drivers lower that IR to target-specific IR. In that transformation data layouts can (and sometimes do) change.

None of that is insurmountable. It could be mitigated by having good utilities for converting pointer address math into GEPs and back out. That would allow data layout conversions to run by converting to GEPs and back.

Once a Structure node is allocated a through a Alloca / Global Var /Malloc_call that would be the start addresses of the structure. Now if these addresses are assigned to node structure pointers and there are no creepy bit casts on these addresses
(which needs to be proven through analysis), all the GEPs operating on node pointers would definitely maintain the consistency.

Please refer https://llvm.org/pubs/2003-09-30-LifelongOptimizationTR.pdf
section 2.2 Language-independent Type Information, Cast and GetElementPtr which throws some light on these aspects.

I do agree that llvm does not guarantee anything about structure type or other invariant information. But the only point is that the analysis to check these aspects would have been far more simpler with GEPs and the typed pointer world.

LLVM has no pointer types that are meaningful the way you hope/pretend they are.
No 20y old document will change that and the fact that we have opaque types should make it clear.

If you now argue that a pointer to an allocation is of the type of the allocation and this can be used for analysis, I can respond in two ways:

  1. There is no type of an allocation that is meaningful in the way you want it to be. Neither alloca, globals, nor malloc carry a meaningful type in the IR. The type of the former two is just interesting for size and alignment purposes, not to be conflated with what the memory is actually used for.
  2. If you already do analysis, what does the type buy you? At the end of the day you would follow and collect all offsets and check the access type. Intermittent bit casts are harmless, the effective type is interesting. That said, you cannot even use access types (load/store) to judge as we translate from float to i32, from anything to i8 accesses, and maybe the other way around. All you have is the fact there is an allocation of a certain size, there are offsets of certain bytes, and there are accesses of a given size. The type part of the access and GEP is really not helping with figuring this out, and, as has been mentioned multiple times, the type part can be misleading anyway, e.g., due to changes by the compiler.

Finally, you might want to provide an actual argument why it would be “far more simpler”, especially after we got rid of typed pointers in trunk. The beginning of an allocation (your first point) is the same in both worlds (w/ and w/o typed GEPs). gep i32, { i32, i32 }* %p, i32 0, i32 1 is the same as gep i8, i8* %p, i32 4, no need for the struct type. etc.

This seems like a good way to determine the effect in an easy way, e.g., in the IR reader.

1 Like

Generally speaking (not llvm in this context) structure types in C language would be preferred by a programmer because of its simplicity which holds multiple basic data types in it. It is not just size and alignment, but it is also language constructs which helps him access these structures hiding the layout as to which field at which offset. The programmer could rather allocate a chunk of memory (*i8) or array and try to keep multiple fields in it and track which field is at which byte offset himself, but that would be terribly complicated right!. Simplicity is the one which has driven the compiler world where the languages have grown like

   1) machine code to assembly, 
   2) assembly to IR, 
   3) IR to high-level 
   4) and so on.

And I believe that 20y old beautiful doc was highlighting these special features which no other IR was providing.

Answering 1 and 2

  1. Struct Type allocation is not just about size & alignment but its layout as well which field at which offset. And the abstract constructs (Geps) which help us to access them.
  2. Once Struct Type and its accesses is ascertained it gives us ways to change its layout may be
    a) reorder fields,
    b) remove dead fields,
    c) change fields size or type,

Collecting offsets and painting the picture of structure access would be complicated in opaque pointer world or with PtrAdds. Intermittent bit casts would complicate the process (may be like how far you would want to go to understand the things). I do agree that it may or would be possible with PtrAdds tracking the offset and painting the picture but that would be complicated than what we can do in the typed world.

I have attached a sample test case SM1.c and the two IRs

  1. typed IR
  2. Opaque IR

In the typed IR

%14 = call noalias i8* @calloc(i64 noundef %13, i64 noundef 24) #13
%15 = bitcast i8* %14 to %struct.Node1*

Can be treated as a single instruction and this gives a clear clue that ‘calloc’ is allocating an object of type %struct.Node1.

Same applies to the following,

%48 = bitcast %struct.Node1* %15 to i8*
call void @free(i8* noundef %48) #11

Apart from the above there are no
a) bit cast instructions to %struct.Node1*
b) bit cast instructions from %struct.Node1*

Giving a clear clue that %struct.Node1* are accessed as %struct.Node1* itself.
Now any GEP instruction in the entire program operating on %struct.Node1* would be consistent,
Consistent meaning,

  1. %struct.Node1* would be incremented or decremented only in multiples of %struct.Node1 size, when operated through GEP

  2. Any access to fields would only be through GEP instruction only.
    (Note care has to be taken to ensure that pointers to fields are not used as arrays)
    And the GEP would readily provide the types of fields as well.

Here the analysis has not traversed the complete path of how %struct.Node1* flows but has still able to quickly analyze its discipled access in this case. Further here we are not doing lots of book keeping for,

a) which instruction or pointer is of which type, 
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*, 
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1*      

Following picture readily is available in the IR,

    %struct.Node1 = type { i32, i64, %struct.Node1* }
    define internal i32 @func_check(%struct.Node1* nocapture noundef readonly %0, i32 noundef %1)
    define internal void @func(%struct.Node1* nocapture noundef writeonly %0, i32 noundef %1)
    define internal void @func_1(%struct.Node1* noundef %0, i32 noundef %1)
    define internal i32 @func_1_check(%struct.Node1* noundef readonly %0, i32 noundef %1) 
    define internal i32 @func_12_check(%struct.Node1* noundef readonly %0, i32 noundef %1)

All we would be doing is that just confirm or a certain that whatever information is present in IR is correct (no need of any book keeping). With this consistent picture of disciplined access it should be easier to change the layout of the %struct.Node1.

In the Opaque IR

Here it would not be readily evident that

%13 = call noalias ptr @calloc(i64 noundef %12, i64 noundef 24) #13

is allocating a %struct.Node1* .

There is not explicit bit cast instructions any were, we have to traverse all the path of flow of %13 to check if it is accessed in %struct.Node1* way. We need to ensure that no other object get mixed up with the usage path of %13. Thank god at least we still have GEPs which gives some indication that it is operating on %struct.Node1* 's Protocol.

Now we need to do book keeping on ,

a) which instruction or pointer is of which type, 
b) which function has a pointer argument of %struct.Node1*,
c) which global is of %struct.Node1*, 
d) which calloc is of %struct.Node1*,
c) which pointer field in structure is of %struct.Node1*  …..

Since all these information will not be readily available as in opaque world.

%struct.Node1 = type { i32, i64, ptr }
define internal i32 @func_check(ptr nocapture noundef readonly %0, i32 noundef %1)
define internal void @func(ptr nocapture noundef writeonly %0, i32 noundef %1)
define internal void @func_1(ptr noundef %0, i32 noundef %1)
define internal i32 @func_1_check(ptr noundef readonly %0, i32 noundef %1)
define internal i32 @func_12_check(ptr noundef readonly %0, i32 noundef %1)

Once PTRAdd comes in may be guessing which structure types are active also would be like moving in a dark room to examine something !
I felt that opaque pointers and PtrAdds could have been implemented as an optimization pass and could have pitched in at some final stages of compilation or LTO stage!. Rather than being a default from the start or front end.

SM1.c (2.8 KB)
temp_op_ll.c (11.2 KB)
temp_ty_ll.c (13.1 KB)

I’m not super up to speed on LLVM’s infrastructure, but aren’t “so, what type has been allocated here/is being pointed to/…?” questions what the type-based alias analysis hints are for?

Yes, but you can’t rely on those. And it’s only really for C’s notion of disallowing type aliasing

Regarding the type-based alias analysis hints.
Why not approach it from:

  1. I don’t know what type this value is.
  2. Take the alias analysis hint, and test whether it is true, and if so use it as its type.
  3. If the test turns out to the false. i.e. the hint looks wrong, then make inferences as needed, or skip optimizations in these cases.
    I think there was another thread in this area:
    [RFC] Better support for typed pointers in an opaque pointer world

(Disclaimer: I’m an LLVM n00b, so this might be wrong.)

My feeling here is that the middle-end would be much easier forcing the separation, and only merging again in the back-end.

I was looking at code that, roughly, needed a + (b-a) to keep AllocIds correct, and ended up opening Collapse `x/N*N` → `x` when the multiplication is inside a GEP and the division exact · Issue #62124 · llvm/llvm-project · GitHub

That’s the kind of missing optimization that can’t happen if there’s just ptradd without including the multiplication. The optimizer obviously handles removing udiv exact+mul for normal integers (and has done so for ages), so if the address calculation logic just does that, there’s no extra logic needed. But if the multiplication exists inside the addressing, then that’s a separate case that needs to be handled.

(AKA “The lack of a canonical representation regularly leads to optimization failures.”, as was said in the OP.)

I’m coming to this very late, sorry (I broke my shoulder around the time this was proposed and am still catching up on LLVM things), but I enthusiastically support this proposal (I wanted precisely this when opaque pointers were introduced). It gives a clearer canonical form (if you do two different GEPs that end up pointing to the address, how do you tell that they alias? You compute the thing that this gives you already), it maps clearly to a single-provenance model, and it removes the single hardest thing in the IR to explain to new people.

1 Like

Whilst we might need to check for users of ptradd to account for address folding, I think it’s worth bringing up that the current solution isn’t perfect either.

I’m currently looking into the cost modelling associated with GEPs, and in particular was poking around the getGEPCost TTI hook.

getGEPCost checks if the address can be folded into the load/store by calling the target’s isLegalAddressingMode with a type computed from the GEP’s pointee type and indices.
E.g. For %x = getelementptr <4 x i32>, ptr %base, i32 0, i32 4 getGEPCost will invoke isLegalAddressingMode with i32.

Targets that define isLegalAddressingMode use this type as the type of the load/store operation, which is used to estimate the instruction that will be used to a certain degree.

For example on RISC-V, vector loads and stores can’t have an offset folded into them, so it checks if this type is a vector or not to guess if what instruction this is for.

Before the advent of opaque pointers, the load/store type was probably the same as the pointee type.
This doesn’t always hold anymore though, as it’s very much possible to have:

%x = getelementptr f16, ptr %base
%y = load <4 x i32>, ptr %x

So the costs reported by getGEPCost may not even be accurate today.

In fact I think that by separating out the multiply/shift from GEPs, it might make costs more accurate in places.

On RISC-V, if the addressing mode requires both scaling and a base register offset, then 2 instructions are emitted (a shift and an add).
But currently such a GEP is costed as only one.
If we had the separate explicit multiply and ptradd instructions and they were each costed (priced?) individually, we would end up with a cost that’s closer to the generated code.

1 Like