[RFC] Adding instructions to to carry GEP type traversal information

Credit note: multiple persons form the SPIR-V backend meeting provided input to write this RFC such as @michalpaszkowski @MrSidims

Motivation

While ptradd migration simplifies pointer arithmetic and enables optimizations by explicitly materializing offset calculations, it discards the high-level, type-based path information inherent to the GEP instruction. For targets like SPIR-V, this causes 2 challenges:

  • SPIR-V requires a structured path to generate instructions like OpAccessChain; this information is critical and lost with ptradd.
  • The memory layout of some objects is unknown at compile time, meaning using ptradd/a byte offset is not viable.

Without the context of how an address is derived through nested structures and arrays, the backend can no longer reliably reconstruct the frontend’s original intent, posing a major correctness and functionality problem.

For a more detailed description of the issue, see this comment, but the root is that some linearized access expressions cannot be lifted back to a structured access as multiple, but incompatible, valid solutions exist.

The only robust way to solve this problem is to get additional information from the frontend. At a high-level, there are 2 main ways we could add this information:

  • replacing the ptradd/gep instruction with an instruction carrying structure information.
  • adding sideband information (metadata, interleaved instructions, …)

Proposed solution

We propose that Clang emits BPF’s llvm.preserve.*.access.index instructions instead of ptradd/GEP when targeting a backend requiring structured access (such as SPIR-V). As of today, only the HLSL frontend would require this change.

In order to represent array access with dynamic indices, we would require relaxing the llvm.preserve.array.access.index instruction to accept non-constant indices.

With those 2 changes, we should only get structured access into nested types, and would be able to generate valid OpAccessChain from LLVM-IR.

The main drawback is the lack of optimizations: since GEP are not used, we won’t benefit from existing GEP optimizations. This is not only an issue for performance, but also for legalization: for example, removing local copies of non-tangible types.

Most passes like inlining, DCE or sinking should not be impacted by those, but we would require adding support for those intrinsics to at least the SROA pass.

Alternatives

Add new target-specific intrinsics interleaved with GEP instructions

%ptr = getelementptr i8 %base_ptr, i32 %offset
%tmp = spv.type.access(%ptr, %base_ptr, %TheType poison, i32 index) 
%val = load i32, ptr %tmp

This would be similar to the proposed solution, except we’d use a target-specific intrinsic.
By not reusing the existing instructions, we’ll have to extensively modify Clang’s codegen to emit those intrinsics in place of GEP instructions when targeting SPIR-V. I see no benefit, except if the community wants us to stay away from the existing instructions.

Annotate GEP instructions with metadata.

  • This is a no-go.
  • passes are allowed to strip metadata they don’t know about.
  • passes rewriting GEP would not have to know about those to properly merge/update the information.

Add a new target-specific intrinsic, interleaved but without return value.

%ptr = getelementptr i8 %base_ptr, i32 %offset
spv.type.access(%ptr, %base_ptr, %TheType poison, i32 index) 
%val = load i32, ptr %ptr

I think this has the same issues as the metadata: the passes will have no knowledge of the semantic carried by this intrinsic, and could replace the GEP used by the load without updating the annotation.

This is very much related to @sebpop ‘s work for DependenceAnalysis and Delinearization. In this proof-of-concept patch, he emits array dimensions info from alloca and global variables in the front-end in the form of assume intrinsics. See the test cases in that patch, but to give a quick impression, it would emit just one assume:


call void @llvm.assume(i1 true) [ "array_info"(ptr %arr, i64 2, i64 10, i64 20, i64 4) ]

for a test case like this:

void test_2d_loop_access(int n, int m) {
  int arr[10][20];
  for (int i = 0; i < n && i < 10; i++) {
    for (int j = 0; j < m && j < 20; j++) {
        arr[i][j] = i * 20 + j;
        sink = arr[i][j];
    }
  }
 external_use(arr);
}

This is exactly the same problem Delinearization needs to solve:

but the root is that some linearized access expressions cannot be lifted back to a structured access as multiple, but incompatible, valid solutions exist.

I am broadly in favor of this proposal, in the sense that we need some way to represent structured GEPs across backends.

Importantly, this is addressing a correctness problem. There are use cases that either cannot use offsets (because they are unknown) or because the underlying “hardware” does not work on offsets. As such @sjoerdmeijer this is nearly entirely unrelated to the delinearization issue, which is about an optimization issue. The solutions for these are going to be very different.

Having said that, I’m not sure that directly reusing the BPF llvm.preserve.*.access.index intrinsics is a good idea. I’m concerned by the degree that these still depend on LLVM IR types, using elementtype to encode the struct type for example. This is not good if you have a struct-like type, which does not actually have the layout of an LLVM struct. The array index representation is also quite peculiar.

What I’d personally like to see is something generic along the lines of:

ptr @llvm.structural.gep.p0(metadata !TypeDescriptor, ptr %p, i64 %i1, i64 %i2, ...)

Where the type description is done in metadata, independent of LLVM’s type system.

Combined with a generic constraint that notional over-indexing (and memory access outside the index) is forbidden, this allows things like SROA and AA to work without any actual understanding of the !TypeDescriptor.

Also cc @nhaehnle, who I think mentioned wanting to have a structural GEP for something in the past.

1 Like

What I’d personally like to see is something generic along the lines of:
ptr @llvm.structural.gep.p0(metadata !TypeDescriptor, ptr %p, i64 %i1, i64 %i2, ...)

This is also fine for us, and I’d personally prefer a single opcode to re-introduce a GEP-like instruction vs one per access type like BPF’s
I suppose metadata used in an intrinsic calls cannot be removed right?

Yes. Metadata in intrinsic arguments cannot be dropped.

1 Like

+1 for an llvm.structural.gep. We built something like it for our downstream compiler, having it in upstream would be useful for us.

For context, in our case it’s something that exists in the frontend part of our compiler because we consume SPIR-V and need a way to represent OpAccessChain for early transforms before we can lower it into the appropriate HW-specific constructs (which are often pointer arithmetic / ptradd, but not always).

1 Like

Would you mind sharing some more details of how this looks like in your downstream compiler? (I’m particularly interested in how you encode type information.)