Representing buffer descriptors in the AMDGPU target - call for suggestions

@jayfoad Thanks for the info on what’s going on with that one test.

I’ve added the auto-upgrade code as planned and pushed it up to the branch.

As far as I’m concerned, this is most of the codegen-side phase of the project, and the next steps would be implementing address space 7 in terms of these new address space pointers.

The main things to do further down the stack would probably be to create a PseudoSourceValue that holds %rsrc, %index, %offset, ... bundles for machine-level alias analysis purposes and then to actually do more complete alias analysis on buffer operations.

I’m also (having also seen @nhaehnle’s comments over on [RFC] Replacing getelementptr with ptradd - #15 by jrtc27 as well) thinking that what may be useful - instead of marking these addrspace(8) pointers non-integral, we could mark them … “non-linear” or “non-flat”, but that’s a separate discussion.

Also, since I’m here, why was s.buffer.load a ReadNone intrinsic before?

Making assumptions about graphics workloads and fear of losing optimizations. Could always have done callsite attributes instead

callsite attributes?

(In this branch, I make s_buffer_load use a memory read)

If you wanted to hack and make it always readnone, you can apply it to the specific instance’s callsite without applying it to the definition / declaration

Makes sense, I just don’t know who’s relying on that behavior - doesn’t seem to be any of our tests, for one thing

All of the old weird things may be relied on by Mesa

Would you know how to test that?

Also, should I go ahead and reorganize the branch into a few commits and put them up on Phabricator?

Vaguely, run piglit? Don’t know if that actually covers performance

Yes

The revisions are

⚙ D145441 [AMDGPU] Define data layout entries for buffers - updating the data layout string
⚙ D146761 [AMDGPU] Make buffer intrinsics pointer-valued - making intrinsics pointer-valued
⚙ D146762 [Verifier] Ban GEP, load, store of addrspace(8) on AMDGPU - updating the verifier

I’ve got a vague recollection that it is tagged as readnone because alias analysis won’t work for buffer operations, and s_buffer_load is only really used with const buffers.

I agree it’s not great, and also that we should move to callsite attributes if that’s still required.

Presumably the work to properly represent buffer descriptors will mean that alias analysis will work correctly anyway?

We could run some performance analysis with Vulkan to see if there’s a regression as a result of not marking it readnone.

One unfortunate thing I found from CI runs on those revisions is that - at the very least - MLIR won’t send intrinsic calls it’s creating through auto-upgrade, since it just pulls the new function definition. So not breaking clients will be tricky and might involve use of defm to create <4 x i32> and ptr addrspace(8) versions of the intrinsics or something … not sure exactly what we want to do here.

Users have to be updated in some way or another. I thought we already had type mangling for the resource descriptor for <8 x i32> vs <4 x i32>, but it looks like the new intrinsics hardcode to 4x. If the resource descriptor were type mangled, you wouldn’t need to have a separately named intrinsic. If you want to do it in 2 steps it might help to introduce type mangling for the descriptor

I don’t think any of the buffer intrinsics have type mangling? My understanding is that the “might be 8xi32, might be 4xi32” thing is only for texture descriptors on image intrinsics, which I’m not touching (though I’d be wiling to argue for ptr addrspace(9) for images, especially since they’ve got fun alignment properties).

My theory of updating users is that they could be transparently updated, since, as far as the underlying value is concerned, <4 x i32> -> i128 -> ptr addrspace(8) is a no-op and just some type casts that the compiler could do … and it looks like the compiler can automatically insert those when presented with IR using the old forms.

… and, having checked, the image intrinsics are hardcoded to <8 x i32> as well, so any type mangling is gone now? (which leaves me low on examples on how to make this change backwards-compatibly)

In positive news, my branch makes alias analysis work on buffer resource functions, at least in a basic capacity llvm-project/buffer-rsrc-ptrs.ll at buffer-improvements · krzysz00/llvm-project · GitHub

We might end up wanting a ptr addrspace(7/8) @llvm.amdgcn.pointer.to.resource(ptr addrspace(*) base, iN metadata) to help the analysis along, but that’s future us, I think

@arsenm Since you mentioned “type mangling” above … I can’t seem to find a mechanism that’d work for having an intrinsic that takes either v4i32 or ptr addrspace(8) values … especially since you can’t have a nocapture <4 x i32>, for example.

Trying to combine unrelated types this way is a bit exotic. TableGen probably won’t help you at all with this as-is

It would be good to avoid flag-day changes that break downstream users. In the past, we’ve sometimes added parallel (“new”) intrinsic definitions for this reason.

How important is nocapture, though? At least in the graphics use cases, nocapture on buffer pointers seems relatively useless because buffer pointers are de facto always passed in from outside anyway. And readonly-ness / memory effects can just be put on the function, they don’t have to be on the argument. So it seems like an “any type”-style mangling should work.

Pointer annotations could always be introduced to the intrinsic definitions after all users are migrated

My intuition is that nocapture is pretty important for alias analysis, so that the load/store/… looks like loads/stores/… .

On top of that, it’s not really meaningful to migrate from memory(read) to memory(argmem: read) with no pointer argument.

So we may want a “new” intrinsic and to migrate everyone over with deprecation warnings