@jayfoad Thanks for the info on what’s going on with that one test.
I’ve added the auto-upgrade code as planned and pushed it up to the branch.
As far as I’m concerned, this is most of the codegen-side phase of the project, and the next steps would be implementing address space 7 in terms of these new address space pointers.
The main things to do further down the stack would probably be to create a PseudoSourceValue that holds %rsrc, %index, %offset, ... bundles for machine-level alias analysis purposes and then to actually do more complete alias analysis on buffer operations.
I’m also (having also seen @nhaehnle’s comments over on [RFC] Replacing getelementptr with ptradd - #15 by jrtc27 as well) thinking that what may be useful - instead of marking these addrspace(8) pointers non-integral, we could mark them … “non-linear” or “non-flat”, but that’s a separate discussion.
If you wanted to hack and make it always readnone, you can apply it to the specific instance’s callsite without applying it to the definition / declaration
I’ve got a vague recollection that it is tagged as readnone because alias analysis won’t work for buffer operations, and s_buffer_load is only really used with const buffers.
I agree it’s not great, and also that we should move to callsite attributes if that’s still required.
Presumably the work to properly represent buffer descriptors will mean that alias analysis will work correctly anyway?
We could run some performance analysis with Vulkan to see if there’s a regression as a result of not marking it readnone.
One unfortunate thing I found from CI runs on those revisions is that - at the very least - MLIR won’t send intrinsic calls it’s creating through auto-upgrade, since it just pulls the new function definition. So not breaking clients will be tricky and might involve use of defm to create <4 x i32> and ptr addrspace(8) versions of the intrinsics or something … not sure exactly what we want to do here.
Users have to be updated in some way or another. I thought we already had type mangling for the resource descriptor for <8 x i32> vs <4 x i32>, but it looks like the new intrinsics hardcode to 4x. If the resource descriptor were type mangled, you wouldn’t need to have a separately named intrinsic. If you want to do it in 2 steps it might help to introduce type mangling for the descriptor
I don’t think any of the buffer intrinsics have type mangling? My understanding is that the “might be 8xi32, might be 4xi32” thing is only for texture descriptors on image intrinsics, which I’m not touching (though I’d be wiling to argue for ptr addrspace(9) for images, especially since they’ve got fun alignment properties).
My theory of updating users is that they could be transparently updated, since, as far as the underlying value is concerned, <4 x i32> -> i128 -> ptr addrspace(8) is a no-op and just some type casts that the compiler could do … and it looks like the compiler can automatically insert those when presented with IR using the old forms.
… and, having checked, the image intrinsics are hardcoded to <8 x i32> as well, so any type mangling is gone now? (which leaves me low on examples on how to make this change backwards-compatibly)
We might end up wanting a ptr addrspace(7/8) @llvm.amdgcn.pointer.to.resource(ptr addrspace(*) base, iN metadata) to help the analysis along, but that’s future us, I think
@arsenm Since you mentioned “type mangling” above … I can’t seem to find a mechanism that’d work for having an intrinsic that takes either v4i32 or ptr addrspace(8) values … especially since you can’t have a nocapture <4 x i32>, for example.
It would be good to avoid flag-day changes that break downstream users. In the past, we’ve sometimes added parallel (“new”) intrinsic definitions for this reason.
How important is nocapture, though? At least in the graphics use cases, nocapture on buffer pointers seems relatively useless because buffer pointers are de facto always passed in from outside anyway. And readonly-ness / memory effects can just be put on the function, they don’t have to be on the argument. So it seems like an “any type”-style mangling should work.