TLDR;
This RFC proposes the ptr dialect to model pointer and low-level memory operations, providing a generalization of the pointer operations in the LLVM dialect, thus making the operations in the dialect reusable and interoperable with high-level dialects.
This RFC is a counter proposal to [RFC] `address` dialect. This proposal arose after chatting with @mehdi_amini , were we agreed that there shouldn’t be type duplication (!llvm.ptr vs address ) and that some ops could be extracted from LLVM, modularizing the dialect.
Why?
- There’s a need for a reusable pointer type and higher-level pointer and memory operations, as they express ubiquitous concepts, and the lack of thereof limits generic analysis opportunities and IR expressiveness, amongst others.
- It would allow lowering
memrefto a more target-independent representation. - It modularizes a subset of the LLVM dialect.
- To allow the modeling of higher-level concepts like GPU constant memory, fat pointers, etc.
- The possibility of introducing a new optimization layer with low-level pointer alias analysis.
- The bare pointer convention could be applied as a pass in high-level dialects.
See this related discussion, that goes over some of the above points:
Proposal
Extract a subset of LLVM pointer and memory operations into the Ptr dialect and generalize them by making them directly translatable to LLVM IR and lowerable to other backend dialects like SPIR-V.
Concretely move the following LLVM operations to the Ptr dialect:
ptrtointandinttoptraddrspacecastloadandstoreatomicrwandcmpxchg- Creating an opaque pointer type (!ptr.ptr) with a generic address space attribute.
ptr ::= `ptr` (`<` memory-space^ `>`)?
memory-space ::= attribute-value
Additionally, adding the following operations:
ptraddto add a pointer and an integer and obtain a pointer, see possible implementation and rationaletype_offsetto represent the offset of a type, see possible implementationconstantto model a constant pointer addressesnullptras the concrete value of anullptris not always 0from_ptrandto_ptrin thememrefdialect, allowing high-level interaction with thememrefdialect. This would ops would serve a similar purpose asfrom_memrefandto_memrefpresented in [RFC] `address` dialect
One restriction and design consideration of the pointer dialect is that under the right circumstances (see #llvm.address_space in the LLVM semantics section), it must faithfully model concepts in LLVM IR.
What about the semantics?
Neither operations nor a pointer type determines pointer and memory semantics. Instead, they are determined by the memory model and encoded in the address space.
This proposal proposes introducing an address space attribute interface to encode the memory model of an address space, thus allowing the dialect and operations to be reused.
This interface would allow specifying higher-level concepts like GPU constant memory by rendering the usage of the StoreOp invalid.
LLVM semantics
LLVM semantics would be specified using the #llvm.address_space attribute, thus determining if a particular type can be loaded or stored, what address space casts are valid, etc. For example:
%v = ptr.load %ptr : !ptr.ptr<#llvm.address_space> -> f32 // Is valid.
%v = ptr.load %ptr : !ptr.ptr<#llvm.address_space> -> memref<f32> // Is invalid as the type is not loadable.
This attribute would not only encode semantics but also would be used to close the gap between the generality of the ops in Ptr and the correct modeling of the current LLVM Ops. For example, by helping with the extraction of tbaa metadata from the attribute dictionary in LoadOp.
What are the default semantics?
Currently, accept everything, but this could be changed.
Why not include getelementptr?
GEP requires interaction with structs, which are not yet a concept outside LLVM. Also, see this related discussion on why ptradd might be better for optimizations [RFC] Replacing getelementptr with ptradd.
Why type_offset?
This operation is needed to represent type offsets in the absence of a data layout.
Implementation details
Since this is a big change subject to many comments, instead of creating a series of PRs implementing the full change, a minimal proof of concept with the LLVM::LoadOp can be found in this PR. The objective of this PR is to demonstrate that the approach is feasible, if the proposal is accepted, implementation details can be discussed during review.
Major canges in the PR:
- The
LLVMPointeris removed in favor ofptr::PtrType. In order to preserve the IR representation of the type!llvm.ptrtheSharedDialectTypeInterfaceinterface was added and AsmPrinter.cpp modified so that!llvm.ptrremains valid (thus no changes in tests are required):
!ptr.ptr<#llvm.address_space<0>> = !llvm.ptr
!ptr.ptr<#llvm.address_space<1>> = !llvm.ptr<1>
- The MemorySpaceAttrInterface interface was added to model address space semantics. See
LLVM::AddressSpaceAttrfor the description of the LLVM interface. The specification ofloadsemantics is specified inisValidLoad. Thus we have that:
%v = ptr.load %ptr : !llvm.ptr -> f32 // Is valid.
%v = ptr.load %ptr : !llvm.ptr -> memref<f32> // Is invalid as the type is not loadable.
%v = ptr.load %ptr : !prt.ptr -> memref<f32> // Is valid as there are no semantics associated.
- If
ptr.loadis used to load from a!llvm.ptr = !ptr.ptr<#llvm.address_space>then,ptr.loadcan be directly translated to LLVM IR.