[RFC] `ptr` dialect & modularizing ptr ops in the LLVM dialect

TLDR;

This RFC proposes the ptr dialect to model pointer and low-level memory operations, providing a generalization of the pointer operations in the LLVM dialect, thus making the operations in the dialect reusable and interoperable with high-level dialects.

This RFC is a counter proposal to [RFC] `address` dialect. This proposal arose after chatting with @mehdi_amini , were we agreed that there shouldn’t be type duplication (!llvm.ptr vs address ) and that some ops could be extracted from LLVM, modularizing the dialect.

Why?

  • There’s a need for a reusable pointer type and higher-level pointer and memory operations, as they express ubiquitous concepts, and the lack of thereof limits generic analysis opportunities and IR expressiveness, amongst others.
  • It would allow lowering memref to a more target-independent representation.
  • It modularizes a subset of the LLVM dialect.
  • To allow the modeling of higher-level concepts like GPU constant memory, fat pointers, etc.
  • The possibility of introducing a new optimization layer with low-level pointer alias analysis.
  • The bare pointer convention could be applied as a pass in high-level dialects.

See this related discussion, that goes over some of the above points:

Proposal

Extract a subset of LLVM pointer and memory operations into the Ptr dialect and generalize them by making them directly translatable to LLVM IR and lowerable to other backend dialects like SPIR-V.

Concretely move the following LLVM operations to the Ptr dialect:

  • ptrtoint and inttoptr
  • addrspacecast
  • load and store
  • atomicrw and cmpxchg
  • Creating an opaque pointer type (!ptr.ptr) with a generic address space attribute.
ptr ::= `ptr`  (`<` memory-space^ `>`)?
memory-space ::= attribute-value

Additionally, adding the following operations:

One restriction and design consideration of the pointer dialect is that under the right circumstances (see #llvm.address_space in the LLVM semantics section), it must faithfully model concepts in LLVM IR.

What about the semantics?

Neither operations nor a pointer type determines pointer and memory semantics. Instead, they are determined by the memory model and encoded in the address space.

This proposal proposes introducing an address space attribute interface to encode the memory model of an address space, thus allowing the dialect and operations to be reused.

This interface would allow specifying higher-level concepts like GPU constant memory by rendering the usage of the StoreOp invalid.

LLVM semantics

LLVM semantics would be specified using the #llvm.address_space attribute, thus determining if a particular type can be loaded or stored, what address space casts are valid, etc. For example:

%v = ptr.load %ptr : !ptr.ptr<#llvm.address_space> -> f32 // Is valid.
%v = ptr.load %ptr : !ptr.ptr<#llvm.address_space> -> memref<f32> // Is invalid as the type is not loadable.

This attribute would not only encode semantics but also would be used to close the gap between the generality of the ops in Ptr and the correct modeling of the current LLVM Ops. For example, by helping with the extraction of tbaa metadata from the attribute dictionary in LoadOp.

What are the default semantics?

Currently, accept everything, but this could be changed.

Why not include getelementptr?

GEP requires interaction with structs, which are not yet a concept outside LLVM. Also, see this related discussion on why ptradd might be better for optimizations [RFC] Replacing getelementptr with ptradd.

Why type_offset?

This operation is needed to represent type offsets in the absence of a data layout.

Implementation details

Since this is a big change subject to many comments, instead of creating a series of PRs implementing the full change, a minimal proof of concept with the LLVM::LoadOp can be found in this PR. The objective of this PR is to demonstrate that the approach is feasible, if the proposal is accepted, implementation details can be discussed during review.

Major canges in the PR:

  • The LLVMPointer is removed in favor of ptr::PtrType. In order to preserve the IR representation of the type !llvm.ptr the SharedDialectTypeInterface interface was added and AsmPrinter.cpp modified so that !llvm.ptr remains valid (thus no changes in tests are required):
!ptr.ptr<#llvm.address_space<0>> = !llvm.ptr
!ptr.ptr<#llvm.address_space<1>> = !llvm.ptr<1>
%v = ptr.load %ptr : !llvm.ptr -> f32 // Is valid.
%v = ptr.load %ptr : !llvm.ptr -> memref<f32> // Is invalid as the type is not loadable.
%v = ptr.load %ptr : !prt.ptr -> memref<f32> // Is valid as there are no semantics associated.
  • If ptr.load is used to load from a !llvm.ptr = !ptr.ptr<#llvm.address_space> then, ptr.load can be directly translated to LLVM IR.
6 Likes

Thanks a lot! This matches pretty well what I had in mind :slight_smile:

Here are some details:

  • The MemorySpaceAttrInterface is a bit weird to me right now, for example I don’t quite see why we should always be able to convert a memory space to an integer here. Also I assume it’s just incomplete right now? There is isValidLoad but nothing for store for example. Are there memory space where a pointer couldn’t index?
  • The SharedDialectTypeInterface seems like a hack to me right now: I understand how it can help the transition, but it’s nothing more than that, I don’t see this as an interface that would persist beyond a transition, and as such I’m not convinced it should be exposed that way.
  • Is !ptr.ptr<#llvm.address_space> and !ptr.ptr<#llvm.address_space<0>> the same thing? You’re using both in the RFC.
  • !ptr.ptr<#llvm.address_space> makes it a bit long. We could abbreviate !ptr.ptr<#llvm.as<0>> since this attribute would exclusively be used in this context. I also wonder if!ptr` could somehow default to LLVM semantics? Maybe not because of struct?
  • type_offset : if we have a ptr.add , then would we just need a type_size to get the datalayout independent size of a type and then pass the result to ptr.add? That would allow to keep ptr.add as the only pointer manipulation operation.
  • Your ptradd right now does not address inbound, and that opens up something you haven’t well defined, around “what is memory?”, “what is a memory space?” and “what is a memory object”? :slight_smile: See also: LLVM Language Reference Manual — LLVM 18.0.0git documentation ; also what about aliasing? Can two pointers in different memory spaces alias?
    This kind of things are the things that I’d like to see very well documented in the definition of the “ptr” dialect.
1 Like

Thanks for the proposal! I should start by saying I’m highly supportive of having a pointer dialect and type. It will help with my pet peeve of telling people that memref is not a pointer.

A high-level non-blocking philosophical question: what’s your take on the following? This is the first time we are moving operations out of the LLVM dialect. Historically, we have been maintaining a duplication between, e.g, LLVM and Arith dialects in case we want to make different design decisions than LLVM IR does so divergence is possible. There were precedents of folks strongly insisting that we don’t repeat LLVM IR design mistakes in “MLIR IR”, in particular related to atomics IIRC, which led us to actually add divergence between std and llvm operations back in the day. What are the plans to handle such eventual divergence? Are we subscribing to propagate whatever decision LLVM IR makes to the pointer dialect?

(I know that there is desire to do also deduplicate Arith and LLVM dialect operations, so the answer here will inform that discussion as well.)

High-level question:

  • How do you envision progressive lowering with such dynamically valid type? Some specific examples:
    • Today, we can configure the patterns by saying “all ops from the memref dialect are illegal and should be converted for the conversion to succeed”; with this, it looks like we’d have to say “ops from the pointer dialect that use types with address space other than the llvm-compatible ones are illegal <…>”. This sounds rather involved and may have performance repercussions.
    • How do we make sure we lowerings don’t create invalid IR by, e.g., loading a memref<f32> from !llvm.ptr? We do allow memref-of-memref now… Is this create some sort of ordering requirement on lowerings?

Lower-level questions:

  • Should MemorySpaceAttrInterface be shared with memref?
  • Is SharedDialectTypeInterface supposed to persist beyond the transition phase? Having a smoother transition plan is much appreciated! If this is a feature we’d want in the long term, I’d propose thinking about making type alias declarations parameterizable somehow.
  • What does ptr.constant bring compared to arith.constant + inttoptr?
  • Why type_offset is specifically intended to be used w/o data layout? I think it should be always aware of the data layout, there is just the default layout that is assumed when no layout spec is provided.
  • Do we also need type_sizeof to complement type_offset w/o alignment, e.g., for packed struct equivalent?

And an FYI: a small group of folks led by @wsmoses and myself is building alias and points-to analyses based on the dataflow framework as part of MLIR port of Enzyme. We will be happy to eventually upstream that.

3 Likes

Yes, it’s incomplete, I purposely left out all other methods to keep the proof of concept small. MemorySpaceAttrInterface should answer all of these questions, for example I initially had hasIntToPtr to determine whether ptr.inttoptr is semantically valid or not.

It’s 99% a hack. Now, the reason I introduced it is I consider !ptr.ptr<#llvm.as> way to verbose, and having the shortcut seems useful even beyond a transition, specially for testing.

Yes, !ptr.ptr<#llvm.address_space> = !ptr.ptr<#llvm.address_space<0>> = !llvm.ptr = !llvm.ptr<0>

It could, however, it’d mean %ptr = memref.to_ptr %mem : memref<f32, 1> -> ptr<1> would have LLVM semantics, defeating the purpose of making ptr generic. I very much prefer !llvm.ptr.

My bad, type_offset is type_size, as type_offset produces an int; I just avoided the use of the word size as the offset should also consider alignment.

// memref<f32> *ptr; newPtr = ptr[i]
%tyOff = ptr.type_offset memref<f32> : i32
%off = arith.muli %i, %tyOff : i32
%newPtr = ptr.ptradd %ptr : !ptr.ptr, %off : i32

We can add definitions for those, however, I’d prefer vague definitions that only have concrete meaning when coupled with MemorySpaceAttrInterface. For example, wrt the aliased question, MemorySpaceAttrInterface could handle that with a mayAlias method. For example, llvm.as<0> and llvm.as<3> can alias on NVPTX, but llvm.as<3> and llvm.as<5> cannot.

1 Like

:sweating:, two fold response:

  1. ptr should be designed and restricted so that if we are interacting with pointers in #llvm.address_space, then ptr Ops are always translatable to LLVM IR, however, I don’t think there’s a reason to make the translation 1 to 1 in all cases (in most cases they will be). Also, in many cases we can close the gap by adding attributes in the attribute dictionary of the Op and making llvm.address_space make sense of those attributes. Thus, divergence of ptr wrt LLVM IR is well defined.
  2. Now, what about divergence of LLVM IR wrt ptr? In those cases, we can update ptr or create LLVM-specific ops for those divergent cases.

Also, I’m already proposing to diverge by not including gep but ptradd.

  1. To keep the dialect generic all ops would have to be dynamicallyLegal in a conversion and I’m 99% sure there would be a performance hit.
  2. I envision a n >= 2 stage approach. For example, for Memref->LLVM:
    i. We first perform Memref → Ptr with the GreedyRewriter as there are no type conversions, and optimizations could be performed at this stage.
    ii. Then, Ptr → LLVM uses the LLVM type converter to change all address spaces in !ptr to the LLVM address space and to convert operand & result types, and at this stage is were we properly verify the validity of LLVM using #llvm.address_space.

Yes, there’s nothing preventing it to be moved to Interfaces/.

The specific implementation is up for debate, however, I do think we need an alias, as !ptr.ptr<#llvm.as<3>> is too verbose for !llvm.ptr<3>. I’d definitely like to see parameterizable alias types, however, the benefit of SharedDialectTypeInterface is that no updates to the tests are required.

Convenience, but also inttoptr might not be available in all address spaces, and in those cases ptr.constant 0 : !ptr.ptr<#my_weird_address_space> would foot the bill.

Yes, it should be possible to determine the type offset from the data layout. It’s not mentioned in the proposal, but I already have in mind a pass that substitutes type_offset with arith constants using the data layout.

Now, type_offset exists because there’s no guarantee that the correct data layout will be available from the start -e.g. in cases where the target has not been determined, and in those cases a different mechanism should exist to express the type offset (ptr.type_offset).

We could add it. I think type_offset could be reused -we only need to add an attribute.

1 Like

What is proposed lowering strategy for backends with typed pointers (SPIR-V)?
I think is should be spelled somewhere.

1 Like

That’s a good question. I’m still not convinced on having typed pointers, LLVM’s rationale for moving away from typed ptrs is pretty good in general, however, its up for discussion.

How to lower Ptr to SPIR-V? !ptr could be converted to !spirv.ptr<i8>, and we use ptr casts for the remainder of operations.

1 Like

Bitcasting to/from i8* (with same address space) is valid per SPIR-V spec but we cannot guarantee it won’t hurt optimizations in backends.

Also, ping @kuhar @antiagainst on SPIR-V

Actually I slept on it, and I can see how it can be spinned as some sort of “type alias” (thanks for @ftynse comment!) between dialects. Independently of this proposal, it may be worth it to extract this in a PR of its own right now if you’re up for it!
(While I’m here: your implementation should be updated for the “default dialect” feature by the way)

You can offload a lot of things to the interface, but you still need to define them! That is the interface is making something “parametric” but we still need to describe the concepts and how it can be “parameterized” by the interface. You may see it as “this is the documentation of the interface”, and that’s OK: but I see this interface and its documentation as part of the ptr type itself actually.

Does this mean that the ops in the ptr dialect become somehow aware of llvm.address_space and LLVM specifics, for example, by specializing behavior based on the presence of that attribute? We usually avoid such inter-dialect dependencies. Especially in this case, SPIR-V folks may not even want to load the LLVM dialect.

Where those ops would live?

I suppose llvm.gep still exists, at least for as long as LLVM IR continues to have it, just not lifted to the ptr dialect?

It would be nice to quantify that somehow and be aware of this since we may be increasing compilation time on the critical path for many flows.

mermef → ptr is a type conversion, so it cannot be handled by the greedy rewriter with regular patterns… It just isn’t a conversion driven by LLVMTypeConverter. Multi-stage approach is fine here, I’m trying to slowly move us into the direction of such conversions being discoverable and configurable with more automation than the current mess of convert-x-to-y half-baked passes, and it feels like having better separation of concerns thanks to types and better-defined legality rules would help there. There’s a question of cost though.

Let me rephrase a bit. I was likely confused by the wording that suggested type_offset somehow computes the size/offset in absence of the data layout. Yet our current understanding is that the data layout is never absent. It may have some assumed defaults and therefore not expressed as attributes, but it’s there. This is inherited from LLVM IR. Maybe we should revise that assumption as it doesn’t necessarily make sense in high-levels IR with before they are target-specialized, for example, by supporting explicit “not-yet-defined” values for the layout queries. The pass you are having in mind is constant folding :slight_smile: In the current model, type_offset is always constant-foldable, but we don’t want it to always be that. This can be achieved by having the “not-yet-defined” value in the layout, or by adding literally a nofold attribute to the op similarly to what we have on tensor.pad. Such an attribute can be removed then desired, and the usual constant folding will kick in as part of most rewriters.

Wild idea: let’s encode the type in address space as !ptr.ptr<#spirv.type<f32>>.

I was thinking specifically of something like

template <T> alias !llvm.ptr<T> = !ptr.ptr<#llvm.address_space<T>>

if you excuse the C++ syntax in IR.

I see the appeal of handling it implicitly in the asm printer/parser, especially having done the previous type transition for the LLVM dialect myself for dozens of dependent projects, but there is value in the IR being explicit too.

Specifically, I wonder how this would interact with the current work on giving types explicit names similar to operations: RFC: Exposing type and attribute names in C++.

No, but passes and translation might be aware. For example:

%a = ptr.load %ptr: !ptr -> f32 {access_group = #llvm.access_group ... }

non-LLVM passes would ignore access_group, but in translation to LLVM we can require the usage of llvm.as and then use LLVM::AddressSpaceAttr::getAccessGroup(loadOp) to retrieve the information.

If the divergence makes sense in ptr, then ptr otherwise LLVM.

Exactly.

Sorry, I was not fully explicit, in MemRef → Ptr I’m talking about a partial rewrite. Some memref ops could outlive the transformation (memref producers) and no type conversion would be performed, so something like this:

%alloca = memref.alloca() : memref<2xf32>
memref.store %val, %alloca[%i] : memref<2xf32>
// Transformed into
%alloca = memref.alloca() : memref<2xf32>
%allocaPtr = memref.to_ptr %alloca :  : memref<2xf32>
%f32Off = ptr.type_offset f32 : index 
%off = index.mul %f32Off, %i
%ptr = ptr.ptradd %allocaPtr, %off : ptr
ptr.store %val, %ptr : f32

However, a one-shot MemRef → LLVM should also be possible.

This would be great.

I was also thinking a no-op %ptr = ptr.assume_element f32 %ptr : ptr should help close the information gap.

This would also be cool to have.

I think it should also be possible to adapt for the other proposal, getName would check for the interface and return the name accordingly.

1 Like

Having read the thread (but not the proposal), I’ll note that #gpu.address_space<> has platform-specific lowering to integers.

Is the long-term plan here something like GPUToPtr with different passes for AMDGPU/NVPTX/… to get the correct LLVM pointer types? (Or SPIR-V ones)

Also, re typed pointers, one could abuse the address space attribute to include a type?

1 Like

The immediate plan would mean keeping the conversion #gpu.address_space<>int in convert-gpu-to-[nvvm|rocdl], with those passes pulling the relevant ptr patterns.
However, I think this might be the right moment to design a better approach. For example, one option is preserving #gpu.address_space<workgroup> and using target information to retrieve the numeric value on translation. Another option is introducing [nvvm|rocdl].address_space and have an interface that returns the numeric value.

Technically yes, and unfortunately it’s not hard. However, the only translatable address space to LLVM IR would be llvm.address_space, which is opaque.

I don’t think we want to go down this road. There are other solutions we might want to explore. A similar solution without the need of hacking the element type in the address space would be making opacity (or “typedness”) optional and making the element type an attribute.

Following on:

  • Operations defining ptr types will have some kind of type information attached:
    • alloca, following LLVM IR, will receive the type as an attribute
    • Type information from operations originating ptr values can be retrieved in a recursive way following the def-use chain knowing, e.g., that the ptr generated by ptradd will have the same type as the original ptr
  • Operations using ptr types will also provide type information:
    • load, store, etc., implicitly give the element type
    • Metaoperations like ptr.assume_element_type <element_type> %ptr : <ptr_type> would also provide this kind of information (no need to return a value here, this would already provide type information for the argument %ptr).

For reference, Khronos’ SPIRV-LLVM-Translator uses a SPIRVTypeScavenger to gather this information, thus being able to produce typed SPIR-V pointers from opaque LLVM pointers, introducing SPIR-V OpBitcast operations when needed to follow SPIR-V typing rules.

It’s not clear to me where this is coming from? Even in LLVM I would think that the transition to opaque pointer should lead to only load/store having types. I believe function arguments already lost their types? What about a call function return a ptr?

Actually maybe you meant “allocating” instead of “defining”? Even then I’m not sure malloc() is typed beyond i8*.

Some key operations (load/store) need it to model the destination “register” and how much to load, but not all right?

My wording was far from best, my bad. In both cases, it should be “some operations”, e.g., ptr.constant (defining a ptr value) would provide no type information, as well as llvm.intr.memcpy (receiving two ptr values), but ptr.from_memref (defining a ptr value whose element type will be given from a memref argument) and alloca (typed in LLVM IR) (defining a ptr value whose element type will be given by an attribute) would. In case no type information could be inferred, a default type (Khronos’ SPIR-V translator goes with i8) might be used instead. Note that some operations don’t provide “direct” type inference information, i.e., “this pointer is a pointer to i32”, but also might provide constraints, e.g., “these two pointers have the same element type”. This kind type inference would be provided in operations using an interface.

However, this is just what the SPIR-V translator is doing now as it has to handle opaque pointers coming from LLVM (also what I understand the WIP LLVM SPIR-V backend will do). ptr -> spirv conversion might work differently, as ptr would not need to follow LLVM IR anymore. Alternatively, (optionally) attaching type information to ptr (via attributes or operations such as assume_element_type) might be another viable solution.

TLDR, I envision three possible solutions (there might be more and better, ofc):

  • assume_element_type to force pointer type or use default type;
  • Generalizing previous option with the interface I describe above;
  • (Optionally) Attaching type information to ptr.

TBH, I just went on mentioning these three alternatives because I just didn’t like the “typed address space” solution too much. It just sounds hacky to me and pretty much equivalent to the non-hacky “optionally typed pointers” solution.

1 Like

Apologies for the delay; right after I posted this, I got swamped by the Fall semester.

I implemented a significant portion of the proposal in the GitHub PR #73057 to test for performance downgrades. However, that PR only includes moving ptr ops from LLVM to the Ptr dialect. The PR includes:

  • Introduction of the Ptr dialect and the !ptr.ptr Type.
  • Implementing the LoadOp, StoreOp, AtomicRMWOp, AddrSpaceCastOp, IntToPtrOp and PtrToIntOp operations. This change includes operation verification, all the respective interfaces, and translations to LLVM IR, so NFC.
  • Removing the above operations from the LLVM dialect and substituting them with Ptr dialect operations.
  • With the exception of import to LLVM, all tests in MLIR and Flang are passing. Import to LLVM fails because that translation has not yet been implemented.

To test for performance downgrades, I used ninja check-mlir and ninja check-flang and ran them 10 times before and after the change. Everything was compiled with Release+Assertions, and with target X86, NVPTX and AMDGPU. The results are:

ninja check-mlir

Pre-patch average: 43.525s
Pre-patch standard deviation: 0.335s
Post-patch average: 43.793s
Post-patch standard deviation: 0.125s
Performance degradation from the switch: 0.62%

ninja check-flang

Pre-patch average: 55.961s
Pre-patch standard deviation: 0.11s
Post-patch average: 56.688s
Post-patch standard deviation: 0.273s
Performance degradation from the switch: 1.3%

Note: Those timings correspond only to running the test suite, MLIR and Flang building times are not included.

In both cases there is a less than 1.5% performance degradation from making the change.

Relevant files in the PR:

  • MemorySpaceInterfaces.td: Declares the interface governing the operations, indicating for example if a type is loadable, if a given IntToPtr cast is valid or not and so on.
  • PtrOps.td: the operations in the dialect.

Edit:

I created a synthetic test to further test the performance downgrade:

for i in range(0, 100):
    print("func.func @func{0}(%A: memref<?xi32>, %i: index) {{".format(i))
    for j in range(0, 2000):
        print("  %{} = memref.load %A[%i] : memref<?xi32>".format(j))
        print("  memref.store %{}, %A[%i] : memref<?xi32>".format(j))
    print("  return")
    print("}")

Then I ran:

  • time mlir-opt test.mlir --finalize-memref-to-llvm --convert-func-to-llvm --canonicalize -o llvm.mlir
  • time mlir-translate --mlir-to-llvmir llvm.mlir -o llvm.ll

In both cases after running the tests multiple times, both pre and post patch, there’s a performance degradation of at least 16%

3 Likes

Late to this thread. In general +1 to modularing and sharing types that we have a need across multiple targets.

The SPIR-V working group is aware of the opaque pointer issue and is exploring directions. Though as like normal, we need to support lots of different target environments and historical versions, so the typed pointer will certainly stay for a long time. Designing the common pointer type to allow carrying an optional type (and address space) would be nice.

Thanks for all this work!

I don’t have access to the full data, but guessing from aggregate statistics, this difference is likely not significant statistically.

Do we have a better way of attributing this? Is it conversion or translation? (If you included the patterns for memref-to-ptr conversions into finalize-memref-to-llvm, it looks unlikely that conversion is slowing things down here). If it’s translation, is it due to more dispatch? Maybe the slowdown is due to infrastructure being insufficiently optimized and we can fix it.

In general, I’m also supportive of modularization here.