Hi, I’m currently trying to improve the mapping between MLIR types and Swift types in the Swift bindings. Essentially, I would like to have an operation like “get Swift type” on MLIR types which would consult a mapping from TypeID to the correct Swift type. My hope was that I could store this mapping inside of MLIRContext so for any type I can just grab the context and grab the mapping (which might be null if the context didn’t come from Swift).
My initial attempt at this was to create a SwiftMLIRContext subclass which would own the mapping, but it seems I can’t dynamic_cast MLIRContext to SwiftMLIRContext because MLIRContext is not a polymorphic class type (and dynamic_cast requires RTTI which is not enabled in my build).
I’m wondering if anyone has thoughts on a good path forward here. Am I missing something obvious in my limited C++ understanding? Would folks be open to just adding a “platformData” pointer to MLIRContext (though this may be fragile if more than one thing in a process tries to use this pointer)? Is there a better way to accomplish this?
One thing that could work is putting this information directly in a dialect (maybe in your case you have a SwiftDialect that you already use).
This is kind of equivalent to putting it in the MLIRContext, since you then can access it with ctx->getOrLoadDialect<SwiftDialect>().
Just to be explicit, getOrLoadDialect will still incur a runtime cost and this is an operation that will happen frequently, so I’m hoping something more straightforward (like a pair of pointers on MLIRContext) is possible.
Can you just store the “SwiftType *” in your MLIR types’s Storage struct?
Taking a step back: I’m a little curious why this backtracking to the swift type happens so pervasively in your code. My intuition from previous systems is that this type of thing tends to be localized to a small set of passes, so a local helper specific to those passes would be enough. I’m having a hard time imagining the scenarios where you need ad-hoc access to the original swift type from, say, canonicalizer hook or some such. Can you explain more why you need this?
The problem is that it isn’t my MLIR Type. To be more specific, I’m looking at implementing something like IntegerType in Swift, which would be backed by milr::IntegerType, and if you had an unknown type in Swift (let foo: TypeProtocol) you could get at the underlying type with a Swift cast (if let foo = foo as? IntegerType).
Overall, the goal is to eventually have a robust set of Swift bindings which can both produce MLIR (i.e. be used to create a DSL embedded in Swift) as well as implement things like passes. The “recover the Swift Type” step is more interesting for the latter category.
I don’t see a good way of doing this - type objects are not extensible by design (and we care a lot about their size, since they are immortal). If you need a parallel data structure for MLIRContext, you can keep track of it next to your MLIRContext instead of inside of it.
This is one of the challenges of MLIR being extensible - we need to be able to allow library based extension without a centralized numbering authority. The current solution for this is mlir/Support/TypeID.h which provides a unique pointer for each C++ subclass. The internals of which boil down to having the linker unique C++ global variables:
This allows you to implement dyn_cast on types in terms of TypeID, but isn’t super efficient, particularly for range based subclasses.
In the case of MLIRSwift, I think that a properly typed protocol based approach will require us to have a “pointer to pointer map” next to MLIRContext that is lazily populated based on TypeID. This is not going to be incredibly efficient, but is the best we can do.
The other alternative is to directly ape the C++ API and have our own dyn_cast etc functions. This won’t be as nice as using as? but would have the same performance characteristics.
By “type object” do you mean the instance of the type (a pointer to something in a context), the context storage (whatever data needs to be stored per type) or the C++ class? Neither the context solution (storing a pointer in MLIRContext to a typeID to Swift function mapping), nor a type C++ class based solution (adding one pointer to the C++ class representing Type) would increase the size or affect the performance of the first two definitions of type object. Either of these solutions can be gated by a compile time flag and only enabled when the platform would benefit.
The simplest solution (a context to mapping mapping, or a dummy dialect) incurs an extra hash table lookup over just having a single extra pointer per context. Either approach and allows types to be mapped differently based on context. Alternatively, adding a pointer the the Type base class (one extra pointer per type definition) would eliminate a second hash table lookup (type ID to mapping) at the cost of the mapping being global, and marginal extra complexity around initializing this mapping (which we may even be able to eliminate using weak symbols in the linker).
There is a tangential issue of whether or not we want to represent an “unconverted type” in Swift which doesn’t incurs the cost of this conversion at all (i.e. for passing directly to another API that just accepts “some type”), but I figured we can add that if we feel its necessary (Just adding an OpaqueType which has a method to convert itself to a Swift type would be sufficient). I don’t have particularly strong feelings about this.
Update: Actually the having extra API on the type C++ class might be a bit trickier than I thought, since MLIR doesn’t seem to do dynamic dispatch, and Type IDs are slightly more complicated than simple pointers.
Yeah, I meant something like mlir::IntegerType. I didn’t mean mlir::Type itself - we want that to stay a single word. In general, the core MLIR objects cannot have binding-specific state, because there may be multiple bindings that are all referencing the same “i42 type”, and they need to coexist cleanly.
The simplest solution… incurs an extra hash table lookup over just having a single extra pointer per context.
Right, but even ignoring the multiple binding problem, MLIR really can’t know everything that some binding might want. One pointer may not be enough.
Right, there are three different cases to cover: “the direct equivalent of mlir::Type” which is an unmapped type, a “well known type that is mapped to a Swift Struct” (e.g. like MemRefType or IntegerType in Swift), and “an MLIR type that has no native Swift binding” (e.g. something like UnknownType).
The later two would conform to various nice Swift type protocols, but the first wouldn’t.