Swift Dynamic Type ID Generation

I’ve been working on adding the ability to write MLIR passes in Swift and I’m wondering if people think my approach for generating type ids for the passes is reasonable.

In Swift, you declare a type that conforms to a DynamicPass protocol. When instantiating the pass, there is machinery that instantiates a C++ Pass subclass through a C interface with type-erased data and function pointers. Since TypeID is a wrapper for a unique pointer, I am initializing an MlirTypeID for each pass type with the ObjectIdentifier of the concrete Swift DynamicPass type. This pointer is stable throughout the lifetime of the program.

I noticed in this patch that adds methods for dynamic type id generation, that one approach is inheriting from SelfOwningTypeID, which generates a TypeID from the this pointer of an instance. So it seems like generating TypeIDs from arbitrary pointers is valid, as long as you can guarantee that the pointer is unique and outlives the usage of the TypeID.

Yes, but the concept of TypeID is that it matches a given “Type” (or “Class”). You’d have to be careful into where it leaks.
For example in the context of passes, we register in a map the pass name (for example “loop-tiling”) and map it to the TypeID. This is global for the entire process.
So you can’t setup a Context and a pass pipeline with a dynamic “loop-tiling” pass and then do it again later with another dynamic TypeID. The framework would complain that there is already a “loop-tiling” with a different ID (even though it died).

In that case, it sounds like passing the Swift type’s ObjectIdentifier should be safe, as opposed to having the DynamicPass C++ side inherit from SelfOwningTypeID, since SelfOwningTypeID is instance specific and not type specific.

@River707, @math-fehr, any additional thoughts on this?

Basically what Mehdi says above. TypeID is essentially intended to serve as a unique identifier for a specific entity class (not necessarily just specific instances). The TypeID itself doesn’t need to be static, but it needs (as Mehdi mentions) to exist beyond the lifetime of whatever is using it. This lifetime can depend on how it’s used. For attributes/operations/types etc. sometimes the lifetime is the lifetime of the context, but for things like pass registration that is eternal (at least right now).

More generally in:

class Foo {};

Foo f;

We generally use the TypeID to refer to Foo, not f.

– River

At some point, I remember @clattner mentioning that it could be interesting for TypeIDs to become successive integers so that we can use arrays instead of hash maps for things keyed by TypeID… has any more thought been put into this? Would we be coding ourselves into a corner by using the assumption that Type IDs are unique pointers?

It largely depends on how the integers are allocated (and when). There are many uses that would not work well as an array, though I suppose this would be intended for things like attributes/operations/types that would be forced to be assigned incrementally? (Not sure how often maps of those are built in practice though, I can’t think of any offhand). TypeIDs are used for a lot of things pulled in from many different places, so having things being sequential enough to use an array (that isn’t huge) feels difficult IMO.

– River

1 Like

Sequential integers looks interesting, but I’m not sure how it would work: would we need a global registry? How do you initialize the TypeID for the user? Resolve on first use? There are tricks that may be done but I can’t see how to avoid extra(s) indirection(s) every time we’d need to resolve a TypeID.

So far it has been working pretty well: it is hard to find a unique ID that works across the entire program for an entity: using the address space to “reserve” an ID is pretty nice…

1 Like

As an example, TFRT does it this way.

1 Like

Thanks! I need to look into what the static size_t get() accessors will compile down to, but that looks interesting.

What kind of things does TFRT take advantage of having sequential IDs?

1 Like

This is probably a better example, though somewhat indirect. ConcreteAsyncValues contain their runtime type (constructed here), and having a uint16_t type ID (and not a void *) saves space.