[RFC] Better support for typed pointers in an opaque pointer world

Thanks for the further information on SPIR-V representation. I want to try to take a step back to characterise what I think we would need to to achieve with opaque types for the SPIR-V/DXIL/Wasm (and more?) use-cases, to check we see the problem in the same way.

With LLVM’s current type system, you take your arbitrarily rich and complex frontend types, and convert them to LLVM types. This inherently loses information including the identity of the frontend types. A key thing the LLVM type system is trying to do is to provide enough facilities to encode the memory layout of any lowered types. For most cases this is sufficient, but we encounter problems when the compilation target is itself a typed IR which requires the identity of types specified in the frontend to be maintained throughout compilation to produce correctly typed output.

Trying to extend LLVM’s type system to support arbitrarily complex external type systems is a non-starter, so what is the minimum we can add? I think the key thing we need to support is maintaining type identity, which the opaque type proposal provides. A type is defined, LLVM may know very little about it (in the wasm case at least, the memory layout is completely opaque), but it does maintain its identity, guarantees that type information won’t be lost, that it won’t be cast to a different type, and other values won’t be cast to it. In some cases, additional information about these external types may enable more optimisations (you mention it would be useful in your use cases to access information such as size of types - is there other information you’d need to access in target-independent passes that might be essential to correctness or reasonable performance?).

In terms of encoding that type identity, I’d mentioned the integer ID as I’d found it a useful starting point for prototyping by using non-integral address space IDs. I think regardless of whether you’re using integer IDs or strings, you’ll still need the ability to have target-specific logic when linking different LLVM modules in order to ensure LLVM can correctly maintain type identity for external type systems. This logic would either need to rewrite typeids or type strings as appropriate: consider nominal types in different modules for instance. Possibly the opaque type proposal could be extended to support this (identifying imported/exported types etc) in the general case, but providing the minimal primitive and letting target-specific logic handle target-specific details for combining types between modules feels like it might be a better starting point.

Does anything above differ drastically to your own thinking?