Background
Building high-level compilers on top of LLVM sometimes requires the representation of high-level types that don’t fit neatly into LLVM’s type system. @jcranmer recently added “target extension types” which provide an initial extension point to LLVM’s type system.
Already today, high-level frontends can define their own “target” extension types (the name is misleading) as long as these types are lowered away somehow before the IR reaches the backend.
However, some uses of types require additional “type info” that is not captured by TargetExtType
type itself, such as a data layout and whether nullinitializer
is allowed.
Proposal
I’m proposing to introduce the notion of “target (extension) type classes” that can be registered with an LLVMContext
.
Target types are still identified by name (and type / int arguments), but may fall into a type class based on their name. If they do, the type class is queried for the additional “type info”. The currently hard-coded type info is replaced by setting up some built-in type classes as part of the LLVMContext
constructor.
This is a relatively small change at just over 100 lines in D147697, but it is enough to allow users of LLVM to define their own custom types and enjoy the full flexibility that is available to built-in target types.
In addition to the LLVM change, you can also see how we intend to use them in llvm-dialects in this pull request.
Alternatives
While the type info could theoretically be stored in the TargetExtType
object itself, that route was not taken when target types were added for good reasons, and I’m not looking to change that.
In the current proposal, type classes are optional. One alternative would be to make them non-optional. That’s technically a breaking change, though almost certainly palatable at this early stage. I didn’t go down this route because it seemed unnecessary, but am curious to hear feedback.
Another alternative would be to try to encode the type info somehow in the DataLayout
. However, that quickly becomes intractable: target types are parameterized, and how would we encode complex dependencies of e.g. the layout on those parameters?
Finally, I made the choice to cache the type class pointer in TargetExtType
for faster access, avoiding repeated string comparisons for a small compile-time benefit. While this makes the type object slightly larger, and I did consider an ID-based scheme that would save some space, types are uniqued per context and their memory footprint is correspondingly small (and it only affects target
types anyway).
Future work
This current proposal brings us closer to a healthy representation of extension types in high-level compilers built on LLVM. It allows us to remove a bunch of ugly hacks in LLPC, for example.
That said, even with this proposal, extension types are still in a bit of an awkward place since some them really are genuinely opaque and don’t even have a known byte size, yet we would like them to appear in alloca
& friends for function-local variables. This is becoming more urgent with the proposal to replace GEPs with ptradd
.
We currently paper over the related issues, but I think longer term we’ll want explicit “structured/opaque alloca” and “structured/opaque getelementptr” instructions to move to an entirely sound representation of what the high-level languages require from us.