As it happens, I was looking at this again recently… I have a bunch of patches that are meant to make this extension more useful by allowing frontends to make better use of their own opaque types, and I’m meaning to get back around to those.
Those patches are mostly needed because custom extension types are not usable in e.g. alloca by default.
Apart from that, the extension types are working quite well from what I could see in my experiments, though I’ve only used them in a frontend context so far (no instruction selection).
The types are currently being used for the svcount type in AArch64 and the SPIR-V builtin types. In both cases, these are being mapped from custom Clang builtin types and to custom MVT types in the LLVM code-gen.
• Instruction selection. DAG or GISel.
The LLVM backend is not my forte, but this seems to be working currently, although it is reliant on actually adding necessary type plumbing.
• Middle-end optimizations. I think we should disable a lot of optimizations on opaque types I think.
Because opaque types can appear in so few contexts, there’s not a lot of optimizations that could even be attempted on them in the first place.
• Front-end(builtins/intrinsics) presentations.
You would need to construct your own custom Clang types to map to the opaque types, and I’m rusty on the way Clang handles builtins. But again, this has been successfully used for some existing types already, so if you do the plumbing correctly, there doesn’t seem to be any issues.
@jcranmer, thank you for your excellent work on Target extensions. I’ve successfully utilized them in Clang, MLIR, and LLVM for fp8, and they seem ideally suited for use with intrinsics.
While the benefits of using target extensions for this purpose are clear, I’m curious about alternative approaches, such as using integer types (i.e i8) to represent fp8 as demonstrated in int_amdgcn_wmma_f32_16x16x16_fp8_fp8. Could you or others (@tschuett / @nikic ) share insights on the potential advantages or pitfalls of this method? Specifically, I wonder if using target extensions to treat specialized data types like fp8 as opaque types might simplify their management and reduce the risk of errors. Is my understanding correct, or are there other considerations I should be aware of?
There are, I think, two main factors that control the decision to use
target extension types over existing types to work with intrinsics.
The first factor is register allocation. Types are used to indicate
which register bank a value will be stored in, so if you have to
allocate fp8 values to different register banks from i8, that would
be a good reason to not use i8 for that type. From my understanding, fp8 is largely going to be relevant only in vectorized fashion, and
vectors of floating-point and vectors of integers are generally the same
register bank anyways.
The other factor is whether or not serendipitous optimizations on the
underlying type are going to affect the ability of the code to work
properly. You might get things like load widening that coalesces several
adjacent loads into an i64 load or stuff like that, so if you’re
dependent on being able to do dataflow analysis to remap all the
relevant i8 operations back into fp8 operations, you might want to
go to target extension types to prevent these optimizations from kicking
in in the first place.
I have a question about the intended use of the HasZeroInit and CanBeGlobal properties of target extension types. As implemented, they seem to apply only to the type itself:
@t = global target("spirv.Image") zeroinitializer
error: invalid type for null constant
but they do not apply to aggregates containing a target extension type:
@s = global {target("spirv.Image")} zeroinitializer
@a = global [1 x target("spirv.Image")] zeroinitializer
(no error)
Is this just an oversight? I think it would make more sense if HasZeroInit and CanBeGlobal applied to any aggregate containing the target extension type. I’d also like to add a new CanBeLocal (or CanBeAlloca) property controlling whether the type can be used in an alloca instruction.