Background
In a previous thread, I discussed the possibility of adding an opaque type to LLVM for the purposes of representing certain SPIR-V types better in a world of opaque pointers. Given that there was interest in such a type, I would like to discuss a more specific, fleshed-out proposal for such types.
In SPIR-V, we have a family of types that represent various opaque hardware types (such as an OpTypeImage) that need to be preserved through all optimizations and be emitted as specific types in the output SPIR-V file. These types are represented as pointers-to-opaque-structs in the current versions of LLVM IR, but optimizations will sometimes introduce illegal optimizations on these types, such as ptrtoint
/inttoptr
.
Furthermore, it is not always possible to correctly identify that a given ptr
value in LLVM IR refers to one of these types. Presently, types are inferred in opaque pointer mode by demangling function names, but return types in particular are usually not encoded in Itanium name mangling. Given that these types have very restricted usability in SPIR-V IR (it is not even possible to bitcast to/from these types), being unable to identify these types proves catastrophic for compilation.
Proposed Semantics
Opaque types are a new kind of fundamental LLVM type. The C++ API for this type, at least as far as type creation looks, would be as follows:
class OpaqueType : public Type {
public:
static void get(LLVMContext &Ctx, StringRef Name);
static void get(LLVMContext &Ctx, StringRef Name,
ArrayRef<Type *> Types, ArrayRef<unsigned> Parameters);
};
In LLVM IR terms, this would be written as:
opaque(āspirv.Imageā, void, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)
opaque(āspirv.Queueā)
Two opaque types are considered the same type if their Name, Types, and Parameters
all compare equal.
Opaque types would be first-class types. It is legal to use them as function parameters or return values, in selects and PHIs. It would also be a sized typeāfor data layout purposes, the size would be the same as ptr addrspace(0)
āwhich means it could be embedded in structs, used in alloca
s, load
s, and store
s, even used as global values. The constant values zeroinitializer
, poison
, and undef
are all legal values for opaque types. What cannot be done is to convert between different opaque typesāit is not legal to bitcast to or from an opaque type, and the following code would have target-defined behavior (which could be undefined behavior):
ptr @evil(opaque(āatypeā) %val) {
%memory = alloca opaque(āatypeā)
store opaque(āatypeā) %val, ptr %memory
%res = load ptr, ptr %memory
return ptr %res
}
Values of opaque types cannot be generally introspected by target-independent passes. These values are expected to mostly arise from target-specific intrinsics, and optimizations on those intrinsics are possible based on existing LLVM attributes (e.g., readonly
, speculatable
). Furthermore, it is possible (indeed, desirable) for optimization passes like SROA to convert alloca
s of opaque type to SSA values, or for DCE to eliminate unused, side-effect-free instructions of opaque type.
The meanings of opaque types are determined by the target. Which semantics the opaque type has is determined by the target, and I would hope that targets document the expected opaque types and their semantics (LLVM target documentation tends to be woefully lacking in this regard, and I would like to set higher standards here). For SPIR-Vāwhich is the main motivating case for meāthe expected opaque types would have the string name be āspirv.*TYPE*ā
(where TYPE is one of the OpType* in SPIR-V that doesnāt readily correspond to any other LLVM IR type), with extra parameters being necessary iff the SPIR-V OpType* requires them.
Impact on optimizations
While I havenāt implemented the proposal in its entirety, I have implemented enough of it to be confident that the proposal would service my needs. I predict that the impact of these changes on existing optimizations is likely to be relatively smallāthe only optimizations I needed to fix were SROA and GVN, which both have independent methods that amount to ācan type T1 be bitcast to T2ā that tends to default to āyesā. Itās possible there are more such calls in the codebase that havenāt triggered on my test suites yet, but I havenāt found any other issues in existing target-independent optimizations.
Unanswered Questions
Naming of opaque types
In the previous discussion, the general design has been to give a name to these types thatās akin to the struct name, such as:
%imgf2d = opaque(āimageā, float, i32 2, i32 42)
There are some advantages to having syntax like this (see below). In my test implementation, Iāve avoided this because I never tested opaque types beyond a single string parameter, and having two names for the same thing seemed a bit much. Given that I expect the extra parameters on opaque types are likely to be relatively rare, Iām not sure itās worth the extra confusion of having an extra name and having to work out the degree to which that name matters.
Moving existing types to opaque types
When preparing this RFC, I noticed that x86_mmx
and x86_amx
already have semantics very similar to opaque types (more so for x86_mmx
than x86_amx
). It may be possible to move these types to being x86-specific opaque types, but doing so would necessitate adding target-specific hooks to work correctly, as the defaults for opaque types suggested in this RFC would be woefully incorrect. Even if these hooks were present, though, it may be too much code churn to move existing LLVM types to an opaque type model, and I donāt actually see any benefits to removing these existing types other than purity of design.
Target-specific hooks
In the previous discussion, it was noted that different targets may have different requirements for opaque types. For example, (this is my understanding) WebAssembly would like to have its opaque types renamed when linking in different modules (i.e., opaque(āwasm.gcā, i32 0)
in one module isnāt necessarily the same as opaque(āwasm.gcā, i32 0)
in another module). If x86_mmx
were opaque(āx86.mmxā)
instead, it would need to have a different size than the nominal sizeof(ptr)
that Iāve given it here.
At present, the only real facility we have for indicating target-specific details of IR is the datalayout string. However, the datalayout string essentially requires listing out the properties of every possible type on the target for completeness, and there can be a very large number of rarely-used types. Indeed, one previous idea for representing these types were as non-integral address spaces, and the number of needed address spaces was a major factor in rejecting that idea. As a result, I donāt think it makes sense to shoehorn them in the datalayout string.
One possibility for specifying this information would be to embed it in the IR file when the opaque type is first used. For example, (assuming youāve got something like a named opaque type):
%imgf2d = opaque(āimageā, float, i32 2, i32 42)
%x86_mmx = bitcastable size(8) opaque(āimageā)
%pipe = canbeglobal opaque(āpipeā)
%wasmgc = nominal opaque(āwasm.gc.foobarā)
This approach does pose some more challenging questions for how it interacts with linking LLVM modules together, which is again why I havenāt attempted to implement anything along these lines. (Also, working out what the set of properties would need to be would require more use cases and examples than I alone could provide).
Another approach is to extend target information beyond the existing TargetTransformInfo class to include things like TargetIRVerifier or TargetLlvmLinker that would better allow fuller target-specific modifications to the IR. Adding such classes could have ancillary benefits, for example, being able to identify types that cannot be supported on particular architectures (e.g., x86_fp80
or ppc_fp128
), or opening up other paths to fixing calling convention issues at the LLVM IR level.
Instruction selection support
I donāt have much familiarity with the codegen section of LLVM, and my immediate use cases rely on direct lowering of LLVM IR to target-specific details without going through SelectionDAG or GlobalISel. As a result, I donāt know what needs to be changed in MVT or EVT to support opaque types.