Hello,
I would like to add TBAA metadata support in LLVMIR dialect, so that the frontends (e.g. Flang) may pass TBAA annotations through LLVMIR dialect to LLVM IR.
Related differential: ⚙ D140768 [mlir] Support TBAA metadata in LLVMIR dialect.
The changes follow the steps of other differentials that have added support for metadata annotations (e.g. ⚙ D97944 [mlir] Add an AccessGroup attribute to load/store LLVM dialect ops and generate the access_group LLVM metadata., ⚙ D107870 Support alias.scope and noalias metadata, etc.). I propose adding three new operations (following the current LLVM IR TBAA definition):
- llvm.tbaa_root: represents a 1-operand TBAA metadata root node. I decided to enforce the presence of the identity operand (while LLVM IR makes it optional), because otherwise it may result in incorrect behavior had different frontends generated unnamed root nodes and then the modules, generated by different frontends, were linked together potentially resulting in different languages’ TBAA graphs to have the same root.
-
llvm.tbaa_type_desc: represents scalar and struct type descriptor nodes. The operation consists of
FlatSymbolRefArrayAttr
andDenseI64ArrayAttr
arrays of the same size. The former is a list of references to symbols defined by other llvm.tbaa_type_desc operations or defined by llvm.tbaa_root operation, and represents the types of the struct members or a parent scalar type. The latter is a list of non-negative integer offsets describing the byte offsets of the members in the struct types or0
for scalar types. -
llvm.tbaa_tag: represents TBAA access tag, which references the symbols of operations defining the base and access types. The operation also specifies the offset of the access and optional
constant
attribute.
Symbols defined by llvm.tbaa_tag may be referenced by LLVM::LoadOp
and LLVM::StoreOp
operations via tbaa
optional attribute. The tbaa
attribute is defined as an array of SymbolRefAttr
’s, so that a single memory accessing operation may have multiple access tags attached to it. The references must be fully qualified with the root reference being the symbol defined by an LLVM::MetadataOp
operation.
Design/discussion points and TBDs:
The requirement for fully qualified symbol references in tbaa
attribute aligns with the corresponding access_group
and alias_scope/noalias
implementations. If I am not missing something, it is intended to speed up the lookups for the defining metadata operation by the symbol reference (in a memory accessing operation).
Though LLVM IR only allows a single access tag currently, it may be worth experimenting with multiple access tags in future (e.g. as one of the options for representing aliasing in Flang-generated IR). This is why the tbaa
attribute is defined as an array of SymbolRefAttr
’s. Since LLVM IR does not support it, the module translation will fail for MLIR operation having more than one symbol reference in the tbaa
attribute.
If I understand TBAA graph properties correctly, it may not have cycles, and the root node must be reachable from any graph node. In the current change-set the cycle detection is done only during LLVM IR import, but I think it may be worth doing the same as part of verification of LLVM::MetadataOp
operation that contains any new TBAA operation.
There is currently no support for tbaa.struct
metadata and tbaa
attribute for LLVM::CallOp
and intrinsics operations.
Please share you suggestions/concerns either here or in the Phabricator.
Many thanks to @ftynse for the initial review! And thank you all in advance.
Slava