[RFC] Support TBAA metadata in LLVMIR dialect


I would like to add TBAA metadata support in LLVMIR dialect, so that the frontends (e.g. Flang) may pass TBAA annotations through LLVMIR dialect to LLVM IR.

Related differential: ⚙ D140768 [mlir] Support TBAA metadata in LLVMIR dialect.

The changes follow the steps of other differentials that have added support for metadata annotations (e.g. ⚙ D97944 [mlir] Add an AccessGroup attribute to load/store LLVM dialect ops and generate the access_group LLVM metadata., ⚙ D107870 Support alias.scope and noalias metadata, etc.). I propose adding three new operations (following the current LLVM IR TBAA definition):

  • llvm.tbaa_root: represents a 1-operand TBAA metadata root node. I decided to enforce the presence of the identity operand (while LLVM IR makes it optional), because otherwise it may result in incorrect behavior had different frontends generated unnamed root nodes and then the modules, generated by different frontends, were linked together potentially resulting in different languages’ TBAA graphs to have the same root.
  • llvm.tbaa_type_desc: represents scalar and struct type descriptor nodes. The operation consists of FlatSymbolRefArrayAttr and DenseI64ArrayAttr arrays of the same size. The former is a list of references to symbols defined by other llvm.tbaa_type_desc operations or defined by llvm.tbaa_root operation, and represents the types of the struct members or a parent scalar type. The latter is a list of non-negative integer offsets describing the byte offsets of the members in the struct types or 0 for scalar types.
  • llvm.tbaa_tag: represents TBAA access tag, which references the symbols of operations defining the base and access types. The operation also specifies the offset of the access and optional constant attribute.

Symbols defined by llvm.tbaa_tag may be referenced by LLVM::LoadOp and LLVM::StoreOp operations via tbaa optional attribute. The tbaa attribute is defined as an array of SymbolRefAttr’s, so that a single memory accessing operation may have multiple access tags attached to it. The references must be fully qualified with the root reference being the symbol defined by an LLVM::MetadataOp operation.

Design/discussion points and TBDs:
The requirement for fully qualified symbol references in tbaa attribute aligns with the corresponding access_group and alias_scope/noalias implementations. If I am not missing something, it is intended to speed up the lookups for the defining metadata operation by the symbol reference (in a memory accessing operation).

Though LLVM IR only allows a single access tag currently, it may be worth experimenting with multiple access tags in future (e.g. as one of the options for representing aliasing in Flang-generated IR). This is why the tbaa attribute is defined as an array of SymbolRefAttr’s. Since LLVM IR does not support it, the module translation will fail for MLIR operation having more than one symbol reference in the tbaa attribute.

If I understand TBAA graph properties correctly, it may not have cycles, and the root node must be reachable from any graph node. In the current change-set the cycle detection is done only during LLVM IR import, but I think it may be worth doing the same as part of verification of LLVM::MetadataOp operation that contains any new TBAA operation.

There is currently no support for tbaa.struct metadata and tbaa attribute for LLVM::CallOp and intrinsics operations.

Please share you suggestions/concerns either here or in the Phabricator.

Many thanks to @ftynse for the initial review! And thank you all in advance.


As @gysit pointed out in phabricator there is no current need to support TBAA attribute for LLVM::CallOp. Clang only produces tbaa.struct for memcpy intrinsic calls, which are represented by LLVM IR dialect’s intrinsic operations. Posting this here just for the record.

To a different note:
Hi @kosarev can you please give some history on ⚙ D41501 [Analysis] Support aggregate access types in TBAA? Is the new format TBAA produced by any frontend currently? Thanks!

1 Like