LLVM metadata support in MLIR

Hi all,

What is the current status of metadata support in the LLVM dialect? I see some support for specific cases, but what about something more general? More specifically, one of my use cases is calling the register_read intrinsic:

declare i32 @llvm.read_register.i32(metadata)
!0 = !{!"my_register"}

%res = call i32 @llvm.read_register.i32(metadata !0)

Is anybody working on something related to this?

MLIR generally takes a much more structured approach to metadata-like information than LLVM does, in particular by using a wide and structured attribute system. Therefore, we add support for LLVM metadata equivalents on a case-by-case need-driven basis. Using an opaque string metadata is exactly the kind of situation we want to avoid because of associated development and runtime costs.

If you need metadata that is part of LLVM IR specification, consider adding it to relevant operations and/or the LLVM dialect itself using appropriate attribute kinds and providing verifiers. If you need custom metadata, introduce in a dialect of your own, again using the appropriate attribute kinds and providing verifiers. The translation infrastructure now has support to process foreign-dialect attributes on LLVM operations in order to modify the LLVM IR emitted when translating the operation - llvm-project/LLVMTranslationInterface.h at main · llvm/llvm-project · GitHub - that can be used to forward this metadata to LLVM IR.

Thanks! Yeah, in this case I’ll probably need to add my own operation with a custom llvmBuilder or put the translation logic directly inside the overridden convertOperation.

I don’t think you need a custom operation in general. MLIR allows you to put arbitrary attributes on an operation, so you can have llvm.call @function { custom_dialect.attribute = #other_dialect.value } just fine.

If you need read_register specifically, just add it as llvm.intr.read_register and make the register name a mandatory inherent string attribute, i.e. an attribute whose presence is checked by the op’s verifier. This intrinsic is a listed in the LLVM’s LangRef, we have plenty of those available already following a similar scheme.

Not sure I understand what you mean. If I add the read_register intrinsic like this:

def LLVM_ReadRegister : LLVM_OneResultIntrOp<"read.register", [0], []> {
  let arguments = (ins LLVM_Type:$a);

I will still need to pass a fake metadata value to this operation (otherwise I get “Calling a function with a bad signature”), and I want to avoid this:

%m = llvm.mlir.undef : !llvm.metadata
%v = "llvm.intr.read.register"(%m) {mydialect.reg_name = "my_register"} : (!llvm.metadata) -> i32

You need something like

def LLVM_ReadRegister : LLVM_OneReslultIntrOp<"read.register", [0], []> {
  let arguments = (ins StringAttr:$reg_name);
  let assemblyFormat = "$reg_name attr-dict : type($res)"
%v = "llvm.intr.read.register"() { reg_name = "my_register"} : () -> i32
%vv = llvm.intr.read.register "my_register" : i32

This uses an inherent attribute, i.e. an attribute whose semantics is defined by the operation to which it is attached. Attributes are not values and neither is metadata, you shouldn’t create them with operations.

This will result in “Calling a function with a bad signature” unless I add a custom llvmBuilder:

def LLVM_ReadRegister : LLVM_OneResultIntrOp<"read.register", [0], []> {
  let arguments = (ins StrAttr:$reg_name);
  let assemblyFormat = "$reg_name attr-dict `:` type($res)";

  let llvmBuilder = [{
    llvm::Module *module = builder.GetInsertBlock()->getModule();
    llvm::Function *fn = llvm::Intrinsic::getDeclaration(
        { }] # !interleave(!listconcat(
            ListIntSubst<resultPattern, [0]>.lst,
                         []>.lst), ", ") # [{
    llvm::LLVMContext &llvmContext = module->getContext();
    llvm::MDNode *llvmMetadataNode = llvm::MDNode::get(
        llvm::MDString::get(llvmContext, $reg_name));
    llvm::Value *operand = llvm::MetadataAsValue::get(llvmContext, llvmMetadataNode);
    auto *inst = builder.CreateCall(fn, operand);
    $res = inst;

Having a custom llvmBuilder is totally fine, there’s no magic that can figure out the particularities of specific intrinsics. Consider putting it in a C++ file though, it’s too long for the inline blob.

Actually LLVM allows to turn metadata into SSA values as well: llvm-project/Metadata.h at main · llvm/llvm-project · GitHub and intrinsics can be declared as taking one as operand.
(I don’t think anything else than intrinsics can consume these).

I wonder if we need to have a specific op and a new type in the LLVM dialect to model this?

Not to hijack the discussion here, but I think there is the opportunity to extend things here to capture the natural dependencies between attributes. In particular, we have many scenarios like:

  1. Two operations are parameterized with an attribute which must be the same.
  2. A higher level operation (like a function) has an attribute that indirectly determines the value of another attribute

Previous systems I’ve worked with (https://ptolemy.berkeley.edu/) allow parameter attributes to depend on eachother in a style that is similar to an Attribute Grammar with a set of scoping rules that is independent of the ‘Value’ scope. Parameter Attributes can be evaluated ‘at compile time’ to determine their actual value, enabling validation of a design. Ptolemy also unifies the notion of types with the notion of parameter attributes, enabling type-parameterized modules and some kinds of type dependencies to be expressed (in particular, things like “the number of rows of this matrix matches the number of columns of that matrix”). This type of design intent is unfortunately lost in MLIR today.

I think you can actually implement something like this in MLIR today, but this won’t be a first-class concept. Basically nothing prevents you from having your own “parameterized_func” with a dictionary attribute that tells you all the “dependent attributes” that may exist on the operations in the function body.

parameterized_func @foo(%arg0 = !my.parameterized_tensor<*x"T1">, %arg1 !my.parameterized_tensor<*x"T2">) { dependent_attrs = ["T1", "T2", "B"] } {
  "op.matmul"(%arg0, %arg1) { transpose = "B" : !my.param_attr } 

On instantiation of the function you can substitute the parameter attributes with actual value and validate the IR at this point. This may not be super ergonomic in MLIR though.