[RFC] Extensible LLVM IR Import

The mlir-translate tool implements an experimental import from LLVM IR into MLIR’s LLVM dialect. The import iterates on all globals and functions and translates them instruction-by-instruction using tablegen generated conversion functions. A helper class manages the state of the translation, such as mappings between LLVM IR and MLIR values. Additionally, it implements various helper methods, for example, for type or debug information conversion.

One limitation of the current import is that it only supports a fixed number of LLVM IR instructions/intrinsics as well as a limited set of metadata. On the other hand, the export from MLIR to LLVM IR is much more mature and extensible. We propose to implement similar extensibility for the LLVM IR import.

Existing Solution for the Export to LLVM IR

The translation of MLIR to LLVM IR is extensible via a dialect interface. Downstream projects can provide an interface implementation to translate their own operations and attributes to LLVM IR. The Export LLVMTranslationDialectInterface interface has two methods (LLVM IR Target - MLIR):

LogicalResult convertOperation(
Operation *, IRBuilderBase &, ModuleTranslation &)
LogicalResult amendOperation(
Operation *, NamedAttribute, ModuleTranslation &)

The convertOperation method converts an operation to an LLVM IR instruction, while the amendOperation method translates attributes to LLVM IR metadata (or other constructs). A dialect attribute thereby may be translated independent of the dialect of the operation it is attached to. This flexibility allows external projects to export custom attributes attached to an LLVM dialect operation to custom metadata. The scope of the export mechanism are target-specific intrinsics and metadata - the parts of LLVM IR that commonly change.

Proposed Solution for the Import from LLVM IR

We propose to implement an LLVMImportDialectInterface that provides similar extensibility for the translation from LLVM IR to MLIR. A dialect that wants to support importing attributes or operations from LLVM IR could then implement the interface using hooks similar to the export hooks:

LogicalResult convertInstruction(
  OpBuilder& builder, 
  llvm::Instruction* inst,
  ModuleImport &moduleImport);

LogicalResult convertIntrinsic( 
  OpBuilder& builder, 
  llvm::CallInst* inst, 
  ModuleImport &moduleImport);

These methods insert the imported operation at the current builder insertion point or return failure if the conversion fails. Their implementation may use moduleImport to perform support tasks such as type and value conversion.

An additional hook converts metadata attached to an instruction and amends it to the imported operation importedOp:

LogicalResult amendInstructionMetadata(
  OpBuilder& builder,
  llvm::Instruction* instruction,
  llvm::MDNode *metadata,
  Operation* importedOp,
  ModuleImport &moduleImport);

The implementation of the method can again rely on moduleImport to perform support tasks.

Optionally, we may add a similar method to import module-level metadata:

LogicalResult amendModuleMetadata(
  OpBuilder& builder,
  llvm::NamedMDNode *namedMetadata,
  ModuleImport &moduleImport);

An example for module-level metadata are the GPU specific annotations produced by the NVVMToLLVMIRTranslation and the ROCDLToLLVMIRTranslation interfaces.

Selecting the Matching Dialect Interface

Other than the export, the import cannot query the matching dialect interface using the getInterfaceFor(Operation*) method of the DialectInterfaceCollection, since the imported LLVM IR does not have any dialect information. It thus needs a different mechanism to select a dialect interface. We propose that the LLVMImportDialectInterface interface provides methods to query the supported LLVM IR constructs:

SmallVector<unsigned> getSupportedOpcodes();
SmallVector<unsigned> getSupportedIntrinsicIds();
SmallVector<unsigned> getSupportedMetadataKinds();

An newly introduced LLVMImportInterface - a collection of all registered dialect interface instances derived from DialectInterfaceCollection - could then query the supported instruction, intrinsic, and metadata kinds for every dialect interface instance and build a map from opcode, intrinsic id, and metadata kind to the dialect implementing an import interface. These maps could then be used to select the matching dialect interface. LLVMImportInterface has to ensure only one dialect interface implements a conversion for a given instruction or intrinsic. For metadata, multiple dialect interfaces may provide conversions for the same metadata to dialect specific attributes.

3 Likes

I don’t see in this proposal the rationale why improving this (improving LLVM dialect and it’s import) is not an option and rather proposed hook that downstream dialects can import from LLVM IR and skip LLVM dialect. Unless I’m misunderstanding proposal. Is the need to be able to import all of LLVM IR even without granularity of modeling that ODS does? Or is this towards import where the ops can not be upstreamed? Or …?

I don’t see in this proposal the rationale why improving this (improving LLVM dialect and it’s import) is not an option and rather proposed hook that downstream dialects can import from LLVM IR and skip LLVM dialect. Unless I’m misunderstanding proposal. Is the need to be able to import all of LLVM IR even without granularity of modeling that ODS does? Or is this towards import where the ops can not be upstreamed? Or …?

Our goal is definitely that the upstream import works for all LLVM IR instructions and also the relevant metadata. We have been working quite a bit over the last weeks to improve the coverage there. This proposal is not meant as a side-channel to prevent upstreaming the import of unsupported instructions!

There are use-cases though where an extensible export is necessary. A vendor may for example implement custom LLVM IR intrinsics for their instruction set architecture that are not available upstream. Another example is custom metadata nodes that are not available in upstream LLVM. In these case, an extensible import would be great and prevent use from copying the entire import code.

Another use case may be that dialects such as the AMX dialect are directly imported from LLVM IR into the AMX dialect (circumventing the LLVM dialect similar to the export). I do not know if there is interest in importing such intrinsics though.

To sum up, the goal is the import of “non-standard”/“vendor-specific” metadata and intrinsics that are not available in upstream LLVM/MLIR. Anything present in upstream MLIR/LLVM should be imported using the upstream import.

It does not seems like a symmetrical situation to me: when there is a source->target conversion it makes sense for the source to register interfaces/hook, but here you’re proposing that targets could register themselves. This look like backends that can be swapped when multiple possible target are possible for a given source. However it isn’t such a situation: we can’t “swap targets” and there is a 1:1 mapping from the source (LLVM IR) to the set of targets (Dialects).

It may make sense to remove the convertInstruction entirely from the proposed LLVMImportDialectInterface so that it is not accessible externally. Would that be an improvement from your point of view?

It does not seems like a symmetrical situation to me: when there is a source->target conversion it makes sense for the source to register interfaces/hook, but here you’re proposing that targets could register themselves. This look like backends that can be swapped when multiple possible target are possible for a given source. However it isn’t such a situation: we can’t “swap targets” and there is a 1:1 mapping from the source (LLVM IR) to the set of targets (Dialects).

Our goal is definitely a 1:1 mapping between LLVM intrinsics and MLIR operations for intrinsics and operations that are implemented in downstream projects. Similarly, we want a 1:1 mapping for LLVM metadata and MLIR dialect attributes that are both defined in downstream project. However, metadata may be nested in other metadata. For example, we may have some custom loop annotation nested in LLVM’s loop metadata node. In this case, the upstream LLVM IR import should import the standard loop metadata, while the downstream project could hook on the loop metadata node and convert the custom annotation to a downstream dialect attribute.

The asymmetry stems from the fact that LLVM IR intrinsics and metadata have do not have a dialect attached. We thus need a way for the dialect interfaces to register which LLVM IR intrinsics and metadata they support. The LLVMImportInterface then makes sure there are only 1:1 mappings between intrinsics and a registered dialect interface.

2 Likes