Exposing LLVM ops to the MLIR type system

I think we can answer this question, at least partially, if we clarify the goal of the LLVM IR dialect itself. If it intends to model LLVM IR inside MLIR as truthfully as possible, than it should not accept anything but LLVM IR dialect types and have strictly the same semantics. If it is more of an abstraction that is similar to LLVM IR but intended only as a convenient lowering vehicle, we can allow for more flexibility.

If we go with the second option, it may be considered at odds with MLIR’s design philosophy that encourages all non-trivial processing to happen within MLIR so that translations can be trivial. This can be solved by having the translation accept only a trivially translatable subset of the LLVM IR dialect, e.g. it could be fine for the LLVM IR dialect to work on a subset of standard types as long as the translator doesn’t need to care this type conversion. In practice, we already have a “legalize for translation” pass, which inserts extra blocks if a predecessor of some block appears more than once (this cannot be expressed with PHI nodes otherwise), that can accommodate such preparatory conversions.

I have been consistently opposed to having LLVM IR dialect diverge from LLVM IR. My main reason is being able to keep the same semantics as LLVM IR as much as possible rather than redefine it again in MLIR. Otherwise, we would have to keep track of semantic changes in LLVM IR (and also intrinsics that are included in the dialect), in the built-in types and check/update it every time one of the two changes. Imagine if MLIR decided to remove signless integer semantics. Or, more realistically, how should we treat add nsw nuw %0, %1 : index if we (a) don’t know the size of the index type and (b) don’t have poison semantics for it in MLIR. That being said, I am not fundamentally opposed to such relaxation as I expect both built-in types and LLVM IR to be stable enough, as long as there is some way of checking that the semantics are still okay that doesn’t involve the dialect maintainer checking it over and over again.

In the meantime, since you mention development velocity, I would consider Jacques’ idea of having casts that we already have as llvm.mlir.cast and/or type-cast-region operation and see what additional challenges does this pose. Today, the cast operation lives in the LLVM IR dialect. Arguably, it’s okay for it to depend on built-in types (“standard” types are not part of the standard dialect). It would not be okay for it to depend on the standard-to-llvm conversion, but that can be solved by factoring out the type conversion (which has nothing to do with the standard dialect) into a separate library. I can expect dependency issues in, e.g., canonicalization if we need to handle the llvm.vscale wrapped by llvm.mlir.casts in the SVE dialect canonicalization pattern, which would create a dependency from the SVE dialect to the LLVM IR dialect. This sounds like a problem that is not specific to the LLVM IR dialect but to the infrastructure in general. Some other potential problems?

@jpienaar’s suggestion to use type casts appears to be the “thinnest” solution to me while maximizing op reuse and allowing mixing of dialects. A successful dialect/type conversion would turn the higher to lower type casts into identity ops (making them disappear), and the lower to higher type casts would always get erased. If it’s not a partial conversion that’s being implemented, the type casts could be added to the illegal target set I think. The islands approach (used for the MLIR TF dialects) is more complex in comparison and may hurt debugging experience whenever the lowering fails.

Why can’t we just create llvm.vscale.i64 / llvm.inline_asm at the point where we are creating this IR? Why do we need to go through an intermediate dialect?

Because I think at that point you just have MLIR standard types and not LLVM dialect types. The LLVM dialect ops are restricted to the latter types.

It’s a little ugly, I think that functionally we only need a handful of cast ops to bridge the two. Maybe with a pseudo op that provides some sugar around this it wouldn’t be too bad. Or we can formalize something like Alex’s suggestion of a “trivially translatable subset” to make this easier. I think if we can soothe the ergonomics around basic integer, floating point, and vector types (which seem “trivially translatable”) I think the pain will be greatly reduced.

Even with that, one thorny issue is sve.vscale : index -> llvm.vscale.i64() : i64 which potentially requires committing to index==i64, which seems to make the translation not “trivial”. Not sure how to resolve that.

Thanks for your replies.

It seems that for ops that are meant to remain low-level such as the llvm.inline.asm, just using the LLVM form with casts is fine so let’s isolate that class of ops that do not need special MLIR behavior (i.e. OpInterface or special verification (e.g. on types).

There remain at least 2 classes of ops I can see:

  1. ops that need to be understood by other parts of the codegen stack (e.g. vscale)
  2. ops that benefit from better semantics at the MLIR-level than just MLIR type or attribute -> LLVMType

Generally, I am skeptical that we can define and use the op “at a distance” via only defining the LLVM instruction operating on LLVM types + extra casts. It seems to invite immediate issues with at least OpInterfaces: what do you attach it on, does it need to “traverse the type system”?

One thing I am wondering is whether we can have a single ODS op in the LLVM dialect that can support both MLIR-y ODS and LLVM-y ODS. To make it concrete, does it seem reasonable to turn these 2 ops:

def LLVM_vector_scale :
  LLVMArmSVE_NonSVEIntrUnaryOverloadedOp<"vscale">;

def VScaleOp : SVE_Op<"vscale", [NoSideEffect, VectorParametricTileOpInterface]> {
  ...
  let arguments = (ins);
  let results = (outs Index:$res);
  let assemblyFormat =
}

into a single op that would have both the MLIR and LLVM specification by literally merging the 2 together.
This would only be for types/attributes that convert trivially (I would put index in that box: given a target the index is determined and the conversion can do the right thing) so that conversion from MLIR form to LLVM form can be autogenerated. Such a mechanism could already carry ops that are almost 1-1 and that we don’t want to leak into other dialects. The price/generalization effort does not seem too steep.

I’ll assume the above is reasonable, now how about this case: can we just merge these 2 ops into a single op?

// pasted for context
//class LLVMNeon_IntrBinaryOverloadedOp<string mnemonic, list<OpTrait> traits = []> :
//  LLVM_IntrOpBase</*Dialect dialect=*/LLVMNeon_Dialect,
//                  /*string opName=*/mnemonic,
//                  /*string enumName=*/"aarch64_neon_" # !subst(".", "_", mnemonic),
//                  /*list<int> overloadedResults=*/[0],
//                  /*list<int> overloadedOperands=*/[], // defined by result overload
//                  /*list<OpTrait> traits=*/traits,
//                  /*int numResults=*/1>;
//*/

def LLVM_aarch64_neon_smull :
  LLVMNeon_IntrBinaryOverloadedOp<"smull">, Arguments<(ins LLVM_Type, LLVM_Type)>;

def SMullOp : Neon_Op<"smull", [NoSideEffect,
  AllTypesMatch<["a", "b"]>,
  TypesMatchWith<
    "res has same vector shape and element bitwidth scaled by 2 as a",
    "a", "res", "$_self.cast<VectorType>().scaleElementBitwidth(2)">]> {
  let summary = "smull roundscale op";
  let description = [{...}];
  // Supports either:
  //   (vector<8xi8>, vector<8xi8>) -> (vector<8xi16>)
  //   (vector<4xi16>, vector<4xi16>) -> (vector<4xi32>)
  //   (vector<2xi32>, vector<2xi32>) -> (vector<2xi64>)
  let arguments = (ins VectorOfLengthAndType<[8, 4, 2], [I8, I16, I32]>:$a,
                       VectorOfLengthAndType<[8, 4, 2], [I8, I16, I32]>:$b);
  let results = (outs VectorOfLengthAndType<[8, 4, 2], [I16, I32, I64]>:$res);
  let assemblyFormat =
    "$a `,` $b attr-dict `:` type($a) `to` type($res)";
}

If the answer is no and we really need 2 ops then we might as well also have them in 2 dialects.

If it seems reasonable to have a single op with all these attributes then the problem of auto-generating kicks in: the LLVM_aarch64_neon_smull can be pretty easily generated, but not the SMullOp part.
We can think of an extra level of Tablegen indirection so that we can specify these things better but it seems we would still have to specify quite a few things manually. At which point is it still worth it vs not ?

To sum up, in the 3 examples above, I’d lean towards:

  1. inline_asm, just use casts
  2. vscale-like 1-1 ops with need for OpInterfaces, extend the tablegen infra
  3. more advanced use cases: don’t see a clear way to achieve it and I’m really not willing trade the nice MLIR type error messages for runtime assertion errors :stuck_out_tongue:

I would consider vscale as kind of special, since that is only 1 or 2 ops.

I think for ops like smull where there are hundreds/thousands of such ops, we want to establish some conventions / single-point-of-truth for having LLVM_aarch64_neon_smull and a nicer wrapper SMullOp op. For npcomp, when we generate the ATen dialect (analogous to your SMullOp), we also add various op interfaces to it.

For the smull case, I think what we want is a single source of truth that generates:

  1. A raw LLVM wrapper op.
  2. An “mlir native” version of the op (possibly with certain domain-specific interfaces added as part of the generation process).

So it would still be 2 ops and 2 dialects instead of a single llvm.smull that can take both MLIR and LLVM types and adapt accordingly?

In that case, the “single source of truth” has to specify everything I wrote in def SMullOp and only saves typing this:

def LLVM_aarch64_neon_smull :
  LLVMNeon_IntrBinaryOverloadedOp<"smull">, Arguments<(ins LLVM_Type, LLVM_Type)>;

Additionally, note that all these ops can be autogenerated from LLVM’s definition (the actual source of truth), which seems preferable to me.

We can add a little more tablegen sugar to reduce the number of characters one has to type but the tradeoff seems to just be more indirection via tablegen.

Is it worth it?

A more involved (but more automatic approach) could be to just embrace and reverse engineer:
AArch64InstrInfo.td and AArch64InstrFormats.td and fully generate the 2 type of MLIR ops ?

If we are fine with very incremental progress (e.g. extended on a per-need basis), then this could be a workable solution?

I cannot follow what you mean here. There seems to be a use case you are referring to, and I have no knowledge of. Where do the interfaces come in? Could you elaborate?

Nothing precludes LLVM dialect ops from implementing interfaces. FWIW, we can actually have an interface that can let you “see through” the llvm.mlir.casts feeding the op to get the actual values if necessary. This interface can be attached to LLVM dialect ops for which this behavior is desirable.

Fundamentally, I don’t see a problem of having an LLVM op surrounded by casts. There are practical problems with dependencies and potential tripling of the IR size because of forward/backward casts.

I don’t understand what do you mean here either. LLVM dialect definition uses normal MLIR ODS, just makes it a bit less verbose by adding several helper classes. The only difference is the llvmBuilder field (the logic handling overloaded intrinsics is necessary to set this field up), which the regular -gen-op-definitions safely ignores. Do you want to generate two C++ op classes per definition in LLVM dialect ODS? This requires a different backend, and small changes to the ODS itself. Or do you want to generate a different ODS from the LVLM dialect ODS? This requires a complete new backend, but not changes to ODS.

Do you want separate ops definable with less pain or do you want one op that takes different types, after all?

Why aren’t you doing this to start with? The intrinsic generator is available and we should just plug it into the build system. It can be improved to, e.g., generate more specific type constraints and/or ops that operate on standard types.

I wouldn’t go as far. The relation between target instructions and intrinsics is not clear to me and, what’s worse, out of our control. So unless MLIR targets instructions directly, I would much rather build ops on intrinsics and let LLVM deal with the correspondence between the two.

I don’t have a concrete use case yet but I expect that vector ops exposed to the MLIR type system will grow interfaces to talk to vector transforms or lowerings. In the case of vscale, I can see it becoming an op interface that we would use to vectorize to scalable vectors and that would have additional properties (e.g. non-negative index).

I know we can do it, but should we?
Even without going to OpInterface but just thinking about traits, this seems to break locality properties e.g. of type checking, location info / error messages.
Do you foresee this to be generally usable and desirable?

I mean what I wrote: “To make it concrete, does it seem reasonable to turn these 2 ops into a single op that would have both the MLIR and LLVM specification by literally merging the 2 together.”

To spell it out more, for the vscale case, and assuming we were to bring it to the LLVM dialect (pardon the non-actual Tablegen compliance, add pounds of sugar as you see fit):

def VScaleOp : 
  // Generates relevant LLVM-related properties (e.g. llvmBuilder), only 
  // "used" if any of the types is an LLVMType.
  LLVM_Spec<[SomeLLVMTableGen<"vscale">]>,
  // Generates relevant MLIR-related properties, only "used" if any of the
  // types is a non-LLVMType.
  MLIR_Spec<[
    SomeMLIRTablegen<"vscale", [NoSideEffect, VectorParametricTileOpInterface]>]> {
    ...
    let arguments = (ins);
    let results = (outs Index:$res);
    let assemblyFormat = ...
  }

The op could then be used either as:

%1 = llvm.vscale() : index
// or 
%42 = llvm.vscale() : // llvm.i32 or llvm.i64 depending on target

with the proper parser, printer, verifier and traits/operands being “activated” depending on the type of some operand/result (of course all have to be consistent LLVM or non-LLVM form).

Unclear, depends on tradeoffs and usage.

For ops one would traditionally mirror from LLVM to std, it could be nice to have a single op with different types. This would allow using new LLVM ops without having to argue whether they really need to exist in std or its upcoming spawns. Tentatively, I would say we could put the “trig” or “shift” ops in that category. I am unclear whether it is worth building the infra for this now, if vscale is compelling enough as a first use case that’s good, otherwise it will slip again to the next “justify this op in std” instance.

For backend-specific ops that match 1-1 with intrinsics, I am leaning towards the simple route first, build usage and then think of refactoring when we have compelling evidence that it improves the system.

The part it currently automates is 2 lines, didn’t feel like adding extra complexity to a PR that is already large enough.

That could work, the tradeoff is still about hiding more stuff behind more tablegen indirection. I’m not looking to drive innovation on this topic, if you have ideas how to capture the SMullOp definition more elegantly, I’ll happily use it.

LLVM dialect ops are also ops, they can implement interfaces and be processed by passes. We are not doing much with them right now because, historically, LLVM dialect is a lowering sink and we expect LLVM to do the job, but there is no fundamental reason why we should not.

Now, if you want additional semantics restrictions on an op that is not present in the LLVM instruction/intrinsic semantics, I would argue you most likely want a new op. The cost of having a different op with similar-but-different semantics + conversion that may reject some ops is arguably lower than having LLVM-like ops with semantics “same as LLVM, but also supports X, in which case the difference is Y” and having to handle them.

I still don’t see a problem. Each op checks their types – the “payload” op checks, e.g., that types match, and the casts check that the source and target types are connected by a conversion. They already do.

I don’t have a use case for the see-through interface, but I do expect more casts to show up as we make the conversion more progressive.

vscale is a special case to me because it’s tied to the vector type. As implemented in LLVM IR, the op makes sense. We don’t have scalable vectors as a built-in type in MLIR. If we add scalable vectors as a type, it totally justifies also adding operations to work with them, in std, vector or whatever other dialect. The equivalent operation would need to support and evolve with the rest of MLIR. Maybe we’ll want multiple scaling factors because we have nD vectors. Maybe we’ll want vector.vscale to work on tensor<*x v ector<...> and return tensor<* x index>. We should not push additional MLIR-specific stuff onto an LLVM dialect op. It’s the same reason as why we have addf instead of reusing LLVM’s fadd.

Do you mean add, addf/fadd, br and other “core” ops?

I wouldn’t expect many new LLVM IR instructions in the foreseeable future.

Here’s my counter-proposal. The built-in type compatibility feature seems necessary for ops that mirror target-specific intrinsics in LLVM IR. The goal, as I understand it, is to be able to emit those intrinsics early in the pipeline, before the rest converted to the LLVM dialect. In this case, define a new dialect for such target-specific intrinsics that uses LLVM dialect types as well as built-in types. Such ops should go to a new dialect anyway (like NVVM, ROCDL and LLVM_AVX512). This removes most of the issue for me, where what has been proposed is overhauling half of standard and almost all of the existing LLVM dialect, by restricting this into a new dialect that the rest of standard and LLVM need not to care about.

Well, if you have 2 ops, it’s probably not worth investing into infra.

With another angle: could we unify more some of the builtin type with the LLVM dialect?

While LLVM has some specific construct (struct/union/array/…) on the other hand floats, integer, and likely vectors aren’t different.
We use to have a hermetic LLVM dialect on the type system because we were re-using the actual LLVM type and wrapping a LLVMContext. This is no longer the case and I wonder if we could just use the builtin float/integer/vector types in the LLVM dialect?
Would that solve most of the use cases @nicolasvasilache is looking for?

If we s/vector/1-d vector and unified the types then I think it would simplify a bunch of things indeed!

Good point on the historical design restriction! I don’t see a reason for maintaining llvm.i32 vs i32. Using the builtin MLIR vector type for all fixed-size vectors in LLVM also seems palatable, and could assuage a large part of the concerns here.

Well, I actually discussed this exact point in the RFC on reimplementing LLVM dialect types. I am still of the opinion that trying to use builtin types in the LLVM dialect creates significantly more complexity than it removes. Here are some examples:

  • MLIR integers can be signed, unsigned and signless; LLVM IR integers are singless only. This is a strict superset and can be handled by the verifier. Well, op verifier because we don’t have type verifiers.
  • LLVM IR supports floating point types that MLIR does not - x86_fp80, ppc_fp128, etc. So we will end up having llvm.fadd %0, %1 : f32 and llvm.fadd %0, %1 : !llvm.x86_fp80 and needing to remember if we should use a builtin type or a dialect type for a specific bitwidth. This starts getting awkward.
  • MLIR has nD vectors but no scaling, LLVM IR has 1D vectors and scaling. We will have to mix LLVM dialect vectors and MLIR vectors + additional restrictions on the latter, unless somebody is willing to propose scalable vectors as a built-in type. This is getting worse.
  • LLVM types can contain other types. Now, we have !llvm.struct<(i32, float, x86_fp80)> and all the inner types are LLVM dialect types. If, instead, i32 was an builtin type, we would need extra logic in many places where container types are used, and in the types themselves, to differentiate whether a builtin or a dialect type is used. Types also don’t have verifiers, so almost any construction of a container type will need to be guarded against nullptr result (even if we could create a wrapper LLVMOrBuiltInIntegerType, signedness semantics is not encoded in the C++ type and LLVM container types would reject non-signless types by producing nulls).
  • The syntax for containers will explode, it currently uses the fact that nested types are implicitly !llvm.
  • The translation to LLVM IR will have to handle both kinds of types, and we explicitly wanted the translation to be simple. (We could have a big “legalization before export” pass that changes types to LLVM, but that would mean we also keep all LLVM types). This was actually the entire point of creating the LLVM dialect rather than translating Standard directly, if you want a historical perspective.

I am open to considering builtin type values in a dialect based on intrinsics, which mostly use simple types, as suggested above, but not to overhauling the type system.

FWIW, I think we should just extend the MLIR builtin types to cover these, it is inevitable that someone will care about these some day. Alternatively, just say that LLVM dialect doesn’t support them at all until someone cares and use that as the forcing function.

-Chris

I’m not entirely convinced, it seems to me that most of the elements in your list can be alleviated, even if it means some changes of infrastructure to have more friendly verifier implementation to support this.
Ultimately the LLVM dialect can have a generic type verifier attached to all of the operations it defines. This does not seem like code that needs to be written over and over but something that would be quite centralized.

In particular:

That’s a good point, I forgot that LLVM evolved its vector type to include the scalable aspect. That said evolving MLIR vector may be something @nicolasvasilache and others would be interested in.

I don’t quite get the complexity here? I would leave the struct type being built with any type, and let the verifier handle.

Why would it explode? If you remove the implicit !llvm. then it becomes explicit, but since we would use mostly builtin types instead of LLVM ones, this wouldn’t be much visible.

What would be harder between converting a builtin integer and a LLVMInteger? Isn’t this just replacing this: https://github.com/llvm/llvm-project/blob/master/mlir/lib/Target/LLVMIR/TypeTranslation.cpp#L103-L106 with the builtin integer?

Thanks @ftynse for that detailed list! My overall takeaway is that it seems doable to rebase on builtin types. There are clearly a number of nuances/tradeoffs, but none seems like a hard blocker. I’ll try to go point-by-point and give what I think are mitigations.

This seems similar to a situation where any random MLIR op only works for i64. Our builtin integer types support si7 as well, but it’s not a problem for an op to only accept a subset of what IntegerType can represent.

We could go Chris’ route and just drop support for those. Or we can keep around the !llvm types that don’t map to builtin types. Either way, these types are very rarely used, so the number of situations where one encounters the awkwardness is highly constrained, so I think both solutions are palatable.

I think the number of situations where one needs to think about this is highly constrained. For example, it would be an op that takes a scalable and non-scalable vector as an operand, which seems very rare. And no LLVM ops take nD vectors (n>1) so that isn’t a concern. We would have !llvm.vector only support scalable vectors then to avoid redundancy.

I don’t think that returning nullptr is needed. We have verifyConstructionInvariants which bottoms out in an assert.

I’m interested to hear more about “extra logic in many places where container types are used, and in the types themselves, to differentiate whether a builtin or a dialect type is used” though. It sounds like it would be mostly mitigated by avoiding redundancy between LLVM dialect types and builtin types that are supported for translation to LLVM IR.

I think this argues that if we allow i32 for LLVM ops, then we should remove !llvm.i32 to remove the need to disambiguate. Seems doable.

Can you clarify what you mean by “handle both kinds of types”? It seems like if we just remove !llvm.i32 and replace it with i32 then only localized modifications of most code is needed.


It sounds like we would need to give up on LLVMType being a base class of all types that one might use in an LLVM dialect op, which gives us less static checking (and depending more on dynamic checks like verifyConstructionInvariants), but I think that’s ok.

It also seems like deleting the !llvm.X type when migrating to !builtin.X (to avoid redundancy) is a recurring theme in the above mitigations.

I would be very careful here such that we don’t paint ourselves into a representational corner; where we can’t evolve the builtin types in the ways we want because we have to support to LLVM use cases. I share the same sense of concern as @ftynse here I believe, i.e. we shouldn’t just paper over edge cases that we may very well have to support at some point anyways. It may make sense to move over the more trivial types, e.g. ints and floats and such, but I would prefer to see more comprehensive thought put into the more complex types. I see a certain value in having the freedom to evolve in a way that best suits in MLIR, even if that breaks conventions established in LLVM.

1 Like

What happens with index though? The vscale example returns an index in the higher level MLIR form which makes sense, so replacing the trivial LLVM types with builtin types doesn’t address this problem. You would still end up in the same situation as now if you didn’t want to hardcode a width, unless I’m missing something.