Mandatory data layout in the LLVM dialect

It has been brought to my attention that the data layout description is mandatory in LLVM IR but we don’t reflect that in the LLVM dialect. Arguably, since the LLVM dialect is supposed to reflect LLVM IR, it should reflect this aspect as well.

We can keep the string attribute with the LLVM layout descriptor or opt for using DLTI and translating that. Thoughts / ideas / oppositions?

cc: @gysit @rolfmorel @nikic

2 Likes

I think a lot of the aspects here have previously been discussed here: [mlir] Do not generate sizeof expressions using GEP · Issue #96047 · llvm/llvm-project · GitHub

I think we should mandata DLTI in any IR that lowers to LLVM IR as the datalayout information is almost certainly required for a lot of lowerings.

Data layout and target triple. And if an MLIR transform makes assumptions on offset calculations (and later lowers to LLVM), it should also follow the same DLTI information that it will lower to LLVM.

And we should stop guessing the data layout string in MLIR and just query the target machine for the correct one, otherwise the back-end in LLVM will get very confused if we get it wrong.

The DLTI representation is already used by the inliner and possibly in other places:

It would make sense to enforce it being present.

Can you clarify where it would be mandatory?

Partial lowering and dialect composition does not make it clear about when and in which condition this would show up.

1 Like

That’s why I’m asking for thoughts… A super-straightforward suggestion is to introduce llvm.module and make the layout and triple inherent attributes of that. Whatever lowering introduces that operation will have to specify them somehow.

1 Like

Having an llvm module may allow us to replace some helper ops that carry module level information such as llvm.mlir.module_flag and llvm.linker_options. So there are potentially more advantages.

I’m not sure what this is achieving though: since you may or may not have a llvm.module, you don’t have the data layout be more mandatory than before.
Instead of checking “does my module have a data layout” you now have to check “do I have a parent llvm.module”.

I think the critical bit is getting the data layout and target information right without duplicating code in MLIR.

For final lowering to LLVM, we need that information to calculate offsets. We can’t just guess, we need to ask LLVM what is the data layout so that we can use it when lowering. This needs target information (at least a triple, but also target features) to build a TargetMachine and get the info from there.

For intermediary lowering that will eventually reach LLVM (ex. vectorization, SROA, inlining), we need the layout information to make correct decisions about the offsets, sizes, padding. If we do that, and we’ll need a TargetMachine to lower to LLVM anyway, then it makes sense to have one in MLIR from the start (or at least from first need).

But how do you get the target information? Clang queries the system if you don’t provide one (or uses the binary/link name) and uses LLVM to build all the target information. We don’t want to duplicate that code. Users can pass that (-march etc or a JSON file), which is what we were discussing in the DLTI threads before.

In the end, IF you use MLIR with LLVM, THEN you have access to the LLVM libraries, target information, parsers, etc. ELSE, you don’t need LLVM’s DL/TI information, but can (not must) still use DLTI dialect to encode your own properties.

2 Likes

I’ve faced a similar issue before, and the solution that made the most sense to me was if we could update the dataLayout member in LowerToLLVMOptions to be set mandatorily to a valid layout and update the *ToLLVM passes to either take that as an option, or extract it from DLTI of the module if one exists.

This way, we let the pass pipeline decide where the DataLayout should be introduced and for cases where the layout is only needed for the final LLVM IR generation, it could just be passed to the last ConvertToLLVM pass and translated appropriately.

I’m relatively new to MLIR, so if this doesn’t make a lot of sense, I apologize in advance.

1 Like

So on the question of “how do we enforce this” - how’s this sound:

  • Any (upstream) passes or patterns that convert to LLVM must have a parent with a llvm.data_layout, which will be used during the conversion
  • An llvm.func has to have a parent with an llvm.data_layout
  • There’s a pass - or just function - for converting DLTI info to LLVM datalayouts and back, such that you can construct your data layout from DLTI (This may require extending DLTI to be able to losslessly represent LLVM datalayouts)
  • Alternatively - or perhaps alongside this - we grow a llvm.target = #llvm.target_machine<triple="...", cpu="...", ...>, which can be queried for its data layout - either automatically or otherwise.
  • Double-alternatively, that target-specification nature could be an interface that returns the arguments to llvm::TargetMachine, allowing downstream compilers to keep their own target spec mechanisms and tying in nicely to how GPU offloading works with its target attributes.
    ?
1 Like

The idea of DLTI is to be flexible and let you represent your target/system in the best way for you. So, I’d say “all of the above”?

However, the target_machine approach is what we proposed last year to extract LLVM targets and it works with downstream projects, too, as they’ll have their own target names and features in LLVM.

For the purpose of this thread, having such a mechanism would allow us to use LLVM’s knowledge of the target (upstream or downstream) to fill up the gaps in the MLIR side without duplicating code or guessing, and given we’re lowering to LLVM IR, you’d expect some version of LLVM would consume that, one with the target you’re requesting.

So, I’d start there, but not discount the other ones, including JSON/YAML files, etc.

IMO if any LLVM op is present in the IR, then we should require a data layout and emit a verification error if not. This information can be injected when calling convert-to-llvm, or as we do with GPU in a pass that simply attaches the information to a module.

I’ve been wanting to introduce a generalization of the GPU target attributes to query: chip, triple, features and target machine. That should be quite straightforward, however, I didn’t do it because I don’t know yet how to make it fit with the current DLTI push happening upstream.

In an ideal world I would like to see a Target subproject to handle all of that information independently of llvm.

4 Likes

Isn’t the data-layout only about “memory”?
That seems a bit overkill to me: that would prevent me from using some limited llvm op as “intrinsics” in my higher-level representation before I inject a data-layout. It limits composability greatly and I’m not sure we need to be that much rigid here.

Yes. But as @gysit showed, there are many things that are intertwined, eg. the inliner iface. Therefore, I’d argue that since we still have a monolithic LLVM dialect design, and it’s supposed to model llvm “faithfully”, we should require it.

If we ever break up LLVM, then we could make more fine-grained decisions.

Now, for GPU intrinsics I think the burden might be smaller. As we could implement the target interface in nvvm.target and rocdl.target and make them fulfill that role.

+1 on keeping things composable. I think there is value in being able to mix and match LLVM ops with other dialects outside of a possible LLVM dialect module possibly in the absence of a data layout. Certain ops such as GEP may be an exception.

1 Like

Agree on composability. The bar is when the code “will be lowered to LLVM”, and it has more to do with “getting it right by asking LLVM what it wants” than using the LLVM dialect itself.

Even GEP can have its uses without a data layout, if you “know what you want” and lower to your own egress. But trying to match LLVM’s expectation is folly, especially with downstream targets, etc.

I think there’s a point of using dialect ops in different contexts. The main problem is that we don’t currently have a mechanism to say “if you run the LLVM inliner, you need DLTI”, other than just returning failure from that pass.

You really don’t want to guess the target half-way through the compilation pipeline because some pass needs it. So, I would propose a soft-fail on passes that need it and a hard fail on lowering to LLVM.

1 Like

It is, but the main problem is getting it right. That’s what I meant by “guessing” above. Currently, MLIR has some hard-coded strings in places that may, or may not, match LLVM’s expectations, because sub-targets can have slight variations and environments (ILP64, etc) also change and come from the target triple.

The only way to get it right is to ask LLVM what’s the correct DL string and for that you need to create a TargetMachine with the right triple, arch and target-features, by which time, you already have the target info in hand.

If you guess and don’t pass it to LLVM it will use the host info on its most basic config and probably not be what you want, or worse, not what you asked for in your own DL string. Silent code generation bugs abound there.

The LLVM dialect would still be composable, we would just add the additional requirement:

To use LLVM, then a target or DL must be there.

That agrees with how llvm is operating nowadays.

More importantly, if an op is needed in an non-LLVM context, then that op should exist outside LLVM. That’s why we are getting ptr and I would argue we should invest time in doing that for all relevant groups of ops (eg. managing structs, arrays…). Now, from the ptr experience I’m not saying we should enforce those requirements right away, as it’s a lengthy process. But we should agree as a community whether to work towards that.


Here’s a draft PR for comments on adding some basic target support:

It takes things like:

module attributes {gpu.container_module} {
  gpu.module @kernel_module1 [#nvvm.target<chip = "sm_70">] {
    llvm.func @kernel(%arg0: i32, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64) attributes {gpu.kernel} {
      llvm.return
    }
  }
}

and adds target info like the DL:

// mlir-opt --pass-pipeline="builtin.module(gpu.module(dlti-set-target-specs))" 
module attributes {gpu.container_module} {
  gpu.module @kernel_module1 [#nvvm.target<chip = "sm_70">] attributes {dlti.dl_spec = #dlti.dl_spec<!llvm.ptr<6> = dense<32> : vector<4xi64>, i64 = dense<64> : vector<2xi64>, i128 = dense<128> : vector<2xi64>, !llvm.ptr = dense<64> : vector<4xi64>, i1 = dense<8> : vector<2xi64>, i8 = dense<8> : vector<2xi64>, i16 = dense<16> : vector<2xi64>, i32 = dense<32> : vector<2xi64>, f16 = dense<16> : vector<2xi64>, f64 = dense<64> : vector<2xi64>, f128 = dense<128> : vector<2xi64>, "dlti.endianness" = "little">} {
    llvm.func @kernel(%arg0: i32, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64) attributes {gpu.kernel} {
      llvm.return
    }
  }
}

Right now, I only implemented the functionality for the gpu.module op and the nvvm target.

I just don’t see why is this requirement needed for the IR validity.

But we’re not trying to build LLVM here, I think that is a key difference. LLVM is a very coupled system where all the IR and the transformation are closed and meant to interact with each other. MLIR is more decoupled, which calls to some differences here.

Isn’t this a bit ironic though? The main point of the DL is to handle the memory layout, which matters especially for the pointer manipulation, and this is the one thing you’re trying to move out of the LLVM dialect right now. That means that managing pointers would not be covered by the proposed DL requirement for LLVM dialect!