MLIR omp.target for gpu offloading

omp.target is a MLIR operation which is used to model OpenMP target Construct . Please remember, that we model the OpenMP operation, not the GPU kernel. We need to follow OpenMP standard which defines execution model of target construct, If you are interested only in GPU kernels I think you should consider 'gpu' Dialect - MLIR

We use omp.attributes to pass information which is used for lowering to LLVM IR. We need to pass information about: target device, host file, etc. That’s why I encouraged you to generate MLIR file by flang-new because it will generate a set of attributes which are required for lowering to the LLVM IR.

Once you have a valid MLIR file then you can use mlir-translate tool to generate LLVM IR and then you can use LLVM tools to convert LLVM-IR to the binary code.

Ok, thank you. I will try to generate MLIR files using flang-new. I am particularly interested in the OpenMP construct because I wanted to actually develop lowering for another hardware and wanted to first see what the flow for the GPU looks like. We do not have a CUDA like kernels and keep the programming construct generic.

If I execute the following, I get an error for -save-temps:

./flang-new  -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target -march=gfx1010 -save-temps -v test.f90
error: unknown integrated tool '-cc1as'. Valid tools include '-fc1'

The last command executed is:

flang-new -cc1as -triple x86_64-unknown-linux-gnu -filetype obj -main-file-name test.f90 -target-cpu x86-64 -fdebug-compilation-dir=/home/darby/llvm-project/flang/build/bin -dwarf-version=5 -mrelocation-model pic -mrelax-all -o test-host-x86_64-unknown-linux-gnu.o test-host-x86_64-unknown-linux-gnu.s

You can skip that issue, because you generated two MLIR files before an error occured. The file for the GPU: test-openmp-amdgcn-amd-amdhsa-llvmir.mlir and for the host: test-host-x86_64-unknown-linux-gnu-llvmir.mlir . You can compare them, you can see the differences and you are ready for your own experiments.

Hi, I see the attributes in the mlir files.
I use the following module attributes in my mlir file:
module_attributes {"llvm.data_layout = "", llvm.target_triple = "x86_64-unknown-linux-gnu", omp.is_device = false, omp.target = #omp.target<target_cpu = "x86_64", target_features = "">"}

But when I use mlir-translate with --mlir-to-llvmir, the offloading entries are not generated. Please could you tell me how to create the offloading entries structure in mlir? Thank you

module_attributes {"llvm.data_layout = "", llvm.target_triple = "x86_64-unknown-linux-gnu", omp.is_device = false, omp.target = #omp.target<target_cpu = "x86_64", target_features = "">"} these attributes are for host (x86) side generation. Host side is responsible for the kernel call. If you want to see the GPU related LLVM IR code please use the command:
mlir-translate --mlir-to-llvmir test-openmp-amdgcn-amd-amdhsa-llvmir.mlir

Yes, they are for the host. But aren’t the offloading entries are generated for the host not for the device? But they are not being generated for the host.

Could you share your mlir file?
For the host *.ll file (generated from test-host-x86_64-unknown-linux-gnu-llvmir.mlir ) you should see:

  1. Definition of the function which contain omp target pragma
  2. Definition of the target function which is executed if given offload device is not present
  3. Module target triple points to x86 target triple.

For the device *.ll file (generated from: test-openmp-amdgcn-amd-amdhsa-llvmir.mlir ) you should see:

  1. Definition of kernel function (something like that: define weak_odr protected amdgpu_kernel void )
  2. The module target triple should be set to GPU triple for example: target triple = “amdgcn-amd-amdhsa”

ok, I attach two mlir files here, one for the host and one for the device. For both I don’t see the offload entries in the *.ll file generated using mlir-translate. Ideally one should have the offloading entry in the .ll file.

%struct.__tgt_offload_entry = type { ptr, ptr, i64, i32, i32 }
!omp_offload.info = !{!2}

omphostllvm.txt (2.7 KB)
omptargetllvm.txt (2.8 KB)

could you try by swapping the omp.is_device to omp.is_target_device in the test files you’ve presented in the last post, I believe we changed it recently to this (there is also an omp.is_gpu, but that’s not necessary in this example i think) that might give you the correct results. I’ve only tested this on a downstream version currently unfortunately, but it yields the correct results by emitting the offload info, so hopefully enough is upstreamed for that to work.

You may also encounter an error when lowering the device side of your code currently related to this: Request for input on a fix for a bug utilizing omp.TargetOp in conjunction with mlir-translate's -split-input-file - #5 by agozillon if you started to use the split input command, you can work around it I believe by adding location information to the target op, via a mlir loc attribute. But if you don’t use the split input command you should be fine.

I tried to swap to omp.is_target_device. Unfortunately it doesn’t work for me. I use upstream llvm at commit 7cc57c07e36fc6b4d176cebb28a9bbe637772175 in the master branch.

Thank you for actively trying to help me out.

I think there is a possibility that the commit is too far behind (May-ish? if I read it correctly) to have the changes, a lot of the work on OpenMP offloading takes a while to filter into upstream and actually being able to generate a kernel that offloads from a TargetOp is a fairly recent addition and still a WIP (e.g. I am working on map support, and i think the initial device side generation support landed in the last month or so, although others can perhaps refute that claim).

That’s no problem at all, I am building an upstream llvm-project to see if the offload info is emitted in the most recent commit, and then I’ll regress it to the commit you referenced and see if it does the same.

Yes, that commit is unfortunately too far back to get the desired output sadly. The upstream top-of-the-tree version does seem to emit most of the offload info, but it is missing at least the __tgt_kernel_arguments that the downstream version I have contains, but again it’s a WIP!

Hopefully the original flang command mentioned in this thread would also work on top-of-the-tree as well, not tested it though, but it’s very likely.

ok, so what would I need to get the offloading to work without the flang driver? Because I only want to start from MLIR files and not from fortran source code.
If I update to the latest commit would things work? I am not sure what __tgt_kernel_arguments is about.
Thank you!

Updating to top-of-the-tree will get you the desired offload information I believe without having to go through the flang driver, just via the mlir-translate command specified above.

I believe actually offloading will require this patch: ⚙ D155633 [OpenMP][OpenMPIRBuilder] Add kernel launch codegen to emitTargetCall to land however. But you can always apply it yourself to your branch. Your mileage will still vary though as I think the branch would then be at a state where it will launch the kernel and allow you to write to/from a scalar but not much more (as that’s the test case we’ve been focused on for the first offloading step, but other things could possibly work, but anything complex is still a while away). @jansjodi is very likely the best person to answer what the upstream kernel launch can currently do, so can correct me if I’ve said anything wrong.

Once that patch (⚙ D155633 [OpenMP][OpenMPIRBuilder] Add kernel launch codegen to emitTargetCall) lands, it will be good to write up an example in the docs page of both Flang (using the flang-driver) and MLIR (using mlir tools). And also add an integration test if possible.

1 Like

@darbyShaw I noticed that your patch had no dlti attributes. dlti attributes denote data layout and target information. More information can be found here: 'dlti' Dialect - MLIR
I manually added some dlti attributes and I am able to generate LLVM IR (mlir-translate --mlir-to-llvmir target.mlir ). I used upstream build with no additional patches. Please see attached file.
target.mlir.txt (3.4 KB)

When I set omp.host_ir_filepath to the host bitcode file I get the following error:

"Error of kind: 0 when emitting offload entries and metadata during OMPIRBuilder finalization"

I believe that’s because the device code is not supposed to have the original host function containing the target region in it by the time it hits the finalization of offload info that’s why the error is trirggered.

When compiling from Flang for device currently it’ll remove the original function holding the target region and outline the device function into a new seperate function, this is to avoid optimisations breaking things for the moment and it’s done via a pass called OMPEarlyOutlining that’s currently part of Flang’s optimisation suite.

There’s a couple of passes that help the device with outlining at the moment. We’re focused on the code path of Flang → LLVM-IR right now rather than hand-writing MLIR and having it convert to LLVM-IR unfortunately so there is likely going to be a few rough edges unfortunately.

You could perhaps hook into the optimisation passes we use to get the same results for the device code (via fir-opt for the moment, but the passes only care about OMP dialect constructs I believe so it should in theory work fine without any fir dialect), which is probably the best way to keep the device code consistent. Or add the following MLIR attribute to the originating function for your device MLIR: omp.declare_target = #omp.declaretarget<device_type = (host), capture_clause = (to)> this should allow the compiler the right to remove the function from the device module (@skatrak worked on the function discard segment so he can correct me if I am wrong), it states the function is a host function, so on the device pass (indicated by the module attribute is_target_device) it will likely get rid of it.

Posting the solution here in case someone else is having the same issue.

The error:

"Error of kind: 0 when emitting offload entries and metadata during OMPIRBuilder finalization"

happens because when building the device module, the OpenMP IR builder can’t find the appropriate target entry in the host IR.

Why? The target entry contains the source location of the target region, including the source file UniqueID specified by the front end, thus, a mismatch in the location info makes the OMP IRBuilder fail.

When Flang creates both modules, it uses the same source location -ie. myfile.f90:line; thus, the OpenMP IR builder can find the entry and succeeds, because the file IDs match.

When using mlir-translate, there are two file IDs, one for the host (host.mlir) and one for the device module (dev.mlir); thus, the entries disagree.

How to fix the issue and make mlir-translate work? Add explicit location attributes to the target region in both modules with the same location, ie:

// host.mlir
module ... {
  omp.target ... {
    ...
  } loc(#target_loc)
}
#target_loc = loc("path-to-host.mlir":line:col)
// dev.mlir
module ... {
  omp.target ... {
    ...
  } loc(#target_loc)
}
#target_loc = loc("path-to-host.mlir":line:col)