Here’s a brief recap of the conversation for discussion.
End goal:
A single robust pipeline for
gpu
code generation without the current shortcomings of the current pipeline. Where the heavy lifting of device code compilation is performed mostly by LLVM infra.
This pipeline will be slowly built across many patches and discussions, as it involves moving certain bits from clang
to llvm
as well as creating some components.
Concrete proposed changes:
- The introduction of target attributes to
gpu.module
, this attribute will hold device target information about the module, such as if it’snvvm
orrocdl
, as well as target triple, features and arch. This could eventually lead to the removal of--convert-gpu-to-(nvvm|rocdl)
in favor of a singlegpu-to-llvm
. The format for such attribute might look like:
gpu.module @foo [nvvm.target<chip = "sm_70">] {
...
}
gpu.launch_op
will not longer be lowered bygpu-to-llvm
, but by a different pass. Allowing a more flexible handling of this op, as there are many ways to launch a kernel (cudaLaunchKernel
,cudaLaunchCooperativeKernel
, etc), and not a 1 to 1 mapping between this op and LLVM.- The introduction of
--gpu-embed-kernel
, this pass will have to be executed aftergpu-to-llvm
and will serialize thegpu.module
to an LLVM module. Why a separate pass? To allow running passes over the full LLVM MLIR IR, ie:
builtin.module {
gpu.module ... {
llvm.func @device_foo ...
}
llvm.func @host_foo ...
}
- Migrate the current serialization pipelines into this
gpu
code gen structure, while addressing some of the shortcomings of the current serialization passes like the lack of general device bitcode linking in trunk. Allowing downstream users to uselibdevice
without having to patch the tree to obtain this functionality. - Once the work on the LLVM infra side is ready, migrate all
gpu
MLIR code compilation into this pipeline.
No JIT or AOT functionality will be lost at any point, we’ll only gain features. Upon agreement the first 4 items, could be rolled in the coming weeks.
Things outside this proposal that are also open for discussion:
- Migrating from the cuda driver API to the cuda runtime API.