ROCm module from LLVM AMDGPU backend


I'm trying to launch a GPU kernel which was compiled by the LLVM
AMDGPU backend. Currently I'm having no success with it and I was
hoping someone tuned in on here might have an idea.

It seems that tensorflow is doing a similar thing. So I was reading
the tensorflow code on github and I believe the following setup is
pretty close in the vital parts:

1) Compile an LLVM IR module (see below) with AMDGPU backend to a
'module.o' file. Using this triple/CPU:

llvm::Triple TheTriple;
TheTriple\.setArch \(llvm::Triple::ArchType::amdgcn\);
TheTriple\.setVendor \(llvm::Triple::VendorType::AMD\);
TheTriple\.setOS \(llvm::Triple::OSType::AMDHSA\);

std::string CPUStr\("gfx906"\);

LLVM IR passes that I use:

TargetMachine\->addPassesToEmitFile with CGFT\_ObjectFile

2) LLVM linker generates a shared lib using 'system()' call

ld\.lld \-shared module\.o \-o module\.so

3) Reading this shared module back into a 'vector<uint8> shared'

4) Using HIP to load this module:

 hipModule\_t module;
 ret = hipModuleLoadData\( &amp;module , shared\.data\(\) \);

\(this returns hipSuccess\)

5) Trying to get a HIP function:

 hipFunction\_t kernel;
 ret = hipModuleGetFunction\(&amp;kernel, module, &quot;kernel&quot; \);

.. and this fails with HIP error code 500 !?

I believe the vital steps here concerning ROCm are similar
(identical?) to what's in tensorflow but I don't get it to work.

I have to admit that I did not build tensorflow to see if the AMD GPU
bits actually work. I read the comments and some are saying that it
comes with some performance overhead. Performance isn't the point at
the moment - I'm working on a proof-of-concept.

My test machine has an 'AMD gfx906' card installed.

Digging deeper, the hipModule_t is a pointer to ihipModule_t and
printing out the values after loading the module gives

ihip->fileName =
ihip->hash = 3943538976062281088
ihip->kernargs.size() = 0
ihip->executable.handle = 42041072

It's not telling me much. 'Not sure what to do with the handle for the

Any ideas what could be tried next?


[AMD Official Use Only - Internal Distribution Only]

Your “@kernel” function isn’t a kernel, it’s the default C calling convention. You need to use the amdgpu_kernel calling convention