I am understanding OpenCL compilation flow on GPU in order to develop OpenCL runtime for a new hardware.
I understood that OpenCL compiler is part of a vendor’s runtime library which is the heart of OpenCL. Since OpenCL kernel is compiled at runtime, hence at high level its compilation takes place in two steps:
i. source code is first converted to intermediate code.
ii. intermediate code is then translated to targeted binary code.
let say for example, we have a OpenCL kernel source code vectorAdd_kernel.cl :
- OpenCL compilation flow on Nvidia GPUs
a. vectorAdd_kernel.cl is first translated to LLVM IR using clang and
b. LLVM IR is converted into optimized LLVM IR using LLVM optimizer.
b. optimized LLVM IR is then translated to vectorAdd_kernel.ptx using Back-end
c. vectorAdd_kernel.ptx is then translated to vectorAdd_kernel.bin file using JIT. Nvidia uses JIT to get benefit in-case when next-generation GPUs are encounterd.
- OpenCL compilation on AMD GPUs
a. vectorAdd_kernel.cl is first translated to LLVM IR using gcc/clang
b. LLVM IR is then converted into optimzed LLVM IR using LLVM optimizer.
c. optimized LLVM IR is then converted into AMD IL.
d. AMD IL is then converted into AMD ISA using shader compiler (GPU JIT).
I understand that AMD uses back-end compilation as part of JIT, instead Nvidia which uses back-end separate from JIT.
Is that correct? If it is so then what are the advantages of using JIT separate from back-end?
Thanks for your comments/opinions,