Issue with -fembed-bitcode in OpenMP target offload with CUDA

I'm having some trouble using -fembed-bitcode with OpenMP offloading
to an Nvidia GPU. I'm on LLVM 13.0, commit
ca721042f1c9876eb350da22d1fda44626d2783b. Given this program:

#include <cstdio>

int main(){
  #pragma omp target
    {
    if (omp_is_initial_device()) {
      printf("Hello World from Host.\n");
    } else {
      printf("Hello World from Accelerator(s).\n");
    }
  }
}

Running "clang++ -fopenmp -fopenmp-targets=nvptx64 -fembed-bitcode
main.cpp" results in
fatal error: error in backend: Cannot select: intrinsic
%llvm.nvvm.bar.warp.sync. It does run normally without the
-fembed-bitcode flag though.

If I grep for this intrinsic in the install folder I can find it in
IR/IntrinsicImpl.inc, but for some reason it's not getting picked up
by clang. It could be that I haven't built llvm and clang properly.
What I'm doing now is that I'm building clang once, and then I build
it again with OpenMP enabled using clang as the compiler. Is that the
proper way to build the target offloading plugin?

Thanks,
Nader Al Awar