Thanks for your reply. I will try implementing atan using libm. I have one more question, does libm work on GPUs? Should I use another implementation of std.atan for GPUs?

GPUs typically have special function units where trigonometric functions are performed. So typically you can lower it to the corresponding intrinsics. For example, for Vulkan, it would be converting to spv.GLSL.atan.