LLVM is optimizing the intrinsic code as well

We have some code which is manually written as intrinsics. But LLVM is trying to optimize further because of --fast-math flag. Manual intrinsic is better compared to LLVM optimized one. Example source code:

inline __m256 simd_evaluate_polynomial<__m256, APPROX_DEFAULT>(__m256 x, const std::array<__m256, APPROX_DEFAULT + 1>& coeff)

{

  __m256 power = _mm256_set1_ps(1.0f);

  __m256 res = _mm256_set1_ps(0.0f);

  for (unsigned int i = 0; i <= APPROX_DEFAULT; i++) {

    __m256 term = _mm256_mul_ps(coeff[i], power);

    power = _mm256_mul_ps(power, x);

    res = _mm256_add_ps(res, term);


  }

  return res;

}

For above function LLVM ASSEMBLY

Address Source Line         Assembly            CPU Time: Total CPU Time: Self

0x1402bbf7d      0              Block 1:                

0x1402bbf7d      19           vmovaps ymm5, ymmword ptr [rip+0x50e4b5b] 0.1%      15.584ms

0x1402bbf85      19           vfmadd213ps ymm5, ymm3, ymmword ptr [rip+0x50e4b32]         0.1%      15.595ms

0x1402bbf8e      19           vfmadd213ps ymm5, ymm3, ymmword ptr [rip+0x50e4b09]         0.6%      93.654ms

0x1402bbf97      19           vfmadd213ps ymm5, ymm3, ymmword ptr [rip+0x50e4ae0]         0.2%      31.178ms

0x1402bbfa0      21           vfmadd213ps ymm5, ymm3, ymmword ptr [rip+0x50e4ab7]         0.3%      46.992ms

Can anyone please explain this why this is happening?