How to identify an instruction that has been lowered by an intrinsic?

Hi I want to know given an instruction whether it has been lowered by an intrinsic? I am trying to prevent the backend from generating FMA’s for consecutive fadd and fmul, when they are being lowered by an intrinsic and when ffast-math is enabled.

Before lowering Intrinsic:

define dso_local noundef <8 x float> @_Z24simd_evaluate_polynomialDv8_fRKSt5arrayIS_Lm10001EE(<8 x float> noundef %x, %"struct.std::array"* noundef nonnull align 32 dereferenceable(320032) %coeff) local_unnamed_addr #0 {
entry:
  %call = call fast fastcc noundef <8 x float> @_ZL14_mm256_set1_psf(float noundef 1.000000e+00)
  %call1 = call fast fastcc noundef <8 x float> @_ZL14_mm256_set1_psf(float noundef 0.000000e+00)
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %res.0 = phi <8 x float> [ %call1, %entry ], [ %call5, %for.body ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %power.0 = phi <8 x float> [ %call, %entry ], [ %call4, %for.body ]
  %cmp = icmp ult i32 %i.0, 10001
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond
  ret <8 x float> %res.0

for.body:                                         ; preds = %for.cond
  %conv = zext i32 %i.0 to i64
  %call2 = call noundef nonnull align 32 dereferenceable(32) <8 x float>* @_ZNKSt5arrayIDv8_fLm10001EEixEm(%"struct.std::array"* noundef nonnull align 32 dereferenceable(320032) %coeff, i64 noundef %conv) #4
  %0 = load <8 x float>, <8 x float>* %call2, align 32, !tbaa !5
  %call3 = call fast fastcc noundef <8 x float> @_ZL13_mm256_mul_psDv8_fS_(<8 x float> noundef %0, <8 x float> noundef %power.0)
  %call4 = call fast fastcc noundef <8 x float> @_ZL13_mm256_mul_psDv8_fS_(<8 x float> noundef %power.0, <8 x float> noundef %x)
  %call5 = call fast fastcc noundef <8 x float> @_ZL13_mm256_add_psDv8_fS_(<8 x float> noundef %res.0, <8 x float> noundef %call3)
  %inc = add i32 %i.0, 1
  br label %for.cond, !llvm.loop !8
}

After lowering and Inlined.

IR Dump After InlinerPass on (_Z24simd_evaluate_polynomialDv8_fRKSt5arrayIS_Lm10001EE) ***
; Function Attrs: mustprogress uwtable
define dso_local noundef <8 x float> @_Z24simd_evaluate_polynomialDv8_fRKSt5arrayIS_Lm10001EE(<8 x float> noundef %x, %"struct.std::array"* noundef nonnull align 32 dereferenceable(320032) %coeff) local_unnamed_addr #0 {
entry:
  br label %for.cond

for.cond:                                         ; preds = %for.body, %entry
  %res.0 = phi <8 x float> [ zeroinitializer, %entry ], [ %add.i, %for.body ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %power.0 = phi <8 x float> [ <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %entry ], [ %mul.i14, %for.body ]
  %cmp = icmp ult i32 %i.0, 10001
  br i1 %cmp, label %for.body, label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond
  ret <8 x float> %res.0

for.body:                                         ; preds = %for.cond
  %conv = zext i32 %i.0 to i64
  %call2 = call noundef nonnull align 32 dereferenceable(32) <8 x float>* @_ZNKSt5arrayIDv8_fLm10001EEixEm(%"struct.std::array"* noundef nonnull align 32 dereferenceable(320032) %coeff, i64 noundef %conv) #2
  %0 = load <8 x float>, <8 x float>* %call2, align 32, !tbaa !5
  %mul.i = fmul fast <8 x float> %power.0, %0
  %mul.i14 = fmul fast <8 x float> %x, %power.0
  %add.i = fadd fast <8 x float> %mul.i, %res.0
  %inc = add i32 %i.0, 1
  br label %for.cond, !llvm.loop !8
}

I don’t think it is possible to identify the origin of inlined instruction in a reliable way.
The “intrinsic” is just a one-line function defined in clang/lib/Headers/avxintrin.h.
I.e. it is no different from other functions, except for its attributes, which are:
__always_inline__, __nodebug__, __target__("avx"), __min_vector_width__(256)))

You can’t go backwards from IR to the exact reason it was inserted. You’re also going about this backwards; the fast flags here specifically allow the fusion into FMA.

@arsenm Thanks for the reply, That is what I would like to prevent, I would like it to not generate FMAs for fmuls and fadds lowered by an intrinsic. Is there any other way I could achieve that?

I think -ffp-contract=off will turn off all FMAs (not just those inspired by intrinsics).

Yeah, I don’t to turn off all of them, only the intrinsic ones.

Well, you can’t. You could disable contraction per-function, if you only concerned with clang (gcc doesn’t support this properly, AFAIA), but that’s it. If you okay with doing at per-function level and you don’t really need them inlined for performance reasons you can also separate them in a separate TU and compile that particular file with -ffp-contract=off.

1 Like