FMA canonicalization in IR

If I have my FMA intrinsics story straight now (thanks for the explanation, Hal!), I think it raises another question about IR canonicalization (and may affect the proposed revision to IR FMF):

define float @foo(float %a, float %b, float %c) {
%mul = fmul fast float %a, %b ; using ‘fast’ because there is no ‘fma’ flag
%add = fadd fast float %mul, %c
ret float %add

Should this be:

define float @goo(float %a, float %b, float %c) {
%maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float %b, float %c)
ret float %maybe.fma
declare float @llvm.fmuladd.f32(float %a, float %b, float %c)

Doing this would raise another issue. What about:

   %t = fmul fast float %x, %y
   %u = fmul fast float %t, %x
   %v = fadd fast float %u, %z

If you _first_ canonicalize this to an fmuladd, then you might miss the associative transform to

   %t1 = fmul fast float %x, %x
   %u = fmul fast float %t1, %y
   %v = fadd fast flat %u, %z

So _if_ you add that canonicalization, you somehow have to teach all the visitFMul logic to visitFMuladd as well. Maybe that's actually desirable if the front-end emits fmuladd for C even when fast-math is enabled.