Aggressive FMA fusion for NVPTX


I propose to override the TLI callback enableAggressiveFMAFusion for the NVPTX backend and return true instead of false. The reason is the same as for PPC: fmul, fmadd and fadd nodes cost the same number of cycles (see, so we can enable more combining heuristics to produce more FMAs. For instance, this pattern would be considered:

// fold (fadd (fma x, y, (fmul u, v)), z) → (fma x, y (fma u, v, z))

cf. commits:

Please tell me what you think.


Looks good to me! Thanks!