I’m attempting to teach LLVM that the x87 does not actually have either 32-bit or 64-bit floating-point registers, but only 80-bit ones. I am trying to do this by removing the fake floating-point registers and setting floating-point operations to Promote when only the x87 is available. The goal is to fix Wrong optimization: instability of x87 floating-point results leads to nonsense · Issue #44218 · llvm/llvm-project · GitHub.
However, it appears that Promote is not actually implemented for anything other than f16, which seems to use some hacky FP16_TO_FP and FP_TO_FP16 instructions.
Do I have to do the same thing that was done for f16, and implement FP32_TO_FP and FP_TO_FP32 (and similar for f64), or is there a better way? I’m feeling quite lost.
Are you looking at Promote in LegalizeFloatTypes or LegalizeDAG? Passing Promote setOperationAction uses LegalizeDAG.cpp not LegalizeFloatTypes.cpp.
Windows configures x87 to 64-bit mode instead of 80-bit mode so do be aware of that.
I was fiddling with TargetLoweringBase, using
setTypeAction(MVT::f32, TypePromoteFloat) and setting the register types to f80. This causes errors in
GetPromotionOpcode() in LegalizeFloatTypes.cpp.
I am aware that Windows sets the x87 unit to 53-bit precision by default. (Of course, it also retains the full exponent range of the 80-bit version. It’s just the precision that’s altered.) This is irrelevant for my purposes, I’m just trying to get it to avoid rounding when spilling by having it explicitly transform 32-bit/64-bit ops into widening loads + 80-bit ops.
I think it’s great that you’re trying to tackle the problem with x87 math! (though, it would’ve been even better if we had solved this a decade or two ago, before FP with SSE2 was ubiquitous )
I don’t know the answer to your question, sorry. But, besides that, I’m worried that addressing this issue at the SelectionDAG level cannot actually solve the problem fully. E.g. at the LLVM-IR level, we’ll optimize away store/load of a double value, under the assumption that doing so is a no-op. But, since the ‘double’ type in IR can also secretly hold x86_fp80 values, that IR-level optimization also changes program semantics, and could cause the same sorts of misoptimization issues.
Maybe it would be better to solve this issue in the frontend – have the frontend code-generator (Clang, Rust, etc) never emit any IR doing FP operations on float/double. Instead, always explicitly emit x86_fp80, and fptrunc/fpext where required.