Questions about llvm.canonicalize

Yes. There is no room for interpretation in the standard that flushing denormals is permitted.

I mean the non-IEEE denormal-fp-math implies no requirements that denormals will be flushed. It indicates the target hardware may be in a hostile operating environment that may, but is not guaranteed, to flush.

Which you ran through IR compiled with the default ieee mode; you’re not expected to get consistent results. The IR’s assertion that the FP environment will support denormals was violated when you enabled FTZ/DAZ. Your IR should have used preserve-sign or dynamic

+1

Yes

We do in fact do this in AMDGPU. We have no native division support, and several different ULP implementations we swap to depending on the denormal mode and !fpmath. The whole reason I need to care about what the denormal mode is going to be is we need to emit different code based on the mode. There are instructions that either ignore the denormal mode, conditionally respect the mode, or requires emitting a different sequence of instructions.

That’s already there. The LangRef states (as IEEE-754 does) that the definition of fabs/fneg/copysign operation is a bitwise only operator:

The returned value is completely identical to the input except for the sign bit; in particular, if the input is a NaN, then the quiet/signaling bit and payload are perfectly preserved.

I thought a little bit about this case when I added the dynamic mode. In principle the attribute is supposed to only tell you about the “default floating-point environment”, which you are not assumed to be operating under strictfp. If we wanted to do anything with the denormal mode in a strictfp function, we would need to have similar mode tracking as for rounding modes. (This is one possible sticking point for AMDGPU fully implementing strictfp fdiv. If a user can flip the denormal mode in the middle of the function, we would need to figure out what the active mode is and restore it after the fdiv expansion)

Then what does the standard mean when it talks about “Applying the identity laws (0 + x and 1× x) only when they preserve numerical results and flags raised”? Apart from denormal flushing, when does applying the identity law for 1× x not preserve numerical results?

I acknowledge that. However, using "denormal-fp-math"="dynamic,dynamic" would not have prevented the problem I ran into because we still eliminate x * 1.0 in that case.

What I meant by saying that targets wouldn’t generate a refinement sequence by default is that the typical programming model for such devices would likely want to favor a fast implementation (as is the case with SYCL and OpenCL). I may be wrong in how broadly I’m applying this, but I believe it is at least common. I suppose when you’re executing in a mode that allows some error in the result, constant folding to the exact result is OK.

I am still concerned about using metadata to describe this mode. If the user wants the faster, approximate division implementation but the metadata gets dropped and you end up using the sequence to get the correctly rounded result, that can be a serious performance problem.

I believe this still shouldn’t be the target default, instead it is up to the OpenCL compiler to setup “fast-math flags” appropriately (possibly using “feature flags” on the (sub)target machine) to opt-in the approximation.

I’m not sure what you mean by “this still shouldn’t be the target default”. What I am saying is that if you call clBuildProgram or clCompileProgram and don’t use the -cl-fp32-correctly-rounded-divide-sqrt the OpenCL specification says that single precision division and square root aren’t required to be correctly rounded and correct rounding isn’t even something devices are required to support. The way the OpenCL specification is written, you opt-in to correctly rounded division rather than opting in to the approximation.

Semantics should dictate the structure, not fears about missed optimizations. If the metadata is significant and dropped, that’s a regular optimization bug that doesn’t warrant a different design. It’s not really different than any of the other droppable optimization hints we have. I’ve fixed a number of places losing fpmath in the past; there are doubtless more instances but these aren’t difficult to deal with.

It’s not clear to me how the semantics dictate that the required/permitted accuracy should be represented as metadata. I guess I have a different perspective since a lot of the work I do is with a downstream implementation, but my interpretation of metadata is that any optimization is allowed to ignore it (with a few exceptions) and if transforming an instruction with metadata it doesn’t understand should discard the metadata. It isn’t clear to me on what basis you can say that dropping the metadata is an optimization bug.

For example, suppose you have this:

  %div = fdiv reassoc nsz float %x, %z, !fpmath !0
  %div1 = fdiv reassoc nsz float %y, %z, !fpmath !0
  %add = fadd reassoc nsz float %div, %div1

That’s going to get optimized into this:

  %0 = fadd reassoc nsz float %y, %x
  %add = fdiv reassoc nsz float %0, %z

The metadata gets dropped, and it’s not clear to me that we have any way of saying that it shouldn’t be dropped. I mean, this entire transformation could be disastrous, but as a user what I probably wanted in this case was to say that the reassociation is safe enough for my purposes and any fdiv can be approximated.