In order to handle subnormals more correctly, changes are necessary to the “denormal-fp-math” string attribute. There are at least 3 different problems I have with the current implementation.
There’s currently no documentation on the meaning of the attribute itself, only the corresponding clang flag (documented at https://clang.llvm.org/docs/UsersManual.html#cmdoption-fdenormal-fp-math). The current description is not particularly clear to me: “Select which denormal numbers the code is permitted to require.” Require in what sense? Does this mean it’s assumed a floating point instruction in a function will never see a denormal input? What are the restrictions on what happens if a denormal is used? Does it mean denormals are required to be flushed by any floating point instruction?
The claim that this “defaults to ieee” is simply untrue. If the flag is not specified, clang does not emit the corresponding attribute in the IR. The one user for code generation of this attribute (https://github.com/llvm/llvm-project/blob/4531aee2ac1609e8ddf4f3deec200c5f793faa7b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L20456) assumes some type of flushing if the attribute isn’t specified, which is incorrect.
In the context of this use, I think what this attribute is intended to mean is the behavior of subnormal outputs from the regular floating point instructions in the default floating point environment. It does not necessarily mean denormal inputs are not allowed or interpreted as zero. Is this a correct interpretation?
The first problem here is assuming a non-IEEE target by default, which doesn’t even match the documented behavior of the flag. In order to fix this without introducing performance regressions, clang needs to start emitting the attribute for platforms where the default floating point mode is known to flush subnormal outputs. What platforms are these? It’s overly difficult to find documentation on what the default mode is on different platforms, and even more difficult to find out the finer points like if it’s a signed flush to zero or not. It would be helpful if interested developers could prepare to handle this switch in the default behavior.
The second problem is this attribute is insufficient to describe the variety of subnormal behaviors. For example on X86 and AMDGPU, the floating point control register provide separate controls for instructions flushing their outputs, and for treating input denormals as 0. ICC for instance provides a separate flag for each of these cases: https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags. If we’re bothering to model this correctly, we might as well try to fully model all of these.
The third problem is that AMDGPU has separate flushing controls for f32, from f64 and f16 instructions. On AMDGPU there’s no performance advantage for turning off f64 denormals (and a fairly limited benefit to turning off f16 denormals from enabling selecting a handful of instructions). However, there is a big advantage to turning them off for f32 on most subtargets. The -cl-denorms-are-zero flag gives the freedom to only flush on the desired types, so ideally these attributes would be broken down per type.
My end goal is to be able to specify denormal flushing per-type that can map into the default initialization fields for the AMDGPU FP mode register, which will also be usable for selection and DAG combines. Currently we use subtarget features for this, which is a bit hacky and I’m trying to replace with some form of attribute (with defaults determined by the calling convention).
With something that looks like the current attribute, we would potentially need (flush input, flush output) * (f16, f32, f64) = 6 attributes to cover the basic types. If you include the more exotic FP types, this would come to 12+ attributes, which is a bit ridiculous.
What would be the preferred form of replacement attributes? I think one attribute per-type that looks something like a bitfield that describes both the input and output denormal behavior would be most preferable. As for bikeshedding issues, what are these attributes called? Should these use the IR names for the FP types, or the MVT names (i.e. -float vs. -f32)? Are these still string attributes, or should this be promoted to a real attribute? I do think the naming scheme should move towards IEEE’s current preferred terminology of subnormal over the commonly used denormal.
I started on some of the work towards some of these fixes in https://reviews.llvm.org/D69598