Hi Andrew,
sorry for the delay, I only now got some time to look into this a bit more. But I still have a number of questions of how to actually implement this in the back end. Looking at this bottom-up, starting with the behavior of the actual machine instructions, we have (at least on SystemZ) the following things to consider:
A) Rounding mode
Most FP arithmetic instructions use the “current rounding mode” as indicated in the floating-point control register. This is currently assumed to never change. To fix this, we need to avoid scheduling FP arithmetic instructions across instructions that modify the rounding mode. This may also imply avoiding scheduling instructions across function calls, since those may also modify the rounding mode. This can probably be done by modeling the floating-point control register as LLVM register (or maybe model just the rounding mode bits as its own “register”), have all FP arithmetic instructions in question take this new register as implicit input, and have the register by clobbered by the instructions that change the rounding mode (and also function calls).
B) Floating-point status flags
FP instructions set a flag bit in the floating-point status register whenever an IEEE exception condition is recognized. If these flag bits are later tested by application code, we should ensure their value is unchanged by compiler optimization. Naively modeling the status register is probably overkill here: since every FP instruction would need to be considered to modify (i.e. use and def) that register, this simply has the effect of creating a dependency chain across all FP instructions and makes any kind of instruction scheduling impossible. But this isn’t really necessary since the flag bits actually simply accumulate. So it would suffice to have special dependencies from each FP instruction separately directly to the next instruction (or routine) that reads the status flags. However, I don’t really see any easy way to model this type of dependency in the back-end (in particular on the MI level).
C) Floating-point exceptions
If a mask bit in the floating-point status register is set, then all FP instructions will trap whenever an IEEE exception condition is recognized. This means that we need to treat those instructions as having unmodelled side effects, so that they cannot be speculatively executed. Also, we cannot schedule FP instructions across instructions that set (those bits in) the FP status register – but the latter is probably automatically done as long as those latter instructions are described as having unmodeled side effects. Note that this will in effect again create a dependency chain across all FP instructions, so that B) should be implicitly covered as well here.
Did I miss anything here? I’m assuming that the behavior on FP instructions on Intel (and other architectures) will be roughly similar, given that this behavior is mostly defined by the IEEE standard.
Now the question in my mind is, how this this all map onto the experimental constrained intrinsics? They do have “rounding mode” and “exception behavior” metadata, but I don’t really see how that maps onto the behavior of instructions as described above. Also, right now the back-end doesn’t even get at that data in the first place, since it is just thown away when lowering the intrinsics to STRICT_… nodes. In fact, I’m also not sure how the front-end is even supposed to be setting those metadata flags – is the compiler supposed to track calls to fesetround and the like, and thereby determine which rounding and exception modes apply to any given block of code? In fact, was the original intention even that the back-end actually implements different behavior based on this level of detail, or was the back-end supposed to support only two modes, the default behavior of today and a fully strict implementation always satisfying all three of A), B), and C) above?
Looking again at a possible implementation in the back-end, I’m now wondering if it wouldn’t after all be better to just treat the STRICT_ opcodes like all other DAG nodes. That is, have them be associated with an action (Legal, Expand, or Custom); set the default action to Expand, with a default expander that just replaces them by the “normal” FP nodes; and allow a back-end to set the action to Legal and/or Custom and then just handle them in the back-end as it sees fit. This might indeed require multiple patterns to match them, but it should be possible to generate those via multiclass instantiations so it might not be all that big a deal. The benefit would be that it allows the back-end the greatest freedom how to handle things (e.g. interactions with target-specific control registers).
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
