Adding FP environment register modeling for constrained FP nodes

Hi Hal,

Thanks for the guidance. I hope you don’t mind that I’m adding LLVMDev to this e-mail thread, as it seems as though it may be of general interest.

I agree that duplicating the FP opcodes should be our goal. I just wasn’t sure that was entirely possible. I’ll try adding implicit defs in the way you’ve suggested, but I’m concerned that there may be code that relies on the TII for that kind of thing – for instance, InstrEmitter::EmitMachineNode() does this:

bool HasPhysRegOuts = NumResults > NumDefs && II.getImplicitDefs()!=nullptr;

where “NumDefs” comes from TII and “NumResults” comes from the node. Obviously we can fix that up as needed, but it seems like a weak point in the design. Perhaps it is still better than trying to maintain a duplicate set of opcodes though.

I’m still trying to piece together how to get the set of nodes to be updated from the SelectionDAG to the InstrEmitter. I’m still learning my way around this code.

In any event, I can confirm that for X86 targets the control register uses are not currently modeled. I just committed a patch yesterday adding the MXCSR register and updating the instructions that directly read and write it (but still implicitly so). I suppose you are correct that there is no reason not to add uses of that register to the instructions that derive their rounding behavior from it and then the constrained FP intrinsics will just need to add implicit defs where needed. I’ll also need to add the x87 control register as that isn’t modeled at all right now.

Thanks,

Andy

SGTM. Good point. I think it is better to update (fix) code that does not handle dynamically-added implicit operands than trying to handle duplicated opcodes all over the place. FWIW, having code with InstrEmitter with this kind of assumption does not surprise me particularly (at that point in the pipeline, nothing else would have added any dynamic implicit defs yet). I’m also happy to think about other ways to do this. We could have the instructions, by default, carry full dependencies and then relax them as desired (instead of the other way around). We should probably also enumerate what we’re trying to do here. For example, I can’t CSE (or hoist, etc.) a FP-operation across a call boundary that might change the rounding mode (if the call might change the rounding mode and the FP instructions read it) or if the call might query the FP environment (and the FP operations are tagged as writing it). Makes sense to me.

We should probably also enumerate what we’re trying to do here. For example, I can’t

CSE (or hoist, etc.) a FP-operation across a call boundary that might change the

rounding mode (if the call might change the rounding mode and the FP instructions

read it) or if the call might query the FP environment (and the FP operations are

tagged as writing it).

This is a very good point!

Having added implicit defs to the instructions that read and write the SEE FP control register (MXCSR), if I also add the implicit uses of that register by the FP operations (as you said PowerPC already has), then I won’t need to do anything to prevent code motion across function calls or instructions that change the rounding mode.

However, I also need to prevent code motion of FP operations relative to instructions that read the FP exception status. I could accomplish this by modeling the fact that FP operations def the FP exception status, but I don’t want to inhibit code motion of FP operations relative to one another (at least not in the non-constrained case, and it’s probably not even necessary in the constrained case). I think if I model the FP instructions as having a def but not a use of the status register that would do what I need it to.

I believe we agreed at the dev meeting that we don’t intend to guarantee the order in which FP operations are executed relative to when FP exceptions occur.

The SSE and later instructions introduce a slight wrinkle in that the same register is used (implicitly) for control and status, but I don’t see a reason why we couldn’t model it as two different registers since it isn’t referenced directly anyway. Technically the FP instructions do read the exception status bits, but I think we can ignore that since they never clear bits, only set them.

So maybe I don’t need the constrained FP handling to do anything at all if the default handling can do all of the following without loss of performance:

-Model defs for instructions that write the FP control bits

-Model uses of the FP control bits for all FP operations

-Model defs of the FP status bits for all FP operations

-Model uses of the FP status bits for instructions that read them

Does that sound correct?

I think that only concern that would leave for constrained FP handling is that we need to make sure that no FP instructions are speculatively executed, but I don’t think the implicit def/use modeling would help with that anyway. I’ve been operating on the assumption that it just doesn’t happen right now (which appears to be true), but I’m not sure there is anything that prevents it.

-Andy