We observe that in some situations LLVM generates a zero-extending subword load followed by a sign-extension, where it would be obvious to use a sign-extending load instead.
This is a short Godbolt demo on x86 and the same on RISC-V: function test1() uses a zero-extending byte load, followed by a sign extension (shift-left/shift-right pair on RISC-V), whereas we expect that it should be equivalent to function test2() which directly uses a sign-extending byte load.
The LLVM IR of both functions is equivalent, except that test1() has a freeze on the result of the load i8 operation (the freeze originates from InstCombine in this fragment; function test2() follows a different optimization path where it does not receive a freeze).
The freeze operation seems to prevent the selection of a sign-extending load, but is this expected? The freeze results in a “COPY” but this is killed (I assume because it is taken together with the load).
Freeze followed by sign-extend guarantees all the extended bits are equal. A sign-extending load followed by a freeze does not guarantee that. We currently don’t have any target-independent way to represent a “freeze and sign-extend” load, so we don’t optimize. If it’s a common pattern, it might be worth adding something.
Yes. But note that the instcombine transformation of the select does not require that all the extended bits are equal. The freeze starts out on the condition of the select, and is then propagated upward. When the freeze is propagated from the sign-extend operation to its argument, this additional requirement is imposed, so it could also be an idea to not always propagate the freeze over an extension operation.
This could indeed help to make more freeze operations zero-cost. I assume on almost any architecture the “freeze and sign-extend” load will just map onto the regular sign-extending load?
A sign-extending load followed by a freeze does not guarantee that.
That might be true in IR, but we’re talking about instruction selection here. By the time you get to a MachineInstr representing a RISC-V sign-extending load, surely that should have the same behavior as the hardware instruction, which will always have all the extended bits equal?
I have seen similar problems in the AMDGPU backend. It seems to me that freeze in IR is well defined but ISD::FREEZE in SelectionDAG is not well defined (after all SelectionDAGBuilder maps both UndefValue and PoisonValue to ISD::UNDEF) and just tends to get in the way.
Yes, it should be fine to lower the sequence to a sign-extending load instruction. We’d only need “freeze and sign-extend” in SelectionDAG.
The most recent discussion of this I can find is funnel shift, select, and poison - #26 by nlopes … not sure we reached a formal conclusion, but basically, poison exists in SelectionDAG, but not later.