[Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter

Dear All,

I am trying to custom lower 32-bit ISD::SHL and SHR in a backend for 6502 family CPUs. The particular subtarget has 16-bit registers at most, so a 32-bit result is not legal. Normally, if you mark this as “Legal” or “Expand”, then it will expand the node into a more nodes as follows in an example:

shl i32 %a , 2

=> high_sdvalue = (or (shr %b, 14), (shl %c, 2) )

=> low_sdvalue = (shl %b, 2)

where %b would be the lower 16 bits of %a, and %c is the upper 16 bits of %a.

Since this target doesn’t have a barrel shifter, it would be a huge performance hit to construct the DAG in this way, due to the large number of right shifts). Thus, I wish to mark ISD::SHL/SHR as “Custom” in setOperationAction(), but I’m unsure of how to manually split the 32-bit shiftee SDValue into two 16-bit values. The standard logic in /lib/CodeGen/SelectionDAG/LegalizeTypes.cpp and LegalizeIntegerTypes.cpp has a member function called GetExpandedInteger(). Unfortunately, this is a private method, so I can’t gain access to this without hacking on code outside of my target’s (/lib/Target/x65 in this case). To be 100% clear, I actually already did attempt this route, and when compiling received an error that GetExpandedInteger was a private method.

What I would ultimately like to lower this to is probably a 16-bit ISD::SHL_PARTS, which I’d then further lower to a sequence of pairs of { shift left, rotate left through carry} instructions. Example of how that would look in relation to the aforementioned example:

asl %b

rol %c

asl %b

rol %c

There doesn’t seem to be a set of standard node types analogous to ADD / ADDC / ADDE for shift operations, which is really what I’m after here.

I would very much appreciate any advice or insight you could provide on this matter.

Thanks,

~Dave Waggoner / MathOnNapkins

I had a similar problem with a backend for the 68HC12 family which also has no barrel shifter. Some 68HC12 CPUs support shift for just one of the 16-bit registers and only support rotation on the 2 8-bit subregs of that 16-bit register. That means the only practical solution for 32-bit shifts is to lower to a libcall but my situation for 16-bit shifts sounds similar to yours for 32-bit shifts.

I defined pseudo-instructions in my InstrInfo.td that implemented 8 and 16-bit SHL, SRL and SRA with one variant of each for the shift count in a register and another for an immediate shift count. Some shifts by immediate values and all shifts by non-immediate values need to be implemented using a loop so I used a custom inserter. I added a pass to expand the remaining shift pseudos into shift and rotate sequences with masking where required (e.g. replacing SRL x,7 with ROL x,2)

I've no idea if that's the best way but it seems to work for me.

Steve

I forgot to mention that I used EXTRACT_ELEMENT in my backend to get the high and low parts of an SDValue.

Hi Steve,

Thanks for confirming that EXTRACT_ELEMENT is something I can use. I had seen it in the generated DAGs but was unsure whether I was “allowed” to use it, if that’s the right word. I checked up on it more and indeed the mainstream targets like ARM use that node type in custom lowering code, so that should solve that. Perhaps in the future I might submit a patch for TargetLowering to set whether the target has a barrel shifter, because I believe that should alter the default emitted DAGs considerably. For example, I tried lowering SHL %a, 5 to a series of SHL %a, 1’s at one point and the framework somewhat hilariously (to me) recombined them. If CodeGen knew not to do that, it could save some effort for people trying to target instruction sets with this limitation. The only current mainstream target without a barrel shifter, though, is the MSP430, and it’s still a bit experimental (though a very helpful example for me at times).

Could I also ask whether you had to design your shift nodes with Glue or Chain operands? I haven’t taken extra that step yet, but I imagine it will be necessary.

Sincerely,

~Dave Waggoner / MathOnNapkins

Hi Dave,

I wasn’t sure whether I was “allowed” to use EXTRACT_ELEMENT either but, as you say, it’s used in other targets and it seems to work so I’ll stick with it.

No, I didn’t use Glue or Chain operands because I’m lowering each shift operation to a single node corresponding to a pseudo-instruction which is expanded into Machineinstrs either by the custom inserter or by a later pass. I think the MSP430 does it in a similar way, the difference being that it always expands shifts using a loop whereas I chose to leave some to be expanded later without use of a loop.

Steve