I don't think our problem is in the way that we define our instructions
nor even will it be resolved by removing Mem2Reg optimizations. As Dan
says, Mem2Reg is the prerequisite for so many other optimizations that
we can't afford to loose it; in fact removing Mem2Reg helps in some
cases, but in few cases even increases the code size.
I think the answer is in the scheduler. Currently the LLVM scheduler
tries to reduce the register pressure on the aggregate of operations in
one basic block and leaves the rest to the register allocator to do it
magic (at least that is how I understand it); however, for an 8-bit
device with only one register, there isn't much that the register
allocator can do, hence increasing the number of spills.
What I think we should do is to add a new scheduling mode where the
scheduler tries to keep all operations on one dataflow path together;
kind of like what one would do for a stack based machine.
Now this stack-based scheduler mode is what I've been thinking of
adding, but I need more clues into the how-to of it and what it will
affect as far as other pieces of LLVM. Any kind of input with this
regard is appreciated.
Dan Gohman wrote:
Have you ever investigated the following approach? Define fake
register+register forms of common instructions, in addition to the
register+memory forms. Let the instruction selector work as if
everything were in registers. Then, since there's only one physical
register, the register allocator will have to spill, and the spills
and reloads can be folded in, eliminating the take register+register
forms. You might need special handling for the case where both
operands are the same.
If this works well enough, it would allow your target to be less
strange from LLVM's perspective. Fewer things would need to be
Custom-expanded (e.g. ADD), and it may even allow you to actually
run more of the optimizer (since without mem2reg, much of the
optimizer is effectively disabled).
I remember that you had suggested this in one of earlier emails as
well, which I lost. And I was desperately searching for that email. Glad
that you put up it again.
The approach actually sounds better as it will drastically simplify
the back-end code. But I was clueless as to how to make register
allocator fold the spills and reloads into the actual target
instructions. The only interfaces that it exposes are saveRegToStackSlot
and loadRegFromStackSlot, and we didn't even know for which instructions
these spills are reloads are happening. All these APIs get is a
Now that you have decided to get us to explore a better path, it would
be good if you could put more light to these issues.
The main API hooks here are TargetInstrInfo::foldMemoryOperandImpl;
a FrameIndex form and a generic load form.
To be sure, I don't know if this kind of approach will work well. But if
does, it could help make PIC16 less different from other targets in
One more thing that I feel will simplify things in a great sense is to
make i16 legal (as it would make the pointer legal) and there onwards
lower the types/operations ourselves to 8-bit (as type legalizer
wouldn't do that). By doing that we would pretty much need to duplicate
the legalizer code in our back-end as the TypeLegalizer interfaces
currently are not exposed to TargetLowering. Or can a back-end just
create an instance of Type Legalizer and use it?
I don't have anything to suggest here.