This is a question about optimizing the code generation in a (new) Z80 backend:
The CPU has a couple of 8 bit physical registers, e.g. H, L, D and E, which are overlaid in 16 bit register pairs named HL and DE.
It has also a native instruction to load a 16 bit immediate value into a 16 bit register pair (HL or DE), e.g.:
Now when having a sequence of loading two 16 bit register pairs with the *same* immediate value, the simple approach is:
However, the second line can be shortened (in opcode bytes and cycles) to load the overlaid 8 bit registers of HL (H and L) into the overlaid 8 bit registers of DE (D and E), so the desired result is:
; optimized version: saves 1 byte and 2 cycles
LD D,H (sets the high 8 bits of DE from the high 8 bits of HL)
LD E,L (same for lower 8 bits)
Another example: If reg pair DE needs to be loaded with imm16 = 0, and another physical(!) register is known to be 0 (from a previous immediate load, directly or indirectly) - assuming that L = 0 (H might be something else) - the following code:
I would expect that this needs to be done in a peephole optimizer pass, as during the lowering process, the physical registers are not yet assigned.
Now my question:
1. Is that correct (peephole instead of lowering)? Should the lowering always emit the generic, not always optimal "LD DE,<imm16>". Or should the lowering process always split the 16 bit immediate load in two 8 bit immediate loads (via two new virtual 8 bit registers), which would be eliminated later automatically?
2. And if peephole is the better choice, which of these is recommended: the SSA-based Machine Code Optimizations, or the Late Machine Code Optimizations? Both places in the LLVM code generator docs say "To be written", so I don't really know which one to choose... or even writing a custom pass?
...and more importantly, how would I check if any physical register contains a specific fixed value at a certain point (in which case the optimization can be done) - or not.