Handling register allocation on Propeller 2

Thank you for the advice. As for the last one, the P2 is neither superscalar nor OoO, so using the flags as a register for bitwise operations when preparing to branch or conditionally execute some instruction(s) with them is efficient. Thanks for the pointers to other backends, I’m likely going to have to take tips from AMDGPU in several places, as it and the P2 both have very large flat regfiles, and P2 has GPU-like instruction skipping functionality available as a code size optimization utility (unlike AMDGPU it doesn’t have every core on one instruction stream, so it’s purely a speed/space tool.)
As you mentioned, some ISA details would probably help people out here, so I’ll try and summarize:
The P2 is a 32-bit in-order-execution microcontroller architecture, with no caches (SRAM main memory) and 1-16 cores. It does not have atomics, but does have HW locks. It is a RISC/CISC hybrid (read: I dunno how to classify it, but instructions are fixed width.)
All instructions have conditional execution via a 4-bit predicate (which is used as a LUT, the C and Z flags as the index.)
It is a load/store architecture and does not have any instructions that read memory directly as an argument besides load/store type instructions.
It has no FPU, and uses a CORDIC for integer multiply/divide/sqrt/etc.
Further details can be found at https://parallax.com/propeller-2/ if needed, but absolutely not expecting anyone to skim through that.