Handling labels, symbols, and fixups with MIPS16

Hi all!

For some time now, I’ve been working on improving LLVM’s support for MIPS16 assembly. If you aren’t familiar, MIPS16 was a separate execution mode in MIPS32 and MIPS64 CPUs that provided a 16-bit subset of instructions to run, similar to Arm’s original Thumb mode. Many instructions have an “extended” 32-bit form as well with the same mnemonic; the main difference being the size of the immediate value that can be supplied. For example, the unconditional branch instruction “b” can be 16- or 32-bit depending on the size of the offset.

b 162 ← this is a 16-bit instruction
b 32000 ← this is a 32-bit instruction

For most instructions, this isn’t a problem because LLVM (or TableGen, I guess) can handle this more or less by trying the smaller one first and using the bigger one if the immediate won’t fit. I actually now have many instructions recognized and being encoded as I’d expect–even the weird “save/restore” instructions-- which is great! However, now I’ve come across an issue I don’t know how to handle. It occurred to me that I don’t really know what I need to do to handle labels or symbols as branch (or load/store) targets. That is, what do I do with:

b some_label ← is 16-bit or 32-bit??

It’s not clear to me how I can deal with these since which instruction to use would be ambiguous until the value of the label is known. From what I can tell, LLVM does try to evaluate these when possible (MCExpr::evaluateAsAbsolute()), so I guess in that case I don’t have anything extra to do. If LLVM can’t evaluate the label right there, it looks like I would add a “fixup” and handle that later with other code. I’m using the microMIPS code as a reference and this is what appears to happen in MipsMCCodeEmitter::getBranchTargetOpValue()).

What could I do about the 16 vs 32-bit ambiguity? Would I use the 32-bit version by default and then somehow shrink it to 16-bit later? Can I add a “minify” step to do that? I suppose I could just decide that if a fixup is needed, then you’re getting stuck with the bigger instruction. This actually wouldn’t be a bad option if LLVM can evaluate most labels right then and there, but I don’t know when that happens.

Also, is there a good place to read up to better understand what all of the fixup types mean? The function “MipsMCCodeEmitter::getExprOpValue()” appears to handle a lot of them already, but there are MIPS16-specific fixup values that are not supported, so I’ll have to add those there and to other functions that manually add them.

Oh! If you’re curious why I’m trying to add support for a dead extension of a dead architecture (MIPS the company has even given up on MIPS the architecture!), it’s because I want to be able to use LLVM to build for the Microchip PIC32 series of MIPS-based microcontrollers.

Thanks for any help! Of course, let me know if I’m missing some documentation that clearly explains all of this! :grin:

A few different backends have branches like that. Generally, you start with the small instruction, and relax to the larger instruction.

MC has builtin support for this sort of late relaxation. The functions you want to override are MCAsmBackend::mayNeedRelaxation and MCAsmBackend::relaxInstruction. ARM and x86 have straightforward implementations.

Excellent, thanks for the help! I’ll have a look at those backends.