Backend development for unusual target

Hi all (and apologies if this is in the wrong group),

I’m working with a new and somewhat unusual target: it has very limited general purpose registers (3), nearly unlimited memory (but each memory address can only be written once), and instead of working on bits, it works on a large prime field. Without going into details, it can do things such as work on bits, with help from an “advisor,” but it is less efficient. Now, I’d like to be able to compile a higher-level language to this ISA so that I and others can more easily write software targeting it. As I don’t have much experience with LLVM, I am not sure how difficult writing a backend targeting this ISA would be. As an alternative, I also imagined having the higher-level language compile to a RISC architecture (such as ARMv7), and then writing a compiler of my own (along with a runtime) to compile the ARM code into this ISA. Does this sound like a more straightforward task than writing a new LLVM backend?

Thanks, and let me know if any additional information would help.

3 GPRs is on the small side, but probably enough that you wouldn’t need any exotic register allocation strategy.

“each memory address can only be written once” sounds problematic; I’m not sure how you translate regular code to such a target, unless I’m misunderstanding how you expect it to work. LLVM code generation can’t work without a mutable stack.

LLVM backends are pretty complicated. If translating ELF binaries is sufficient for your work, I’d recommend it, I think; it’s probably easier, and amount of work required is a lot more predictable. If you’re not sure target to choose to translate, I’d probably recommend RISC-V. (ARMv6M is probably not that much harder, but might be a bit tricky to handle the flags register. ARMv7 has a much larger instruction set.) Depends on your goals, of course; if you want to learn more about LLVM, writing a backend is a good way to do that.

Thanks so much for the reply! This is very helpful.

Regarding the stack, we do have a workaround for this. The architecture does have a stack pointer register. The stack itself is implemented as a linked list, so if we need to push an item we just take the last unwritten memory cells and write (stack pointer, new item) and then change the stack pointer to point at new item. Is this (effectively) mutable stack sufficient for LLVM, or will it have problems with only being able to write to a memory address once? We do have a workaround for this to simulate full read-write memory, but it is relatively inefficient.

If you’re not sure target to choose to translate, I’d probably recommend RISC-V. (ARMv6M is probably not that much harder, but might be a bit tricky to handle the flags register. ARMv7 has a much larger instruction set.) Depends on your goals, of course; if you want to learn more about LLVM, writing a backend is a good way to do that.

Thank you for this suggestion–again, very helpful.

Have a great weekend

The two biggest issues I can think of related to the stack:

  1. Local variables (“alloca” in LLVM IR). Code in most languages that target LLVM IR will naturally have these. SROA will eliminate some of these, but if the address is taken, they can’t be eliminated in general. Optimizations and SelectionDAG lowering can also generate them in some obscure cases.

  2. Register allocation spill slots. The allocator code normally assumes spill slots can be written multiple times/places, and fixing that probably requires big changes to the way register allocation works. You might need to write your own register allocator.

Thanks for the help again. I’ve been moving forward with the idea of RISC-V binary translation and had another question–due to the peculiarities of the new target, it would be really helpful if memory accesses were always aligned on 4 bytes. I noticed that LLVM supports a data layout string, which I can easily pass to the Rust compiler (the frontend that I am mostly using). My question is this–is it that simple? Can I just tell LLVM to, for instance, align i8 on 4 bytes? Or is that going to interfere with the backend expecting something else?

Thanks again!

It’s pretty strongly baked in to LLVM IR optimizations that it’s possible to load and store i8 values (or at the C level, CHAR_BIT == 8). Messing with the datalayout won’t give you the result you want.

That said, some small targeted compiler changes might let you reduce the number of i8 operations generated in specific cases.