I am attempting to create a custom backend for MOS Technology 6502 architecture. I have already created my own simulator for it. Now, I am attempting to create the backend in LLVM.
While following the cpu0 tutorial, it expects that some modification be made to the ELFObjectFile.h:
In a previous question, it has already been asked that to include custom binary format, what needs be done. So, my addition to that question would be, if I want to add a custom binary format, what are the steps, and are they documented somewhere?
This isn’t the kind of every day task that would have a step by step guide readily available. You’d have to follow along with what the existing 3 do and know how and why this new format differs.
What I hope to understand is this: the ELF format is created via the specifications at llvm-project/llvm/include/llvm/BinaryFormat/. The various formats such as ELF.h, COFF.h, GOFF.h, MachO.h, Wasm.h etc. are encoded. So if I simply add the header file for my custom binary format, how to integrate?
The approach: emit known binary format (say ELF32) and converting that to required format makes sense. However, if the architecture I intend to target has smaller (say 8 bit) or bigger word size (say 128 bit), will this approach still work?
Unfortunately I personally cannot help you much with this. If you decide to go this way, you’ll have to implement you own symbol/relocation resolving in lld and you should prepare for adding a lot of files and cases for your object type to random switch statements all over the place (I tried, then decided to not do this).
AFAIK ELF32/ELF64 just influence headers and the symbol table and such, so you’d be limited to an address space of 64 bits, but who needs more RAM than that anyways. 128-bit calculations will work fine, as the code for those will by output by your MCCodeEmitter and ELF will not care about the actual code. However, you will not be able to add 128-bit relocations if you’d need those, unless you split them into two 64-bit relocations. Using 8-bit addresses should be fine, as those fit inside 32 bits. It just means that you’ll just write 8 bits in XXX::relocate in lld when you encounter your 8-bit relocation type. If you need 64-bit addresses, you should use ELF64, which you do by setting Is64Bit to true in the MCELFObjectTargetWriter constructor. (The endianness of your ELF headers is determined by the value for Endian in the MCAsmBackend constructor.)
The answer to this is definitely yes; I maintain just such a backend out of tree
It’s fairly feature-complete, such as it is.
It’s not necessary, but it is largely beneficial. Having a “usual” object file format buys a lot of functionality in command line tools like llvm-nm, objdump, readelf, DWARF, the linker, etc.
Although it predates me, llvm-mos has a pretty fully specified ELF variant for the 6502. It more-or-less treats the 6502 as a 32-bit target, which allows using the high bits of the address for things like bank numbers.
@mysterymath, just a day before you had posted your answer, I accidently stumbled upon your repo. I want to thank you and all collaborators on the llvm-mos project for having created it. I will still try to create my own version, since I want to learn how to; I will use llvm-mos as the reference. I will mark this as the answer as this kind of directly answers my question.