Hi, I’m creating a backend for my own custom virtual processor, and now that the plain text assembly output with llc works, I’d like to get clang to output my custom executable format. Thus my question: How would I go about doing this? Do I need a custom MCObjectWriter, MCObjectStreamer, etc., or changes to lld? Can I reuse the existing linker instead of writing my own? Ideally, I’d like to use the custom object format only for the final executable, not for intermediate object files, I don’t care about the format of these, as long as I can compile a program containing multiple source code files and produce the custom executable in the end. I don’t need support for dynamic libraries (or static libraries, for that matter).
I really hope someone can give me some hints, thanks!
LLVM supports several different object formats: ELF, COFF, MachO, and probably some variants. If you are happy with one of those formats for your relocatable object files, then the only compiler changes would be to have your target hook up the correct object file format.
If you want a custom executable format, I can see two options. One, try to leverage an existing linker to emit your custom format. Two, use an existing linker to output a standard format, then write a conversion tool. This means an extra post-link processing step, but might be simpler in the short term.
So this first method would need changes in ./lld/, I guess? In this case, will (could) the intermediate object files still be ELF or something? (Or is there some more neutral format?) Not sure if LLD has some intermediate representation after it links the object files, such that this could be easily transformed into any executable.
Does LLD also support this? I can’t find it in ./lld/.
I see, because it wants to output something like LLVM IR I guess. In this case, would it be impossible to compile a project with multiple source (e.g. .c) files, because the linker is skipped, if I understand correctly? Also, I’d like to have absolute memory addresses (e.g. for globals) in the final executable, so I guess I’d need the linker for that, right?
If you want a custom object-file format, you will need to modify something. I’m not really familiar with the internals of LLD, so I couldn’t say whether it would be better to have the compiler emit a new format and add a whole new “flavor” to LLD, or have the compiler emit an existing format and persuade LLD to turn that into a different output format. But if you want the final executable to be in a custom format, you will need to do one of those things.
In the end, it turns out that adding support for a new binary format requires a lot of code changes. Notably, lld has almost no shared code between different binary formats so adding a new one requires re-implementing everything, including symbol resolving and relocations.
Hence, I opted to just use ELF (adding support for the relocations for my architecture), and then create a separate tool to convert that to the output I want (currently I integrated it with llvm-objcopy but I think I might separate it). This way, I can just use the existing ELF lld code for creating the object files and resolving symbols and (static) relocations.
This required doing the following (XXX = your target):
Set your default binary format to ELF in getDefaultFormat in llvm/lib/TargetParser/Triple.cpp
llvm/include/llvm/BinaryFormat:
Add your architecture (EM_XXX) to the Machine architectures enum in ELF.h
Add your relocation types to ELFRelocs/XXX.def (for my simple architecture I only needed R_XXX_NONE and R_XXX_32)
Add the relocation types to ELF.h using enum { #include "ELFRelocs/XXX.def" };
Add the relocation types to llvm::object::getELFRelocationTypeName in llvm/lib/Object/ELF.cpp
llvm/lib/Targets/XXX/MCTargetDesc:
Add XXXELFObjectWriter.cpp, inheriting from MCELFObjectTargetWriter
This needs to implement getRelocType. For me it seems to be enough to just translate the generic FK_Data_4 into my ELF::R_XXX_32 relocation, as I only have one constant size and no PC-relative relocations
Make XXAsmBackend::createObjectTargetWriter return your XXXELFObjectWriter
Also, you need to implement fixupNeedsRelaxation and applyFixup. For me, fixupNeedsRelaxation just returns false and applyFixup shouldn’t need to do anything (Value is always 0), because I set HasRelocationAddend to true in the constructor of my XXXELFObjectWriter, so the addend doesn’t need to be written inline in the code, which is what applyFixup would do
In LLVMInitializeXXXTargetMC, call TargetRegistry::RegisterELFStreamer with a function that calls createELFStreamer
lld/ELF:
Add Arch/XXX.cpp, where getRelExpr for me just returns R_ABS for my R_XXX_32, because I do not support PC-relative addressing, and relocate applies the relocation, for me checkUInt + write32le. You may want to support getImplicitAddend as well (e.g. read32be) to read an implicit written by XXXAsmBackend::applyFixup
Add a case for your target in elf::getTarget in Target.cpp
And then we have the conversion tool, which I currently implemented in llvm-objcopy
Regarding relocations, the generic fixup kind (e.g. FK_Data_4) usually originates from your XXXMCCodeEmitter, which can add fixups for MCExprs during encodeInstruction. These MCExprs originate from XXXAsmPrinter::emitInstruction.
Currently I am working on refactoring in these modules to do EXACTLY the same task. The aim is to make implementation of new targets easier and maintainable in general.