(noet that llvm has an “integrated assembler” for some targets - so in some cases no assembly code is generated (in the textual sense, at least) - it goes straight to object code. You can look in MC*Streamer (MCAsmStreamer is for textual assembly, MCObjectStreamer is for the integrated straight-to-object-code assembler) - and how it handles the current COFF/MachO/ELF formats, and you could add your own format there)
If you’re interested in ideas: I’d certainly love to see a format that does a better job about symbol names (symbol names for large template heavy code can make 10-20% of object size) - keeping the symbols compressed and using maybe uniform-sized symbol hashes for linker symbol resolution could save some time in the linker/save object size (then only decompress the mangled names for linker error messages, or for linker-exported symbol names)
& could also build the format with something more like MachOs “subsections via symbols” to save space compared to ELF’s section overhead per function (if you’re using -ffunction-sectinos/-gc-sections) though also probably needs “alt entry” for cases where you do want a symbol not to break up a contiguous region.
I’d suggest that there could be a stronger distinction between relocatable and executable formats. What’s good for the former (the kinds of things @dblaikie mentioned) aren’t necessarily of any use at all to the latter. An executable format would really want to be optimized for fast loading into a process, IMO. Back in the day I worked on a system where process pages could be mapped directly from the executable, and therefore not require a separate backing store (not take up space in the page/swap file). I can’t say I’ve studied this aspect of ELF closely but my impression is that while you could set it up that way, it’s not at all required, and so loaders don’t take advantage. (Someone who knows more about ELF in its executable persona might well correct me on this.)
I am interested in something that is actually not space efficient but chooses to optimize access, UX and exploration.
I am most interested in relocatable objects because there is more metadata to expose and work with such as the relocations themselves. As @pogo59 mentioned, when it gets to the executable, the final product is closer to be directly mmap’d.
If there is a particular file that would help me explore trying new object file formats please let me know.
My plan of attack so far was:
look at llvm-mc
add a new “filetype” to it’s support and work there
I can then use clang or whatever to generate the machine code with “-S” and pipe it to llvm-mc.
(Bonus points if I can figure out how to directly wire Clang through to the changes I introduce to llvm-mc itself and then can save myself figuring out multiple individual commands)
Yes it is in main, but it is definitely unsuitable for use with targets other than DirectX. It is just simple enough that it might give you a roadmap, but DirectX doesn’t encode instructions, so it isn’t full featured enough to be useful for CPU architectures.