Plan 9 a.out executables with lld

Hi all,

I wish to cross-compile executables for the Plan 9 a.out format using LLVM so that I may compile GNU C for the OS.

So far I have tried to avoid changes to LLVM itself, by using a linker script to produce the a.out format. The format is documented in the man page here: a.out page from Section 6 of the plan 9 manual. It is a few big-endian u32s followed by the aligned sections themselves.

However, it is not possible to my knowledge to build this format using a linker script for little-endian architectures because all of the calculated sizes in the script are emitted as little-endian.

As far as I can tell, my options are:

  • Add a new relocation like R_ABS_BYTESWAP, which takes the calculated value from the linker script and performs a byteswap before inserting it into the final executable.
    I think this is awful.
  • Add a new output format to lld for the Plan 9 a.out format

For the latter, this would require:

  • Adding a new driver which has a Plan 9 linker interface (documented in man 8l(1))

  • Refactor the ELF backend so that the code can be re-used by the Plan 9 a.out backend

    This is because Plan 9 has its own specific object format which differs per architecture, so instead the linker should be invoked like the ELF one, ELF in, Plan 9 out

    Fundamentally, the only differences between the ELF backend are:

    • The writer, the Plan 9 driver could invoke the ELF one and swap out the writer for the a.out format,
    • Plan 9 executables are static-only, so it needs to bail out on any undefined symbols;

I would like to know if my understanding is correct, and if the LLVM lld developers welcome the aforementioned changes, or if there is an alternative approach.

As an aside, it seems this feature is needed by the Zig LLVM backend as well for Tier 1 support: Tier 1 Support for x86_64-plan9 · Issue #7153 · ziglang/zig · GitHub

Thanks,
K

Refactor the ELF backend so that the code can be re-used by the Plan 9 a.out backend

This is unlikely to be a good idea for both ELF and the Plan a.out format.
When the new COFF/ELF/WebAssembly/Mach-O formats were added to lld/, deliberate choices were made to make them share very little code.
Different object file formats have very different semantics, requirements, and prevailing existing implementations.
There is very little to share code.
Refactoring ELF code for the Plan a.out purpose will likely be detrimental to readability to ELF.

If a new object file format is to be added to lld, it seems that the natural requirement is that it has llvm/lib/BinaryFormat and llvm/lib/Object support. If there is no plan to contribute to that portion (“So far I have tried to avoid changes to LLVM itself,”), I think it is pointless to add the port to lld. A standalone linker project will just be more meaningful.

If a new object file format is to be added to llvm, we use Clang’s criteria for extensions Clang - Get Involved
I am unsure whether the Unix a.out format meets the “Evidence of a significant user community” requirement…

If there is no plan to contribute to that portion (“So far I have tried to avoid changes to LLVM itself,”), I think it is pointless to add the port to lld

I should rephrase in that it’s not that I do not plan to contribute to LLVM, but rather that I was trying to solve the problem using existing features rather than making unnecessary additions. The context surrounding adding support for object formats is useful nonetheless, which leads to the next point:

Refactoring ELF code for the Plan a.out purpose will likely be detrimental to readability to ELF

This makes sense. However I’ll expand a bit more: Plan 9 contains a set of architecture-specific compilers which each have their own undocumented object format. These are then passed to the architecture-specific linker which emits the uniform a.out executable. This is unlike ELF, where ELF encapsulates both the object format and executable format, and is consistent across architectures.

For this reason, in my mind, it does not make sense to add the Plan 9 object format to LLVM, instead it is fine to have ELF be the intermediate object format (which LLVM already supports), as long as lld itself can output the a.out for a given architecture.

A standalone linker project will just be more meaningful.

In all, I agree that a standalone linker project seems to be the solution here given the lld situation. However, I am a bit disappointed that linker scripts almost got me to where I wanted to be. That being said, I could not have been the first person to run into an issue of this kind before :thinking:.

An alternative option to altering or making a new linker is an ELF to a.out conversion tool. Assuming that you can get a linker script that is close to the components you’d need for an a.out file then it could be possible to write a tool that extracts the ELF components and writes the equivalent a.out file.

I’d expect something like llvm-objcopy might be a good starting point for such a conversion. The main difficulties are:

  • Dealing with ELF files that aren’t representable in a.out
  • Requires an extra step in build files, although I suppose a wrapper script around the linker could be written.

I found elftoaout: ELF to a.out convertor - Linux Man Pages (1) but this seems to be Sparc only. It does show that it could be possible.

I agree that llvm-objcopy would be a good fit for an ELF to a.out conversion tool. llvm-objcopy already has support for other conversions e.g. “raw” binary and ihex, so it wouldn’t be a big stretch to add support for another thing.

Barring a custom tool, I agree llvm-objcopy seems more appropriate than lld.

However, after re-reading the linker manual, I have figured out that it is possible to achieve the a.out format with just a linker script, using the raw emit directives with a custom byteswap rather than an assembler program with relocations:

LONG(
	(__text_size   & 0x000000ff) << 24
	| (__text_size & 0x0000ff00) << 8
	| (__text_size & 0x00ff0000) >> 8
	| (__text_size & 0xff000000) >> 24
)

I’d argue it’s a hack but it works nonetheless. (Un)Fortunately there seem to be no macros, nor is the script run through the pre-processor beforehand so the linker script ends up repetitive.

You could always run a pre-build step that runs the script through a pre-processor of course.