Hi all,
We have been working on implementing ELF’s .addrsig
format – originally outlined by @pcc here – for the MC Mach-O backend. However, we’ve run into some issues getting it to work properly. The Mach-O streamer seems to be structured quite differently from the ELF one, and our experience working with MC is limited, so I was hoping we could get some tips on how to handle this. I also have some suggestions for alternative formats of encoding the data, which might be simpler to implement.
The Problem
First, a quick recap of the format: We have symbol table indices stored as as ULEB128-encoded values in an __llvm_addrsig
section. These indices tell the linker that those symbols should never be merged during code folding. Since we are using ULEB128, the size of this addrsig section depends on the actual index values, so we can’t determine its final size until we have assigned those indices.
The problem with Mach-O addrsig generation is that MCAssembler::layout()
runs before symbol table indices are assigned; this happens in MachObjectWriter::computeSymbolTable()
. But layout
needs to know the size of the sections to work properly, leaving us in a bit of a circular bind.
I think MC’s ELF backend seems to sidestep this problem because it computes section offsets separately from MCAsmLayout, but the Mach-O backend doesn’t seem to do this. I might be missing something obvious though given my unfamiliarity with the code. Pointers will be highly appreciated!
Possible Alternative Encodings
I’ve been wondering if we could mark address-significant symbols within the symbol table itself, instead of using an auxiliary section. In particular, we could set a bit in the nlist::n_desc
field for this. Alternatively, we could maybe reuse the REFERENCED_DYNAMICALLY
bit and combine it with N_PEXT
to indicate symbols whose addresses are significant, but which can still be removed by llvm-strip
. Either way, this will probably need Apple’s stamp of approval. @cachemeifyoucan, do you have any thoughts on this?