eh_frame sections emitted by Clang use an indirect encoding for the personality pointer and typeinfo references (to avoid text relocations). I noticed a few potential issues with these references; see Compiler Explorer for an example:
- The typeinfo pointers (
.L_ZTIi.DW.stubin my example) are emitted directly under the.datasection. As far as I understand, this means they won’t be deduplicated across compile units. I don’t think they need to be writable either. - The personality pointer (
DW.ref.__gxx_personality_v0) is emitted in its own COMDAT section, so it’ll be deduplicated, but it’s still being emitted in the writable.datasection. - On aarch64, the indirect references to these pointers are encoded as
sdata8, because code and data can be up to 4 GB away even in the small code model.
Locally, I’ve experimented with using GOTPCREL relocations for the personality and typeinfo references inside eh_frame, instead of creating our own stubs. For AArch64, the GOTPCREL relocation type was defined in [aaelf64] Define GOT-Relative data relocation by PiJoules · Pull Request #223 · ARM-software/abi-aa · GitHub (to support the relative vtables work); x86-64 and RISC-V also support an equivalent relocation type, and I believe R_ARM_GOT_PREL serves the same purpose for 32-bit ARM. In all cases, the relocation causes a GOT entry to be created for the referenced symbol, and the relocated place is filled in with the 32-bit offset to the GOT entry, which is compatible with the DW_EH_PE_indirect encoding being used. I believe this solves all my issues:
- We remove all duplication by having a single GOT entry.
- The GOT is read-only after relocation, which removes a writable indirect pointer.
- We can switch aarch64 to
sdata4, since the offset to the GOT entry is 32-bit. I’m not actually 100% sure about this; the AArch64 SysV ABI says that the “definition of the text segment” (which is limited to 2 GiB in all code models) “includes the shareable PLT, code and read-only data sections”. The GOT is only read-only after relocations are applied, so I’m not completely sure it counts, but the GOTPCREL relocation relies on this, so I assume it’s okay.
I’ve successfully prototyped this for our Android arm64 applications. I’ve observed no runtime issues, and we reduce both binary size and the number of dynamic relocations (which is important for startup time). My prototype-quality patch can be seen here; it’s limited to aarch64 under an option to ease testing, but in theory it should apply to any architecture which supports a GOTPCREL-like relocation.
Are there any problems with this approach that I’m not considering? If not, are there any objections to adding an option to change Clang’s eh_frame emission to use GOTPCREL relocations? I do believe we’d want to make this optional, because e.g. I don’t believe the bfd linker supports GOTPCREL relocations for aarch64, but we could possibly default to it under certain circumstances (e.g. when targeting Android, where LLD is the only supported linker, or when using relative vtables, which also rely on GOTPCREL relocations being supported).