Randomize offset between program segments?


I'd like to improve address space randomization (ASLR) by randomizing the offset between .text, .data and .bss segments (or more generalized, any program segments). With large code generation model (-mcmodel=large) on AMD64, the offset could be very large, but even with the default model, the segments could be randomized within range of RIP-relative accesses (+/-2GB). Currently the dynamic loader can't randomize the segments (nothing also tells it if this would be OK) so it maps them next to each other, which is predictable and boring.

For this to happen, I think the compiler would have to emit relocations for all cross-segment accesses and probably flagging the shared object somehow. Then, when detecting the flag, the dynamic loader could load the segments at random offsets within 2GB, or if the large model was used in compilation (another flag), anywhere in the available virtual address space (let OS map the segment anywhere by using mmap(NULL,...)).

Perhaps if GOT would be kept within 2GB range, other data segments could still be placed anywhere.

There would be some slowdown because of additional relocations (and the OS would not be happy due to increased VM fragmentation) but I think otherwise nothing should change (the code should be identical). This would be of course an opt-in feature mainly for hardened systems.

So, I wonder how to implement the compiler part. Is this something that could be done easily with LLVM/Clang?



As I understand it, this would require dynamic relocations in what would be the read-only segment in a position independent shared library, which would mean that they would at least need to be RELRO instead of read-only. This would prevent sharing the read-only segment as well.

Leaving aside whether this is a good trade-off or not. To do this practically would probably need something more complicated than just exposing relocations. At the moment the compiler exposes inter-section references with relocations, many of them GOT generating. The static linker resolves these, creating a GOT with a much simpler set of dynamic relocations. In theory you could expose the full set of static relocations to the dynamic linker, but at least on a RISC machine these can be quite fiddly to resolve and there can be a lot of them. I think that you would probably want another code-model that made the dynamic linkers job easier. One possible approach would be to do something like FDPIC (shared libraries on microcontrollers without a MMU) this uses a reserved register to hold the position of the RW segment rather than assuming a fixed PC-relative offset. The value of the reserved register is fixed per-process.

Good luck, I think you'll need input from compiler, static and dynamic linker, and possibly OS people.