Old: R RX RW(RELRO) RW
New: R(R+RELRO) RX RW; R includes the traditional R part and the
RELRO part
Runtime (before relocation resolving): RW RX RW
Runtime (after relocation resolving): R RX RW
I actually see two ways of implementing this, and yes what you mentioned
here is one of them:
- Move RELRO to before RX, and merge it with R segment. This is what you
said above.
- Move RELRO to before RX, but keep it as a separate segment. This is
what I implemented in my test.
As I mentioned in my reply to Peter, option 1 would allow existing
implementations to take advantage of this without any change. While I think
this optimization is well worth it, if we go with option 1, the dynamic
linkers won’t have a choice to keep RO separate if they want to for
whatever reason (e.g. less VM commit, finer granularity in VM maps, not
wanting to have RO as writable even if for a short while.) So there’s a
trade-off to be made here (or an option to be added, even though we all
want to avoid that if we can.)
Then you probably meant:
Old: R RX RW(RELRO) RW
New: R | RW(RELRO) RX RW
Runtime (before relocation resolving): R RW RX RW
Runtime (after relocation resolving): R R RX RW ; the two R cannot be merged
means a maxpagesize alignment. I am not sure whether you are going to add it
because I still do not understand where the saving comes from.
If the alignment is added, the R and RW maps can get contiguous
(non-overlapping) p_offset ranges. However, the RW map is private dirty,
it cannot be merged with adjacent maps so I am not clear how it can save kernel memory.
If the alignment is not added, the two maps will get overlapping p_offset ranges.
My test showed an overall ~1MB decrease in kernel slab memory usage on
vm_area_struct, with about 150 processes running. For this to work, I had
to modify the dynamic linker:
Can you elaborate how this decreases the kernel slab memory usage on
vm_area_struct? References to source code are very welcomed
This is
contrary to my intuition because the second R is private dirty. The number of
VMAs do not decrease.
- The dynamic linker needs to make the read-only VMA briefly writable in
order for it to have the same VM flags with the RELRO VMA so that they can
be merged. Specifically VM_ACCOUNT is set when a VMA is made writable.
Same question. I hope you can give a bit more details.
How to layout the segments if --no-rosegment is specified?
Runtime (before relocation resolving): RX RW ; some people may be
concered with writable stuff (relocated part) being made executable
Indeed I think weakening in the security aspect may be a problem if we are
to merge RELRO into RX. Keeping the old layout would be more
preferable IMHO.
This means the new layout conflicts with --no-rosegment.
In Driver.cpp, there should be a “… cannot be used together” error.
Another problem is that in the default -z relro -z lazy (-z now not
specified) layout, .got and .got.plt will be separated by potentially huge
code sections (e.g. .text). I’m still thinking what problems this layout
change may bring.
Not sure if this is the same issue as what you mentioned here, but I also
see a comment in lld/ELF/Writer.cpp about how .rodata and .eh_frame should
be as close to .text as possible due to fear of relocation overflow. If we
go with option 2 above, the distance would have to be made larger. With
option 1, we may still have some leeway in how to order sections within the
merged RELRO segment.
For huge executables (>2G or 3G), it may cause relocation overflows
between .text and .rodata if other large sections like .dynsym and .dynstr are
placed in between.
I do not worry too much about overflows potentially caused by moving
PT_GNU_RELRO around. PT_GNU_RELRO is usually less than 10% of the size of the
RX PT_LOAD.
This would be a somewhat tedious change (especially the part about having
to update all the unit tests), but the benefit is pretty good, especially
considering the kernel slab memory is not swappable/evictable. Please let
me know your thoughts!
Definitely! I have prototyped this and find ~260 tests will need address changing…