ICF: Identical Code Folding
Linker deduplicates functions by collapsing any identical functions
together - with icf=safe, the linker looks at a .addressing section in the
object file and any functions listed in that section are not treated as
collapsible (eg: because they need to meet C++'s "distinct functions have
distinct addresses" guarantee)
The name originated from MSVC link.exe where icf stands for "identical COMDAT folding".
gold named it "identical code folding" - which makes some sense because gold does not fold readonly data.
In LLD, the name is not accurate for two reasons: (1) the feature can
apply to readonly data as well; (2) the folding is by section, not by function.
We define identical sections as they have identical content and their
outgoing relocation sets cannot be distinguished: they need to have the
same number of relocations, with the same relative locations, with the
referenced symbols indistinguishable.
Then, ld.lld --icf={safe,all} works like this:
For a set of identical sections, the linker picks one representative and
drops the rest, then redirects references to the representative.
Note: this can confuse debuggers/symbolizers/profilers easily.
lld-link /opt:icf is different from ld.lld --icf but I haven't looked
into it closely.
I find that the feature's saving is small given its downside
(also increaded link time: the current LLD's implementation is inferior:
it performs a quadratic number of comparisons among an equality class):
This is the size differences for the 'lld' executable:
% size lld.{none,safe,all}
text data bss dec hex filename
96821040 7210504 550810 104582354 63bccd2 lld.none
95217624 7167656 550810 102936090 622ae1a lld.safe
94038808 7167144 550810 101756762 610af5a lld.all
% size gold.{none,safe,all}
text data bss dec hex filename
96857302 7174792 550825 104582919 63bcf07 gold.none
94469390 7174792 550825 102195007 6175f3f gold.safe
94184430 7174792 550825 101910047 613061f gold.all
Note that the --icf=all result caps the potential saving of the proposed annotation.
Actually with some large internal targets I get even smaller savings.
ld.lld --icf=safe is safer than gold --icf=safe but probably misses some opportunities.
It can be that clang codegen/optimizer fail to mark some cases as {,local_}unnamed_addr.
I know Chromium and the Windows world can be different:) But I'd still want to
get some numbers first.
Last, I have seen that Chromium has some code like
https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory_new_handler.cpp
void sk_abort_no_print() {
// Linker's ICF feature may merge this function with other functions with
// the same definition (e.g. any function whose sole job is to call abort())
// and it may confuse the crash report processing system.
// 860850 - chromium - An open-source project to help move the web forward. - Monorail
static int static_variable_to_make_this_function_unique = 0x736b; // "sk"
base::debug::Alias(&static_variable_to_make_this_function_unique);
abort();
}
If we want an approach to work with link.exe, I don't know what we can do...
If no desire for link.exe compatibility, I can see that having a proper way marking the function
can be useful... but in any case if an attribute is used, it probably should affect
unnamed_addr directly instead of being called *icf*.