I've been digging into COMDAT with regular LTO, specifically in the
context of the LLVM gold plugin. The GCC WHOPR documentation specifies
that the linker will resolve all COMDAT groups to the IR-provided
definitions, if available. Additionally it specifies that "When the
WPA phase produces the definition of the COMDAT symbol in a new object
file, that definition should not be in a COMDAT group."
(whopr/driver - GCC Wiki)
The gold LTO plugin does not currently remove COMDAT groups from
symbols emitted to the new object file. This doesn't actually seem to
matter, as the linker ignores the COMDAT groups. However, it might be
nice to strip the COMDAT group early in module linking since we know
we won't need them.
Do other linkers that use the LTO api also resolve COMDAT symbols to
the LTO definition? Would it be possible to drop COMDAT groups for
prevailing symbols, at least when linking with the gold plugin?
+rafael for comments on lld, which is the other linker using the LTO API and +pcc for LTO handling
Any change we make wouldn’t be gold-plugin specific - that’s all in the underlying LLVM IR linking and LTO handling. Currently, the LTO API which is guided by linker symbol resolution does strip non-prevailing symbols from COMDATs. However, as you note, prevailing COMDAT are left in place. Is your concern just for gcc LTO compatibility?
There are some other places in LLVM where we check for COMDAT membership, but it is mostly around ensuring that DCE and similar optimizations don’t yield incomplete COMDAT. I suppose for LTO since the linker has already decided to drop other COMDAT copies we could presumably remove these prevailing symbols from their the COMDAT group and optimize the symbols normally?
I’m not really concerned about GCC LTO compatibility, myself. It seems that gold doesn’t actually care that LTO output still has COMDAT groups. I was more thinking about any optimizations that might not be possible due to COMDAT, but it seems like that might be a non-issue. While it might be nice to drop COMDAT just for simplicity, that could be more trouble than it’s worth.
If we could drop COMDAT, it would simplify the way I’m trying to control function layout for pagerando (https://reviews.llvm.org/D37581#inline-344896). However, if other linkers don’t resolve COMDAT in the same way and this isn’t an option, I’ll try to figure out another way to deal with laying out COMDAT functions.