PGO GC inefficiency due to link.exe's IMAGE_COMDAT_SELECT_ASSOCIATIVE limitation

I have studied the GC state of Windows (IR/frontend) PGO a bit.
Below is my understanding. Hope folks can clarify it I make a mistake:)

For PGO, we need cnts/data/vals/names sections. cnts/data are main
metadata sections. vals is for value profiling.
For Windows, the cnts section is named .lprfc$M and the data section
is named .lprfd$M.
The garbage collection story is unfortunate.

If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external
symbol, MSVC link.exe may report a spurious duplicate symbol error
(error LNK2005), even if the associative section would be discarded
after handling the leader symbol.
lld-link doesn't have this limitation.
However, a portable implementation needs to work around MSVC link.exe.

For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr),
otherwise references to a non-prevailing symbol would cause an error.
Due to the limitation, .lprfd$M has to reside in its own COMDAT, no
sharing with .lprfc$M.

Different COMDAT groups mean that the liveness of one .lprfc$M does
not make its associative .lprfd$M live.
Since a .lprfd$M may be unreferenced, we have to conservatively assume
all COMDAT .lprfd$M live.
Since .lprfc$M input sections parallel .lprfd$M input sections, we
have to conservatively assume all COMDAT .lprfc$M live.
For an external symbol, we use a /INCLUDE: directive in .drectve to
mark it as a GC root.
As a result, .drectve may have many /INCLUDE: directives, just to work
around the link.exe limitation.
IIRC Chrome folks have found that the /INCLUDE: directives can cause
huge memory usage.

Note: for ELF we can use R_*_NONE to establish an artificial
dependency edge between two sections.
I don't think PE-COFF provides a similar feature.

So does link.exe accept feature requests? The external symbol
limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite
unfortunate...

It seems that /OPT:REF has another inefficiency - it cannot discard
non-COMDAT sections.
IIUC Fuchsia folks found that GC on non-COMDAT functions can save a
lot of space (40%? of PGO metadata sections). It is unfortunate that
link.exe doesn't discard non-COMDAT sections...

I have studied the GC state of Windows (IR/frontend) PGO a bit.
Below is my understanding. Hope folks can clarify it I make a mistake:)

For PGO, we need cnts/data/vals/names sections. cnts/data are main
metadata sections. vals is for value profiling.
For Windows, the cnts section is named .lprfc$M and the data section
is named .lprfd$M.
The garbage collection story is unfortunate.

If an IMAGE_COMDAT_SELECT_ASSOCIATIVE section defines an external
symbol, MSVC link.exe may report a spurious duplicate symbol error
(error LNK2005), even if the associative section would be discarded
after handling the leader symbol.
lld-link doesn’t have this limitation.
However, a portable implementation needs to work around MSVC link.exe.

Yes, that’s pretty much how I remember this working.

For a COMDAT .lprfd$M, its symbol must be external (linkonce_odr),
otherwise references to a non-prevailing symbol would cause an error.
Due to the limitation, .lprfd$M has to reside in its own COMDAT, no
sharing with .lprfc$M.

Yes. I’m sure the instrumented code references the counters, so the counters must be external. However, if the _profd symbol is not referenced directly from anywhere, perhaps we could make it internal and have it join the _profc* comdat to reverse the edge. If there are explicit _profd references from code, this won’t work.

Different COMDAT groups mean that the liveness of one .lprfc$M does
not make its associative .lprfd$M live.
Since a .lprfd$M may be unreferenced, we have to conservatively assume
all COMDAT .lprfd$M live.
Since .lprfc$M input sections parallel .lprfd$M input sections, we
have to conservatively assume all COMDAT .lprfc$M live.
For an external symbol, we use a /INCLUDE: directive in .drectve to
mark it as a GC root.
As a result, .drectve may have many /INCLUDE: directives, just to work
around the link.exe limitation.
IIRC Chrome folks have found that the /INCLUDE: directives can cause
huge memory usage.

Yes, I have wanted to implement an LLD-specific extension for this feature. The general problem is that symbol names are long, and we should avoid repeating them in the object file if at all possible. I think llvm.used and attribute((used)) should be compatible with link.exe, but it would be reasonable to make PGO an LLD-only feature to get these object file size savings. Specifically, we can copy the design of .llvm_addrsig to mark symbols as GC roots.

Note: for ELF we can use R_*_NONE to establish an artificial
dependency edge between two sections.
I don’t think PE-COFF provides a similar feature.

Yep, not so far as I am aware.

So does link.exe accept feature requests? The external symbol
limitation in an IMAGE_COMDAT_SELECT_ASSOCIATIVE section looks quite
unfortunate…

I have a few points of contact at Microsoft, but there isn’t a good general way to submit these kinds of low-level ABI feature requests.

The way the feature is designed, it only really works well when the associative data is static, unreferenced, and discovered only by section concatenation and iteration. This supports very many use cases: debug info, unwind info, dynamic initializers, asan global metadata, control flow guard metadata, others. In the PGO use case, I think we would be violating a hidden invariant that all the associative data should be static and unreferenced.

It seems that /OPT:REF has another inefficiency - it cannot discard
non-COMDAT sections.
IIUC Fuchsia folks found that GC on non-COMDAT functions can save a
lot of space (40%? of PGO metadata sections). It is unfortunate that
link.exe doesn’t discard non-COMDAT sections…

This is not a major issue in practice because with -ffunction-sections / -fdata-sections, everything is in a comdat section, they just all use IMAGE_COMDAT_SELECT_NODUPLICATES.

I guess I really only have one idea here: find a way to avoid _profd* references, and make _profc the comdat leader.

Circling back on this. Two patches have landed, decreasing the object
file/linked image sizes for ELF and Windows a bit.

* ⚙ D103355 [InstrProfiling] Delete linkage/visibility toggling for Windows [InstrProfiling] Delete
linkage/visibility toggling for Windows
* ⚙ D103372 [InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat [InstrProfiling] If no value
profiling, make data variable private and (for Windows) use one comdat

Windows still suffers from /INCLUDE:__covrec_DB956436E78DD5FAu strings
in the .drectve section.
Since .lcovfun$M is in a comdat for deduplication, it is subject to
default -opt:ref.
We have to use /INCLUDE because there is no cheaper ELF-style R_*_NONE.