RFC: ODR checker for Clang and LLD

Hi all,

I’d like to propose an ODR checker feature for Clang and LLD. The feature would be similar to gold’s --detect-odr-violations feature, but better: we can rely on integration with clang to avoid relying on debug info and to perform more precise matching.

The basic idea is that we use clang’s ability to create ODR hashes for declarations. ODR hashes are computed using all information about a declaration that is ODR-relevant. If the flag -fdetect-odr-violations is passed, Clang will store the ODR hashes in a so-called ODR table in each object file. Each ODR table will contain a mapping from mangled declaration names to ODR hashes. At link time, the linker will read the ODR table and report any mismatches.

To make this work:

  • LLVM will be extended with the ability to represent ODR tables in the IR and emit them to object files

  • Clang will be extended with the ability to emit ODR tables using ODR hashes

  • LLD will be extended to read ODR tables from object files

I have implemented a prototype of this feature. It is available here: https://github.com/pcc/llvm-project/tree/odr-checker and some results from applying it to chromium are here: crbug.com/726071
As you can see it did indeed find a number of real ODR violations in Chromium, including some that wouldn’t be detectable using debug info.

If you’re interested in what the format of the ODR table would look like, that prototype shows pretty much what I had in mind, but I expect many other aspects of the implementation to change as it is upstreamed.

Thanks,

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared to other object file data? I assume that if these tables contain full mangled symbol names, they could end up being very large and may want to share the symbol name strings with the overall string table in the .o

Also, do you have any numbers on the performance of your initial implementation?

W.r.t. LLD and having it always on by default (and hence making it as fast as possible), it seems like right now you are implementing the checking process with a hash table. That’s simple and fine for a first implementation, but it’s probably worth mentioning in a comment the problem of checking the tables, at least from the linker’s perspective, does fit into a map-reduce pattern and could be easily parallelized if needed. E.g. a parallel sort to coalesce all entries for symbols of the same name followed by a parallel forEach to check each bucket with the same symbol name (roughly speaking).

Even better than doing it faster is just doing less work. There’s a lot of work that the linker is already doing that may be reusable for the ODR checking.
E.g.

  • maybe we could get the coalescing step as a byproduct of our existing string deduping, which we are generally doing anyway.
  • we are already coalescing symbol names for the symbol table. If the ODR table is keyed off of symbols in the binary that we are inserting into the symbol table, then I think we could do the entire ODR check with no extra “string” work on LLD’s part.

I see Rui already mentioned some of this in https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a symbol (such as types), but I think that it would really simplify things if the check was done as such. Do you have any idea exactly how much of the things that we want to check are not keyed on symbols? If most things are keyed on symbols, for the things we are not we can just emit extra symbols prefixed by _clang_odr_check or whatever.

The issue of retaining the ODR check for functions even if they get inlined may inherently pose an extra cost that can’t be folded into existing work the linker is doing, so there might be a reason for clang to have a default mode that has practically no linking overhead and one that does more thorough checking but imposes extra linking overhead. Think something like a crazy boost library with thousands of functions that get inlined away, but have gigantic mangled names and so precisely are the ones that are going to impose extra cost on the linker. Simply due to the extra volume of strings that the linker would need to look at, I don’t think there’s a way to include checking of all inlined function “for free” at the linker level using the symbol approach.

I guess those inlined functions would still have those symbol names in debug info (I think?), so piggybacking on the string deduplication we’re already doing might make it possible to fold away the work in that case (but then again, would still impose extra cost with split dwarf…).

Anyway, let’s wait to see what the actual performance numbers are.

– Sean Silva

Does this need LLVM support - or is there some generic representation that could be used instead? (I guess LLVM would want to be aware of it when merging modules though, so maybe it’s worth having a first-class representation - though LLVM module linking could special case a section the same way the linker could/would - not sure what’s the better choice there)

I was thinking (hand-wavingly vague since I don’t know that much about object files, etc) one of those auto-appending sections and an array of constchar*+hash attributed to that section. (then even without an odr-checking aware linker (which would compare and discard these sections) the data could be merged & a post-processing pass on the binary could still point out ODR violations without anything in the toolchain (except clang) needing to support this extra info)

As to the performance on the linker side, one thing I’d like to note about is that we might be able to use some probabilistic approach (e.g. a bloom filter-ish data structure). In that sense, .odrtab doesn’t have to contain complete information to detect ODR violations. Instead, it can contain hints. If the table allows us quickly verify that there’s no ODR violation with 99.99% probability for each identifier, for example, then we can fall back to the debug info to see if the remaining 0.01% are real, and the cost of false positive is probably negligible.

I can imagine for example that we can store a 32-bit hash for each mangled name and compare hashes instead of large strings.

As to the performance on the linker side, one thing I’d like to note about is that we might be able to use some probabilistic approach (e.g. a bloom filter-ish data structure). In that sense, .odrtab doesn’t have to contain complete information to detect ODR violations. Instead, it can contain hints. If the table allows us quickly verify that there’s no ODR violation with 99.99% probability for each identifier, for example, then we can fall back to the debug info to see if the remaining 0.01% are real, and the cost of false positive is probably negligible.

The main(only?) place where there are false positives are when there are mixed binaries containing both Clang and GCC object files which take different choices about exactyl where the starting line of a function is, for example.

So perhaps there wouldn’t be false positives if the only thing that were considered were Clang built binaries (those containing this special odrtab section). Though the gold ODR checker may’ve been more restrictive it what it caught - with the ODR hashes Clang’s creating this can be a far more aggressive ODR checking - and if it always fell back to debug info, there might be many false negatives? (these two hashes are different for the same mangled name, but the debug info doesn’t encode that difference - so the diagnostic would not be emitted)

Falling back to debug info also means a lot more code in LLD to support doing that analysis on the debug info.

I can imagine for example that we can store a 32-bit hash for each mangled name and compare hashes instead of large strings.

Many of the mangled names (those of functions) may already be present in strtab? Would it be valid/reasonable for this section to refer to those strings to avoid some duplication?

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared to
other object file data? I assume that if these tables contain full mangled
symbol names, they could end up being very large and may want to share the
symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion (alongside
a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard compression
algorithm like gzip.
This wouldn't seem to affect link time performance much because I think we
should only need to look at the strings if we see a ODR name hash match
together with an ODR hash mismatch, which would mean an ODR violation with
a high probability (i.e. unless there was an ODR name hash collision, we
have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.

W.r.t. LLD and having it always on by default (and hence making it as fast

as possible), it seems like right now you are implementing the checking
process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach where at
compile time we sort ODR names by hash and partition them using (say) the
upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a lot of

work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the ODR
table is keyed off of symbols in the binary that we are inserting into the
symbol table, then I think we could do the entire ODR check with no extra
"string" work on LLD's part.

I see Rui already mentioned some of this in https://bugs.chromium.org/p
/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is basically
zero overlap right now between ODR names and symbols. I suppose that I
could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

The issue of retaining the ODR check for functions even if they get
inlined may inherently pose an extra cost that can't be folded into
existing work the linker is doing, so there might be a reason for clang to
have a default mode that has practically no linking overhead and one that
does more thorough checking but imposes extra linking overhead. Think
something like a crazy boost library with thousands of functions that get
inlined away, but have gigantic mangled names and so precisely are the ones
that are going to impose extra cost on the linker. Simply due to the extra
volume of strings that the linker would need to look at, I don't think
there's a way to include checking of all inlined function "for free" at the
linker level using the symbol approach.

I guess those inlined functions would still have those symbol names in

Does this need LLVM support - or is there some generic representation that
could be used instead? (I guess LLVM would want to be aware of it when
merging modules though, so maybe it's worth having a first-class
representation - though LLVM module linking could special case a section
the same way the linker could/would - not sure what's the better choice
there)

The only thing that LLVM needs to do is to have some way to store a blob
that will be emitted to the object file. In my prototype I just create a
GlobalVariable with private linkage, I think in the final version I will
use an MDString referenced by a named MD node. I don't think we would want
a higher level representation -- I'd imagine that the blob would be
entirely a property of the source code, so I can't see anything that an
IR-level pass would want to do with it. It's similar to some parts of debug
info in that there's no real benefit to representing it as anything other
than a blob.

I was thinking (hand-wavingly vague since I don't know that much about

object files, etc) one of those auto-appending sections and an array of
constchar*+hash attributed to that section. (then even without an
odr-checking aware linker (which would compare and discard these sections)
the data could be merged & a post-processing pass on the binary could still
point out ODR violations without anything in the toolchain (except clang)
needing to support this extra info)

Linkers merge section contents by section name, so you wouldn't need
anything other than for the object files to agree on a section name. The
odrtab header in my prototype has a size field, so we could use that to
split an .odrtab section into multiple odrtabs.

Peter

Peter Collingbourne via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi all,

I'd like to propose an ODR checker feature for Clang and LLD. The feature
would be similar to gold's --detect-odr-violations feature, but better: we
can rely on integration with clang to avoid relying on debug info and to
perform more precise matching.

It seems a really nice idea. Do you have performance numbers?

I agree with others that suggested putting the strings in .strtab. For
-O0 at least it should save many duplicates.

Cheers,
Rafael

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared to
other object file data? I assume that if these tables contain full mangled
symbol names, they could end up being very large and may want to share the
symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion
(alongside a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard compression
algorithm like gzip.
This wouldn't seem to affect link time performance much because I think we
should only need to look at the strings if we see a ODR name hash match
together with an ODR hash mismatch, which would mean an ODR violation with
a high probability (i.e. unless there was an ODR name hash collision, we
have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Neat observation!

FWIW, it is a birthday problem type situation though, so for a 32-bit hash,
we would expect a collision in about 1 in 2^16 distinct hashes (and 2^16
seems pretty easy to hit in a large project). So 64-bit hashes might be
preferable.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.

I know that things aren't currently apples-to-apples, but how does that
compare to gold?

W.r.t. LLD and having it always on by default (and hence making it as fast

as possible), it seems like right now you are implementing the checking
process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach where at
compile time we sort ODR names by hash and partition them using (say) the
upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a lot of

work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the ODR
table is keyed off of symbols in the binary that we are inserting into the
symbol table, then I think we could do the entire ODR check with no extra
"string" work on LLD's part.

I see Rui already mentioned some of this in https://bugs.chromium.org/p
/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is
basically zero overlap right now between ODR names and symbols. I suppose
that I could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

Wouldn't any function or symbol using the record type have the type name
somewhere in it? If we used an offset+length encoding (instead of offset +
NUL termination) we might be able to reuse it then (at some cost in finding
the reference). With debug info surely there is some sort of string
representing the record name or something like that, no?

I guess we may have to have our "low-overhead" user-facing behavior be a
bit more nuanced. E.g.:
1. does this feature bloat object files significantly
2. does this feature slow down link times significantly

Intuitively, it seems like we should be able to get 1. when debug info
happens to be enabled (not sure about split dwarf?) and possibly in all
cases at the cost of complexity. We may be able to get 2. in all cases with
proper design.

-- Sean Silva

Peter Collingbourne via llvm-dev <llvm-dev@lists.llvm.org> writes:

> Hi all,
>
> I'd like to propose an ODR checker feature for Clang and LLD. The feature
> would be similar to gold's --detect-odr-violations feature, but better:
we
> can rely on integration with clang to avoid relying on debug info and to
> perform more precise matching.

It seems a really nice idea. Do you have performance numbers?

I gave a few perf (and binary size) numbers in my response to Sean.
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113865.html

I agree with others that suggested putting the strings in .strtab. For

-O0 at least it should save many duplicates.

Maybe. As I wrote in my message to Sean it isn't clear exactly how many
duplicates we will have in practice, so it may be better to use a separate
string table and compress it. In any case it does seem to require closer
investigation and maybe more prototyping.

Thanks,

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared to
other object file data? I assume that if these tables contain full mangled
symbol names, they could end up being very large and may want to share the
symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion
(alongside a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard
compression algorithm like gzip.
This wouldn't seem to affect link time performance much because I think
we should only need to look at the strings if we see a ODR name hash match
together with an ODR hash mismatch, which would mean an ODR violation with
a high probability (i.e. unless there was an ODR name hash collision, we
have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Neat observation!

FWIW, it is a birthday problem type situation though, so for a 32-bit
hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
be preferable.

Oh right, good point, using a 64-bit hash does seem like a good idea here.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.

I know that things aren't currently apples-to-apples, but how does that
compare to gold?

I will measure it.

W.r.t. LLD and having it always on by default (and hence making it as

fast as possible), it seems like right now you are implementing the
checking process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach where at
compile time we sort ODR names by hash and partition them using (say) the
upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a lot

of work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the
ODR table is keyed off of symbols in the binary that we are inserting into
the symbol table, then I think we could do the entire ODR check with no
extra "string" work on LLD's part.

I see Rui already mentioned some of this in https://bugs.chromium.org/p
/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is
basically zero overlap right now between ODR names and symbols. I suppose
that I could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

Wouldn't any function or symbol using the record type have the type name
somewhere in it? If we used an offset+length encoding (instead of offset +
NUL termination) we might be able to reuse it then (at some cost in finding
the reference).

That may be possible with some work in the string table builder. But at
that point of course we're not dealing with regular symbols any more. I
guess we could have two ODR tables per object file: an array of (ODR hash,
location) tuples for ODR names that correspond to symbol table symbols
(i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
hash, location) tuples for all other ODR names. I guess if we wanted a "low
overhead" mode we could just omit the second table or put fewer symbols in
it.

With debug info surely there is some sort of string representing the record

name or something like that, no?

Not the record name on its own (they do appear but a bit awkwardly -- each
namespace component is stored in a separate string), but if the record has
at least one member function the mangled type name will appear somewhere in
.debug_str, so we could in principle reuse that with the offset/length
trick.

I guess we may have to have our "low-overhead" user-facing behavior be a

bit more nuanced. E.g.:
1. does this feature bloat object files significantly
2. does this feature slow down link times significantly

Intuitively, it seems like we should be able to get 1. when debug info
happens to be enabled (not sure about split dwarf?) and possibly in all
cases at the cost of complexity. We may be able to get 2. in all cases with
proper design.

I think that would be my rough assessment as well. I think we have a good
shot at 1 for all cases with some of the ideas that have been mentioned
already. If we can avoid creating dependencies on DWARF I think that would
be ideal -- I'd ideally like this to work for COFF as well, where you'd
typically expect to find CodeView in object files. If I were to try this I
think the first thing that I would try is hash/compression combined with
the two ODR tables (no reuse for non-symbol ODR names to start with, as
compression may be enough on its own).

Peter

If you IR Link two modules, you'd want to check that the hash matches and
diagnose immediately before dropping link once_odr functions, right?

2017-06-07 16:32 GMT-07:00 Peter Collingbourne via llvm-dev <
llvm-dev@lists.llvm.org>:

Does this need LLVM support - or is there some generic representation
that could be used instead? (I guess LLVM would want to be aware of it when
merging modules though, so maybe it's worth having a first-class
representation - though LLVM module linking could special case a section
the same way the linker could/would - not sure what's the better choice
there)

The only thing that LLVM needs to do is to have some way to store a blob
that will be emitted to the object file. In my prototype I just create a
GlobalVariable with private linkage, I think in the final version I will
use an MDString referenced by a named MD node. I don't think we would want
a higher level representation -- I'd imagine that the blob would be
entirely a property of the source code, so I can't see anything that an
IR-level pass would want to do with it. It's similar to some parts of debug
info in that there's no real benefit to representing it as anything other
than a blob.

If you IR Link two modules, you'd want to check that the hash matches and
diagnose immediately before dropping link once_odr functions, right?

The ODR table (including its string table) is created by the compiler and
is unaffected by dropped functions. Functions can be dropped by the
optimizer in the non-LTO case as well, of course, and we'd still want to be
able to diagnose ODR mismatches in those functions.

In any case, I think we'd want the ODR checker to always be driven by the
linker, not the IRMover. This is so that we can properly diagnose ODR
mismatches between bitcode files and regular object files, and avoid
needing to deal with duplicate diagnostics (say if we import the same pair
of ODR-mismatching modules into two different ThinLTO backends), and avoid
bad interactions with the ThinLTO cache (we don't want to miss diagnostics
because of a ThinLTO cache hit). The way I imagine that this would work is
that we'd add something to lto::InputFile that gives the client access to
the ODR table.

That said, we may want to reconsider the blob representation if we want to
try and share strings between the object file's string table and the ODR
table, as discussed elsewhere.

Peter

Does it mean that when we merge two bitcodes files, the two ODR tables for
these files are merged as well? In which case if both files contain
function `foo` we will have two hashes for `foo` in the table of the
resulting file?

2017-06-07 16:32 GMT-07:00 Peter Collingbourne via llvm-dev <
llvm-dev@lists.llvm.org>:

Does this need LLVM support - or is there some generic representation
that could be used instead? (I guess LLVM would want to be aware of it when
merging modules though, so maybe it's worth having a first-class
representation - though LLVM module linking could special case a section
the same way the linker could/would - not sure what's the better choice
there)

The only thing that LLVM needs to do is to have some way to store a
blob that will be emitted to the object file. In my prototype I just create
a GlobalVariable with private linkage, I think in the final version I will
use an MDString referenced by a named MD node. I don't think we would want
a higher level representation -- I'd imagine that the blob would be
entirely a property of the source code, so I can't see anything that an
IR-level pass would want to do with it. It's similar to some parts of debug
info in that there's no real benefit to representing it as anything other
than a blob.

If you IR Link two modules, you'd want to check that the hash matches
and diagnose immediately before dropping link once_odr functions, right?

The ODR table (including its string table) is created by the compiler and
is unaffected by dropped functions.

Does it mean that when we merge two bitcodes files, the two ODR tables for
these files are merged as well? In which case if both files contain
function `foo` we will have two hashes for `foo` in the table of the
resulting file?

The resulting file would have two ODR tables, both of which would have a
entry for the function 'foo'.

Peter

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared
to other object file data? I assume that if these tables contain full
mangled symbol names, they could end up being very large and may want to
share the symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion
(alongside a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard
compression algorithm like gzip.
This wouldn't seem to affect link time performance much because I think
we should only need to look at the strings if we see a ODR name hash match
together with an ODR hash mismatch, which would mean an ODR violation with
a high probability (i.e. unless there was an ODR name hash collision, we
have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Neat observation!

FWIW, it is a birthday problem type situation though, so for a 32-bit
hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
be preferable.

Oh right, good point, using a 64-bit hash does seem like a good idea here.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.

I know that things aren't currently apples-to-apples, but how does that
compare to gold?

I will measure it.

For that unit_tests binary I measured the overhead at about 5 seconds
(average of 10 runs). That is with debug info, of course.

W.r.t. LLD and having it always on by default (and hence making it as fast

as possible), it seems like right now you are implementing the checking
process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach where
at compile time we sort ODR names by hash and partition them using (say)
the upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a lot

of work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the
ODR table is keyed off of symbols in the binary that we are inserting into
the symbol table, then I think we could do the entire ODR check with no
extra "string" work on LLD's part.

I see Rui already mentioned some of this in https://bugs.chromium.org/p
/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is
basically zero overlap right now between ODR names and symbols. I suppose
that I could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

Wouldn't any function or symbol using the record type have the type name
somewhere in it? If we used an offset+length encoding (instead of offset +
NUL termination) we might be able to reuse it then (at some cost in finding
the reference).

That may be possible with some work in the string table builder. But at
that point of course we're not dealing with regular symbols any more. I
guess we could have two ODR tables per object file: an array of (ODR hash,
location) tuples for ODR names that correspond to symbol table symbols
(i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
hash, location) tuples for all other ODR names. I guess if we wanted a "low
overhead" mode we could just omit the second table or put fewer symbols in
it.

With debug info surely there is some sort of string representing the

record name or something like that, no?

Not the record name on its own (they do appear but a bit awkwardly -- each
namespace component is stored in a separate string), but if the record has
at least one member function the mangled type name will appear somewhere in
.debug_str, so we could in principle reuse that with the offset/length
trick.

I guess we may have to have our "low-overhead" user-facing behavior be a

bit more nuanced. E.g.:
1. does this feature bloat object files significantly
2. does this feature slow down link times significantly

Intuitively, it seems like we should be able to get 1. when debug info
happens to be enabled (not sure about split dwarf?) and possibly in all
cases at the cost of complexity. We may be able to get 2. in all cases with
proper design.

I think that would be my rough assessment as well. I think we have a good
shot at 1 for all cases with some of the ideas that have been mentioned
already. If we can avoid creating dependencies on DWARF I think that would
be ideal -- I'd ideally like this to work for COFF as well, where you'd
typically expect to find CodeView in object files. If I were to try this I
think the first thing that I would try is hash/compression combined with
the two ODR tables (no reuse for non-symbol ODR names to start with, as
compression may be enough on its own).

I developed a second prototype which uses hash/compression with no attempt
to reuse. It is available here: https://github.com/pcc/
llvm-project/tree/odr-checker2

For Chromium the object file size overhead was 536566007 bytes, or in
relative terms about 25%, or about 4% with debug info. I measured perf
overhead for unit_tests at about 6%, but after I moved the checker onto
another thread, the overhead disappeared into the noise.

I'm reasonably happy with these figures, at least for a first
implementation. We may be able to do even better for file size with reuse,
but I'd leave that for version 2.

Peter

Still seems like quite a big increase.

Any chance of compression? Could this go into .dwo files with Fission and be checked by dwp instead, perhaps?

What’s the story with compatibility between versions, then? Is there a version header? Will old formats be supported by lld indefinitely? Not at all?

  • Dave

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are compared
to other object file data? I assume that if these tables contain full
mangled symbol names, they could end up being very large and may want to
share the symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion
(alongside a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard
compression algorithm like gzip.
This wouldn't seem to affect link time performance much because I
think we should only need to look at the strings if we see a ODR name hash
match together with an ODR hash mismatch, which would mean an ODR violation
with a high probability (i.e. unless there was an ODR name hash collision,
we have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Neat observation!

FWIW, it is a birthday problem type situation though, so for a 32-bit
hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
be preferable.

Oh right, good point, using a 64-bit hash does seem like a good idea
here.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.

I know that things aren't currently apples-to-apples, but how does that
compare to gold?

I will measure it.

For that unit_tests binary I measured the overhead at about 5 seconds
(average of 10 runs). That is with debug info, of course.

W.r.t. LLD and having it always on by default (and hence making it as

fast as possible), it seems like right now you are implementing the
checking process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach where
at compile time we sort ODR names by hash and partition them using (say)
the upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a

lot of work that the linker is already doing that may be reusable for the
ODR checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our
existing string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the
ODR table is keyed off of symbols in the binary that we are inserting into
the symbol table, then I think we could do the entire ODR check with no
extra "string" work on LLD's part.

I see Rui already mentioned some of this in
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is
basically zero overlap right now between ODR names and symbols. I suppose
that I could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

Wouldn't any function or symbol using the record type have the type
name somewhere in it? If we used an offset+length encoding (instead of
offset + NUL termination) we might be able to reuse it then (at some cost
in finding the reference).

That may be possible with some work in the string table builder. But at
that point of course we're not dealing with regular symbols any more. I
guess we could have two ODR tables per object file: an array of (ODR hash,
location) tuples for ODR names that correspond to symbol table symbols
(i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
hash, location) tuples for all other ODR names. I guess if we wanted a "low
overhead" mode we could just omit the second table or put fewer symbols in
it.

With debug info surely there is some sort of string representing the

record name or something like that, no?

Not the record name on its own (they do appear but a bit awkwardly --
each namespace component is stored in a separate string), but if the record
has at least one member function the mangled type name will appear
somewhere in .debug_str, so we could in principle reuse that with the
offset/length trick.

I guess we may have to have our "low-overhead" user-facing behavior be a

bit more nuanced. E.g.:
1. does this feature bloat object files significantly
2. does this feature slow down link times significantly

Intuitively, it seems like we should be able to get 1. when debug info
happens to be enabled (not sure about split dwarf?) and possibly in all
cases at the cost of complexity. We may be able to get 2. in all cases with
proper design.

I think that would be my rough assessment as well. I think we have a
good shot at 1 for all cases with some of the ideas that have been
mentioned already. If we can avoid creating dependencies on DWARF I think
that would be ideal -- I'd ideally like this to work for COFF as well,
where you'd typically expect to find CodeView in object files. If I were to
try this I think the first thing that I would try is hash/compression
combined with the two ODR tables (no reuse for non-symbol ODR names to
start with, as compression may be enough on its own).

I developed a second prototype which uses hash/compression with no
attempt to reuse. It is available here: https://github.com/pcc/
llvm-project/tree/odr-checker2

For Chromium the object file size overhead was 536566007 bytes, or in
relative terms about 25%, or about 4% with debug info. I measured perf
overhead for unit_tests at about 6%, but after I moved the checker onto
another thread, the overhead disappeared into the noise.

Still seems like quite a big increase.

Any chance of compression?

That was with compression -- the implementation compresses the parts of the
ODR table that aren't hashes (aside from the header and the Clang version,
which is a small fixed cost), as well as the string table. The hashes were
left uncompressed because they are in the critical path of the linker and
because I imagine that they wouldn't really be that compressible.

So I think the remaining gains would either be through limiting the number
of ODR table entries, or through reuse of data.

Limiting might be something to explore -- one possibility is that we could
limit the ODR table entries to the declarations that are "used" by a
particular translation unit (it appears that Clang tracks something like
that in Decl::Used/Decl::Referenced, but I'm not sure if that is exactly
what we need -- I think we would basically need to test for reference
reachability from the functions/globals that are IRgen'd).

In terms of reuse, it seems that of the 536566007 bytes of
overhead, 319309579 were the compressed part of the ODR tables. So even if
we achieved 100% sharing, with the current scheme I think that our minimum
achievable overhead would be ~15% (no debug info) or ~2% (with debug info).

Could this go into .dwo files with Fission and be checked by dwp instead,
perhaps?

I think it could also work that way, yes.

I'm reasonably happy with these figures, at least for a first

implementation. We may be able to do even better for file size with reuse,
but I'd leave that for version 2.

What's the story with compatibility between versions, then? Is there a
version header?

Yes, the header contains a version number.

Will old formats be supported by lld indefinitely? Not at all?

I think we should drop support for old formats when we introduce a new
format. My understanding is that the ODR hash can change whenever Clang
changes (the implementation only performs ODR checking if all ODR tables
were produced by the same revision of Clang), so there wouldn't seem to be
a huge benefit in keeping support for old formats around.

Peter

I’d be a bit surprised if they weren’t especially compressible - and how much of the size increase is the compressed data V the uncompressed data?

Is it still in the hot path when parallelized?

Currently it has every type and function that is in the AST? Yeah, that’s a lot - perhaps it should be more like the things that go in the DWARF? (though would need to add some cases there - since the DWARF logic already relies on the ODR to not emit duplicates in some cases)

100% sharing? You mean if all the data were compressed, and assuming the hashes were compressible at the same ratio as the other data?

I imagine it’s possible people aren’t necessarily going to rev lld in exact lock-step with clang, but I could be wrong. (certainly binutils ld or gold aren’t released/kept in lock-step with GCC, for example)

Very nice and simple implementation!

Do you have any statistics on how large these odr tables are
compared to other object file data? I assume that if these tables contain
full mangled symbol names, they could end up being very large and may want
to share the symbol name strings with the overall string table in the .o

Looking at Chromium's object files it looks like the total size of
the odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.

One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion
(alongside a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard
compression algorithm like gzip.
This wouldn't seem to affect link time performance much because I
think we should only need to look at the strings if we see a ODR name hash
match together with an ODR hash mismatch, which would mean an ODR violation
with a high probability (i.e. unless there was an ODR name hash collision,
we have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.

Neat observation!

FWIW, it is a birthday problem type situation though, so for a 32-bit
hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
be preferable.

Oh right, good point, using a 64-bit hash does seem like a good idea
here.

Also, do you have any numbers on the performance of your initial

implementation?

I measured the link time for chromium's unit_tests (the largest
single binary in chromium) at 5.05s without ODR checks and 6.61s with ODR
checks. So about 30% overhead, but in absolute terms it doesn't seem too
bad. So I think this may be acceptable for an initial implementation, but
it certainly seems worth trying to do better.

I know that things aren't currently apples-to-apples, but how does
that compare to gold?

I will measure it.

For that unit_tests binary I measured the overhead at about 5 seconds
(average of 10 runs). That is with debug info, of course.

W.r.t. LLD and having it always on by default (and hence making it as

fast as possible), it seems like right now you are implementing the
checking process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).

Right, that's one approach. I was thinking of a simpler approach
where at compile time we sort ODR names by hash and partition them using
(say) the upper bits of the hash, so that at link time we can have N
threads each building a hash table for a specific partition.

And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.

Even better than doing it faster is just doing less work. There's a

lot of work that the linker is already doing that may be reusable for the
ODR checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our
existing string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If
the ODR table is keyed off of symbols in the binary that we are inserting
into the symbol table, then I think we could do the entire ODR check with
no extra "string" work on LLD's part.

I see Rui already mentioned some of this in
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on
a symbol (such as types), but I think that it would really simplify things
if the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.

Since the current implementation only works with records there is
basically zero overlap right now between ODR names and symbols. I suppose
that I could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.

Wouldn't any function or symbol using the record type have the type
name somewhere in it? If we used an offset+length encoding (instead of
offset + NUL termination) we might be able to reuse it then (at some cost
in finding the reference).

That may be possible with some work in the string table builder. But
at that point of course we're not dealing with regular symbols any more. I
guess we could have two ODR tables per object file: an array of (ODR hash,
location) tuples for ODR names that correspond to symbol table symbols
(i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
hash, location) tuples for all other ODR names. I guess if we wanted a "low
overhead" mode we could just omit the second table or put fewer symbols in
it.

With debug info surely there is some sort of string representing the

record name or something like that, no?

Not the record name on its own (they do appear but a bit awkwardly --
each namespace component is stored in a separate string), but if the record
has at least one member function the mangled type name will appear
somewhere in .debug_str, so we could in principle reuse that with the
offset/length trick.

I guess we may have to have our "low-overhead" user-facing behavior be

a bit more nuanced. E.g.:
1. does this feature bloat object files significantly
2. does this feature slow down link times significantly

Intuitively, it seems like we should be able to get 1. when debug
info happens to be enabled (not sure about split dwarf?) and possibly in
all cases at the cost of complexity. We may be able to get 2. in all cases
with proper design.

I think that would be my rough assessment as well. I think we have a
good shot at 1 for all cases with some of the ideas that have been
mentioned already. If we can avoid creating dependencies on DWARF I think
that would be ideal -- I'd ideally like this to work for COFF as well,
where you'd typically expect to find CodeView in object files. If I were to
try this I think the first thing that I would try is hash/compression
combined with the two ODR tables (no reuse for non-symbol ODR names to
start with, as compression may be enough on its own).

I developed a second prototype which uses hash/compression with no
attempt to reuse. It is available here: https://github.com/pcc/
llvm-project/tree/odr-checker2

For Chromium the object file size overhead was 536566007 bytes, or in
relative terms about 25%, or about 4% with debug info. I measured perf
overhead for unit_tests at about 6%, but after I moved the checker onto
another thread, the overhead disappeared into the noise.

Still seems like quite a big increase.

Any chance of compression?

That was with compression -- the implementation compresses the parts of
the ODR table that aren't hashes (aside from the header and the Clang
version, which is a small fixed cost), as well as the string table. The
hashes were left uncompressed because they are in the critical path of the
linker and because I imagine that they wouldn't really be that compressible.

I'd be a bit surprised if they weren't especially compressible -

Maybe I'm wrong, but my intuition about compression is that it works best
when the data contains repeated patterns. If we use a hash function with
good dispersion then I'd expect each hash to have little in common with
other hashes.

and how much of the size increase is the compressed data V the
uncompressed data?

The ratio was roughly 60% compressed data to 40% uncompressed data.

Is it still in the hot path when parallelized?

Not right now according to my benchmarking, but decompression could push it
into the critical path if it ends up taking longer than the rest of the
work done by the linker after symbol resolution. On the same machine that I
used for benchmarking, gunzip'ing 200MB of /dev/urandom (which is roughly
what I'd expect the hashes to look like) takes around 1.1s, i.e. a not
insignificant fraction of lld's runtime.

So I think the remaining gains would either be through limiting the number

of ODR table entries, or through reuse of data.

Limiting might be something to explore -- one possibility is that we
could limit the ODR table entries to the declarations that are "used" by a
particular translation unit (it appears that Clang tracks something like
that in Decl::Used/Decl::Referenced, but I'm not sure if that is exactly
what we need -- I think we would basically need to test for reference
reachability from the functions/globals that are IRgen'd).

Currently it has every type and function that is in the AST? Yeah, that's
a lot - perhaps it should be more like the things that go in the DWARF?
(though would need to add some cases there - since the DWARF logic already
relies on the ODR to not emit duplicates in some cases)

Just every record declaration -- Clang only supports ODR hashes for record
declarations right now. I understand that function declarations (including
function bodies) are still works in progress.

I think it should indeed just be roughly the things that go in the DWARF. I
think that at one point I observed that every record declaration, even
unused ones, were going into the DWARF, but I might have been mistaken
because I can no longer reproduce that. I'll take a closer look to see if I
can reuse what logic presumably already exists for DWARF.

In terms of reuse, it seems that of the 536566007 bytes of

overhead, 319309579 were the compressed part of the ODR tables. So even if
we achieved 100% sharing,

100% sharing? You mean if all the data were compressed, and assuming the
hashes were compressible at the same ratio as the other data?

Sorry, I mean if 100% of the data in the compressed part of the ODR table
could be eliminated by reusing data stored elsewhere (e.g. in the object
file string table or in the DWARF).

with the current scheme I think that our minimum achievable overhead would

be ~15% (no debug info) or ~2% (with debug info).

Could this go into .dwo files with Fission and be checked by dwp
instead, perhaps?

I think it could also work that way, yes.

I'm reasonably happy with these figures, at least for a first

implementation. We may be able to do even better for file size with reuse,
but I'd leave that for version 2.

What's the story with compatibility between versions, then? Is there a
version header?

Yes, the header contains a version number.

Will old formats be supported by lld indefinitely? Not at all?

I think we should drop support for old formats when we introduce a new
format. My understanding is that the ODR hash can change whenever Clang
changes (the implementation only performs ODR checking if all ODR tables
were produced by the same revision of Clang), so there wouldn't seem to be
a huge benefit in keeping support for old formats around.

I imagine it's possible people aren't necessarily going to rev lld in
exact lock-step with clang, but I could be wrong. (certainly binutils ld or
gold aren't released/kept in lock-step with GCC, for example)

That's certainly possible, but I'd say that the bar for dropping backwards
compatibility is lower because ODR tables are not required for correctness.
We could keep compatibility with the last version or so if it isn't too
burdensome, or otherwise print a warning.

Peter