Very nice and simple implementation!
Do you have any statistics on how large these odr tables are compared to
other object file data? I assume that if these tables contain full mangled
symbol names, they could end up being very large and may want to share the
symbol name strings with the overall string table in the .o
Looking at Chromium's object files it looks like the total size of the
odrtabs is about 50% of the total size of the object files, which isn't
great. The current implementation only looks at records, so I imagine that
it would be hard to share any of the strings that I'm currently creating.
(I guess it's possible that some types will have a mangled vtable name in
the string table, so we may be able to share a little that way.) Note
however that this was without debug info.
One option for reducing size would be to
1) store hashes of ODR names in ODR tables, per Rui's suggestion (alongside
a reference to the name itself in the string table)
2) compress the string table for the ODR names with a standard compression
algorithm like gzip.
This wouldn't seem to affect link time performance much because I think we
should only need to look at the strings if we see a ODR name hash match
together with an ODR hash mismatch, which would mean an ODR violation with
a high probability (i.e. unless there was an ODR name hash collision, we
have found an ODR violation). If we don't expect a lot of sharing with
regular string tables (see below), it seems even more reasonable.
Also, do you have any numbers on the performance of your initial
implementation?
I measured the link time for chromium's unit_tests (the largest single
binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
think this may be acceptable for an initial implementation, but it
certainly seems worth trying to do better.
W.r.t. LLD and having it always on by default (and hence making it as fast
as possible), it seems like right now you are implementing the checking
process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).
Right, that's one approach. I was thinking of a simpler approach where at
compile time we sort ODR names by hash and partition them using (say) the
upper bits of the hash, so that at link time we can have N threads each
building a hash table for a specific partition.
And of course this work can be started right after symbol resolution
finishes and parallelised with the rest of the work done by the linker.
Even better than doing it faster is just doing less work. There's a lot of
work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the ODR
table is keyed off of symbols in the binary that we are inserting into the
symbol table, then I think we could do the entire ODR check with no extra
"string" work on LLD's part.
I see Rui already mentioned some of this in https://bugs.chromium.org/p
/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a
symbol (such as types), but I think that it would really simplify things if
the check was done as such. Do you have any idea exactly how much of the
things that we want to check are not keyed on symbols? If most things are
keyed on symbols, for the things we are not we can just emit extra symbols
prefixed by __clang_odr_check_ or whatever.
Since the current implementation only works with records there is basically
zero overlap right now between ODR names and symbols. I suppose that I
could estimate the amount of function overlap in a hypothetical
implementation that computes ODR hashes of functions by comparing the
number of *_odr functions after clang has finished IRgen with the number
after optimization finishes. This of course would be strictly more than
functions + types.
The issue of retaining the ODR check for functions even if they get
inlined may inherently pose an extra cost that can't be folded into
existing work the linker is doing, so there might be a reason for clang to
have a default mode that has practically no linking overhead and one that
does more thorough checking but imposes extra linking overhead. Think
something like a crazy boost library with thousands of functions that get
inlined away, but have gigantic mangled names and so precisely are the ones
that are going to impose extra cost on the linker. Simply due to the extra
volume of strings that the linker would need to look at, I don't think
there's a way to include checking of all inlined function "for free" at the
linker level using the symbol approach.
I guess those inlined functions would still have those symbol names in