Hi Alexey,
Hi James,
(Resending with history trimmed to avoid it getting stuck in
moderator queue).
Hi Alexey,
Just an update - I identified the cause of the "Generated debug
info is broken" error message when I tried to build things
locally: the `outStreamer` instance is initialised with the host
Triple, instead of whatever the target's triple is. For example,
I build and run LLD on Windows, which means that a Windows triple
will be generated, and consequently a COFF-emitting streamer will
be created, rather than the ELF-emitting one I'd expect were the
triple information to somehow be derived from the linker
flavor/input objects etc. Hard-coding in my target triple
resolved the issue (although I still got the other warnings
mentioned from my game link).
Thank you for the details. Actually, I did not test this on
Windows. But I would do and update the patch.
I measured the performance figures using LLD patched as
described, and using the same methodology as my earlier results,
and got the following:
Link-time speed (s):
+-----------------------------+---------------+
> Package variant | GC 1 (normal) |
+-----------------------------+---------------+
> Game (DWARF linker) | 53.6 |
> Game (DWARF linker, no ODR) | 63.6 |
> Clang (DWARF linker) | 200.6 |
+-----------------------------+---------------+
Output size - Game package (MB):
+-----------------------------+------+
> Category | GC 1 |
+-----------------------------+------+
> DWARFLinker (total) | 696 |
> DWARFLinker (DWARF*) | 429 |
> DWARFLinker (other) | 267 |
> DWARFLinker no ODR (total) | 753 |
> DWARFLinker no ODR (DWARF*) | 485 |
> DWARFLinker no ODR (other) | 268 |
+-----------------------------+------+
Output size - Clang (MB):
+-----------------------------+------+
> Category | GC 1 |
+-----------------------------+------+
> DWARFLinker (total) | 1294 |
> DWARFLinker (DWARF*) | 743 |
> DWARFLinker (other) | 551 |
> DWARFLinker no ODR (total) | 1294 |
> DWARFLinker no ODR (DWARF*) | 743 |
> DWARFLinker no ODR (other) | 551 |
+-----------------------------+------+
*DWARF = just .debug_info, .debug_line, .debug_loc,
.debug_aranges, .debug_ranges.
Peak Working Set Memory usage (GB):
+-----------------------------+------+
> Package variant | GC 1 |
+-----------------------------+------+
> Game (DWARFLinker) | 5.7 |
> Game (DWARFLinker, no ODR) | 5.8 |
> Clang (DWARFLinker) | 22.4 |
> Clang (DWARFLinker, no ODR) | 22.5 |
+-----------------------------+------+
My opinion is that the time costs of the DWARF Linker approach
are not really practical except on build servers, in the current
state of affairs for larger packages: clang takes 8.8x as long as
the fragmented approach and 11.2x as long as the plain approach
(without the no ODR option). The size saving is certainly good,
with my version of clang 51% of the total output size for the
DWARF linker approach versus the plain approach and 55% of the
fragmented approach (though it is likely that further size
savings might be possible for the latter). The game produced
reasonable size savings too: 62% and 74%, but I'd be surprised if
these gains would be enough for people to want to use the
approach in day-to-day situations, which presumably is the main
use-case for smaller DWARF, due to improved debugger load times.
Interesting to note is that the GCC 7.5 build of clang I've used
these figures with produced no difference in size results between
the two variants, unlike other packages. Consequently, a
significant amount of time is saved for no penalty.
I'll be interested to see what the time results of the DWARF
linker are once further improvements to it have been made.
yep, current time costs of the DWARFLinker are too high. One of
the reasons is that lld handles sections in parallel, while
DWARFLinker handles data sequentially. Probably DWARFLinker
numbers could be improved if it would be possible to teach it to
handle data in parallel. Thank you for the comparison!
No problem! It was useful for me to gather the numbers for internal investigations too. Parallelisation would hopefully help, but at this point it's hard to say by how much. There are likely going to be additional time costs for fragmented DWARF too, once I fix the remaining deficiencies, as they'll require more relocations.
Speaking of "Fragmented DWARF" solution, how do you estimate
memory requirements to support fragmented object files ?
I'm not sure if you're referring to the memory usage at link time or the disk space required for the inputs, but I posted both those figures in my original post in this thread.
I mean the run-time memory usage of DebugInfoDWARF library.
Currently, when Object file is loaded and DWARFContext class is created
the DWARFContext references section data from object::ObjectFile:
DWARFContext(std::unique_ptr<const DWARFObject> DObj,..)
DWARFObjInMemory(const object::ObjectFile &Obj, ...)
class DWARFObjInMemory {
const DWARFSection &getLocSection() const;
const DWARFSection &getLoclistsSection() const;
StringRef getArangesSection() const;
const DWARFSection &getFrameSection() const;
const DWARFSection &getEHFrameSection() const;
const DWARFSection &getLineSection() const;
StringRef getLineStrSection() const;
}
class DWARFUnit {
DWARFContext &Context;
/// Section containing this DWARFUnit.
const DWARFSection &InfoSection;
}
struct DWARFSection {
StringRef Data;
};
DWARFSection references data that are loaded by Object file.
DWARFSection is assumed to be a monolithic piece of data.
There is a code using these data assuming random access:
StringRef LineData = OrigDwarf.getDWARFObj().getLineSection().Data;
LineData.slice(*StmtList + 4, PrologueEnd)
...
StringRef FrameData = OrigDwarf.getDWARFObj().getFrameSection().Data;
FrameData.substr(EntryOffset, InitialLength + 4)
...
InputSec = Dwarf.getDWARFObj().getLocSection();
InputSec.Data.substr(Offset, Length);
...
DWARFDataExtractor RangesData(Context.getDWARFObj(), *RangeSection,
isLittleEndian, getAddressByteSize());
uint64_t ActualRangeListOffset = RangeSectionBase + RangeListOffset;
RangeList.extract(RangesData, &ActualRangeListOffset);
i.e. It is possible to access random piece of DWARFSection.
If object::ObjectFile would contain fragmented sections then
we need a solution of how that could work.
One possibility is to create a glued copy of fragmented data and pass it to the DWARFObj.
But that would require to load all original debug info sections twice
(fragmented sections inside Objectfile and glued sections inside DWARFObj).
Another possibility is to rewrite DebugInfoDWARF/DWARFSection to avoid random access to the data(if that is possible).
If it's something else, please let me know. Based on those figures, it's clear the cost depends on the input code base, but it was between 25 and 75% or so bigger object file size and 50 and 100% more memory usage. Again, these are likely both to go up when I get around to fixing the remaining issues.
In comments for your Lightning Talk you have mentioned that it
would be necessary to "update DebugInfo library to treat the
fragmented sections as one continuous section". Do you think it
would be cheap to implement?
I think so. I'd hope it would be possible to replace the data buffer underlying the DWARF section parsing to be able to "jump" to the next fragment (section) when it gets to the end of the previous one. I haven't experimented with this, but I wouldn't expect it to be costly in terms of code quality or performance, at least in comparison to parsing the DWARF itself.
So it looks like you assume the second case: avoiding random access to the section data.