Hello
I am working on adding support for bolt (https://github.com/facebookincubator/BOLT/tree/rebased) to write out DWP directly. I want to re-use as much llvm-dwp functionality as possible.
Plan is to move most of functionality that is now in llvm-dwp in to llvm/lib/DWP, with corresponding header file in llvm/include/llvm/DWP.
In the header files have
getContributionIndex
handleSection
parseCompileUnitHeader
writeStringsAndOffsets
getCUIdentifiers
buildDuplicateError
writeIndex
For structs that are passed around define in the header also.
UnitIndexEntry
CompileUnitHeader
CompileUnitIdentifiers
Thought I would solicit opinions before I dive too deep into this.
Thank You
Alex
I’m OK-ish with this, though I will warn you, llvm-dwp is not, in my opinion (as the author of it), the best foundation for this sort of thing. In particular it uses significantly more memory (& probably CPU) than gold’s dwp tool - building more things on top of this before addressing some of those scalability issues will build more technical debt that’ll need to be paid down at some point. If you plan to use this in a production use case, I’d strongly encourage you to invest some time in improvements to llvm-dwp’s scalability first, to create a better foundation to build on top of. I’m happy to help with/advise on that work.
Hello David
I haven’t dug into llvm-dwp performance. What are some of the performance pain points that you know about?
Thank You
Alex
Hello David
I haven’t dug into llvm-dwp performance. What are some of the performance pain points that you know about?
Yeah - using LLVM’s higher level abstractions for writing object files (MCStreamer et, al) means that, as far as I recall, all the output ends up buffered in memory before being written out - whereas, ideally, it’d be streamed (memcpy to/from memory mapped files) from input file to output file (potentially through streamed compression/decompression where possible too - another layer of the MCStreamer abstractions that can add cost (though I don’t think I implemented support for compressing output in llvm-dwp, though it’d be trivial to add because it’s already supported in MCStreamer (but that support does buffer the whole uncompressed and compressed data… ))). Maybe some other things, but that’s certainly the top of my list.
Hello David,
Thank you for elaborating.
When you are talking about compression, is this related to debug info coming in compressed already, or something else?
Regarding MCStremer what would be the alternative? In Bolt it provides a nice level of abstraction for us as we output new updated binary, and write out dwo files, in debug fission case.
In general, the usage model for BOLT is in some ways similar to llvm-dwp, except we don’t really deal with compressed debug information. Some sections are pass through, but others get either modified, .debug_info, or complete re-written, .debug_loc. As an example. For llvm-dwp the .debug-str-offset and .debug-str section gets re-written. Although much more data is modified/replaced before being written out in bolt case. So, I am not sure pure in/out performance is as critical for us at the moment.
I took initial step of factoring out llvm-dwp code in to it’s own library. To see what it will look like. What I ended up is with few APIs that take in MCStreamer, and all the code for dealing with it is in main function of llvm-dwp.
With all of this said, and Bolt usage model, I think dealing with MCStreamer issue can be deferred to after refactoring to library/adding functionality to BOLT.
Alex
Hello David,
Thank you for elaborating.
When you are talking about compression, is this related to debug info coming in compressed already, or something else?
Both/either compressed input (which if I remember correctly gets fully decompressed into a buffer and that buffer may be kept alive for most/all of the llvm-dwp run - I don’t remember exactly how that goes) and compressed output (which first llvm-dwp passes bytes to MCStreamer, which buffers every section itself, then it has to write that section contents into a compressed buffer which also stays alive until it’s written out to the output file, etc).
Regarding MCStremer what would be the alternative? In Bolt it provides a nice level of abstraction for us as we output new updated binary, and write out dwo files, in debug fission case.
Some kind of lighter weight abstraction (or refactoring of MCStreamer to make or allow it to be lighter weight in some/many cases - eg: if a user knows the important facts (size, etc) that are needed to emit headers, layout the object, etc then get a callback to emit the section’s bytes when needed - so if they’re parametrically computed (eg: a function of some input file, bytes from a StringMap, etc) or simply mapped from some input file, then those bytes can be emitted without MCStreamer to ever have to take ownership of the byte buffer/make its own).
Personally I’d love it if the abstraction was lightweight enough to be shared with lld and orc (well, I guess orc probably already uses MCStreamer, but perhaps it can benefit from these reductions in overhead).
In general, the usage model for BOLT is in some ways similar to llvm-dwp, except we don’t really deal with compressed debug information. Some sections are pass through, but others get either modified, .debug_info, or complete re-written, .debug_loc. As an example. For llvm-dwp the .debug-str-offset and .debug-str section gets re-written. Although much more data is modified/replaced before being written out in bolt case. So, I am not sure pure in/out performance is as critical for us at the moment.
Fair enough.