RFC: Adding "minidump" support to obj2yaml

Hello all,

yesterday I sent an email
<http://lists.llvm.org/pipermail/lldb-dev/2019-March/014811.html&gt; to
lldb-dev proposing a new tool in lldb for yamlization of minidump files.
It's been suggested to me that instead of a new tool it may be better to
add support for that format to obj2yaml instead. Hence, this email. :slight_smile:

As I expect most people are unfamiliar with this format, I'm going to
start off with a brief introduction.

Minidump is the native "core file" format for windows systems. However,
it is widely used on other systems too. Probably the most popular tools
producing this format are the Google "breakpad" and "crashpad" crash
reporting systems. LLDB has support for this format since 2016, when it
was added as a GSoC project by Dimitar Vlahovski. It currently in active
use and development by several lldb contributors.

The format itself is fairly simple and extensible. The file starts of
with a header containing some basic info and a collection of "streams".
Each stream contains various types of information about the state of the
process at the time when the snapshot (minidump) was taken. This
includes information such as:
- list of loaded modules
- list of threads
- chunks of process memory
- etc.

The problem I'm trying to solve right now is how to write tests for this
functionality. We currently don't have any tool which could create
minidump files from human-readable descriptions of them, so our tests
are relying on checking in opaque binary blobs. This makes reviewing the
changes hard and also complicates creating test cases (real-world
minidumps tend to be large). In other words, we are missing a tool like
yaml2minidump.

=== end of introduction ===

While we could create an lldb tool for converting between minidump and
yaml files, there is some appeal in making everything available from a
single tool (i.e., yaml2obj). The main obstacle to that is that there is
currently no support for parsing these files in llvm, and apart from
yaml2obj, it's not clear to me whether any other llvm tool/project would
benefit from this functionality being available in the main llvm
project. For example tools, like llvm-readelf have support for elf core
files, but this is mostly a byproduct of the fact that elf core files
are similar to elf executables. However, there is no "executable" form
of minidumps.

So I am asking this question: Do you think having minidump parsing code
in llvm is a good idea?

To give you an idea of what this involves, the current minidump parser
in lldb is about 2000 LOC. It's already fairly independent of the rest
of lldb, though it would need to be cleaned up a bit to be up to llvm
standards. My expectation is that the yaml conversion code would add
another 1-2 kLOC.

The natural place for this in llvm would seem to be the Object library,
so I'd propose for this code to be placed there. The thing I'm not sure
about is whether it makes sense to integrate this into the existing
ObjectFile hierarchy. While the minidump "streams" could be represented
as sections, I'm not sure we'd be doing anyone a favour by doing that.
The ObjectFile sections assume they are referring to sections in regular
object files, which have things like relocations, symbol lists, etc., and
minidump streams have none of those. Therefore I'm leaning towards the
option of just implementing this as a standalone MinidumpFile class.
This would be kind of similar to the existing ELFFile class, only there wouldn't
be an ELFObjectFile sitting on top of that.

Please let me know what do you think,
pavel

I’m all for anything that allows people to test without having to use pre-canned binaries. I’m not particularly familiar with the minidump format, so I’m not sure what the best place for code relating to it would be, but I do agree that extending yaml2obj sounds like a good idea. From what you say, minidumps don’t sound like they’d fit the ObjectFile class well, so I don’t see an issue with a new MinidumpFile class, if it will work well with how yaml2obj is currently written.

James

I have no problem with extending yaml2obj. As for the minidump parsing code, do you think it would be possible lay it out in a way that compiling it can be optional? I would imagine that this feature is less interesting for people who want to build, e.g., non-crosscompiling Linux toolchains and since the code size of LLVM is growing very quickly people are becoming more sensitive to it.

-- adrian

Thanks for the support, James.

Adrian, I do share the concerns about code size. I suppose I could put
the minidump parsing code into a subfolder of lib/Object, such that it
is a separate library and can be disabled by excluding it from
LLVM_DYLIB_COMPONENTS by people trying to minimize size footprint (I
don't expect this should have impact on anything other than the llvm
shared library, as the tools which don't use this code simply will not
have it linked in). If that's the consensus, then I'm happy to
implement that, but I'm not sure if this doesn't give more prominence
to the minidump code than it deserves (i.e., why should it get a
special subfolder, and elf/macho/coff/wasm code be stuffed into the
same folder).

Or we could just say that the niceness of having a single tool for
yaml<->binary conversions (and to me that really seems like the main
advantage of putting this code in llvm) isn't worth the size increase,
and just have a separate tool for that in the lldb repo, at least
until we have another reason to have minidump parsing code live in
llvm.

regards,
pavel

Hello again,

I've posted a patch <https://reviews.llvm.org/D59291&gt; which adds basic
minidump support to obj2yaml. I've tried to keep the size down as much
as possible, but the patch is still relatively large (~1kLOC) because
I needed to set up the infrastructure because we had no minidump
support at all at this moment.

I've made the MinidumpFile class inherit from llvm::Binary, which I
didn't know exists before, but it seems that minidump files fit in
there nicely. I did not try to put the code into a separate module,
since that is not really supported by the Binary hierarchy right now
(we'd need some kind of plugin mechanism to register minidump files at
runtime).

let me know what you think,
pavel