RFC: Move parts of llvm-symbolizer tool implementation to LLVMSymbolize library

Hi,

We have a lot of non-trivial logic accumulated in the
implementation of llvm-symbolizer tool (tools/llvm-symbolizer/LLVMSymbolize.{h,cpp}), for instance:

  • dynamic dispatch between DWARF and PDB debug info;
  • building address->symbol_name mapping from object file (with special cases for PowerPC function descriptor section, and COFF export tables);
  • finding debug info stored in separate files (.dSYM files on Darwin, ELF .gnu_debuglink section, etc.);
  • demangling (with platform-specific implementations for Windows and Unix).

I propose to move this code into a separate library LLVMSymbolize (stored under lib/DebugInfo/Symbolize), and make llvm-symbolizer a short and simple tool using it. This would allow to:

  • implement in-process symbolized stack trace printers (for the cases when it’s possible to link in a bunch of LLVM libraries into the executable).
  • easily write more tools that can make use of symbolized code locations, such as coverage data visualizers.
  • (at least sometimes) write unit tests instead of testing functionality by running “llvm-symbolizer” executable on pre-built executables checked in repository.

Any comments/objections?

I would say it is worth it if someone is actually planning on using
the library in something else.

Moving the code "just in case" or to create unit tests is not a good
reason IMHO.

To create unit tests is a pretty good reason IMO :slight_smile:

That said, I’d be a fan of trying to encapsulate all of this behind an interface. I like that most of the tools are exceptionally light weight and it makes it much more obvious what’s “wrapper” versus “functionality” in something like llvm-symbolize. That said, I’ll be interested to see the library design :slight_smile:

-eric

We have out-of-tree implementation of llvm-symbolizer-as-a-library, and I
still hope to upstream it one day (unfortunately, its build process is
really complicated).

Mike can probably comment on his plans for using symbolization in
coverage-related tools.

To create unit tests is a pretty good reason IMO :slight_smile:

That said, I'd be a fan of trying to encapsulate all of this behind an
interface. I like that most of the tools are exceptionally light weight and
it makes it much more obvious what's "wrapper" versus "functionality" in
something like llvm-symbolize. That said, I'll be interested to see the
library design :slight_smile:

Do you suggest to design it upfront, or you're fine with moving the
existing code first, and gradually updating the interface afterwards?

Don’t know, what’s the interface look like now? :slight_smile: Were you just going to copy the LLVMSymbolize.[cpp,h] into the directory? That should be fine I guess. I’d like to see the general ownership of objects separated out fairly explicitly from the rest of the code.

-eric

We have out-of-tree implementation of llvm-symbolizer-as-a-library, and I
still hope to upstream it one day (unfortunately, its build process is
really complicated).

Mike can probably comment on his plans for using symbolization in
coverage-related tools.

To create unit tests is a pretty good reason IMO :slight_smile:

That said, I'd be a fan of trying to encapsulate all of this behind an
interface. I like that most of the tools are exceptionally light weight and
it makes it much more obvious what's "wrapper" versus "functionality" in
something like llvm-symbolize. That said, I'll be interested to see the
library design :slight_smile:

Do you suggest to design it upfront, or you're fine with moving the
existing code first, and gradually updating the interface afterwards?

Don't know, what's the interface look like now? :slight_smile: Were you just going to
copy the LLVMSymbolize.[cpp,h] into the directory?

For a start, yes.

That should be fine I guess. I'd like to see the general ownership of
objects separated out fairly explicitly from the rest of the code.

I'm not sure what you mean by this.

OK.

Is LLVMSymbolizer owning all of the files the right choice, or what was convenient at the time?

-eric

We have out-of-tree implementation of llvm-symbolizer-as-a-library, and
I still hope to upstream it one day (unfortunately, its build process is
really complicated).

Mike can probably comment on his plans for using symbolization in
coverage-related tools.

To create unit tests is a pretty good reason IMO :slight_smile:

That said, I'd be a fan of trying to encapsulate all of this behind an
interface. I like that most of the tools are exceptionally light weight and
it makes it much more obvious what's "wrapper" versus "functionality" in
something like llvm-symbolize. That said, I'll be interested to see the
library design :slight_smile:

Do you suggest to design it upfront, or you're fine with moving the
existing code first, and gradually updating the interface afterwards?

Don't know, what's the interface look like now? :slight_smile: Were you just going
to copy the LLVMSymbolize.[cpp,h] into the directory?

For a start, yes.

OK.

That should be fine I guess. I'd like to see the general ownership of
objects separated out fairly explicitly from the rest of the code.

I'm not sure what you mean by this.

Is LLVMSymbolizer owning all of the files the right choice, or what was
convenient at the time?

Yeah, I think it's correct to have LLVMSymbolizer own parsed object files
(although we might factor out a separate "cache" object
that would be responsible for it).

OK. Sounds good to me. It can also be changed in the future if we decide to go a different way.

-eric

We have out-of-tree implementation of llvm-symbolizer-as-a-library,
and I still hope to upstream it one day (unfortunately, its build process
is really complicated).

Mike can probably comment on his plans for using symbolization in
coverage-related tools.

To create unit tests is a pretty good reason IMO :slight_smile:

That said, I'd be a fan of trying to encapsulate all of this behind
an interface. I like that most of the tools are exceptionally light weight
and it makes it much more obvious what's "wrapper" versus "functionality"
in something like llvm-symbolize. That said, I'll be interested to see the
library design :slight_smile:

Do you suggest to design it upfront, or you're fine with moving the
existing code first, and gradually updating the interface afterwards?

Don't know, what's the interface look like now? :slight_smile: Were you just going
to copy the LLVMSymbolize.[cpp,h] into the directory?

For a start, yes.

OK.

That should be fine I guess. I'd like to see the general ownership of
objects separated out fairly explicitly from the rest of the code.

I'm not sure what you mean by this.

Is LLVMSymbolizer owning all of the files the right choice, or what was
convenient at the time?

Yeah, I think it's correct to have LLVMSymbolizer own parsed object files
(although we might factor out a separate "cache" object
that would be responsible for it).

OK. Sounds good to me. It can also be changed in the future if we decide
to go a different way.

See ⚙ D13998 Move parts of llvm-symbolizer tool into LLVMSymbolize library.