[RFC] Displaying source variable locations in llvm-objdump

Hi llvm-dev,

I’ve uploaded a prototype patch at https://reviews.llvm.org/D70720 which adds a new feature to llvm-objdump: displaying the location (in registers/memory/etc) of source-level variables alongside the disassembly display. I’ve put a demo of the output at https://reviews.llvm.org/M2.

I have two use-cases in mind for this:

  • Users reading the disassembly of compiled code. It will be quicker/easier to do this if the disassembly shows which value is in each register and stack slot, rather than the user having to reverse-engineer this by hand.
  • Compiler developers, who can use it to understand the debug info emitted by the compiler, and spot missing or incorrect debug info. In fact, I’ve already spotted one LLVM bug while writing this patch: in the function baz in M2, the debug info claims that variable a is in r0 between PC addresses 0x14 and 0x8, which isn’t true.

My questions for the LLVM community are:

  • Is this an acceptable change for llvm-objdump, or is this adding too much complexity to be worth it?
  • The patch currently uses unicode box-drawing characters, is this OK? If not, what would people rather see? A plain ASCII version of this, or some completely different format?
  • The patch displays DWARF expressions in an ad-hoc syntax, which is a mix of C and ARM assembly (square brackets for memory access). Is there an existing syntax which would be better for this? I think it’s important that the common cases like “load 4 bytes from memory at SP+4” are displayed concisely.

Oliver

Hi Oliver,

This is really cool. I absolutely support this for llvm-objdump. As
far as output I don't have any strong opinions other than it might be
good to separate out the "drawing" code as much as possible from the
variable collection and range code to make it a little easier, but
that's about it from here.

Thanks for the work, can't wait to use it.

-eric

Hi,

I like this a lot! I think the dwarf expression syntax is spot-on.

I quite like the look of this in principle too, and I see no reason it couldn’t be added as an option in llvm-objdump. Have you experimented with this on a Windows command prompt (PowerShell and cmd)? My experience with unicode characters in those has shown that unicode characters don’t always work particularly well.

I don’t think that should be a blocker on this feature mind you, but if it is problematic, we might just want to fall back to standard ASCII characters (‘|’, ‘+’, ‘-’ etc).

I’ll take a look at the patch in due course.

James

Hi Oliver,

I've uploaded a prototype patch at https://reviews.llvm.org/D70720 which adds a new feature to llvm-objdump: displaying the location (in registers/memory/etc) of source-level variables alongside the disassembly display. I've put a demo of the output at https://reviews.llvm.org/M2.

I haven't read the code yet, but the demo looks incredibly good, and
I'd certainly find this feature useful on a daily basis. Many thanks
for writing it!

Oliver wrote:

* The patch currently uses unicode box-drawing characters, is this OK? If not, what would people rather see? A plain ASCII version of this, or some completely different format?

I enjoy a plain ASCII aesthetic myself, but I feel the extra detail is
really contributing a lot, for example distinguishing the location
range from the variable name connection (the former thick, the latter
thin). IMHO, well worth keeping the unicode.

* The patch displays DWARF expressions in an ad-hoc syntax, which is a mix of C and ARM assembly (square brackets for memory access). Is there an existing syntax which would be better for this? I think it's important that the common cases like "load 4 bytes from memory at SP+4" are displayed concisely.

I'm not aware of existing syntax, when printing assembly LLVM will add
comments where variable ranges start such as [0]. AFAIUI that only
ever prints the base register, an initial memory deref (like [SP+4] as
your demo shows), and the rest of the expression is printed as
text/opcodes as here [1].

I reckon that outside of the two common cases you describe, it would
be enough to flag that there's extra unshown expression to consider,
by appending a star for example. The rest of the expression is easily
accessible to a developer, and displaying expressions isn't the
primary aim of the patch.

I'll get round to looking at the patch in a bit.

[0] https://github.com/llvm/llvm-project/blob/1433b1b6ec7e1c2b2a91d2070dcd88adf1aa9774/llvm/test/tools/llvm-symbolizer/frame-types.s#L99
[1] https://github.com/llvm/llvm-project/blob/abf25745b339700639a5d319551ed120a52fd753/llvm/test/tools/llvm-dwarfdump/X86/Inputs/statistics-fib.split-dwarf.s#L115

This looks fantastic. It will be a big time saver for folks staring at assembly.

— Sean Silva

I agree with the others that this seems great! I think this information can be super helpful for users both in learning and skimming assembly codes.

As a maintainer of a LLVM frontend (JuliaLang), I’m additionally interested in whether some bits of this make sense to end up in libLLVM itself. Probably especially the collection code pieces. For context, I’ve previously written some code to pretty-print the line-table information as code comments (sample https://gist.github.com/vtjnash/2f2b642663655d5fc63ec7321c5bd0bd, implementation https://github.com/JuliaLang/julia/blob/master/src/disasm.cpp#L167), and it’s been on my mind ever since to figure out if some portion of that made sense to upstream, if any. And also to figure out how to parse and show the variable info along it. So even if none of this PR ends up in the libllvm library, I’d still plan to someday figure out which bits of this PR to copy into our AssemblyAnnotationWriter to show the variable info in our front-end also.

But if it does get put in libLLVM, this capability seems like it could be useful for the other instruction printers too (e.g. IR and MIR). So I’d be interested to hear if you have any thoughts on what might make sense in a library, and any other opportunities where I could help collaborate. This shouldn’t need to delay review and merging of your current PR though.

-jameson

My general perspective here is that it’ll be useful to put into a library at some point. I’d like to see a collection of a lot of the functionality from readelf and objdump to make its way out of the tool and into the library in general.

-eric

+1 to this comment. It has always felt weird to me that we have a non-trivial amount of duplicate functionality with independent implementations in the two tools, when we could pull much of it into libObject or similar. Printing of the data would need to continue to be independent of llvm-readelf/llvm-objdump etc since they do that differently, but the parsing should be identical.

James

This is a great addition to llvm-objdump. My only concern is that llvm-objdump.cpp is already pretty complicated and in need of refactoring as it’s had lots of small features added over the years. I’d really like to see the disassembly formatting stuff moved out to another file, but I’m not sure that should be a blocker.

While I really like the unicode, it won’t work on Windows by default. It would be nice if we could detect if the terminal supported unicode, but I’m not sure there’s actually a good way to do that.

  • Michael Spencer