Making llvm-objdump more like GNU objdump

Hello LLVM,

Previously, some folks wanted llvm-objdump to behave more like GNU
objdump. This could encompass both command line options and output
format. Such a change helps developers already familiar with GNU
tools and allows re-use of Perl scripts or other automation expecting
to see GNU style dumps.

Is moving llvm-objdump toward GNU objdump the general preference? And
what about otools style output?

Regards,
-steve

Hi Steve,

I’ve been trying to get the functionality of llvm-objdump to match that of darwin’s otool(1). In adding the support for symbolic disassembly and to allow testing of it on very large files that would allow the disassembly to diff cleanly, I added a few options to llvm-objdump and to tool(1). For example these would be the two command lines I would use for testing:

llvm-objdump -d -m -no-show-raw-insn -full-leading-addr -print-imm-hex …
otool -tV -U -no-show-raw-insn …

Longest term I hope to see llvm-objdump take over all of darwin’s otool(1) functionality. Not sure the best way of going this for command line options as the trick of passing them differently based on argv[0] may not work. There may need to be some wrapper to do that. And also their may need to be some option like llvm-nm’s "-format XXX” to get the output to match so scrips can use the output.

I’ve Cc’ed Jim Grosbach as he may have some guidance on this.

My thoughts,
Kev

Hey folks,

This is great to see more interest on the supporting tools like objdump and such. I very much agree that bringing llvm-objdump up to feature parity (to start with) compared to both otool(1) and objdump(1) is a great goal. The default output formatting is easy enough to get right by having it be controlled by the container format (otool style for macho, objdump style for ELF). Kevin’s right that where this gets a bit interesting is command line option handling. The prevailing wisdom from clang and lld so far seems to the alternatives Kevin mentions of sniffing argv[0] and/or having a —flavor or —format option. IMO, for now we can just do the latter, which is the simpler thing, while we get the real functionality in place. Then when we’re ready to, optionally as packagers decide to opt-in, use llvm-objdump to replace the system version, we can figure out the right way to make that transition nice and clean.

-jim

Hi guys, thanks for responding. Will mimicking both otool and objdump
in one binary become unwieldy? Maybe a disassembler library would be
a better way to factor out common code? For example, will Kevin's
symbolizing work be relevant for ELF files?
Regards,
-steve

At least for now, I don’t expect it to become all that unwieldy. Any behavioral differences should be easily separable into different classes and source files. If as things progress it becomes obvious that there’s really not much of anything in common other than the general nature of the tools, it’s easy to split them apart.

-Jim

There currently is a -macho option to llvm-objdump to "Use MachO specific object file parser” which I’m hiding the disassembly stuff specific for Mach-O behind. Currently it is only used with the -disassemble option. But one could see it to be used for other stuff. But as Jim points out the output today for some things is controlled by the container which is what is done for things like -private-headers . There are flags like -exports-trie, -rebase, -bind, etc that are really Mach-O options.

As far as the symbolizing work it can be relevant for ELF files and the code I did can be used as a model for hooking it up for ELF files. But the real work of the call backs are very specific to each type of object file.

Kev

OK. Let's try a specific example: At least for ELF files, GNU
objdump prints operand values in hex. AFAIK, hex is not just the
default, but the only choice. On the other hand, llvm-objdump prints
operand values in decimal and ignores the --print-imm-hex option for
ELF.

How about a patch to print operands in hex for ELF? Good place to start?

OK. Let's try a specific example: At least for ELF files, GNU
objdump prints operand values in hex. AFAIK, hex is not just the
default, but the only choice. On the other hand, llvm-objdump prints
operand values in decimal and ignores the --print-imm-hex option for
ELF.

How about a patch to print operands in hex for ELF? Good place to start?

Seem like a good place to start if you want to create a patch that honors the --print-imm-hex option for ELF files.

At one point I had to I hooked up the existing -no-show-raw-insn option to the Mach-O parser code in llvm-objdump to allow me to test its output against darwin’s otool(1). And later even had to add the -no-show-raw-insn option to darwin’s otool(1) so that arm64 code could also be diff’ed.

In talking to Jim Grosbach today, the idea is to first get all the functionality implemented. Then later worry about getting the packaging stuff like the defaults for all the options to match the native tool we are trying to replace.

What about llvm-otool being a symlink to llvm-objdump and get the bin
name as a --flavour default? This way we could have both on all
arches...

cheers,
--renato

I agree, using lld-style command-line option flavors (and getting a
real option parser into llvm-objdump) looks like the right way to me.

Dmitri

Let's skip that. Piecemeal format controls like --print-imm-hex are
too problematic since they pile up quickly and require nitpick checks
in the target's InstPrinter code. I see you've already got at least
three on your command line:

-no-show-raw-insn -full-leading-addr -print-imm-hex ...

Since each target controls it's own operand format, each target
decides how closely to conform to whatever style matters most to them.
This will involve more fine grain formatting issues than we'll want to
control on the command line.

In keeping with the main idea bouncing around, how about a global
style enum with "GNU", "OTOOL", etc. available in MCInstPrinter()?
llvm-objdump can set the global style automatically based on the
binary's container. The user can override the default on the command
line. Targets can check the hint and then do that they know to be
best.

Regards,
-steve

Sure, that sounds good. I just personally don’t see it as very high priority right now since there’s significant functionality still missing. Others, of course, may have different priorities, and I’d be happy to see patches along those lines if anyone is so inclined.

-Jim

Another wrinkle is that hex values are parsed as unsigned. For
example, take this instruction on x86:

83 c0 9c addl $-100, %eax

Keeping to imm8, -100 is a 0x9C in hex. Suppose llvm-objdump
disassembled the instruction this way:

addl $0x9C,%eax

Re-assembling results in a different and wrong instruction.

05 9c 00 00 00 addl $0x9C, %eax

I assume we have a golden rule that reassembling our disassembly
should get back to the same binary.

For reference, GNU objdump prints hex but knows the width of operand
and extends appropriately:

83 c0 9c add $0xffffff9c,%eax
05 9c 00 00 00 add $0x9c,%eax

Unfortunately, the logical operand width doesn't seem to be handy in
the target InstPrinter code. Any ideas how best find the logical
operand width?

Regards,
-steve

Hi Steve,

For this issue it appears llvm’s disassembler uses a negative hex value:

% cat x.s
.byte 0x83, 0xc0, 0x9c
% clang -c x.s
% otool -tvj x.o
x.o:
(__TEXT,__text) section
0000000000000000 83c09c addl $-0x64, %eax

and that assembles to the same:

% cat y.s
addl $-0x64, %eax
% clang -c y.s
% otool -tvj y.o
y.o:
(__TEXT,__text) section
0000000000000000 83c09c addl $-0x64, %eax

The old built in disassembler in otool(1) does this:

% otool -Qtv y.o
y.o:
(__TEXT,__text) section
0000000000000000 addl $0x9c,%eax

So while I’m not a fan of a negative hex value it seams to make sense for the llvm disassembler add allowing the disassembly and assembly to follow the golden rule.

Kev