relocation visitor

Currently llvm-dwarfdump isn’t very useful on ELF .o files because it doesn’t apply relocations.

nlewycky@ducttape:~$ llvm-dwarfdump helloworld.o | grep debug_str\[
0x0000000c: DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000012: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000022: DW_AT_comp_dir [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000027: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000044: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000052: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000061: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000068: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)

The attached patch fixes the problem for ELF X86-64 files:

nlewycky@ducttape:~$ llvm-dwarfdump helloworld.o | grep debug_str\[
0x0000000c: DW_AT_producer [DW_FORM_strp] ( .debug_str[0x00000000] = “clang version 3.2 (trunk 163034)”)
0x00000012: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000021] = “helloworld.c”)
0x00000022: DW_AT_comp_dir [DW_FORM_strp] ( .debug_str[0x0000002e] = “/home/nlewycky”)
0x00000027: DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000003d] = “main”)
0x00000044: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000046] = “argc”)
0x00000052: DW_AT_name [DW_FORM_strp] ( .debug_str[0x0000004b] = “argv”)
0x00000061: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000042] = “int”)
0x00000068: DW_AT_name [DW_FORM_strp] ( .debug_str[0x00000050] = “char”)

but I’m not asking you to review the patch, that’s why I’m posting to llvm-dev. I’m asking for design review.

This patch introduces “RelocationVisitor”, a new templated class that tries to safe the user of relocations some work by breaking down relocations into their basic mathematical operations. If you look at the ELF x86-64 spec, you’ll note that it defines things like:
A = the addend field of the relocation
B = base address which the shared object has been loaded into
P = address of the storage being relocated
S = value of the symbol whose index resides in the relocation entry
and then the relocations are
R_X86_64_64: S + A
R_X86_64_PC32: S + A - P
so my thinking was that a client class could define functions that provide values for A,B,P,S,etc., and then we break down visiting a relocations down to calls to those functions. Those which really can’t be implemented generically (IRELATIVE?) should be pure virtual in the RelocationVisitor and force the subclasses to implement their own.

The visitor could then build up an exact integer if it knows all the relevant facts (has a concrete integer for the address of symbols, etc.), or it could build up an MCExpr, or it could build up a ConstantExpr, etc.

The attached patch shows the concept with just 5 simple relocations from ELF X86-64, but I’ve been looking at MachO and other ELF architectures, and I think the concept will pan out. I’m a bit out of water here, so I’d appreciate any other file format experts taking a look and letting me know what you think.

Nick

reloc-visitor-design.patch (7.6 KB)

I like the main concept of a visitor using CRTP and how generic it is.
It does seem to solve a lot of my concerns about the generality of
handling relocations in all the cases we have. One thing to improve on
this would be to make access to RelocInfo be via type traits instead
of directly. There are cases where we can use the raw relocation info
instead of the generic RelocationRef interface.

I do have some concerns about how to handle cases that need more
information than RelocInfo can provide:

* Rel relocations where the addend is read from where the relocation
is to be applied.
* Symbol relative relocations.
* Mach-O scattered relocations.
* TLS/PLT/GOT rel.
* Mach-O paired relocations. In this case the visitor decides how many
relocation entries to consume.

I'm currently thinking that these cases should be handled as callbacks.

It would be nice if this interface could be used to augment MCInst
printing with relocation data. This would allow llvm-objdump -d to
print:

call printf

Instead of:

call 0

I've cced Jim and Owen to get some input on this use case.

- Michael Spencer