How can I get the opcode length of an IR instruction in LLVM?

I need to get the offset and the exact length of opcode corresponding to a particular LLVM IR instruction in x86 architecture. I believe for this I must hack in backends.

I assume there is a way when the opcodes are being generated in x86 backend to dump their offsets and sizes. However, considering optimizations and translation of one IR instruction to multiple operations, I’m not sure whether the relation between a single IR instruction and its corresponding opcode is maintainable or not.

My questions are:

  • Is this in general possible?
  • How to hack the backend to dump the required informations? Is there a generic way to do so, or do I need to hack all backends?

Thanks a lot.


Not possible, even in theory, because the size of some instructions – such as relative branches – are not known until link time, and that affects the offsets of instructions following them.

Which target's that for? You'd need function-internal relocations for
each BB to make that work, x86 branches only get relaxed at
compile-time as far as I was aware.



On 27 February 2017 at 07:24, Mohsen Ahmadvand via llvm-dev <llvm->

Is this in general possible?

Definitely not. Just about every pathology you can imagine could happen:

  * Multiple IR instructions can be combined into a single target
instruction (without any information tracking which instructions it
came from).
  * A single IR instruction can produce multiple target instructions.
  * Some target instructions don't correspond to any IR instruction
(ABI handling, register spills to the stack).
  * Some IR instructions produce no target instructions (unreachable
for example). This might be the easiest to handle.

How to hack the backend to dump the required informations? Is there a
generic way to do so, or do I need to hack all backends?

The size is only really known at the very end of the compilation
pipeline (low-level optimizations like compressing branches can affect
the size and happen last). The functions where it happens are
MCObjectStreamer::EmitInstruction and friends.

So bearing in mind that you'll only ever get an approximation, you
could attach debug-info to the IR pointing back at itself (i.e. debug
info for LLVM IR instead of a higher-level language). You could hack a
check for that during emission and count the bytes that came from any
particular line/inst.

There used to be a pass to add this kind of debug info to IR, but it
bit-rotted and got removed a while back. Should still be in the git
history somewhere though.



Definitely for RISC-V in the gnu linker, which also relaxes function calls that turn out to be within +/-1MB, and I think global/thread local variables that turn out to be in the first 4 KB. Both of those change from sequences of two 32-bit instructions to a single instruction (possibly even a 16 bit instruction).

Most targets could probably benefit from things like, but perhaps don’t bother.