How can I get the opcode length of an IR instruction in LLVM?

Mohsen_Ahmadvand · February 27, 2017, 3:24pm

I need to get the offset and the exact length of opcode corresponding to a particular LLVM IR instruction in x86 architecture. I believe for this I must hack in backends.

I assume there is a way when the opcodes are being generated in x86 backend to dump their offsets and sizes. However, considering optimizations and translation of one IR instruction to multiple operations, I’m not sure whether the relation between a single IR instruction and its corresponding opcode is maintainable or not.

My questions are:

Is this in general possible?
How to hack the backend to dump the required informations? Is there a generic way to do so, or do I need to hack all backends?

Thanks a lot.

Mohsen

Bruce_Hoult · February 27, 2017, 3:50pm

Not possible, even in theory, because the size of some instructions – such as relative branches – are not known until link time, and that affects the offsets of instructions following them.

TNorthover · February 27, 2017, 3:52pm

Which target's that for? You'd need function-internal relocations for
each BB to make that work, x86 branches only get relaxed at
compile-time as far as I was aware.

Cheers.

Tim.

TNorthover · February 27, 2017, 3:55pm

On 27 February 2017 at 07:24, Mohsen Ahmadvand via llvm-dev <llvm->

Is this in general possible?

Definitely not. Just about every pathology you can imagine could happen:

  * Multiple IR instructions can be combined into a single target
instruction (without any information tracking which instructions it
came from).
  * A single IR instruction can produce multiple target instructions.
  * Some target instructions don't correspond to any IR instruction
(ABI handling, register spills to the stack).
  * Some IR instructions produce no target instructions (unreachable
for example). This might be the easiest to handle.

How to hack the backend to dump the required informations? Is there a
generic way to do so, or do I need to hack all backends?

The size is only really known at the very end of the compilation
pipeline (low-level optimizations like compressing branches can affect
the size and happen last). The functions where it happens are
MCObjectStreamer::EmitInstruction and friends.

So bearing in mind that you'll only ever get an approximation, you
could attach debug-info to the IR pointing back at itself (i.e. debug
info for LLVM IR instead of a higher-level language). You could hack a
check for that during emission and count the bytes that came from any
particular line/inst.

There used to be a pass to add this kind of debug info to IR, but it
bit-rotted and got removed a while back. Should still be in the git
history somewhere though.

Cheers.

Tim.

Bruce_Hoult · February 27, 2017, 4:58pm

Definitely for RISC-V in the gnu linker, which also relaxes function calls that turn out to be within +/-1MB, and I think global/thread local variables that turn out to be in the first 4 KB. Both of those change from sequences of two 32-bit instructions to a single instruction (possibly even a 16 bit instruction).

Most targets could probably benefit from things like, but perhaps don’t bother.

Topic		Replies	Views
Finding Size of X86 instruction in MachineFunctionPass LLVM Dev List Archives	5	149	July 10, 2018
Getting the target information of a branch instruction LLVM Dev List Archives	3	443	July 2, 2007
Find instruction's offset LLVM Dev List Archives	5	130	January 30, 2017
LLVM Pass - Backend or IR LLVM Dev List Archives	0	91	February 21, 2017
incorrect x86 instruction size calculation LLVM Dev List Archives	1	73	December 17, 2009

How can I get the opcode length of an IR instruction in LLVM?

Related Topics