Symbolic information in disassembler output

Tatyana_Krasnukha · August 31, 2017, 5:42pm

Hello,

As I understand it, old disassembler (based on libedis) could print symbolic information instead/beside address operand of an instruction. And it looks like there is not such ability in disassembler now. Is this responsibility shifted on some other component of lldb? Or it was considered as useless and was removed at all?

Thanks,

Tatyana

jingham · August 31, 2017, 6:08pm

I don't think anybody thought of it as useless. It's one of the things Jason has been trying to find time to do for a while now.

InstructionLLVMC is the main concrete Instruction subclass in llvm now, and relies on MCInst for most of the heavy lifting. So we should work with MCInst to figure out how to do this.

Jim

clayborg · August 31, 2017, 6:09pm

I believe libedis was deprecated many years ago and hasn't returned. We use the standard LLVM disassembler, so any features need to be built into llvm::MCInst.

Sean_Callanan · August 31, 2017, 6:35pm

Greg is right that this was a libedis feature and has no equivalent in LLDB today.

MCInst, however, doesn’t have enough information by itself to do this.Â The reason is that for many things that are considered “operands,” the MCInst has several underlying operands.Â For example, an operand that was expressed as a register + an offset would be represented in MCInst as a register operand and in immediate operand, and only correlating the opcode with the LLVM instruction tables (and possibly some special knowledge) would tell you that the two belong together.

Additionally, libedis could express the semantics of the instruction operands (e.g., “this is a source operand and represents the result of dereferncing rbp - 4”) , and inform the client what ranges of characters in the generated string represented each high level operand.Â Both of these features are not exposed anywhere at the moment, and in fact the underlying knowledge was lost when the edis TableGen backend was deprecated.

There are a few LLDB features that reads instructions and attempt to interpret them:

The fast unwinder looks for specific bit patterns (see UnwindAssembly_x86::GetFastUnwindPlan in UnwindAssembly-x86.cpp);
The ARM instruction emulator has its own home-grown instruction table (see EmulateInstructionARM64.cpp); and
The crash diagnose functionality actually parses the output strings from the disassembler (see DoGuessValueAt in StackFrame.cpp).

Sean

Tatyana_Krasnukha · August 31, 2017, 7:12pm

I got it. I hoped that this work was just removed in other library, despite I didn’t find something like that anywhere in lldb. Also I supposed there were some certain reason to remove it. But since it is just not implemented yet, I have no questions more) Thank you all for explanation!

Due to complicated instruction formats it is very undesired for me to implement instruction tables again. DoGuessValueAt in StackFrame looks like what I need, thanks for this hint!

Topic		Replies	Views
LLVM Disassembler question LLVM Dev List Archives	1	63	August 26, 2013
Translation between MCInst and Binary Executable LLVM Dev List Archives	2	66	September 18, 2013
Getting MCInst "ins" and "outs" LLVM Dev List Archives	6	80	January 2, 2013
Symbolizing relocations in disassembly LLVM Dev List Archives	1	111	January 28, 2022
MC Disassembler LLVM Dev List Archives	1	72	September 8, 2014

Symbolic information in disassembler output

Related Topics