[RFC] Diagnostics support in Disassembler

Hi everyone,

Currently, there is no convenient way to emit diagnostics in a disassembler. For instance, a disassembler for a specific backend has no way of reporting information in-sync with the instruction it correponds to because an output stream is created, formatted and flushed before every instruction is printed; all of which is handled by the MCDisassembler.

I propose something very similar to clang Diagnostics for the Disassembler.

  1. Define class MCDiagnostic in MCDisassembler.cpp, to package and emit diagnostics.
  2. Modify getInstruction(..) to return std::pair<DecodeStatus, MCDiagnostic>. This will require a change in
DecodeStatus MCDisassembler::getInstruction(MCInst &Instr, uint64_t &Size,
                                      ArrayRef<uint8_t> Bytes, uint64_t Address,
                                      raw_ostream &CStream) const;

which all disassemblers implement.

  1. Modify printInst(..) to accept an MCDiagnostic. This will require a change in
void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
                         const MCSubtargetInfo &STI, MCDiagnostic *Diag,
                         raw_ostream &OS)

Any feedback will be appreciated!

Cheers,
Anshil

I like the idea, the Arm disassembler currently uses SoftFail for a lot of cases, which can only print a very generic error message.

What do you propose the MCDiagnostic class will contain? Will this be a string containing the actual diagnostic, or an identifier (and possibly instance-specific data) which can be rendered into a string later?

I’m not sure it makes sense to make printInst also responsible for printing the diagnostic. Users of MCDisassembler will probably want to treat the diagnostics differently to the instructions, for example sending them to stderr instead of stdout, or formatting the differently. Instead, I’d have a function of MCDiagnostic to get the diagnostic as a string, and let the calling code decide what to do with it.

What do you think about allowing multiple diagnostics for one instruction? This is currently done by the assembly parser using a callback function, though that uses llvm::SourceMgr, which might not be appropriate for binary input to the disassembler. Multiple diagnostics would be useful for Arm, where some instructions have register fields that are not allowed to be SP or PC, and there can be multiple such fields in an instruction.

I didn’t think of multiple diagnostics per instruction. Originally, MCDiagnostic contains the kind of diagnostic (error or warning) and a string storing the message itself. I imagine this pair of information will be required for every diagnostic emitted for an instruction so we would need to process SmallVector<MCDiagnostic> for each instruction.

If not printInst, I think size_t LLVMDisasmInstruction(..) might be a convenient place to emit diagnostics. It may be called immediately prior to printing the instruction. Alternatively, a virtual function may be defined in MCDisassembler which can be defined by the targets the way they want to. That function may then be called in size_t LLVMDisasmInstruction(..) which can take care of the streaming.

Currently, I am able to emit the warning as follows as accepted by FileCheck:

# CHECK: Warning: Non-exec destination operand

# CHECK: v_cmpx_f_f64_e64 exec, v[1:2]
0x7e,0x00,0x30,0xd4,0x7e,0x02,0x02,0x00

# CHECK: v_cmpx_f_f64_e64 s[0:1], v[1:2]
0x7e,0x00,0x30,0xd4,0x00,0x02,0x02,0x00

The challenge here is that the warning corresponds to the second instruction, and not the first. Emitting the warning anywhere besides printInst generates the output as above. Emitting from printInst generates the following input accepted by FileCheck:

# CHECK: v_cmpx_f_f64_e64 exec, v[1:2]
0x7e,0x00,0x30,0xd4,0x7e,0x02,0x02,0x00

# CHECK: Warning: Non-exec destination operand
# CHECK: v_cmpx_f_f64_e64 s[0:1], v[1:2]
0x7e,0x00,0x30,0xd4,0x00,0x02,0x02,0x00