[Machine IR] Analyzing Assembly Source Code in MIR passes

Lele_Ma · November 21, 2019, 2:36am

Dear LLVM developers,

My goal is to write LLVM Machine IR (MIR) passes to analyze the assembly source code. But it seems I need to find a way to translate the handwritten assembly code into MIR format first.

Is there any materials, or any modules in LLVM source code, that can help to translate assembly code into LLVM MIR for analysis?

Or is there any easier ways to analyze assembly code in MIR passes without translating it?

Best Regards,
Lele Ma

nhaehnle · November 25, 2019, 1:19pm

My goal is to write LLVM Machine IR (MIR) passes to analyze the assembly source code. But it seems I need to find a way to translate the handwritten assembly code into MIR format first.

Is there any materials, or any modules in LLVM source code, that can help to translate assembly code into LLVM MIR for analysis?

Or is there any easier ways to analyze assembly code in MIR passes without translating it?

MachineIR is designed for code generation, not for general assembly
representation. MIR is even not necessarily able to represent all
assembly instructions that a target's hardware supports. The
disassembler produces MCInsts, and if you wanted to go from there back
to MachineIR, you'd have to write your own target-specific code to do
so.

Cheers,
Nicolai

aaronsm · November 25, 2019, 2:15pm

Llvm-mctoll will raise a binary back to LLVM IR.
Not exactly what you want but it might be something you can leverage.

https://github.com/microsoft/llvm-mctoll

Lele_Ma · November 25, 2019, 10:24pm

Thank you for the instructions, Aaron and Nicolai!

Raising a binary to LLVM IR, or raising to MIR is a reasonable solution for me. However, given Nicolai’s information that not all target-specific instructions are representable in MIR, I got two questions that need your help:

Why MIR does not necessarily represent all target specific instructions for certain hardware? If someone added those support, will this violate some design principles of MIR?
Instead of IR/MIR raising, I am wondering whether a third path is possible to solve the problem of analyzing assembly code:
- write simple LLVM pass in the MC layer to process information not available in MIR/IR and
- passing analysis result from IR/MIR pass to the MC layer pass where we can enhance the result with missing representations.
So the second question is whether it is possible to write passes directly in the MC layer? If so, is there any documentation or example for that?

Thank you in advance!

Best Regards,
Lele

Lele_Ma · November 27, 2019, 5:50am

Hi All,

A self-follow up and rephrase of my previous question with updated subject:

What I want to do is to analyze hand-written assembly code with ‘full details’ where semantics of each instruction can be known in LLVM passes. Many of such instructions have no corresponding counterparts in IR/MIR forms, such as ‘syscall’ ‘iret’, etc. At MC level, such assembly code can be translated to MCInst easily since this level is closest to the assembly code. Therefore, I am thinking to write a pass at MC level instead of IR/MIR.

However, when I am searching to learn the MC level passes, I cannot find any related classes in LLVM infrastructure (such as FunctionPass at IR level; MachineFunctionPass at MIR pass). Could anyone direct me where I should start to write a MC level pass?

Best Regards,
Lele

aaronsm · November 27, 2019, 7:00am

The MC layer doesn’t have passes. There is a method called emitIntruction() which is called one by one to create the MCInst.

In the past I have accomplished what you’d like by overloading the methods in ObjectStreamer to buffer all the MCInst for a function. Then doing analysis on the buffered instructions.

Here’s a link about how instructions are lowered which might shed some light on how all this works.

https://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm

Lele_Ma · November 28, 2019, 1:50am

Thank you so much! That is very helpful.

Best,
Lele

Topic		Replies	Views
LLVM MIR passes Clang Frontend	3	737	July 8, 2021
LLVM for binary analysis LLVM Dev List Archives	1	98	October 17, 2011
Machine intermediate representation Beginners	5	701	January 21, 2023
Target-independent machine-code analysis pass on LLVM LLVM Dev List Archives	0	143	April 25, 2015
Target-independent machine-code analysis pass on LLVM LLVM Dev List Archives	1	91	April 27, 2015

[Machine IR] Analyzing Assembly Source Code in MIR passes

Related Topics