Identifying instructions that definitely access memory

Hello,

I am looking for a way to identify loads, stores and any other kind of instruction that definitely perform memory access and extract the address operand(s), however I was not able to find a cross-architecture API. The closest I stumbled upon are “MCInstrDesc::mayLoad()” and “MCInstrDesc::mayStore()”, but I understand that their results are just a hint, so I would then need to examine the instruction name or opcode in order to find out whether it’s actually a load or store and which operand(s) is (are) memory address(es) and also do so for each architecture separately, which I would really like to avoid.

Is there a way to identify such instructions either by examining them through the disassembler (e.g. “DoesLoad()” | “DoesStore()”) before they execute or right after they perform any kind of memory access?

Thank you very much, in advance! :slightly_smiling_face:

― Vangelis

Hello,

I decided to try once more with a follow-up email, since my previous one got no responses (I hope it’s not considered rude to send more than one message in a row for a particular question).

To sum up and clarify my previous question, what I need is a way to track memory stores and save both the old and the new value of the memory location being modified.

My thinking so far:

  1. Recognize the instructions that definitely access memory before they execute, based on their opcode.
  2. Tell whether each operand is a register or a memory location.
  3. If it’s a memory location, check whether it is a load or store destination.
  4. In case it is a store destination, fetch and save current value from memory.
  5. Execute instruction.
  6. Fetch and save new value from memory.

However, I was not able to find a cross-architecture API that covers all of the conditions above and more specifically Instruction::DoesStore() and Operand::IsStoreDestination().

Last but not least, I should notice that the target is executed in single-step mode, so I do have control right before and after the execution of every instruction.

Thanks, again, in advance! :slightly_smiling_face:

― Vangelis

Hi Vangelis,

Not sure this will help you, but you can try to compare llvm::MachineInstr::getOpcode() with TargetOpcode::G_LOAD and TargetOpcode::G_STORE if you can obtain a MachineInstr instance.

It also may have sense to ask llvm-dev for a proper solution.

Hi Tatyana,

Thank you for your reply! :slightly_smiling_face:

If I understand correctly, TargetOpcode::G_{LOAD, STORE} do not cover x86’s mov instructions (and other relevant instructions of both x86 the rest of supported architectures) and such, which also access memory, however I will look into it more.

Additionally, thank you for the suggestion regarding llvm-dev ―I will forward my email to that list, too.

― Vangelis

[ This question has already been asked in lldb-dev (see attached emails), however it was suggested that I should forward the question to llvm-dev, since it is more relevant to MC Disassembler than LLDB. ]

Hello,

I am looking for a way to track memory stores and save both the old and the new value of the memory location being modified using LLDB, as described below:

1. Recognize the instructions that definitely access memory before they execute, based on their opcode.
2. Tell whether each operand is a register or a memory location.
3. If it’s a memory location, check whether it is a load or store destination.
4. In case it is a store destination, fetch and save current value from memory.
5. Execute instruction.
6. Fetch and save new value from memory.

However, I was not able to find a cross-architecture API that covers all of the conditions above and more specifically Instruction::DoesStore() and Operand::IsStoreDestination().

Last but not least, I should notice that the target is executed in single-step mode, so I do have control right before and after the execution of every instruction.

Thank you very much, in advance! :slightly_smiling_face:

― Vangelis

[ This question has already been asked in lldb-dev (see attached emails), however it was suggested that I should forward the question to llvm-dev, since it is more relevant to MC Disassembler than LLDB. ]

Hello,

I am looking for a way to track memory stores and save both the old and the new value of the memory location being modified using LLDB, as described below:

1. Recognize the instructions that definitely access memory before they execute, based on their opcode.
2. Tell whether each operand is a register or a memory location.
3. If it’s a memory location, check whether it is a load or store destination.
4. In case it is a store destination, fetch and save current value from memory.
5. Execute instruction.
6. Fetch and save new value from memory.

However, I was not able to find a cross-architecture API that covers all of the conditions above and more specifically Instruction::DoesStore() and Operand::IsStoreDestination().

Last but not least, I should notice that the target is executed in single-step mode, so I do have control right before and after the execution of every instruction.

Thank you very much, in advance! :slightly_smiling_face:

― Vangelis

[ This question has already been asked in lldb-dev (see attached emails), however it was suggested that I should forward the question to llvm-dev, since it is more relevant to MC Disassembler than LLDB. ]

Hello,

I am looking for a way to track memory stores and save both the old and the new value of the memory location being modified using LLDB, as described below:

1. Recognize the instructions that definitely access memory before they execute, based on their opcode.

I’m only aware of API’s that report the possibility of storing. For example, MCInstrDesc::mayStore(). Whether an instruction with mayStore() actually does store is target specific and can depend on the exact inputs or the state of the processor or memory at the time. For example, an atomic store might depend on the value of memory or a physical register at the time it executes.

2. Tell whether each operand is a register or a memory location.

MCOperandInfo::OperandType can sometimes tell you this but not all targets use it accurately (many get away with OPERAND_UNKNOWN most of the time) so I don’t know how useful that will be.

3. If it’s a memory location, check whether it is a load or store destination.

This is target specific and depends on the opcode. As far as I know the MC layer doesn’t have API’s to determine this. The MIR had some of this information in the MachineMemoryOperand but that information didn’t know which operand(s) were involved and is discarded when lowering to the MC layer.

4. In case it is a store destination, fetch and save current value from memory.

The MC layer doesn’t know how the address is calculated so it can’t tell LLDB which location to fetch. You’d need to implement something that knew how each instruction calculates the address.

5. Execute instruction.

LLDB would presumably handle this bit by single-stepping.

6. Fetch and save new value from memory.

This is the same as for 4.

Hello Daniel,

Thank you very much for your insightful answer!

Unfortunately, the situation regarding 1 - 4 seems worse (regarding what I’m looking for) than I expected.

It would be really nice if the target backends provided at least the information currently available (e.g. MCOperandInfo::OperandType), since it would be helpful for gaining more insights about machine code and thus allowing for more kinds of new features and plugins, e.g., by tracking and analyzing memory accesses.

Is there any chance you (or anyone on the list) are aware of a disassembly library that provides such information either for at least x86 or ARM (64-bit for both platforms)? :slightly_smiling_face:

― Vangelis