I’m writing a system that does analysis on x86 machine code in the LLVM backend (i.e., MachineFunctionPasses). Part of this involves data-flow analysis (reaching definitions, to be exact) on machine instructions, handling data flow through both registers and memory locations. This analysis provides an interface whereby I can query it to determine the set of definitions that reach a particular register use-operand (MachineOperand) of a machine instruction, or a memory load operand (MachineMemOperand).
Given this, my goal is to determine whether each variable input to an instruction - whether a register use or a memory load - is reached by some definition in a particular (known) set. However, the relationship between MachineOperands and MachineMemOperands complicates this.
Whenever a machine instruction does a memory load/store, it has both:
- A MachineMemOperand, which specifies the details of the load/store at a high level; and
- A sequence of register and immediate MachineOperands, which represent the low-level encoding of the memory address in the instruction. For x86, this sequence consists of five operands, specifying the base register, scale constant, index register, offset constant, and segment register respectively. (In cases where the full 5-part addressing mode is not needed, some of the registers can be set to %noreg and the constants to identity values, e.g. scale=1 and offset=0. This convention is detailed in the code generator documentation.)
The problem I’m having is that there’s no way to tell from the MachineOperands themselves whether they were generated as part of a memory address specification sequence, or as a “real” register use that provides a value to be computed on by the instruction. Thus when I go to query my reaching-definitions interface, I don’t know which register operands I should be querying as registers and which I should be skipping to instead query as memory accesses (i.e., via their MachineMemOperands). Although it’s certainly valid to ask the question “which definitions reach this register operand” when the operand is part of an address specification, it’s not particularly useful - I’m interested in the flow of data in the logical computation, not “the value of RBP used in this stack-frame-relative load was defined in the ‘mov %rsp, %rbp’ instruction at the beginning of the function”. Hence why I want to skip these and instead look at the respective MachineMemOperands.
So, my question is: is there any good way to identify whether a MachineOperand was generated as part of a memory-addressing sequence?
I looked through the MachineOperand and MachineMemOperand Doxygen trying to find some link between the two, but to no avail. I also read through a lot of the CodeGen and X86 backend code learning how these operand sequences are generated, but I didn’t see a single place where this consistently happens that I could (for instance) modify to note which operands are generated this way. As a last resort I could try to guess which operands are the memory addressing sequence by position (e.g., for stores, the memory-addressing operands seem to always come first), but I would really prefer not to do that because there are so many memory instructions in x86 that it would be a lot of work to comprehensively account for all of them.