adding prefixes to certain instructions x86 -- where to start?

Hello,

I’ve been using LLVM IR passes for my research for about a year now, but for my next step I think I might have to dig into a backend. I'm hoping someone could give me a pointer on how to get started.

The thing I would like to do is add and override address-size override prefix [1] to a given x86-64 instruction. I’m hoping I can do something like:

1) Mark some IR instructions with metadata in my pass
2) Hack the backend to look for my metadata, and if found add the prefix when the machine instruction is emitted

Does this seem feasible? Does the LLVM x86 backend currently have the capability of adding instruction prefixes and could someone please point out where I should look in the code for it?

Thanks,
Scott A. Carr
PhD Student
Purdue University CS

[1] X86-64 Instruction Encoding - OSDev Wiki

What is it you are ACTUALLY trying to do?

In other words, why would you want a different address size… Understanding that would probably help provide a better answer (I have absolutely no idea how to solve the actual question, but I suspect understanding what the overall goal is will help a whole lot)

I’m trying to make a security sandbox. For example, lets say my program has a LoadInst in the LLVM IR and I know I want to confine the address range this LoadInst is accessing. Maybe that LoadInst gets emitted as a MOV machine code instruction by the backend. During execution an attacker could potentially control the operands of the MOV instruction through some exploit, but usually he cannot modify the instructions/prefixes because the code is not writable. So the prefix can potentially let me confine the attacker to an address range even if he controls the instruction operands.

I hope that makes some sense. If someone knows of a different approach – a very light weight sandbox implemented in LLVM I’d be interested looking into it.

Thanks,
Scott

So, you already have a method to guarantee that the address is in the low 2GB or the top 2GB of a 64-bit address range [in other words, a 32-bit address is guaranteed to work]?

Generally, I’d try to avoid such a solution, as it’s very hard to control, and the MMU is a much better tool to control this - if you want further control than that, run your code in a virtual machine, and use the VMM to ensure access to rest of world and sensitive memory. If the user-code has access to alter memory mappings in the MMU, then the user-code can also map any sensitive information into the 2GB that you can access with a 32-bit address, so I don’t really see how this helps security in any way shape or form [and you certainly must not use this for any memory addresses from the stack in Linux, since that is located at high end of the positive range of the 48 bits of address space that current x86-64 processors support - unless you plan on hacking the kernel to change that too - which then sounds like you really just want -m32, but I doubt that makes anything safer as such].

And if you still really think this is what you want to do, perhaps using a special address space is the easiest way to “annotate the instruction”. With the benefit that you get some help from within LLVM if you try to mix restricted and unrestricted pointers without the appropriate address space cast.