How to get return address at llvm ir level?

Hi

I want to write a FunctionPass to insert some code before return.

Funcion:



mov eax,[esp]
cmp eax,0x12345678
je 0x12345678
ret
(maybe stack will not balance)

I wonder that can I get the return address at llvm ir level?

I use IRBuilder to CreateICmpEQ and CreateCondBr.

but I don’t how to get the value of return addrss.

I have found there is a Intrinsic::returnaddress.

Is Intrinsic::returnaddress can help me?

I don’t konw how to use Intrinsic::returnaddress because few files use this intrinsic.

Thanks

Judging from the documentation, I’d say yes.
You can use intrinsics by importing them first using Intrinsics::getType and Intrinsics::lookupLLVMIntrinsicsByName I think. Then just call them like any other function. You might need to take special care with inline attributes and whatnot though

Zhang

Hi

I want to write a FunctionPass to insert some code before return.

Funcion:



mov eax,[esp]
cmp eax,0x12345678
je 0x12345678
ret
(maybe stack will not balance)

I wonder that can I get the return address at llvm ir level?

Short answer: You can’t

Long answer: LLVM IR is a high level description, in comparison to assembly code, of a program.
It “tries" to describe programs in a platform-independent fashion. However, return address, even how a function pass its return data, is really platform and architecture specific.

Nevertheless, your goal can still be achieved easily in LLVM. In your cases, just iterates over all the instructions in a Function, find the return instructions(i.e. those who are ReturnInst class instances), and insert the desired things before them.
Of course, taking care of the data dependencies and/or control dependencies when you insert instructions

I use IRBuilder to CreateICmpEQ and CreateCondBr.

This combination is for branches between BasicBlocks, not function returns.

but I don’t how to get the value of return addrss.

I have found there is a Intrinsic::returnaddress.

Is Intrinsic::returnaddress can help me?

Intrinsic functions in LLVM are usually used for special purposes. I’m certainly sure that this Intrinsic::returnaddress is not what you want. Since it won’t be generated by normal compiler frontend, and it would only place a function call in IR without invoking it and gives you return address when you’re doing code optimizations

I don’t konw how to use Intrinsic::returnaddress because few files use this intrinsic.

LLVM IR is neither (machine) assembly code nor something with the same role as assembly code - It’s a representation for, and should only be used for compiler optimizations.
My suggestion is to read the official documents about what is LLVM IR. It’s a nice introduction and it’s not long.

Best,
Bekket

To my knowledge that intrinsic IS generated by frontends like Clang when using _builtin_return_address(), i could be wrong though

Zhang

Correct…you can always call that intrinsic explicitly. However, I don’t think it would help the original question, since the question is how to get the return address inside a LLVM Pass, instead of getting the return address after executing the program with Intrinsic::returnaddress inside.
Also, executing a program containing Intrinsic::returnaddress won’t get you anything - even failed to pass the linking stage - without special support, since the intrinsic is just a function declaration.

Bests,
Bekket

Thanks for your reply.

What I want to do is that check the return address at every return site (for some security issue) .

(I will also do some analysis to get some candidate return target)

So the “ret” instruction will be transformed to

mov eax,[esp] // get the return address at the top of stack
cmp eax,0x08040000 // candidate 1
je 0x08040000
cmp eax,0x08040004 // candidate 2
je 0x08040004
cmp eax,0x08040008 // candidate 3
je 0x08040008

So if I want to do this transform at llvm ir level rather than backend,

I need to get the return address of current function in FunctionPass, right?

I found that intrinisc::returnaddress only returns a *void pointer.

c code:

int main(){
int a = __builtin_return_address(0);
}

llvm ir:

define i32 @main() #0 {
entry:
%a = alloca i32, align 4
%0 = call i8* @llvm.returnaddress(i32 0)
%1 = ptrtoint i8* %0 to i32
store i32 %1, i32* %a, align 4
ret i32 0
}

Can I use the return value of intrinisc::returnaddress to compare with “Function” class in llvm ir?

(Otherwise, I need to modify backend to do my intrumentation.)

Thanks

Bekket McClane <bekket.mcclane@gmail.com> 於 2018年9月5日 週三 下午9:41寫道:

Hi,

If this is mainly x86, you can use @llvm.addressofreturnaddress to get what you need, I think. It says it only works on that platform, but it kind of does exactly what you want. I think the Value of Function is the address of the function when used in that manner, so it can be stored in a constant and used. Similarly if you narrow the blocks containing calls to the function you’re modifying down to the call and a branch to the rest of the block (and have a reasonable idea of how parameters will be pushed onto the stack before the call), you can take the address of the block the CallInst is in and come up with a very narrow range the return address is allowed to point to. For fixed length instruction sets you should be able to do it almost perfectly with just a target triple. Doing it with the block addresses as global constants and performing similar constant calculations on them will generate them as label references with the calculations added on in the assembly and constants + relocations the output files. That’s probably easier than basing things on the functions. Blocks with their addresses taken and the address used shouldn’t be removed by any optimization passes, so it sort of follows that the constant / block won’t be removed unless both the callee and caller are dead and can be eliminated.

Otherwise @llvm.stacksave prior to allocas / @llvm.stackrestore at end will get rid of local allocs (and maybe the frame pointer save?) so potentially you only have to deal with a saved frame pointer on ARM / MIPS / etc. On those platforms you won’t need to do this unless the function saves LR or equivalent to the stack, but since this is usually only done on functions that call other functions it shouldn’t be hard to determine.

@llvm.localescape (after allocas) and @llvm.localrecover might be useful as well. The docs mention a localaddress intrinsic as well in the section on those two things, but nowhere else.

I’m pretty sure this would be the way to start, from the docs at least. I haven’t tried this and haven’t needed to play with this sort of thing in a while now.

Cheers,
-G