How to get the return address on the stack on LLVM

Hi all,

I want to implement the Xor random canary, so I have to get the return address in the prologue and epilogue of the function.

In the prologue of the function, before I insert into the canary on the stack, I can get the return address by:

ConstantInt* ci = llvm::ConstantInt::get(Type::getInt32Ty(RI->getContext()), 0);
       Value* Args1[] = {ci};
CallInst* callInst = CallInst::Create(Intrinsic::getDeclaration(M, Intrinsic::returnaddress),
                &Args1[0], array_endof(Args1), "Call Return Address", InsPt);

CallInst will get the return address and it works.

While, in the epilogue of the function, due to the canary has been inserted. I write the similar code:

ConstantInt* ci2 = llvm::ConstantInt::get(Type::getInt32Ty(RI->getContext()), 1);
     Value* Args3[] = {ci2};
     CallInst* callInst1 = CallInst::Create(Intrinsic::getDeclaration(M, Intrinsic::returnaddress),
              &Args3[0], array_endof(Args3), "Caaall Return Address", BB);

But it does not work this time. I cannot get the return address.

What is problem? How can I get the return address? Thank you!

Ying

Hello

In the prologue of the function, before I insert into the canary on
the stack, I can get the return address by:

Note that there is no epilogue and prologue at IR level :slight_smile:

But it does not work this time. I cannot get the return address.
What is problem? How can I get the return address? Thank you!

What is the problem? It seems you're getting the return address via
intrinsic call correctly.

Hi Anton,

Thanks for your reply!

I'm now stuck at a problem with the Intrinsic function returnaddress.
I call it twice in the function.

For the first time, it goes to the correct address which storing return address.
And then it saves the content it reads into the local stack.
But when I call returnaddress again, it just reads from the local stack.

How can I solve this problem?

Thank you
Ying

Quoting Anton Korobeynikov <anton@korobeynikov.info> on Tue, 26 Jul 2011 21:12:07 +0400:

Hi all,

I want to implement the Xor random canary, so I have to get the return
address in the prologue and epilogue of the function.

First, two clarifications on the llvm.returnaddress() intrinsic to make sure you understand what your code is doing:

1) If I understand correctly, the llvm.returnaddress instrinsic returns the value of the return address stored on the stack. It does not return the location of the return address within the stack. In other words, the llvm.returnaddress intrinsic can tell you the program counter to which control flow will jump on function return, but it doesn't give you a way to modify the return address.

2) The llvm.returnaddress intrinsic is fragile. Optimizations can prevent it from returning the correct value, especially when you give it a non-zero argument. For example, frame pointer elimination may cause it to return incorrect results.

In the prologue of the function, before I insert into the canary on
the stack, I can get the return address by:

ConstantInt* ci =
llvm::ConstantInt::get(Type::getInt32Ty(RI->getContext()), 0);
        Value* Args1[] = {ci};
CallInst* callInst = CallInst::Create(Intrinsic::getDeclaration(M,
Intrinsic::returnaddress),
                 &Args1[0], array_endof(Args1), "Call Return Address", InsPt);

This generates a call to llvm.returnaddress(0). This returns the program counter of the call site that called the currently active function.

CallInst will get the return address and it works.

While, in the epilogue of the function, due to the canary has been
inserted. I write the similar code:

ConstantInt* ci2 =
llvm::ConstantInt::get(Type::getInt32Ty(RI->getContext()), 1);
      Value* Args3[] = {ci2};
      CallInst* callInst1 =
CallInst::Create(Intrinsic::getDeclaration(M,
Intrinsic::returnaddress),
               &Args3[0], array_endof(Args3), "Caaall Return Address", BB);

This code generates a call to llvm.returnaddress(1). This returns the program counter of the call site that called the function that called the currently active function.

-- John T.

Hi John,

Thanks for your reply!

I'm CC'ing this to the list in case anyone knows why you're seeing this behavior.

Now, I know the different between llvm.returnaddress(0) and llvm.returnaddress(1). I modify the StackPortector.cpp and I just want to get value of the return address stored on the stack.

But when I call llvm.returnaddress(0) twice. For the first time, it goes to the correct address which storing return address.

Fascinating. The first thing to do is to see if one of the LLVM IR optimizations is eliminating the second call to llvm.returnaddress(0). To do that, run your instrumentation pass and then run the -std-compile-opts pass in opt. Disassemble the output to LLVM assembly code and see if there is one or two calls to llvm.returnaddress(0).

If there are still two calls, then it's probably a code generator optimization and not a mid-level optimization that's causing this behavior.

And then I guess it saves the content it reads into the local stack. Because when I call llvm.returnaddress(0) again, it just reads from the local stack instead of reading the return address.

Here is a simple sample code and assembly language below:

int main()
{
   return 0;
}

asm generated:

main: # @main
# BB#0:
    pushl %ebp
    movl %esp, %ebp
    subl $8, %esp
    movl 4(%ebp), %eax //call llvm.returnaddress(0) first
    movl %eax, -4(%ebp)
    movl $0, -8(%ebp)i
    movl -4(%ebp), %ecx //I want to read 4(%ebp) the return
                                       //address again, but the compiler save
                                       //save it as a temporary variable instead
                                       //of reading the return address again
    cmpl %ecx, %eax
    jne .LBB0_2

How can I solve this problem?

If it's an optimization that's causing the problem, you'll have to figure out which optimization it is and disable it.

That said, I think implementing a stack canary system at the LLVM IR level is the wrong way to go. As I've suggested before, you should look into writing a pass that modifies the prologue/epilogue code at the level of MachineInstructions (i.e., you should write a MachineFunctionPass instead of a FunctionPass or ModulePass). A MachineFunctionPass should allow you to directly control code generation and give you a reliable way of fetching and examining the function return address and inserting the stack canary.

-- John T.