Preventing function call from being optimized out in LTO

Hello,

I am adding function calls to an LLVM link-time optimization (LTO) pass, using the IRBuilder::CreateCall method. I want these calls to remain in the final x86 binary at any optimization level, but on levels -O2 and -O3, some of these calls are being optimized out.

So far, I’ve tried adding each function in the program (excluding LLVM intrinsics) to the llvm.used set, and I’ve also set noinline and optnone attributes on each function in the program. This has allowed me to retain most, but not all, of the calls I’ve added with IRBuilder::CreateCall.

Furthermore, I have confirmed that all the calls I’ve created are present in the LLVM IR that results immediately after my pass. Thus, I know some future LTO pass is optimizing out some of these calls.

How can I ensure that none of the calls I add are optimized out? Thanks for your help!

Best,
Shishir Jessu

optnone on such functions should suffice - well, unless the calls turn out to be dead & I don’t think there’s anything you can do to thwart dead code removal. So what are you trying to preserve the function calls for?

Hi David,

By “dead” do you mean unreachable? My understanding was that the removal of dead code is simply another optimization, which should be disabled after adding “optnone” (and adding the function to llvm.used so the function doesn’t later get deleted entirely).

I am instrumenting certain basic blocks in an LLVM pass, and would like to compile a binary which structures things the same way the LLVM pass does, to analyze some behavior. I observe that my calls are not removed on -O0 and -O1 for several programs, so it should be the case that the calls are removed on higher optimization levels - but then adding “optnone” isn’t protecting all of those calls from being removed.

If you have any more insight I’d appreciate it! Thanks for your help.

Hi David,

By “dead” do you mean unreachable?

Yep!

My understanding was that the removal of dead code is simply another optimization,

It is, yes.

which should be disabled after adding “optnone”

Nah - optnone just means “don’t optimize this function” (nor do interprocedural analysis to optimize the call site based on details of the implementation of this function - treat it as though this function were in another translation unit/module that the compiler has no visibility into) - but without any interprocedural analysis, nor any optimization of the body of the function, a call to an optnone function can still be removed if the call is dead.

(and adding the function to llvm.used so the function doesn’t later get deleted entirely).

llvm.used is there to preserve functions that don’t otherwise need to exist - it sounds to me like this isn’t a useful tool for your situation - if you manage to preserve the call then the function will be retained (unless it’s defined as “available externally” - then the definition of the funcion could be removed, leaving only a declaration). It sounds like you need to preserve the call - so if you succeed at that, then the function body/definition should be retained without the need for llvm.used.

I am instrumenting certain basic blocks in an LLVM pass, and would like to compile a binary which structures things the same way the LLVM pass does, to analyze some behavior.

Hmm, can’t quite picture that from what you’ve described, sorry.

I observe that my calls are not removed on -O0 and -O1 for several programs, so it should be the case that the calls are removed on higher optimization levels - but then adding “optnone” isn’t protecting all of those calls from being removed.

Any number of optimizations might make more code provably dead & then removed, losing your calls.

  • Dave

Hi David,

Thanks for your detailed comments.

but without any interprocedural analysis, nor any optimization of the body of the function, a call to an optnone function can still be removed if the call is dead.

I am not concerned with calls to an optnone function being removed, but rather call-sites within an optnone function being removed. But although I place "optnone" on a given function F, dead code in F still sometimes seems to be potentially removed. Is there any way to prevent this from happening at the linking stage (since the bitcode emitted by clang still includes all my calls)?

Oh, sorry - my misunderstanding. Could you provide an example of that
happening? (a specific optnone function IR and a specific 'opt'
command that optimizes that IR and modifies the optnone function in an
unexpected/undesirable way) - that could be a bug.

- Dave

Hi David,

Sure! Here’s a function in sqlite3 called verifyDbFile, compiled with -O3. This is what it looks like when the intermediate bitcode is emitted by clang: link

And here’s what happens after I run opt -O3 (no additional command-line arguments) on the file containing this function: link.

I’m not 100% sure, but it seems like some pass within opt determines that a particular trap is guaranteed to occur on line 22, and so optimizes out the rest of the function, even though the pre-opt version of the function has the “optnone” attribute.

Does this seem like intended behavior or some sort of bug? Is there any way to disable this behavior? Thanks!

Best,
Shishir

Hi David,

Sure! Here’s a function in sqlite3 called verifyDbFile, compiled with -O3. This is what it looks like when the intermediate bitcode is emitted by clang: link

And here’s what happens after I run opt -O3 (no additional command-line arguments) on the file containing this function: link.

I’m not 100% sure, but it seems like some pass within opt determines that a particular trap is guaranteed to occur on line 22, and so optimizes out the rest of the function, even though the pre-opt version of the function has the “optnone” attribute.

Does this seem like intended behavior or some sort of bug? Is there any way to disable this behavior? Thanks!

Ah, OK, so the call is unconditionally dead based on local reasoning within the function - not sure if that qualifies as a bug in the optnone implementation. Even at -O0, we do some optimizations - just designed to be benign ones, from a program behavior/debugging perspective, I suppose.

+Paul Robinson who had some hand in the implementation of optnone, to see how he feels about it/whether he reckons removing dead code at -O0/optnone is desirable or not.

But otherwise, if you can make the code not dead, that would thwart any optimizations - eg: loading from a volatile variable (that could always be false in reality) to determine the condition to branch to the code you want to preserve.

  • David

I don’t know about calling the internal IR stuff directly - but when I ran into this in C code I added attribute((used)) to the function to not have it be optimized out.

Right, -O0/optnone is not actually completely free of optimization; the goal I believe is to compile quickly, producing code that is easy to debug. In this case, br i1 false can be cheaply transformed into an unconditional branch, which makes the cont: block have no predecessors, and pruning an entire block with no predecessors is also cheap (certainly cheaper than generating code for all those blocks). This cascades through the successors to cont:.

Looking at why that conditional branch has a false test would be the most productive path, I think.

–paulr

Hello,

I am adding function calls to an LLVM link-time optimization (LTO) pass,
using the IRBuilder::CreateCall method. I want these calls to remain in the
final x86 binary at any optimization level, but on levels -O2 and -O3, some
of these calls are being optimized out.

So far, I've tried adding each function in the program (excluding LLVM
intrinsics) to the llvm.used set, and I've also set noinline and optnone
attributes on each function in the program. This has allowed me to
retain *most,
*but not all, of the calls I've added with IRBuilder::CreateCall.

Furthermore, I have confirmed that all the calls I've created are present
in the LLVM IR that results immediately after my pass. Thus, I know some
future LTO pass is optimizing out some of these calls.

How can I ensure that none of the calls I add are optimized out? Thanks for
your help!

The answer, in short, is:

Don't run any optimizations/normalizations or, better, only place calls where they are not statically dead.

Basically, if you want optimization, O2/O3, you get them. We even have "reaosnable"

way to tell optimizations to ignore some part of the code, e.g., optnone, but even then,

some trivial things will be removed (think `if (false) foo();`). That said, you can

make the code "conditionally life" to avoid this. There are many ways, one would be

something like:

Before:


int foo(int a) {

   if (a > 3)

     return a-1;

   return 0;

}

After:

```

int return_zero() __attribute__((pure)) // linked in as object file

int foo(int a) {

if \(return\_zero\(\) \!= 0\)

  goto calls;

if \(a > 3\) \{

call1:

   my\_new\_call\(1, a\);

   return a\-1;

\}

call2:

my\_new\_call\(2, a\);

return 0;

calls:

switch \(return\_zero\(\)\) \{

case 1: goto call1;

case 2: goto call2;

default:

        goto call2;

\};

}

Now even if you inline the above for a call site like

`foo(0)`, the `my_new_call(1, a)` call site will still

be considered life, thus not be removed.

Hope this helps.

Hi David,

Thanks for your help! I was able to solve my problem by changing the way this IR was written to change the “false” test to something else.

Best,
Shishir

Hi Tobias,

Thanks for your response! I was doing this as well, but it turns out adding attribute((used)) prevents the function itself from being removed, but does not prevent all calls to the function from being removed, which is what I was after. I fixed it by changing some IR that was resulting in unconditionally dead branches (“br i1 false”).

Best,
Shishir

Hi Paul,

Thanks for your help! I was able to rewrite that section of IR so it didn’t produce a ‘false’ test, which indeed fixed the problem.

Best,
Shishir