Which pass converts call printf to puts?

I found that LLVM optimized the IR by replacing printf with puts. I wondered which pass did this optimization? And is it common that puts is faster (and some other metric) than printf?

I think it is called simplifylibcalls or something like that.

As mentioned already, it is part of the library simplification pass.
puts is faster than printf, because it doesn't have to parse the input
string, it only needs to compute the size.

Joerg

Hi Thomson,

I found that LLVM optimized the IR by replacing printf with puts. I
wondered which pass did this optimization? And is it common that puts
is faster (and some other metric) than printf?

yes, it's safe to assume that puts() is always faster than printf()
because it doesn't have to perform all the format-string parsing.

Also, the implementation of puts() is a lot smaller - which might matter
if the binary is linked statically and code size is important.

Best regards,
Christoph

Hi Thomson,

the new call to puts() is inserted right away, whereas the old call to
printf() is removed a bit later in SimplifyLibCalls::runOnFunction(). If
you browse the code a bit and backtrack the call stack to see what
happens with the return value of PrintFOpt::OptimizeFixedFormatString(),
you will stumble upon this segment in SimplifyLibCalls.cpp:1703ff.:

// Try to optimize this call.
Value *Result = LCO->OptimizeCall(CI, TD, TLI, Builder);
if (Result == 0) continue;

DEBUG(dbgs() << "SimplifyLibCalls simplified: " << *CI;
      dbgs() << " into: " << *Result << "\n");

// Something changed!
Changed = true;
++NumSimplified;

// Inspect the instruction after the call (which was potentially just
// added) next.
I = CI; ++I;

if (CI != Result && !CI->use_empty()) {
  CI->replaceAllUsesWith(Result);
  if (!Result->hasName())
    Result->takeName(CI);
}
CI->eraseFromParent();

Best regards,
Christoph

P.S. When answering, don't forget to CC the mailing list.

Ok. So it seems CI->eraseFromParent() removed the old instruction and the new one is inserted right after this one in the inner function in the case of printf->puts. There is another line CI->repalceAllUsesWith(Result). I think this line could also do the replacement besides inserting the new one in the inner function. What’s the difference of these 2 replacement methods?

Also thanks for your reminder of CC the mailing list.
-Thomson

Hi Thomson,

Ok. So it seems CI->eraseFromParent() removed the old instruction and
the new one is inserted right after this one in the inner function in
the case of printf->puts. There is another line
CI->repalceAllUsesWith(Result). I think this line could also do the
replacement besides inserting the new one in the inner function. What's
the difference of these 2 replacement methods?

whenever an instruction is replaced, all other instructions that use its
result (as an input operand) have to be told to use the result of the
new instruction instead. That's what CI->replaceAllUsesWith(Result)
does. Only then can the old instruction be removed from the basic block.

Best regards,
Christoph

Could you give an example of the old instruction cannot be removed safely? Since the new instruction would produce the same context/result of the old one, I suppose it is safe to remove the old one when the new instruction is ready (inserted after the old one). Anything I missed here?

Thanks,
-Thomson

Hi Thomson,

Could you give an example of the old instruction cannot be removed
safely? Since the new instruction would produce the same context/result
of the old one, I suppose it is safe to remove the old one when the new
instruction is ready (inserted after the old one). Anything I missed here?

yes, you're missing that LLVM programs are in SSA form: A new
instruction always produces a new result.

Imagine a graph in which every instruction is represented as a node and
every use-relationship is represented as a directed edge. Now if you
want to replace a node with another node, you have to make sure to
properly reconnect all incoming and outgoing edges of the old node,
otherwise they'll be dangling.

Best regards,
Christoph

This makes sense now. So if the instruction doesn’t make any assignment, it would be unnecessary to replace any operand reference and the same instruction(CI) would be returned from the inner function to avoid this replacement, is this right?

Thanks,
-Thomson

Hi Thomson,

This makes sense now. So if the instruction doesn't make any assignment,
it would be unnecessary to replace any operand reference and the same
instruction(CI) would be returned from the inner function to avoid this
replacement, is this right?

yep, looks like it. To use the correct terminology, though, I would
rather call it "the result of the instruction is unused" instead of "the
instruction doesn't make any assignment".

By the way, I'm a bit confused myself about the following line
(SimplifyLibCalls.cpp:1195) - maybe someone can comment on it:

return CI->use_empty() ? (Value*)CI :
              ConstantInt::get(CI->getType(), FormatStr.size()+1);

So does LLVM assume that puts() is always successful and will never
return EOF? I'm not really sure that's a safe assumption. For instance,
stdout might have been closed or redirected to a file on a disk with
exceeded quota, and the write operation could fail as a consequence.

Best regards,
Christoph