About NRVO (named return value optimization)

Hi,

For the following small test case,

// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -O1 -o - %s | FileCheck %s

// Test code generation for the named return value optimization.

class X {

public:
X();
};

void f(const X& x);
void test10(bool b) {
f(X());
f(X());
}

we are generating the following LLVM IR with "

%class.X = type { i8 }

; Function Attrs: nounwind
define void @_Z6test10b(i1 zeroext %b) #0 {
entry:
%ref.tmp = alloca %class.X, align 1
%ref.tmp1 = alloca %class.X, align 1
call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
call void @_ZN1XC1Ev(%class.X* %ref.tmp1) #2
call void @_Z1fRK1X(%class.X* nonnull %ref.tmp1) #2
ret void
}

declare void @_Z1fRK1X(%class.X* nonnull) #1
declare void @_ZN1XC1Ev(%class.X*) #1

So my questions is should NRVO be able to know ref.tmp and ref.tmp1 can be merged to be a single one? That is, I’m expecting the following LLVM IR code to be generated,

define void @_Z6test10b(i1 zeroext %b) #0 {
entry:
%ref.tmp = alloca %class.X, align 1
call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2

call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
call void @_ZN1XC1Ev(%class.X* %ref.tmp) #2
call void @_Z1fRK1X(%class.X* nonnull %ref.tmp) #2
ret void
}

If we leave both ref.tmp and ref.tmp1 to LLVM IR, it seems to be hard for middle-end to combine them unless we demangle the function name _ZN1XC1Ev to know it is a C++ constructor and do more alias analysis.

Any idea?

Thanks,
-Jiangning

This isn't related to NRVO - as the name suggests, NRVO is about named
return values. The example you gave has no return values and no named
values.

The optimization necessary here is stack reuse, which classically
LLVM/Clang haven't done a great job on. I'm not sure of the precise
details of the current state, but there have been some efforts to make
it better.

One part of that is the lifetime intrinsics (
http://llvm.org/docs/LangRef.html#memory-use-markers ) which would
allow the backend to know that the stack memory used by the first
temporary is dead before the first use of the stack memory for the
second temporary, and thus reuse the stack. I don't know what the
current state of the lifetime markers is (I guess we don't turn them
on by default? not sure whether they're brokne/inefficient/slow/not
valuable enough yet) and whether they're a viable way forward, but
someone thought so at some point.

- David

From: "David Blaikie" <dblaikie@gmail.com>
To: "Jiangning Liu" <liujiangning1@gmail.com>
Cc: "cfe-dev Developers" <cfe-dev@cs.uiuc.edu>
Sent: Wednesday, June 25, 2014 1:14:16 AM
Subject: Re: [cfe-dev] About NRVO (named return value optimization)

This isn't related to NRVO - as the name suggests, NRVO is about
named
return values. The example you gave has no return values and no named
values.

The optimization necessary here is stack reuse, which classically
LLVM/Clang haven't done a great job on. I'm not sure of the precise
details of the current state, but there have been some efforts to
make
it better.

There is now an enabled-by-default stack coloring optimization (and has been for a while now); but Jiangning's comment that our lack of interprocedural alias analysis might be defeating it here is plausible.

One part of that is the lifetime intrinsics (
http://llvm.org/docs/LangRef.html#memory-use-markers ) which would
allow the backend to know that the stack memory used by the first
temporary is dead before the first use of the stack memory for the
second temporary, and thus reuse the stack. I don't know what the
current state of the lifetime markers is (I guess we don't turn them
on by default? not sure whether they're brokne/inefficient/slow/not
valuable enough yet) and whether they're a viable way forward, but
someone thought so at some point.

As I recall, clang does generate lifetime markers by default (at least in some circumstances), they now work well, and this does seem like a good use case for them. I recommend investigating why this is not happening here. One thing to look at is in CodeGen/CGDecl.cpp:

/// Should we use the LLVM lifetime intrinsics for the given local variable?
static bool shouldUseLifetimeMarkers(CodeGenFunction &CGF, const VarDecl &D,
                                     unsigned Size) {
  // For now, only in optimized builds.
  if (CGF.CGM.getCodeGenOpts().OptimizationLevel == 0)
    return false;

  // Limit the size of marked objects to 32 bytes. We don't want to increase
  // compile time by marking tiny objects.
  unsigned SizeThreshold = 32;

  return Size > SizeThreshold;
}

Maybe the problem is that sizeof(X) < 32? If so, further testing of this limit's impact on compile time might be worthy of investigation.

-Hal

This specific testcase shows 2 problems:

  • unamed variables are not handled the same way than named variables
  • if using a named temporary instead; the object size is the second problem

Cheers,