Deduplication of memory allocation

Hi,

Even though this question does not only concern LLVM, it seems that only compilers guru can answer it. So I am giving a try here, hoping for the best.

In a recent talk by Chandler Carruth, “Performance with algorithms, efficiency with data structures” ( https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one should never return by reference for efficiency. Although I totally agree with him in most cases because pointers make it difficult for a compiler to optimise, I still don’t always have an efficient solution with value semantics. Here is the case that I am thinking of :

Why?

Look up "Return value optimization".
:slight_smile:

Of course, the allocation only occurs only once in every loop in the first example, inside the f function. We have 2 copy elision happening: one when the function returns and one when w is copy constructed. But we still have one memory allocation inside f.

With reference semantics, we don’t have a single memory allocation in the i-loop.

Hi,

Even though this question does not only concern LLVM, it seems that only
compilers guru can answer it. So I am giving a try here, hoping for the
best.

In a recent talk by Chandler Carruth, “Performance with algorithms,
efficiency with data structures” (
https://www.youtube.com/watch?v=fHNmRkzxHWs ), Chandler says that one
should never return by reference for efficiency. Although I totally agree
with him in most cases because pointers make it difficult for a compiler to
optimise, I still don’t always have an efficient solution with value
semantics. Here is the case that I am thinking of :

====
std::vector<double> f(std::size_t i);

auto v = std::vector<double>( n );
for (std::size_t i = 0; i < 1000; ++i) {
    auto w = f(i);
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] += w[k];
    }
}

which would be way slower than

====
void f(std::size_t i, std::vector<double>& w);

auto v = std::vector<double>( n );
auto w = std::vector<double>( n );
for (std::size_t i = 0; i < 1000; ++i) {
    f(i, w);
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] += w[k];
    }
}

because there is no memory allocation in the i-loop, inside the f-call. In
the Q&A where a guy seems to give him such an example (at 1:06:46), he says
that smart compilers such as LLVM can deduplicate memory allocation. It
does not seem to me to be applicable to this kind of algorithm. Does anyone
have a concrete example where a compiler deduplicates memory allocation?

Probably the easiest way to give the compiler an opportunity here would be
to have 'f's definition available for inlining - a compiler might be able
to inline that, then, depending on the particular allocation behavior of
'f', the allocation /may/ be reused (I don't think we do anything like
reusing yet - though we can, to some degree, inline a bunch of
std::vector's ops and then just remove the whole std::vector thing in favor
of a stack allocation, for example)

The following code is must faster with reference semantics than with value semantics. The “value” code is not transformed into the “reference” code by llvm. Just compile this code, and the difference in timings shows.

#include <iostream>
#include <vector>

std::vector<double> f_val(std::size_t i, std::size_t n) {
    auto v = std::vector<double>( n );
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] = static_cast<double>(k + i);
    }
    return v;
}

void f_ref(std::size_t i, std::vector<double>& v) {
    for (std::size_t k = 0; k < v.size(); ++k) {
        v[k] = static_cast<double>(k + i);
    }
}

int main (int argc, char const *argv[])
{
    const auto n = std::size_t{10};

    {
        auto v = std::vector<double>( n, 0.0 );
        for (std::size_t i = 0; i < 100000000; ++i) {
            auto w = f_val(i, n);
            for (std::size_t k = 0; k < v.size(); ++k) {
                v[k] += w[k];
            }
        }
        std::cout << v[n - 1] << std::endl;
    }

    {
        auto v = std::vector<double>( n, 0.0 );
        auto w = std::vector<double>( n );
        for (std::size_t i = 0; i < 100000000; ++i) {
            f_ref(i, w);
            for (std::size_t k = 0; k < v.size(); ++k) {
                v[k] += w[k];
            }
        }
        std::cout << v[n - 1] << std::endl;
    }

    return 0;
}

Probably the easiest way to give the compiler an opportunity here would be to have 'f's definition available for inlining - a compiler might be able to inline that, then, depending on the particular allocation behavior of 'f', the allocation /may/ be reused (I don't think we do anything like reusing yet - though we can, to some degree, inline a bunch of std::vector's ops and then just remove the whole std::vector thing in favor of a stack allocation, for example)

Do you have any concrete example where std::vector is finally allocated on the stack?

There will be no allocation in the first example. it will transform
the first into the equivalent of the second.

The following code is must faster with reference semantics than with value semantics. The “value” code is not transformed into the “reference” code by >llvm.

This is a question of "what does your current version of clang and llvm do.
Not a question of what can be done.
It *can*, it may not do it right now, but it in fact, can.