Missed opt: Clang can't see that captured variables in a lambda are unchanged through a loop

I have a C++ lambda with some variables captured by value, which are used inside a loop. It unrolls the loop as expected, but it generates code to reload the captured variables from the function object every time. I can speed it up by assigning to locals first. The Godbolt link (juicy bit below) runs faster with RUIN_PERFORMANCE=0

As I understand it, captured variables are treated like members of an object. I don’t know of a C++ syntax to declare the constness of the lambda object while it’s running.

The code is simplified from an interpreter. Pad is its working memory.

Search for fmul in the assembly, with RUIN_PERFORMANCE=0 or 1.

function<void(Pad const &pad)> emit_mul()
{
  Val av{99, 0};
  Val bv{99, 4 * BATCH_SIZE};
  Val rv{99, 8 * BATCH_SIZE};

#if RUIN_PERFORMANCE
  return [rv, av, bv](Pad const &pad) {
#else
  return [rv1=rv, av1=av, bv1=bv](Pad const &pad) {
    // Assign to locals so clang can be extra-sure that they aren't
    // modified inside the loop
    Val rv=rv1, av=av1, bv=bv1;
#endif

    for (int batchi = 0; batchi < BATCH_SIZE; batchi++) {
      F &r = pad.wrreg_F(rv, batchi);
      F const &a = pad.rdreg_F(av, batchi);
      F const &b = pad.rdreg_F(bv, batchi);
      r = a * b;
    }
  };
}

The lambda is a function that takes two pointers; in general, it’s tricky to prove aliasing between the two. We can’t do any interprocedural analysis because the address of the function escapes.

I’d normally expect TBAA should be able to disambiguate, but apparently that isn’t happening. The relevant loads have !tbaa.struct metadata because you’re pass the whole “Val” object to the helper functions. And it looks like LLVM’s implementation of type-based alias analysis currently doesn’t analyze that metadata. This is probably straightforward to fix in TypeBasedAAResult::alias, but not sure how much work that would be.

2 Likes