llvm.assume after CodeGenPrepare

With recent changes in BasicAA (mostly by Nikita Popov I believe) llvm.assumes can now guide in the AA decision making. Which is of course great.

For example for C input (or IR equivalent) as follows it can make a huge difference if the variable ‘x’ is known to be non-zero when AA is queried during scheduling

__builtin_assume(x != 0);

for (int i = 0; i < 64; i += 4) {

v[(i + 0) * x] = v[(i + 0) * x] >> 2;

v[(i + 1) * x] = v[(i + 1) * x] >> 2;

v[(i + 2) * x] = v[(i + 2) * x] >> 2;

v[(i + 3) * x] = v[(i + 3) * x] >> 2;

}

Unfortunately it appears that the CodeGenPrepare pass removes llvm.assume so that they never reach the code generator. Currently commit 91c9dee3fb6d89ab3 (and before that commit 6d20937c29a1a1d67) eliminate assumptions in CodeGenPrepare for reasons that appear to be optimization (avoiding blocks that would be empty if it was not for the llvm.assume and its predecessors).

It seems these two efforts are quite contradictory. Is there any deeper thinking behind this? I for one would be in favor of not eliminating assumes.

-Markus

Well, I'm fairly new to LLVM and don't really have much of a clue
about this. I just wanted to make a small improvement on top of the
existing work from 6d20937c29a1a1d67 without thinking all that much
about the overall design; and since codegenprepare was already
stripping it, I thought it would be safe to assume that no later pass
would use this information.

I was writing some code like this:

    assert(ptr != nullptr || !ret); // no-op in release builds
    __builtin_assume(ptr != nullptr || !ret);

Then I figured that it would look nicer and might (?) be nicer on the
optimizer to instead make it look like this:

    if (ptr == nullptr) {
        assert(!ret); // no-op in release builds
        __builtin_assume(!ret);
    }

but when I tried that, the result was that because the llvm.assume
only got stripped in a later part of the codegenprepare pass, the
branch condition was never eliminated (see
https://chromium-review.googlesource.com/c/chromium/src/+/2727400/2/base/memory/checked_ptr.h#120).
That seemed wrong to me; and since I saw that it was already stripped
later on in codegenprepare, I thought it'd be a good idea to do it a
bit earlier to allow removing the branch entirely.

I guess if you wanted to keep llvm.assume information around without
creating empty blocks, you'd have to instead let the codegenprepare
pass figure out whether blocks are empty except for llvm.assume and
its speculatable dependencies, and if so, either discard the whole
block or rewrite "if (COND1) __builtin_assume(COND2)" to something
like "__builtin_assume(!COND1 || COND2)"?

With recent changes in BasicAA (mostly by Nikita Popov I believe) llvm.assumes can now guide in the AA decision making. Which is of course great.

For example for C input (or IR equivalent) as follows it can make a huge difference if the variable ‘x’ is known to be non-zero when AA is queried during scheduling

__builtin_assume(x != 0);

for (int i = 0; i < 64; i += 4) {

   v[(i + 0) * x] = v[(i + 0) * x] >> 2;

   v[(i + 1) * x] = v[(i + 1) * x] >> 2;

   v[(i + 2) * x] = v[(i + 2) * x] >> 2;

   v[(i + 3) * x] = v[(i + 3) * x] >> 2;

}

Unfortunately it appears that the CodeGenPrepare pass removes llvm.assume so that they never reach the code generator. Currently commit 91c9dee3fb6d89ab3 (and before that commit 6d20937c29a1a1d67) eliminate assumptions in CodeGenPrepare for reasons that appear to be optimization (avoiding blocks that would be empty if it was not for the llvm.assume and its predecessors).

It seems these two efforts are quite contradictory. Is there any deeper thinking behind this? I for one would be in favor of not eliminating assumes.

Well, I'm fairly new to LLVM and don't really have much of a clue
about this. I just wanted to make a small improvement on top of the
existing work from 6d20937c29a1a1d67 without thinking all that much
about the overall design; and since codegenprepare was already
stripping it, I thought it would be safe to assume that no later pass
would use this information.

I was writing some code like this:

     assert(ptr != nullptr || !ret); // no-op in release builds
     __builtin_assume(ptr != nullptr || !ret);

Then I figured that it would look nicer and might (?) be nicer on the
optimizer to instead make it look like this:

     if (ptr == nullptr) {
         assert(!ret); // no-op in release builds
         __builtin_assume(!ret);
     }

but when I tried that, the result was that because the llvm.assume
only got stripped in a later part of the codegenprepare pass, the
branch condition was never eliminated (see
https://chromium-review.googlesource.com/c/chromium/src/+/2727400/2/base/memory/checked_ptr.h#120).
That seemed wrong to me; and since I saw that it was already stripped
later on in codegenprepare, I thought it'd be a good idea to do it a
bit earlier to allow removing the branch entirely.

I guess if you wanted to keep llvm.assume information around without
creating empty blocks, you'd have to instead let the codegenprepare
pass figure out whether blocks are empty except for llvm.assume and
its speculatable dependencies, and if so, either discard the whole
block or rewrite "if (COND1) __builtin_assume(COND2)" to something
like "__builtin_assume(!COND1 || COND2)"?

FWIW, I think the above is closer to what we want than 6d20937c29a1a1d67 is.
Deleting assumes should not happen deliberately if we don't have to. Instead,
canonicalizations should be put in place, as described above. I'd do it before
codegen though, simplifycfg, for example, could hoist assume from blocks that
just contain it; in order to simplify the CFG :wink: We could also add more of these
into an assume specific pass, unsure what is best here.

~ Johannes

I think there’s two interacting pieces here:

  • I don’t believe we have a way to represent an assume at the MI layer. If we let them flow through codegen, we’d need to add such a thing.
  • The tradeoffs between information preservation and lowering may be different at different points in the pipeline. Johannes frames this as a canonicalization problem. That sounds reasonable, but it’s also reasonable that we may need to give up on preserving assumes at some point. (As a silly example, we probably don’t want them at MC.) Where exactly that point is unclear, and is mostly a matter of practical engineering tradeoffs.

If you felt like exploring alternate lowering points, that would seem entirely reasonable. It might not work out, or it might require a bunch of work to make happen, but the basic idea seems entirely worth exploring.

Philip

I think Markus use case is that MachineInstr::mayAlias may use AA (and even possibly TBAA), and AA could make use of llvm.assume to make decisions.

So the alias analysis in cadegen is partly based on analysing the “final” IR representation before ISel, so the llvm.assume does not need to be present in the MI layer for that to work.

PS. It might seem a bit strange and dangerous to use the IR and IR analyses during the codegen (e.g. afte PHI elimination), but afaik it works as long as MachineMemOperands are handled correctly (being dropped or updated correctly when doing certain transforms such as store merging).

/Björn

That is correct. The only use case I am interested in is when AA is queried during scheduling of MI and as Bjorn points out (and afaiu) this falls back on assumes being present in the IR. The fact that AA, during the MI layer, falls back on IR can indeed seem a bit strange but that is simply how it works and not directly related to the presence of assumes or not.

AFAIU llvm.assume will iselect to nothing (which effectively also eliminates its predecessors if the llvm.assume was their only use) so at least for this use case no explicit llvm.assume representation is required in the MI layer. The requirement would possibly be different if there were other uses of llvm.assume in the MI layer but I am not sure what they would be.

-Markus

Right, the canonicalization as described put in simplifycfg seems like a good thing to do longer term. For starters though it might be enough, at least for my use case, to simply stop dropping llvm.assume in codegenprepare unless the llvm.assume (and its predecessors) are the only instructions in a block. I think that would keep most people reasonably happy.

-Markus

My second point should still stand. Not caring about the MI representation does simplify the work involved, so that’s a good observation.

Philip

I would encourage you to do the simplifycfg change instead. The less fragile we can have performance, the better off we are.

Philip