@llvm.assume blocks optimization

(I searched simply on the discourse and github issues)

Reproducer: Compiler Explorer

For the IR:

; Function Attrs: nounwind
define fastcc void @_Z3Barv.destroy(ptr noundef nonnull align 8 dereferenceable(120) %0) #0 {
resume.entry:
  %index.addr = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 5
  %index = load i2, ptr %index.addr, align 8
  %switch = icmp eq i2 %index, 1
  br i1 %switch, label %cleanup21, label %CoroEnd

cleanup21:                                        ; preds = %resume.entry
  %.reload.addr = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3
  %1 = load ptr, ptr %.reload.addr, align 8
  %2 = icmp eq ptr %1, null
  tail call void @llvm.assume(i1 %2)
  %index.addr.i2 = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3, i64 72
  %index.i3 = load i2, ptr %index.addr.i2, align 8
  %switch.i4 = icmp eq i2 %index.i3, 1
  br i1 %switch.i4, label %cleanup21.i, label %CoroEnd

cleanup21.i:                                      ; preds = %cleanup21
  %.reload.addr.i5 = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3, i64 24
  %3 = load ptr, ptr %.reload.addr.i5, align 8
  %4 = icmp eq ptr %3, null
  tail call void @llvm.assume(i1 %4)
  br label %CoroEnd

CoroEnd:                                          ; preds = %resume.entry, %cleanup21, %cleanup21.i
  tail call void @_ZdlPv(ptr noundef nonnull %0) #1
  ret void
}

It looks like the optimizer can’t help to optimize it further. But this looks odd to human. Since the only meaning instruction in the function is tail call void @_ZdlPv(ptr noundef nonnull %0).

And if I remove the two assume instructions, the ‘opt’ can optimize it as expected now: Compiler Explorer . Then it looks bad since now it shows the @llvm.assume blocks the optimization.

Maybe we want to put the job to optimize @llvm.assume to code generators. But it doesn’t do a good job for this case at least: Compiler Explorer

(the generated code)

Bar() [clone .destroy]:                        # @Bar() [clone .destroy]
        cmp     byte ptr [rdi + 112], 1
        jmp     operator delete(void*)@PLT                      # TAILCALL

There is a meaningless cmp instruction and this is also the motivation of the post.

I can understand that there is some trade offs here. e.g., it is hard to decide in which passes should we remove assume intrinsics. But it still looks odd to me that we put some work at the middle end to the backend. And I am wondering if we had any chance to improve this.

Note that assume become a standard C++ feature recently. So I am concerning this may be worse after users tries to use assume widely in their source codes.

Any thoughts?

2 Likes

I see three conceptually different ways forward:

  1. Drop assumptions more aggressively. We need some of this for sure, but I doubt it should be the only solution we want to adopt.

  2. A pass that applies heuristics to restructure/move assumes. In the example, it could hoist the loads and the assumes into the entry. We’d need to change the former into masked loads or sth to prevent us from assuming they imply dereferenceability. The assumes would use the pointer and the path conditions to provide the assumption. Overall, it would be complicated.

  3. We first adopt something along design idea #2 from [llvm-dev] [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume. (The discourse ate some of the email, but here is the discussion: [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume).
    Then it will be simpler to run the pass to hoist/canonicalize assumptions because we can move side-effects that only affect the assumption “into the assumption function”. We could even outline regions that only deal with assumptions and make them new assumption functions.

I had a conversation with @alina, @aeubanks, and @chandlerc last week about assumes, and Chandler essentially explained that some of these optimization-blocking effects are unavoidable tradeoffs from the design. We were discussing libc++'s use of __builtin_assume and how it blocks optimizations, which lead us to use ⚙ D123175 [libcxx] Add flag to disable __builtin_assume in _LIBCPP_ASSERT.

To try to summarize the conversation, as designed, llvm.assume should only be used to communicate critical information to the optimizer that has a high likelihood of triggering optimizations, such as alignment information that will enable vectorization. Using it as a way to carry nice-to-have information through the IR is not consistent with the original design and the tradeoffs involved to make assumption information durable through optimization.

I guess this is all to say, these kinds of issues are a known part of the design, and that certainly needs to be better documented, or users of the C++ assume feature will experience surprisingly bad performance.

If resolving these performance problems is important, we should look at the design Johannes linked. This is, however, a challenging project. In the short term, if I were working on this problem, I would focus on approaches 0 and 1 of managing assume information.

I would slightly tweak this – I think the bar is lower than “critical”, but is isn’t at all free. It is a very material tradeoff and so there needs to be a real, cognizant idea of a benefit that makes the tradeoff worthwhile.

One specific thing to add to this – there was also a very significant investment in reducing the impact of these on the optimizer to minimize this issue. But it doesn’t go to zero, it is just kept as small as possible (so far).

So if there are assumptions that should be valuable, then they should be added. And it may be necessary to improve still further the resilience of the optimizer to make the cost of that acceptable. But it’ll never go truly to zero.

The thing that I think this design argues strongly against, IMO, is adding assumptions without intent. They have a cost, and we should be paying that cost only when there is a reason to expect benefit.

IMHO, having builtin_assume blocking optimization in anyway is a bug, not intention. It may be difficult to fix and we will face the consequence, but it is a bug .

While documenting that the implementation of assume may have a runtime cost, I am sure how it helps user in making the decision as the ‘cost’ is not well defined – they really have to measure it case by case in order to avoid the negative side effect leading to reduced usablity.

2 Likes

FWIW, if we introduce a new memory effect we can also limit the impact of assumes w/o compromising their correctness.
So instead of
@llvm.assume(i1 %cond) ["bundles"(ptr %ptr)] mustprogress nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: readwrite)
we would have
@llvm.assume(i1 %cond) ["bundles"(ptr %ptr)] mustprogress nocallback nofree nosync nounwind willreturn memory(assume_mem: write)
So they do not interfere with malloc and other known functions.
In fact, we should be able to have some “unrelated”/“fresh” memory kind which doesn’t interact with anything. That way, each assume will only impact “itself”, nothing else, at least not through its effects.
If we then outline the associated side effects and control, we can annotate the llvm.assume_fn call with memory(fresh: write, argmem: read) to ensure we do not interchange it with write effects that would influence anything read inside.

Yeah, it surprised me a lot that assume is not free.

To try to summarize the conversation, as designed, llvm.assume should only be used to communicate critical information to the optimizer that has a high likelihood of triggering optimizations, such as alignment information that will enable vectorization

So if there are assumptions that should be valuable, then they should be added. And it may be necessary to improve still further the resilience of the optimizer to make the cost of that acceptable. But it’ll never go truly to zero.

While it is relatively tolerant for the frontend-middle end communication, the major concern now I get is that assume is a standard feature now. And I am not sure if WG21 were aware about this. And no matter if WG21 were aware about that or not, I am pretty sure many C++ end users won’t get the intent. I can image they’ll insert a lot “non-valuable” “non-critical” assumptions to their code. It looks not good really.

I am not sure if I understand the method well. In my imagination, it still can’t solve the case that the value %cond are only used by the assumption like the reproducer did.

I mean something like:

%0 = ptr
...
%4 = tail call i1 @whatever(ptr %3, ptr %2, ptr %1, ptr %0)
tail call void @llvm.assume(i1 %4) 

Given %0~%4 are only used by @llvm.assume, can we eliminate all of them now?
Or only if %4 is only used by @llvm.assume but %4 provides some information, so that eliminating %4 may affect the optimizer too. Then how should we make a decision here?

IIUC, is it possible to solve the question by an @llvm.assume() without i1 %cond? I feel the edge cases will be much less in this way.

The i1 is already not used for various assumes that use bundles instead. (I think that was #1 in the RFC I linked earlier).

Under the proposal, your initial example would end up looking like this:

declare void @__assume_fn(ptr %0) {
resume.entry:
  %index.addr = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 5
  %index = load i2, ptr %index.addr, align 8
  %switch = icmp eq i2 %index, 1
  br i1 %switch, label %cleanup21, label %CoroEnd

cleanup21:                                        ; preds = %resume.entry
  %.reload.addr = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3
  %1 = load ptr, ptr %.reload.addr, align 8
  %2 = icmp eq ptr %1, null
  tail call void @llvm.assume(i1 %2)
  %index.addr.i2 = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3, i64 72
  %index.i3 = load i2, ptr %index.addr.i2, align 8
  %switch.i4 = icmp eq i2 %index.i3, 1
  br i1 %switch.i4, label %cleanup21.i, label %CoroEnd

cleanup21.i:                                      ; preds = %cleanup21
  %.reload.addr.i5 = getelementptr inbounds %_Z3Barv.Frame, ptr %0, i64 0, i32 3, i64 24
  %3 = load ptr, ptr %.reload.addr.i5, align 8
  %4 = icmp eq ptr %3, null
  tail call void @llvm.assume(i1 %4)
  br label %CoroEnd

CoroEnd:                                          ; preds = %resume.entry, %cleanup21, %cleanup21.i
  ret void
}

declare void @llvm.assume_fn(ptr, ...)

; Function Attrs: nounwind
define fastcc void @_Z3Barv.destroy(ptr noundef nonnull align 8 dereferenceable(120) %0) #0 {
  call void @llvm.assume_fn(ptr @__assume_fn, ptr %0)
  tail call void @_ZdlPv(ptr noundef nonnull %0) #1
  ret void
}

I’ve submitted ⚙ D153966 [SimplifyCFG] Allow dropping block that only contains ephemeral values to fix the specific issue reported here.

Woah, I didn’t know that was the reason for you folks asking for a way to disable __builtin_assume. As an extremely optimizer-naive user, I find this completely nuts (even though I know there are really good reasons for it). On the libc++ side, we were naively using __builtin_assume with the intent of giving more semantic information to the optimizer so it can do its job better. Seeing that’s usually not the case, I am tempted to simply remove __builtin_assume from libc++'s assertions until the compiler has been fixed/improved, if we think there’s a way we can get there.

Otherwise, if we don’t think we’ll get to a point where we can throw __builtin_assume around without preventing optimizations, I think we need to clearly document the cases where it makes sense to use the attribute and try to rely on a good shared understanding of how the attribute operates in the community. Barring that, I believe people are going to use it the exact same naive way libc++ has been using it and they will get the inverse of what they wanted, which would make this feature essentially a giant footgun.

I think that captures nicely what we haven’t been doing (in libc++). However, as a decent representative for optimizer-naive users, I think that requiring assumptions to have an intent makes them close to useless for the average programmer. You have to already know how the optimizer thinks in order to do that properly, and the average user just doesn’t.

I’m sure this is harder to fix than I can imagine. In the short term, I think we should probably do @jdoerfert 's step (0) and drop assumptions much more liberally. And in the even shorter term, I’m going to disable __builtin_assume from libc++'s assertions so we at least don’t ship LLVM 17 with an obvious case of “non-intentional assumptions” until we have more clarity.

Edit: The libc++ patch is ⚙ D153968 [libc++] Stop using __builtin_assume in _LIBCPP_ASSERT

If people want to get an idea how much the optimizer can use “information”, check out the conclusion/graphs in:

Long story short, we were not great at utilizing information.
That said, we are improving on that continuously.

Another way assumes may affect optimizations right now is that they (intrinsics themselves and IR instructions that are only used by assumes) usually impact all kinds of cost estimators and may prevent inlining/vectorization/unrolling/etc. even if they end up being removed before the codegen (and that’s not always guaranteed, see :gear: D152358 [CGP] Remove operand of llvm.assume more aggressively. ). There are lots of helper methods to “do something ignoring dbg intrinsics”, but no such thing for assumes (and the problem is harder).

Not sure if that’s true, various passes use CodeMetrics::collectEphemeralValues and try to skip those.

Passes also rely on num uses of the values, which might be affected by the вип-like intrinsics

Thanks for following up on the libc++ assumes! We should have written up up a proper issue for it with real data instead of anecdata and narrative, but that takes time and things get dropped every so often.

As an outsider: There is no notion of consuming and therefore removing an llvm.assume?

We don’t have that, AFAIK, but that is a good idea. Especially if we manifested equivalent information in the IR elsewhere we can drop assumptions for sure.

AIUI AssumeSimplifyPass manifests equivalent information in the IR and drops redundant assumes. Perhaps it’s worth looking into running that pass by default? And we could extend it to heuristically drop assumes that may block optimizations, especially if this happens in the middle of the optimization pipeline where some passes have already had the chance to take advantage of the assumes.

For example, we currently don’t drop the assume in

declare void @llvm.assume(i1)
declare void @g(ptr)

define void @f(ptr nonnull %p) {
	call void @llvm.assume(i1 true) [ "nonnull"(ptr %p) ]	
	call void @g(ptr %p)
	ret void
}

@aeubanks Normal icmp based assumes do get dropped if redundant – simply by dint of the condition simplifying to true. The AssumeSimplifyPass handles operand bundle assumes, which are, at present, not used to any significant degree, which is presumably also why the pass is not enabled by default.