[RFC] __builtin_assume, [[assume]] optimization behavior

(This RFC was co-authored with @Sirraide @erichkeane.)

Clang supports __builtin_assume to inform the LLVM optimizer about assumptions it is allowed to make based on an expression passed to the builtin (https://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume). C++23 added similar functionality in the form of the [[assume]] attribute (https://wg21.link/P1774R8). This RFC is about the optimization behavior we’d like to see in both the builtin and the attribute.

There was a recent discussion (@llvm.assume blocks optimization) that observed that LLVM’s assumption capabilities (through llvm.assume (LLVM Language Reference Manual — LLVM 19.0.0git documentation)) can have a surprising negative impact on program performance that is likely to trip up programmers of all experience levels. As one example, the libc++ developers added use of __builtin_assume for conditions they assert in the STL and ended up removing that functionality (⚙ D153968 [libc++] Stop using __builtin_assume in _LIBCPP_ASSERT) because of surprising performance regressions. The consensus position on that discussion seemed to be that LLVM should be more aggressive about dropping assumption information (at least in the short term) with a longer-term plan in place ([RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume), but to wit, nobody is actively working on either approach. There were concerns raised in that thread about what level of support we should have for [[assume]] in the presence of these unresolved issues.

The C++ standard gives implementations wide latitude in how to implement the optimization-related effects of this standard attribute, so we have options available to us in terms of how to proceed:

  1. Have __has_cpp_attribute(assume) return 0 and not lower any assumption information into LLVM IR.

  2. Have __has_cpp_attribute(assume) return 202207L and not lower any assumption information into LLVM IR.

  3. Have __has_cpp_attribute(assume) return 202207L and lower assumption information into LLVM IR via llvm.assume.

Of the three options we see, we think Option #2 should be disregarded out-of-hand, as the effect is us telling users “we support this attribute” and then not doing anything useful with it in terms of optimization behavior. The standard expects 0 to be returned from __has_cpp_attribute for this attribute in that case ([dcl.attr]).

Option #1 is a conservative option; it doesn’t expose users to a footgun while still giving us a path forward to provide support for assumption attributes in the future once LLVM has improved support for llvm.assume (or provided different IR for us to lower to). The benefit to this option is that it gives us time and doesn’t give users a poor first impression of the feature, which may dissuade them from trying it again in the future. We have time to improve the QoI of our offerings while still being fully conforming and we have time to see how the attribute fares in other implementations (or whether it’s implemented in them at all – the feature was contentious with implementers within WG21). GCC implements the feature already while MSVC currently does not, which means we may need this option in -fms-compatibility mode regardless of what we decide on. The downside to this option is that it doesn’t meet user expectations when it comes to portability of the feature and it may give the impression that Clang is not fully implementing C++23 (despite being a conforming implementation choice). It may be worth thinking about whether “portability” is practically meaningful in this case given that different optimizers will likely react differently to the same assumptions, so the syntax is portable but the effects on optimizations may not be. Note, this option does not preclude us from checking assumption attributes in a constant expression context should we find that valuable.

Option #3 is a more direct implementation in that it exposes functionality we already support. The benefit to this option is that it provides users with a portable syntax for specifying assumptions across various C++ implementations. It’s also straightforward for us to implement in Clang (as easy as Option #1 would be to implement). The downside is that the functionality we are exposing is not necessarily going to be in line with user expectations (which tend to set a higher bar for standards features than for implementation extensions), and then we’ll get bug reports that aren’t actionable for Clang developers (it requires LLVM to change rather than Clang and there may be unavoidable tradeoffs due to the design of LLVM that might not be practical for them to change). We could help mitigate some of these downsides by issuing a congratulatory diagnostic when using the attribute, along the lines of “assumption attributes may worsen performance rather than improve it; test the effects of this assumption under optimized builds”. It’s a bit obnoxious, but users can disable the diagnostic easily enough.

Additionally, we should decide whether we want __builtin_assume and [[assume]] to be synonymous or whether we want them to be different. For example, we could decide to not allow [[assume]] to have optimization impacts but continue to do so for __builtin_assume. Due to common usage patterns of putting attributes behind a macro for people who want highly portable code, this could perhaps be a sufficiently portable stopgap approach for users if we picked Option #1. However, due to that same usage pattern, if [[assume]] has no effect on optimization behavior, we may drive users writing highly portable code to use the builtin and they’ll hit the same footguns we’re trying to help them avoid. We propose to leave the optimization behavior of __builtin_assume as it is today.

To help get a sense of community sentiment, do you prefer:

  • Option #1 & leave __builtin_assume alone
  • Option #1 & make __builtin_assume a noop
  • Option #3 & leave __builtin_assume alone
  • Other (see comments)
0 voters
4 Likes

+1 for issuing a warning.

1 Like

Since the assumption functionality, deficient though it may be, is already exposed via __builtin_assume, my vote is to align the [[assume]] behavior with it. This at least enables existing users of the former to migrate to the standardized spelling.

I’m disinclined towards issuing a diagnostic on use. My expectation is that users that are (understandably) surprised by performance degradation will file bug reports that might help to isolate the cause of such degradation and therefore motivate addressing the underlying optimizer behavior; particularly so if testing shows improvements for other compilers. If an excess of bug reports starts appearing, we can then reevaluate whether to emit a “sorry, you’re not going to like this, but…” diagnostic.

3 Likes

The project isn’t THAT compartmentalized. (And if your impression is that it IS that compartmentalized, that’s a different problem.)

This would be a pretty obnoxious diagnostic. If the feature doesn’t work, don’t support it. Or file bugs to help make it better.

I think Option #1 and leaving __builtin_assume alone is the most defensible option right now.

I worry that exposing the attribute with its optimization impacts today will lead to very little aside from frustration. We already have strong signals from the community that the current behavior of llvm.assume is problematic. We have a path forward and I would love to hear from @jdoerfert whether he is planning to work on the proposal in the near future or if thinking has changed on the topic, problems have arisen, etc. But also, I worry that exposing the attribute now will cause problems for that transition in the future because performance will change when LLVM improves (and “LLVM improves” != “your code gets faster in all cases”).

That actually touches on why I think we don’t want to expose our users to this attribute right now, or perhaps ever: we already see that libc++ developers ran into the trap of assuming (pun intended) that giving the optimizer more information will help the optimizer do its job and thus improve performance. However, you must profile your code to see what the impacts of this attribute are. But you don’t have to profile your code once, you have to profile it on every release of every compiler you work with. There’s nothing portable about this attribute in terms of what users care about. No two optimizers behave the same way and so on anything but the most trivial of uses, this attribute is going to be difficult to use without conditional compilation – which means a builtin is actually more usable for users because they’re already used to builtins being implementation-specific. Further, Microsoft does not implement this attribute and they indicated in WG21 that they never would because of how problematic __assume has been for their users in practice. We have similar feedback from our users about __builtin_assume and I think we should keep that in mind. I would love to hear more about the experience @ldionne and other libc++ folks had with using this functionality. Also, I’d love to hear what experiences @rnk had that led him to pointing out the issues in libc++.

I think [[assume]] is the new register, but with the distinction that we have more latitude to determine what user experience we give users. I’d urge us to remember that most users are not power users. Power users are comfortable using implementation-specific tools like builtins. It’s the other 98% of users out there I’m concerned for – they’re the ones who struggled with register, and inline, and all the other times the standards committees have standardized optimization hints. That’s why I’m strongly in favor of a diagnostic if we end up going with Option #3 – we could learn from the past and help users instead of assuming everyone has enough expertise to know how to use this tool properly. The whole point to [[assume]] is to induce undefined behavior, even in circumstances where there would not otherwise be undefined behavior, so we should also be considering our security posture when thinking about how we expose this feature.

All this said, if the community’s consensus is to go with Option #3, I can live with it.

3 Likes

It’s not so much about compartmentalization as it is about comfort; I think LLVM developers are more comfortable making significant changes in LLVM than Clang developers are making significant changes in LLVM. This is the sort of functionality where making something better for your use case may easily make things worse for someone else’s use case, and that’s a hard situation for folks to want to step into.

Part of the trouble is, what does it mean for this feature to “not work?” The feature can be used correctly, at least for one release of one compiler that you can tune against. But how about the next release? Or another compiler?

I think the benefit to a congratulatory diagnostic is that it tells users “there are dragons here” but still gives power users a way to say “shhhhh, I know.” For example, I suspect such a diagnostic could have saved the libc++ folks had we issued one for use of __builtin_assume.

Can you explain a bit more about why the standardized spelling is beneficial for users to migrate to given that this isn’t actually a portable feature in terms of optimization behavior?

Personally, I’m in favour of option #3; I think Aaron’s concerns are definitely valid, but I also see a way in which not supporting [[assume]] might not improve the status quo either: if we return 0 for __has_cpp_attribute, users might decide to just hide it behind a macro (which many people are already doing anyway, e.g. in a library header to support standard versions before C++23, as well as to support older compilers that don’t support [[assume]], but which do have an equivalent builtin) and have it fall back on __builtin_assume(). But the problem with that is that, now, we haven’t solved anything, because they’ll still run into the same issues because __builtin_assume() still suffers from them.

Sure enough, the __builtin might dissuade people from using it without
fully understanding the ramifications, but, in my opinion, there’s a crucial difference between __builtin_assume() and most other builtins: It’s the fact that [[assume]] is now part of the standard. On the surface, the situation here parallels that of [[likely]]/[[unlikely]], and __builtin_expect()—i.e. there was a compiler builtin that was supported by all major compilers in some fashion and that was widely used with the intent of improving performance, so much so that it got standardised.

In my opinion, this is likely to instead lead people to believe that __builtin_assume() is just ‘how you spell [[assume]] in Clang before C++23’, and that it’s fine to just use the ‘builtin equivalent’ until Clang supports the ‘standard spelling’. So, in other words, not supporting [[assume]] may well just make users use __builtin_assume() instead (again, if they’re not already doing that to support older versions of Clang or other compilers); and if that’s the case, we might as well let them spell it [[assume]], as the end result would be the same anyway.

Lastly, I’m not categorically opposed to option #1, but I just don’t see not implementing [[assume]] helping in the long run.

3 Likes

I think this concern is generally overblown. Yes, this specific pattern of “take all asserts and convert them into assumes” is a terrible idea. As long as you don’t do that, and only manually place assumes, you are unlikely to have a particularly bad time.

We have been using assumes to good effect in Rust – generally, as long as you can show that the assume actually improves optimization in at least one case, it’s probably worthwhile to have it.

It also seems quite weird to me that, of all the ways you can shoot off your own foot in C++, the optimization behavior of assumes is where you want to draw the line.

4 Likes

IMO, the problem here is that we are in between a rock and a hard place.

First, we would like to implement the C++ feature to be compatible/helpful to those who want to use it, and in the way that is compatible with what WG21 specified.

Second, the feature as designed is REALLY REALLY hard to use correctly. It exists to introduce UB in a way that isn’t particularly actionable unless EXPERTLY used, and at the danger of introducing some pretty nasty UB. Additionally, this is made worse by some additional performance regression characteristics introduced by the design of the LLVM attribute.

As a representative of our users, this is something we probably don’t want to expose them to: It is hard to use right, and a vast majority of use attempts are going to be harmful. However, ALSO as a representative of our users, they, via the committee and emails to the committee, have stated that this is something they want. So this part comes down to, “Do we do what is BEST for our users, or what they want”. Additionally, our usability concerns WERE brought up in the room in WG21, so I presume those who are asking for this are aware of the concerns.

In general, I’m in favor of #3. WG21/users have been told of the dangers and want the feature anyway.

I AM in vast favor of the “please be very very careful, this is way harder to use than you think” warning. While I see the concerns that this is obnoxious, I REALLY want us to do everything we can do to prevent this from being ‘cargo cult’. I REALLY don’t want anything but the most expert of users to be using it: it is incredibly sharp, and difficult to use.

And a warning of, “Hey, this is an expert-only feature that is really hard to use correctly, please make sure you have done repeated, continous benchmarks on this code to make sure it is doing the right thing” is going to save SOME people.

Finally, I get the argument of, “well, if it is ‘broken’, just fix it!”. That is unfortunately not something we have the ability to do. It is ‘broken’ as designed by the committee. We did our best to fix the design at the time (which really came down to: there isn’t a good design for something like this that isn’t incredibly dangerous), but the design is the design. The best we can do is help folks from using it accidentally and harmfully, while still giving our “experts” the opportunity to use it as they wish.

3 Likes

I also agree that a warning is probably not a bad idea—especially since [[assume]] can also cause UB—so issuing a warning is hopefully at least going to make some people look into this topic a bit more and help them figure out whether or not using [[assume]] is even a good idea in the first place—depending on their use case.

On that note, I forgot whether this has already been brought up here, but we should definitely update the documentation for __builtin_assume() either way to point out these pitfalls.

1 Like

Two things: 1) that is the common usage pattern for this feature for naive users; people expect that telling the optimizer more information will lead to better performance and that’s not actually the case. Asserting that something doesn’t happen is a natural place to want to tell the optimizer to “help it out”. 2) there is very little chance the optimization behavior of Clang matches the optimization behavior of GCC or MSVC, so I think you’re actually quite likely to have a bad time except when using the attribute with only a single implementation, so the builtin is a more natural way to express “this is basically a different-ish feature in all implementations” unlike the standard attribute. Users expect uniformity with standardized features more than they expect it with extensions.

Because this was standardized as an attribute, we have options in terms of conformance. We don’t have as much wiggle room with other footguns.

Thank you this actually summarizes things very nicely, IMO.

Heh, this is actually why I’m in favor of #1 – I think WG21, users who interact with WG21, and compiler developers all have a bias towards expert-level features. I think most users aren’t expert-level users and we should be careful with interpreting “experts want this” with “this is good for users”. Basically, I don’t think it’s accurate to assume that users are aware of the dangers (this is why I’m strongly in favor of a diagnostic if we go with #3).

It provides a standardized spelling for users to express their intent. Optimization is always a matter of QoI. There are certainly cases that don’t involve assumption where optimizers produce surprising results.

I don’t see sufficient data presented here to conclude that assumption (as currently implemented in LLVM) is more often harmful than it is beneficial.

I don’t think we should be making decisions for users here. WG21 has standardized a feature with an uncertain track record in the industry. I think we should let [[assume]] have its day in the marketplace. Coding standards can evolve to provide guidance regarding how and when to use it. QoI differences will highlight implementation deficiencies that can be addressed. We can document the risk that comes with [[assume]] (both inherently and due to QoI issues); I think doing so is preferable to issuing a terse congratulatory diagnostic that might cause users to avoid the feature just because of the diagnostic or because they don’t understand the implications.

An alternative diagnostic approach would be to temporarily issue a this-is-experimental diagnostic that points users at the fine documentation that will of course be written. Doing so might help to lower the adoption rate in order to provide more time to respond to issues that are raised without injecting fear-of-the-feature bias into the community.

1 Like

Worth noting that the reporting bias is going to be heavily skewed towards “it didn’t work as expected.” People won’t file issues to say “I tried this feature and it worked perfectly.”

5 Likes

OTOH there are plenty of expert-level users out there. My team gets reports all the time from game studios trying to do sophisticated things to squeeze the last possible microsecond out of their code. I guarantee that they will try it, and measure it, and tell us exactly when it doesn’t work.

6 Likes

100% agreed; that’s why I think we should leave __builtin_assume alone. This gives people a tool to eek out performance but it doesn’t give the illusion of portability where none exists in practice.

As Erich points out, we’re between a rock and a hard spot in some ways.

While I am in favor of implementing [[assume]] in the front-end (because i think we should start somewhere), I think there is an expectations from users that for a valid assumption,
the optimized code would perform better, or about the same.

The status quo is that LLVM might, in the presence of a valid assumption, produce substantially worse code, which is certainly “QoI”, but… surprising QoI nonetheless.

Do we have good reasons to believe this isn’t also the case for gcc or MSVC? gcc doesn’t implement __builtin_assume, but 79469 – Feature request: provide `__builtin_assume` builtin function to allow more aggressive optimizations and to match clang suggests that the alternative use of __builtin_unreachable can also pessimize code generation.

Discourse cut off the part on how to overcome many of the problems people mention (here and in the GCC thread).

https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Is a better reference.