Giving up using implicit control flow in guards

Hello Everyone,

I want to raise a discussion about @llvm.experimental.guard intrinsic and reasons why we should give up using it. Here is an alternative approach to representation of guards that resolves some of fundamental flaws that the current guards have.

Basically, this intrinsic was introduced to model the following situation: we want to check that some condition is true, and if it’s not, we should deoptimize at this point and quit execution of the compiled code. According to http://llvm.org/docs/LangRef.html#llvm-experimental-guard-intrinsic, the guard looks like this:

Hi Max,

I think this works, but I'm a bit wary around the complexity
introduced by widenable_condition. For instance, we'd have to be
careful in spec'ing out its memory effects (if it is
readnone/readonly, it will get CSE'ed (which may not be what you
want?) but if it can write memory it will inhibit optimizations). I'm
also worried that with this representation it will become difficult to
discover which conditions are widenable (e.g. after optimization we
may end up using a phi of a select of widenable_condition() in a
branch condition, obscuring the fact that the branch was widenable).

But I've not looked at this stuff for a while so I'm happy to trust
your judgement. :slight_smile:

-- Sanjoy

Hi Sanjoy,

Thanks for feedback! As for memory effects, currently I use " inaccessiblememonly " for it. It allows to prove that it doesn't alias with any other load/store in the function, but doesn't allow CSE to eliminate it. It is not actually super-cool, because there is no way that we can safely hoist it out of loop (and sometimes we want to, for example to make unswitching). So this part is a one I am still thinking about.

As an alternative to widenable_condition() call, we are considering loads from special dedicated global fields. The bad thing about them is that if a loop gets peeled, then a widenable condition in the loop can be optimized away basing on fact that this field was true on the 1st iteration.

Thanks,
Max

If we only need the guard representation only for widening, why not
first run some cleanup passes, widen the guards and then lower the
guards and then run the rest of the pipeline which will see the
explicit branching?

Also, in the past we had talked about piggybacking on
AssumptionCacheTracker to solve some of the problems (with the
implicit nature of guards) you highlight. Can that be made to work?

Hi Sanjoy,

Thanks for feedback! As for memory effects, currently I use " inaccessiblememonly " for it. It allows to prove that it doesn't alias with any other load/store in the function, but doesn't allow CSE to eliminate it. It is not actually super-cool, because there is no way that we can safely hoist it out of loop (and sometimes we want to, for example to make unswitching). So this part is a one I am still thinking about.

As an alternative to widenable_condition() call, we are considering loads from special dedicated global fields. The bad thing about them is that if a loop gets peeled, then a widenable condition in the loop can be optimized away basing on fact that this field was true on the 1st iteration.

You'd ideally want every *execution* of a guard to possibly have a
different widening, but using global variables hides that.

-- Sanjoy

Replying specifically to Sanjoy's questions inline. For the record, I've talked with Max about his proposal in depth offline and generally support the direction. As Sanjoy pointed out, there are some details to be worked out, but Max has gotten far enough with his prototyping at this point that I'm personally convinced they're all handleable w/effort.

If we only need the guard representation only for widening, why not
first run some cleanup passes, widen the guards and then lower the
guards and then run the rest of the pipeline which will see the
explicit branching?

Key thing is that we want to be able to widen conditions across function boundaries. Here's a trivial example:
int load(int[] a, int offset) { return a[offset]; }
int b(int[] a) { load(a, 0) + load(a,1); }

This example should require (at most) one runtime range check.

As such, we need the widenable representation through inlining which is a very sizable chunk of the pipeline.

Also, in the past we had talked about piggybacking on
AssumptionCacheTracker to solve some of the problems (with the
implicit nature of guards) you highlight. Can that be made to work?

The conclusion of the "let's just use invokes" thread I started a while ago was essentially the opposite. That is, we should be using invokes for assumes as well specifically because it eliminates the need for anything like an assumption cache.

Or to say it differently, yes, we can improve (a bunch) the current implementation by commoning some code between guards and assumes, and using explicit vs implicit control flow (i.e. terminators), but we still have the general problem of needing to update every single pass. You could argue that if we did a better job of sharing/structuring the code, this would at least mean we picked up support for assumes pretty much everywhere too, but I'm personally not convinced that's a large enough value to justify the investment.

It's also a bit odd to have an invoke without a meaningful unwind target which is what we end up with for both assumes and invokes. You essentially end up having to special case unreachable to leave the invoke in place. Doable, but a bit odd.

I was originally of the opinion that improving/commoning/merging the current representation was the easiest path forward, but Max's prototyping work has impressed me with how few changes he's needed to make to get full coverage of the proposed representation throughout the optimizer. I'm (weakly) convinced his proposal is the right path forward, but I think either approach could be made to work without an unsurmountable amount of effort.

I missed this when it went around, but I wanted to follow up to say that I really like the semantic direction here.

I have some nit-picky concerns that I’m following up on in the code review, but wanted to leave a record here that this seems like a really nice way to progress modelling these things.

-Chandler