generate llvm.assume calls in GVN?

Would it be wrong to generate the llvm.assume IR suggested below? in GVN?

Given more info via the AssumptionCache, InstCombine can then do more optimizing.

Would it be wrong to generate the llvm.assume IR suggested below? in GVN?

I think so... Because:

One very small tweak you could make would be to add an llvm.assume inside

the if case of the if condition. (Yes, that is as silly as it sounds.) I
tested that, and it did optimize as expected. It's essentially working
around a deficiency in the optimizer around path constraints.

Once upon a time, there was an optimization for this called predsimplify. I
believe Nick can tell you a long, glorious, tragic tale about its life and
ultimate death.

In particular:

define void @example_tweaked(i64* %x0) #0 {
  %1 = load i64* %x0, align 8
  %2 = and i64 %1, 3
  %3 = icmp eq i64 %2, 2
  br i1 %3, label %4, label %8
; <label>:4 ; preds = %0
  call void @llvm.assume(i1 %3)

We don't need the assume here. If we want to do predicate-based
simplification, we can just have <whatever> look at dominating branch
conditions, much like it looks at dominating assume calls. Worst case is
that it requires more fancy-ness in the assumption tracking infrastructure
to actually funnel queries through to different ways of saying "this value
is true at this point".

However, that is the core essence of predsimplify. Historically, we have
found a really terrifying amount of complexity and compile time hit for
relatively small gains in terms of generated code quality. =/ Maybe we just
need to handle the "obvious" cases much like the very limited calls to
assumption tracker do, but I think this path needs to be trod with great
care.

Would it be wrong to generate the llvm.assume IR suggested below? in GVN?

I think so... Because:

One very small tweak you could make would be to add an llvm.assume inside

the if case of the if condition. (Yes, that is as silly as it sounds.) I
tested that, and it did optimize as expected. It's essentially working
around a deficiency in the optimizer around path constraints.

Once upon a time, there was an optimization for this called predsimplify.
I believe Nick can tell you a long, glorious, tragic tale about its life
and ultimate death.

Okay, gather 'round the fire. :wink:

Predsimplify was the first optimization pass I tried to write, to fix
PR807. The basis was that it would propagate properties (icmp's between two
SSA values) down paths in the domtree. In its various rewrites it knew all
sorts of tricks. For instance it could do:

  a = b + c
  if b == 5:
    if c == 3:
      print a # replace 'a' with 8

which requires that it solve a bidirectional dataflow problem. It also knew
that a < b implied b > 0, and a < b < c implies that c > 1. Some of the
things predsimplify used to do are now implemented inside gvn's
processEquality. Predsimplify also had an odd trick:

  a = phi [0, a.i]
  b = phi [10, x]
  if a == 1:
    # here we know that b == x. (x could still equal 10 though.)

which I haven't seen replicated elsewhere in llvm.

I went through many iterations of the implementation trying to minimize the
compile time impact. So why predsimplify die?

The first largest problem is that it rarely fired. It turns out that people
don't write code which needs this sort of optimization in C very often, or
at least in the llvm test suite. Instcombine already does a ton of
optimization that overlapped with predsimplify in its demanded bits
analysis, and that does a much better job of getting the cases that show up
in real-world code. The cost of the analysis that predsimplify was doing,
eagerly building up its value graph was too expensive to fit in llvm,
especially when instcombine's demanded bits analysis tended to get the
cases already. If your IR looks different, it might be worth adding to the
pass pipeline for you.

I think a future VRP (value range propagation) pass would need to be a
superset of simplify-demanded-bits, so that we could remove the cost of
running simplify-demanded-bits to help pay for the VRP. I've been thinking
about the data structure we could use for that, but haven't worked out all
the operations yet.

Nick

In particular:

Thanks, LLVM Ol’ Timers! Your ghost stories are scary enough that I’ll tread quietly past predsimplify’s grave.

I was peeking into this because - and I’m happy to be corrected if I’ve misdiagnosed these - we’ve seen a few VRP-related bugs cropping up lately:

  1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/079038.html (remove alignment checks)
  2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079345.html (remove tag bit tests)
  3. http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080314.html (remove FP range checks, patch submitted)
  4. http://llvm.org/bugs/show_bug.cgi?id=17683 (range check to improve division)
  5. http://llvm.org/bugs/show_bug.cgi?id=17713 (propagate FP equalities, patch committed)

The general problem of exploiting path sensitive facts for optimization is a very open one. I’ve been thinking about this myself a bit recently, but haven’t come up with a particularly workable approach yet. The major problem is finding the right trade off between compile time and optimization quality.

One simplification I think I’ve talked myself into is completely ignoring merging of conditions (i.e. dataflow) and simply rely on dominating conditions. At least so far, this seems to handle most of the cases I care about. (I’m mostly looking at cases involving either null checks or range checks.)

I’ve more or less convinced myself that any approach taken would have to be lazy, and make aggressive use of analysis caching. The only exception to this might be a early dominance based early propagation restricted to specific narrow cases. (Either an extension to EarlyCSE, or something similar.)

In an ideal world, the analysis results would feed all other optimization passes (in particular, InstSimplify and InstCombine). Not sure if this is doable for version one though.

It’s also worth mentioning that a between LVI, GVN, & LoopUnswitch, we actually do get a lot of conditions inside loops. In particular, any check inside a loop header block with an immediately dominating check outside the loop is generally taken care of.

Philip