Approaches for handling branches

Hi,

LLVM newbie here. I'm working on writing a compiler from a toy language to
LLVM assembly, and hoping to get some advice about how to handle certain
kinds of branches.

Consider an input program that might look like this if written in C:

   x1 = cond1 ? f(a1)
      : cond2 ? f(a2)
      : cond3 ? f(a1)
      : f(a4);

   x2 = cond3 ? f(a3)
      : cond4 ? f(a1)
      : f(a4);

   return g(x1, x2);

I'm hoping to compile something like the above in a "good" way that avoids
unnecessarily calling F.

To mock up how this might work, I wrote some LLVM assembly to try to model
the above (full code below: "try1"). I ended up with the following for x1,
where @f is declared as an external function, and @ite is a simple
if-then-else function that seems to nicely get inlined:

   %fa1 = call i64 @f (i64 %a1)
   %fa2 = call i64 @f (i64 %a2)
   %fa4 = call i64 @f (i64 %a4)

   %x1.cond3 = call i64 @ite (i64 %cond3, i64 %fa1, i64 %fa4)
   %x1.cond2 = call i64 @ite (i64 %cond2, i64 %fa2, i64 %x1.cond3)
   %x1 = call i64 @ite (i64 %cond1, i64 %fa1, i64 %x1.cond2)

Unfortunately this seems to produce "bad" x86 code, at least when compiled
like this:

     opt demo.ll -O2 -o=demo.ll.opt && llc -o demo.s demo.ll.opt

In particular, the resulting demo.s appears to be doing all of the calls of F
first, followed by some conditional moves to select the right answer. (This
would of course be great if F is very fast, but it would be bad if F is
slow.)

I've tried adding some promising looking switches to the opt command (like
-delinearize and -sink) and using different register allocation schemes in
the llc command, but so far I don't seem to be able to get LLVM to reorder
the instructions to push the calls of F into the branches. (I'm afraid I
don't really know much about what the switches do.)

I seem to be able to do much better by (manually) rewriting the assembly so
that I explicitly check the conditions first, and only call F on the various
arguments in the branches where they are needed (code below: "try2").

So I could write my frontend so that it generates code like try2. But I
worry about this approach because the code for try2 ends up repeating
computations like the call of f(a1) and f(a4), and it seems like this will
lead to code blowup. My "real" programs will have much more deeply nested
branches, and the expressions in the branches are going to be bigger than
just a single function call. So if I try to put these expressions only in
the branches where they are computed, it seems like that will result in code
that grows exponentially in the number of branches that lead to a common
subexpression.

So I have many questions. I wonder:

  - Are there other command-line options I should look into that might help
    LLVM be able able to optimize "try1" to push the calls down into the
    branches where they are used and avoid the unnecessary calls of F?

  - Is this kind of reordering anything I should expect LLVM to be able to
    take care of in the first place, or should I expect to need to work on
    making sure that my frontend produces code that is closer to what I want
    the machine to execute?

  - Are there other approaches that would work better than try1/try2 that I
    haven't considered?

  - Are there any examples of developing LLVM frontends for languages that
    deal with terms that are rich in branches and structure sharing? My real
    goal is to build a good compiler for efficiently executing DAGs of 64-bit
    integer operations, where some nodes fan out to several operations, but
    large chunks of the DAG may not be needed depending on the inputs. I'd
    be very interested in seeing other approaches to this kind of problem.

Thanks very much for your time.

Cheers,

Jared

Try writing the same C code, feeding it through Clang and see how it optimizes, that should give you a sense of what’s possible/practical.

The main issue here, I think, is that LLVM doesn’t know/isn’t guaranteed that your function calls don’t have side effects (printing things, changing the contents of memory, etc) so it can’t cause there to be extra (or fewer) invocations of the function than are present in your original program without making the program incorrect/behave differently.

If you know your function calls have no side effects and can be called multiple times I think the attribute you want to add to the function declarations is ‘readnone’, but I’m not entirely sure. If you add that attribute to your functions, you should be able to see them potentially cloned or coalesced as is ‘best’ for the code.

But if you don’t have that guarantee from your input program/language spec, then you need to make sure there aren’t excessive/duplicate calls in your input you give to LLVM to optimize.