Hi all,
I’m getting more impressed by LLVM day by day, but what’s a bit unclear to me now is the order of optimization passes, and their performance. I think I have a pretty solid understanding of what each pass does at a high level, but I couldn’t find any documentation about how they interact at a lower level.
I’d like to use LLVM for generating high-performance stream processing code at run-time. Obviously the resulting code should be as optimized as possible, but performing the optimizations themselves should also be very fast. The code I’m compiling is comparable to C (without any exception handling or garbage collection, so none of the related passes are needed). My first attempt at collecting useful optimizations looks like this:
passManager->add(new TargetData(*executionEngine->getTargetData()));
passManager->add(createScalarReplAggregatesPass()); // Convert to SSA form
passManager->add(createSCCPPass()); // Propagate constants
passManager->add(createInstructionCombiningPass()); // Peephole optimization
passManager->add(createDeadStoreEliminationPass()); // Dead store elimination
passManager->add(createAggressiveDCEPass()); // Aggressive dead code elimination
passManager->add(createCFGSimplificationPass()); // Control-flow optimization
I have several questions about this:
-
Does ScalarReplAggregates totally superscede PromoteMemoryToRegister? I think I need it to optimize small arrays, but what is the expected added complexity?
-
Does SCCP also eliminate multiplying/dividing by 1 and adding/subtracting 0?
-
Is it arbitrary where to place InstructionCombining? Is there a better order?
-
Is DeadStoreElimination still necessary when we have AggressiveDCE?
-
What are the tradeoffs between the different dead code elimination variants (why not always use the aggressive one)?
-
Is there a better place for CFGSimplification? Should I perform it at multiple points?
Also, my code will frequently have vectors, that are either initialized to all 0.0 or 1.0. This offers a lot of opportunity for eliminating many multiplications and additions, but I was wondering which passes I need for this (probably a Reassociate pass, what else)? And I also wonder whether these passes actually work with vectors?
Is there any other highly recommended pass for this kind of applications that I’m forgetting? Any passes that I better avoid due to poor gains and long optimization time?
Sorry for the many question marks. I don’t expect there is an absolute definite answer to each of them, but some guidelines and insights would be very welcome and much appreciated.
Thanks!
Nicolas Capens