I want to present my project for GSoC 2011 for LLVM below. It would be very nice to hear suggestions and your opinion, thanks!
Superoptimization for LLVM IR
This project focuses on implementing superoptimization algorithms targeted at
the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to
discover new peephole optimizations that can be later integrated into LLVM
instruction simplify phase or as a separate peephole optimization pass.
Code optimization is the task of using a faster sequence of instructions which
compute the same information, compared to a previous code sequence. The old
sequence may then be substituted by the new one, creating a faster program
with the same behavior. The problem of ﬁnding a code sequence that does some
computation in optimal time is undecidable and, in practice, optimizations are
discovered by humans using analytical reasoning. Finding new optimizations in
an automated fashion, using a search algorithm, is a diﬀerent approach to the
problem and became known in literature as superoptimization, term coined by
Using computer-aided optimization discovery, diﬀerent kind of optimizations
unnoticed by humans may be detected, specially those speciﬁc to the target
machine code . This project will focus on implementing a superoptimizer
which will exhaustively investigate new optimizations for the LLVM IR. Using
this approach, I hope to greatly contribute to the current LLVM instruction
simplify phase, introducing new substitutions identiﬁed in this project.
There is already a superoptimizer for LLVM in initial stages of development.
The existing superoptimizer currently limits its search space to a few transfor-
mations. This project will focus on improving this current superoptimizer by
extending the number of transformations applied and thus greatly increasing
the search space.
In order to test the superoptimizer eﬃciency, it will use LLVM bitcodes
compiled from the entire LLVM test suite and SPEC to harvest possible code
sequence optimizations. When successfully identiﬁed, these optimizations could
be implemented directly as peephole optimizations for LLVM IR.
This project is expected to last three months, as follows.
• 1st month, 1st week - Study the LLVM IR in order to establish a
list of axioms, that is, valid transformations that determine the search
algorithm capability. There is an important trade-oﬀ to investigate here,
since using fewer transformations will lead to faster searches, but several
good optimizations may be missed.
• 1st month, 2nd week - Thorough study of the current implemented
base and new algorithm design to support arbitrary transformations.
• 1st month, 3rd week until end of project - Implementation of the
• 2nd month - Test phase begins. Test algorithm with simple transforma-
tions searching for optimized code sequences.
• 3rd month - Harvesting new optimizations and implementing ﬁxes to
the search algorithm as more performance feedback from training data is
I am a Master student at the University of Campinas (UNICAMP), Brazil (time-
zone GMT -3). In my master thesis, I have already worked with superoptimiza-
tion theory and automatic code sequence searching. But instead of using it for
optimization purposes, I use this to automatically generate a LLVM backend
based on a machine description using the ArchC architecture description lan-
guage. The backend patterns are ﬁxed and the implementation of the pattern
is inferred using a search algorithm similar to the one used on superoptimizers.
Thus, I have experience both on superoptimizer theory and LLVM code.
 Sorav Bansal and Alex Aiken. Automatic generation of peephole superopti-
mizers. SIGPLAN Not., 41:394–403, October 2006.
 Henry Massalin. Superoptimizer: a look at the smallest program. In
ASPLOS-II: Proceedings of the second international conference on Architec-
tual support for programming languages and operating systems, pages 122–
126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press.
Auler,GSoC2011.pdf (100 KB)