IR2Builder
Hello, I had created a tool - IR2Builer - for converting LLVM IR into C++
IRBuilder API calls. The purpose of this post is to gather ideas and see if this
would be a valuable addition to the set of tools that are part of the LLVM
repository.
The tool itself is not fully finished (see testing section bellow), but can be
(and was) used already. It can be found in my fork of LLVM: GitHub - mark-sed/llvm-project at ir2builder.
Motivation
The main motivation for me was all the cases where I had to generate functions
using IRBuilder. It is not always possible to use .ll file and just load it as
there might be compile-time or runtime dependencies.
In almost all of these cases I had always written first he LLVM IR version and
then I had rewritten it in using IRBuilder API calls. Using this tool helps with
the rewriting.
Another use case is for learning LLVM. When I started learning LLVM (not that
far back) I had issues with code generation and understanding all the
connections in IRBuilder. So one could write the LLVM IR and then see how it can
be constructed using this tool.
For some it might be useful to convert their always included .ll files
into cpp code.
Implementation and use
The current implementation is done as a standalone LLVM target implemented in
one cpp file. It uses LLVM to parse the input IR file and then traverses it and
generates C++ code in textual form.
Currently it is possible to generate just the function (probably most useful),
but also whole module and main. The latter is mostly for testing purposes.
Users can also specify variable names for IRBuilder, LLVMContext, Module or
llvm scope, so it fits straight into their code.
./ir2builder code.ll -o code.cpp
Testing and known issues
I had created a simple bash script, which ran this tool on the whole llvm/tests
directory for all .ll files for opt:
- It created the fully runnable cpp code (
./ir2builder test.ll --runnable > test.cpp
) - Then compiled this using
llvm-conf
(g++ test.cpp $(./llvm-conf ...) -I./include/ -o test
) - Ran the generated binary (
./test > generated.ll
) - Compared original with generated using llvm-diff (
./llvm-diff test.ll generated.ll
)
Test was considered successful (pass) if there was no compilation error and
llvm-diff succeeded. Tests that were not for opt are marked as âskippedâ.
I can shared the full results (in google sheet or something), but here are the
main numbers:
Test set | Passed | Failed | Skipped |
---|---|---|---|
Analysis | 1003 | 450 | 3 |
CodeGen | 501 | 301 | 18166 |
Transforms | 5895 | 3581 | 162 |
Assembler | 18 | 16 | 436 |
Bitcode | 38 | 10 | 223 |
Examples | 5 | 2 | 4 |
Feature | 15 | 6 | 62 |
Instrumentation | 355 | 101 | 2 |
LTO | 30 | 16 | 104 |
Other | 84 | 40 | 27 |
ThinLTO | 99 | 79 | 104 |
Verifier | 16 | 6 | 317 |
ÎŁ | 8059 | 4608 | 19610 |
Pass rate: 63.6 %
Fail rate: 36.4 %
I would personally prefer to see higher pass rate, but most of these failures
are tied to some of the known issues, which can be fixed (see bellow).
Known issues
- Incorrect GEP return type generation - This issue is mostly just me know
knowing the correct call to get the return type that I need to provide to
CreateGEP
call. This one should be easy to fix and in the test set I can see
1030 mentions of this error (there can be multiples in one test). - ConstantExpr creation - Once again I was not sure how to correctly generate
ConstantExpr from the IR and this one is present 17316 times in the test set.
So adding this should help with the pass rate the most. - BasicBlock addresses - Every function is generated in its own scope to
avoid having issues with same named variables across functions, but this is an
issues for BlockAddresses creation. - Float errors - llvm-diff fails sometimes because of different floats, which is
caused by rounding error and not always correct output of the value. - Incorrect tests - Some tests fail when being parsed and loaded into the tool.
- Missing call bundles
The current output is not well formatted, but my idea is that the code generated
can be easily run through clang-format
and that sorts this issue.
Current and general solution for known and unknown issues
The tool counts on new stuff being added to LLVM and not having it updated straight away
and for this and known issues there is a simple approach - let the user
create this. Currently for the missing parts, such as ConstantExprs, ir2builder
places a comment such as /* TODO: ConstantExpr creation */
in the part where
this should be created and the user can create it there. This will also usually
fail the compilation and this comment is visible in the error snippet.
Summary
This tool still needs some tweaks (which I am happy to add), but currently I
find it to be in a usable state.
It is also made so that it can bring some use even when missing some newer
constructs and could be used even for learning.
I personally think it would be a good addition to the LLVM tooling.
What do you think?