llvm IR to C/C++ conversion


I am a beginner in llvm. In my one of the projects I need to convert llvm IR back into C/C++. I read on stackoverflow that llvm has discontinued this support. Can someone please explain me what are some drawbacks of such a conversion and can I use the old backend code safely with current versions of llvm to convert IR into C or if there is some other safe way to achieve so?


Abhishek Kumar

I'm not aware of theoretical drawbacks; it was removed due to lack of a maintainer (no one cared enough to keep it working). I suspect you'd have a lot of work to do to get the old one working again.

Skimming through the kaleidoscope tutorial should give you enough
clues on the "right" way to build IR using the LLVM C++ API's.

The cpp back end was removed because it had been broken for some time,
and didn't generate nice code anyway.

There are two discontinued back ends in LLVM that are often confused. The C++ back end generated calls to LLVM APIs for constructing the IR that was fed into it. It was removed largely because it never generated good API calls (e.g. it created instructions directly rather than via IRBuilder) and it bit-rotted to the extent that the code that it generated didn’t compile with new APIs.

The other back end was the C back end. This was largely discontinued because of lack of interest, but it always had problems because LLVM IR is strictly more expressive than C. There are many LLVM intrinsics that can’t be represented by ISO C, though a lot have GNU builtins that can be used instead. C can’t represent exception-handling constructs at all. C++ can, but implies the use of the C++ exception personality function and a specific format for the type info, whereas LLVM IR can include any personality function. There are also some subtle issues with regard to calling conventions: LLVM IR has some leaky abstractions in this regard, for example on some i386 targets a struct {int; int;} in C is returned as i64 and so transforming this back into C is very difficult, because there is no way to differentiate this from a function that returns a long long (or a union).

If the source code that generated the LLVM IR is C/C++ and you know about the target-specific lowering then you can probably reconstruct some C/C++ from it that will generate a compatible binary when compiled. Whether that is a sensible thing to do is debatable.