C Backend Ressurected

Hi All,

2 of my summer interns (Aimee Dipietro and Greg Simpson) used their time over the summer to resurrect the LLVM C Backend:

https://github.com/draperlaboratory/llvm-cbe

Improvements include recovery of simple for/while loops (instead of goto), better variable naming, inline asm support, and making it work on a more recent version of llvm. I believe they used the repository here as a starting point:

https://github.com/glycerine/llvm/tree/cbe_revival

Feedback is welcomed. I would like to see this feature put back into LLVM, and any help on how to make that happen would be appreciated.

Thanks,

-Rick

While I've no particular dog in this race, I will say that the usual
objection/point of contention here is: who will maintain it?

It was removed because it was bitrotting and unmaintained. Given the
burden it places on active developers to fix API changes, etc, in it,
there's a general expectation that whoever wants this feature must be
actively contributing to the project to, in some sense, pay in kind
for the work that will be necessary on the part of others over time.

- David

From: David Blaikie [mailto:dblaikie@gmail.com]
Sent: Monday, August 18, 2014 10:58 AM
To: Carback, Richard T., III
Cc: llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] C Backend Ressurected

(snip)

While I've no particular dog in this race, I will say that the usual
objection/point of contention here is: who will maintain it?

I'm happy to sign up to do it.

would you mind explaining what this backend is for, and its applications?

i guess it is to get the C code at the output of the backend, but dont get
why we need it.

thanks,
Jun

I don’t know how good it is, but the applications seem obvious. e.g. compiling programs in any of a number of original formats to run natively on CPUs that have a working simple C compiler (maybe only K&R or C89) but don’t have an LLVM back end.

The source program could be in modern C, C++, or any other high level language or assembly language with a translator to LLVM.

Hi,

I don't know how good it is, but the applications seem obvious. e.g.
compiling programs in any of a number of original formats to run natively
on CPUs that have a working simple C compiler (maybe only K&R or C89) but
don't have an LLVM back end.

The source program could be in modern C, C++, or any other high level
language or assembly language with a translator to LLVM.

Or similar, but not identical.
Code generation. Have some software that plugs together something in a
domain specific language. But the result should run on a not so
common microcontroler where you may only have a c compiler.
A lot of real time system environments use that kind of methods to
get the final code. There it could also be beneficial to pre optimize
the code on this level before you write out the c code.

Greetings
Mathias

Same here.

Dear LVVM developers,

2 of my summer interns (Aimee Dipietro and Greg Simpson) used their time

over the summer to resurrect the LLVM C Backend:

https://github.com/draperlaboratory/llvm-cbe

Improvements include recovery of simple for/while loops (instead of goto),

better variable naming, inline asm support, and making it work on a more

recent version of llvm. I believe they used the repository here as a

starting point:

https://github.com/glycerine/llvm/tree/cbe_revival

Feedback is welcomed. I would like to see this feature put back into LLVM,

and any help on how to make that happen would be appreciated.

I would like to ask you about phase of development

Last summer I was to try to do about the same but

I was to to some else academic projects istead

My work was trial to port CBackend for LLVM3.3 or later

and I was to mention four stages

  1. To be able to compile CBackend form LLVM 2.9 for new LLVM- that was finished

  2. To be able to run CBackend to produce output - I was not finished, yet because of segfaults

  3. To be able to run CBackend to produce compilable output - not started

  4. To be able to run CBackend to produce compilable and correctly working output - not started

I wish you to finish $th stage as soon as possible

My idea was to use CBackend to analyze code from several languages

using C language LLVM based analysis tools. Your work is great information for me.

McSema make my iodea more usable

A New Program Exists To Translate x86 Machine Code Into LLVM Bitcode

Compiler

Published on 11 August 2014 08:18 AM EDT

Written by Michael Larabel in Compiler

McSema has been officially open-sourced as an advanced program for translating x86 machine code into LLVM bitcode.

McSema is the latest program trying to allow taking x86 binaries and turning them back into LLVM bitcode.
http://www.phoronix.com/scan.php?page=homeom/scan.php?page=home

I look forward hearing from you

Yours faithfully

Peter Fodrek

Hi Rick,

Good to see that you've got it working this far!

I've added a link to your repository from my old page that listed the patches that went into what became your starting point.

Cheers,
  Roel

This is part of the problem with the C backend. This is very much not what it’s useful for, yet it very much looks like it is. The LLVM IR is target dependent, including things like structure layout, pointer size, and other ABI issues. Even with a resurrected C backend, you can’t use it as a substitute for real target support.

-Jim

It provides a useful starting point, but I agree with Jim that it is not a complete solution and requires rework of the results in a lot cases. I think we could improve it further to address these issues but that work is nontrivial.

If you are deciding between a quick and dirty implementation of a custom backend vs. the C backend, then the C backend is sometimes preferable in my experience although it depends on the complexity of the code you are trying to run and how often you need to change it.

Is the C backend at all suitable to be adapted to emit OpenCL code? Or
do the target-dependence, and/or things that C can do but OpenCL can't,
make that hopeless?
-Isaac

I can’t see why you’d want to do this, no.

-eric

I am not sure implementing a C backend is really what you want.
LLVM IR is in SSA CFG form.
I would have thought that the next step would have been trying to
convert it into AST, i.e. The format CLANG uses.
Once it is in AST, it is trivial to output in C or whatever other
language/format you wish.
I know going from CFG to AST is hard, but I think that would be a
better problem to solve.
As building AST from CFG is very different from building a C backend,
I would have though that it might have been good to use LLVM as a
library to read the LLVM IR format, manipulate the CFG into other
forms that are more AST friendly, and then write your own classes to
then convert that to AST.
Once in AST form, you could then use existing refactoring tools to
then output more readable C source code.

Well, that is just my idea, I am currently writing a tool to go from
Binary to LLVM IR. Once I have that done, I was going to do the next
tool to go from LLVM IR to AST.

Kind Regards

James

James - you should check out fracture (Binary -> LLVM IR open source project):

https://github.com/draperlaboratory/fracture

Another tool of this type is McSema:

https://github.com/trailofbits/mcsema

And yet another is Dagger:

http://dagger.repzret.org/

I'm biased towards the first as it's mine, and I know we're looking for contributors!

Thanks,

-R

I would like to ship an application that can compute on multiple brands
of modern GPU. I would like to write my GPU code in a slightly higher
level language than OpenCL's C variant -- for example, C++ templates
would be useful to have. One way might be compiling some higher level
language to OpenCL that I ship. Can you suggest better ways to do this?

Best,
-Isaac

I would like to ship an application that can compute on multiple brands
of modern GPU. I would like to write my GPU code in a slightly higher
level language than OpenCL's C variant -- for example, C++ templates
would be useful to have. One way might be compiling some higher level
language to OpenCL that I ship. Can you suggest better ways to do this?

Best,
-Isaac

This was the intended use case for SPIR