[RFC] Resurrecting the C back-end

Hello all,

I am in need for a working C back-end for LLVM for my current research. I know that the previous incarnation of this back-end has been kicked out of the tree since the 3.1 release and I have gone through the archives to restore it to it's previous 'glory'.

So far, I have restored most of the previous version (excluding some of the parts that needed changes outside of the lib/Targets/CBackend directory) and I have made the necessary changes to get it back in 'working' state.

I have already had some short discussion on the IRC channel (with baldric IIRC) and he suggested to include type legalization in the list of passes to run before generating the output in order to get support for arbitrarily sized types.

Some other things I am considering for inclusion as improvements to the new CBackend include the following:

* Simplification of the output
   o Printing only the required set of headers/defines for a specific module
   o Reducing the number of explicit type casts in the generated code
   o Optionally removing the current prefix 'llvm_cbe_' to named variables
   o Only printing full prototypes of structs when their internal fields are actually referred to within the module. (e.g. when using library calls like fopen a complete description of the struct FILE is generated whereas a simple 'struct FILE;' should be sufficient.

* An option to insert software floating-point calls and/or library calls for things like division (I have an embedded processor as target system in my research which can not always support costly operations)

My hope is that, in generating a more simplified output, it is possible to produce a more friendly yet portable output.

Furthermore, some of the current features are outside of the scope of my current work and could make it more difficult for me to maintain the code.

For example, the previous back-end seems to put quite some emphasis on the different linkage types and the properties of various C compilers that are required in order to correctly represent them. My guess is that this is irrelevant for most of the use-cases of the C back-end while it could take me quite some time to support.

A similar example is the handling of inline assembler statements, which required a per-target support for e.g. the translation of register names. For now, this is not something I need (my target architecture is not supported by LLVM anyway) and I consider myself not yet familiar enough with the LLVM internals to offer support for this feature.

Anyway, that brings to my final question: Which features are critical/important/wanted/unwanted for a C back-end?

Cheers,
  Roel

Dear Roel,

Thank you for working on this!
C backend definitely deserves to be resurrected.

Good luck with your effort,
- D.

Shouldn't this be done by the C compiler instead?

Philipp

Will this allow users to compile C++ (or some other language that LLVM
has a frontend for) to C, which then can be compiled using a C compiler
for a target architecture, for which only a C compiler exists?
Which use-cases do you have in mind for this backend?

Philipp

I think the C backend also allow people performing source-to-source
transform with LLVM (instead of Clang).

ether

Hi Roel,

  It's good to know that you're working on C backend. But IMO, the reason that
C backend was removed in LLVM 3.1 is no one maintain the C backend. If you bring
it back, would you like to take the responsibility for the maintaining work?

Regards,
chenwj

I do not believe that this would be the case nor that it should be a goal. Source-to-source transformation requires a lot of accurate information about the AST, and conversion to LLVM IR is way too lossy. Signedness, for example, is lost at IR generation, as is any pretense of machine independence.

Why is it not possible to have the C backend emit machine-independent
code (i.e. C code that does not rely on implementation-defined behaviour)?

Philipp

Because LLVM IR already includes that implementation-defined behaviour.

Cameron

I'd like it to be easy to configure (e.g. to tell which size int is
assumed to have).

I'd prefer the resulting code to not rely on implementation-defined
behaviour (e.g. not make any assumptions about the size of int).

I'd like the resulting code to containe a lot of (use of data types,
keywords such as cost and restrict) that can be used to generate
optimized code.

As an example, assume I feed some code that uses an int variable into
LLVM. LLVM finds that the value of the variable is assigned only once,
and has a value in the range between 4 and 212. Then the corresponding
variable in the output could be a const uint_fast8_t.

Philipp

Sorry, it seems I misunderstood Joshua's mail the first time I did read
it. While the question I asked is one I want to ask, the context may
give a false impression.

Shouldn't work LLVM with the C backend this way:

* The original input is read, and implementation-defined behaviour in
there is assumed to have meaning based on some extra information
supplied (e.g. signedness of char, size of an int, etc)
* LLVM does transformations
* The C backend generates C code, which is machine-independent (i.e.
will behave the same no matter with which C compiler it is compiled with).

The last point is what I meant by "have the C backend emit
machine-independent code (i.e. C code that does not rely on
implementation-defined behaviour)". And I do not see how
implementation-defined behaviour included in LLVM-IR would prevent that.

Philipp

Without even getting too interesting:

- If your original C program uses uintptr_t (even within the bounds allowed by the standard), that will get turned into some concrete integer type in LLVM IR. But that type might not be large enough to fit a pointer on your target, e.g. if your implementation of C uses fat pointers.

- LLVM assumes that all pointers have the same width (inside of the same address space), but C does not require this.

- LLVM assumes that null pointers are represented with a zero bit pattern, but C does not require this.

Cameron

Will this allow users to compile C++ (or some other language that LLVM
has a frontend for) to C, which then can be compiled using a C compiler
for a target architecture, for which only a C compiler exists?
Which use-cases do you have in mind for this backend?

Possibly yes, compiling C++ to C would require support for things like exception handling which require more work to be represented in C. I expect that LLVM has routines to translate exception handling to more C compatible structures for usage in the other backends. However, this approach would probably limit the exception handling to work in a specific way when translated to C which might not be what the user of a C++ to C compilation flow would like.

In short, I'd need to think about how this should work and how much would need to be configurable for the user.

My current goal is to be able to use the C backend for my research. I work within the ASAM project [1] on datapath synthesis for application specific processors. I have created some application analysis methods within the LLVM framework and I want to compare their predictions with real-life results on our target architecture. It is difficult for me to implement my analysis within our target compiler as it is closed source but I still want to be sure that the application code has been optimized in the same way. Therefore I would like to be able to translate the optimized IR back to C and compile it using the target compiler without further optimizations. That way I can also support/control some optimizations better which are more difficult to control in the target compiler. (writing a complete backend for my target architecture is 'a bit' too much work for me)

  Roel

Philipp
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

[1] https://www.asam-project.org/

Hi chenwj,

I am aware of this and willing to support, at least, the basic operation of the C backend. I will need to do so for myself anyway and I see that there are many others which might benefit from it as well.

I am not sure though how much time I will have to support the larger, more complex features that are outside of my usecase as some of those might be difficult to replicate for me of outside of my current knowledge.

Anyway, I'd be interested to have the C backend back in LLVM and am willing to cooperate on that part for as much as I am able.

Regards,
  Roel

Possibly yes, compiling C++ to C would require support for things like exception handling which require more work to be represented in C. I expect that LLVM has routines to translate exception handling to more C compatible structures for usage in the other backends.

Not really. We have IR representation of stack unwinding, which doesn't translate well to C. You could probably implement a very expensive SJLJ mechanism.

However, this approach would probably limit the exception handling to work in a specific way when translated to C which might not be what the user of a C++ to C compilation flow would like.

A combination of setjmp/longjmp might work, but would be costly on the non-exceptional path. Anyway, this was a big limitation of the old C backend.

Sebastian

Anyway, that brings to my final question: Which features are
critical/important/wanted/unwanted for a C back-end?

I'd like it to be easy to configure (e.g. to tell which size int is
assumed to have).

I'd prefer the resulting code to not rely on implementation-defined
behaviour (e.g. not make any assumptions about the size of int).

Ok, Cameron showed me that this one isn't possible with LLVM.

I'd like the resulting code to containe a lot of (use of data types,
keywords such as cost and restrict) that can be used to generate
optimized code.

Here's my use-case: I would use LLVM as a kind of language and
optimization frontend, and use the free sdcc compiler (which has
excellent machine-specific optimization, but is somewhat lacking in
machine-independent optimization) as a backend.

Philipp