guidance on backend writing; canonical example?

> Chris Lattner*, Mon Nov 15 12:06:18 CST 2010, wrote:*
>

If anyone was really interested in this, I’d strongly suggest a complete rewrite of the C backend: make use the existing target independent code generator code (for legalization etc) and
then just put out a weird “.s file” at the end. -Chris

I see that Chris made the above suggestion a while ago. Are there other suggestions for how to re-architect the C backend? I’m thinking of helping with Roel Jordan’s effort to revive the C backend, but I don’t know the best way to implement backends. Is there a prototype to follow?

Thanks,
Jason

Hi Jason,

I was having the same kind of problem myself. After quite some thinking about it I see several solutions but I am having difficulties in deciding which way to go here.

First of all, we have a somewhat working version in place, however, as Chris was already saying, it could use a rewrite.

There are several problems with the current version that I've found so far.

The two main problems:
(A) Supporting target specific features, e.g. legal types, intrinsics, inline assembler, and many more...
(B) Supporting different compilers. Currently the CBE generates a huge list of defines and other support code even for an empty input bitcode. Some of these are related to possibly unsupported target features (e.g. floating point) and most should only be printed when they are actually needed.

The effect of both these problems is that there is a quite large proportion of the CBE code just for handling all the target and compiler specific features which makes it difficult for everyone to work on the CBE (or at least me).

In the past, some of the target specific features were implemented as parts of the targets themselves. The handling of inline assembly was one of them. This made it everyones problem to help supporting the CBE when working on a target backend.

On the other hand, I have been thinking on a radically different approach. My idea was that C output is probably not best qualified as a separate backend. Its an output format, just like assembler or binary code...

However, implementing it as such would imply an even further integration into all of the existing backends. Basically, creating a third output type of llc.

Currently a LLVM backend goes through several stages as described in [0]. In short, it selects the appropriate target instructions, schedules the code, allocates registers, and emits the resulting assembler in either binary or textual format.

The current CBE does only a small part of the first step [1]. It builds the initial operation graph and applies some optimizations (the first two steps from the list).

In order for the CBE to properly support target specific features, it should probably also do the next few steps (type legalization and possibly operation legalization). The problem here is that this requires a lot of knowledge about the target in the CBE.

I have been looking into this but it seems that we either need to figure out a way to load an existing target description into the CBE. Or we need to make the CBE in such a way that it only kicks-in after these steps are done. Making C an output format that branches from the normal backend flow after the operation legalization step.

Anyway, I have submitted a talk proposal the the upcoming LLVM conference in Paris to talk about exactly these choices and their effects. Hopefully, that will be accepted so that we can get some feedback and discussion about the possible designs.

I hope that I haven't confused things too much :wink:

Cheers,
  Roel

[0] http://llvm.org/docs/CodeGenerator.html
[1] http://llvm.org/docs/CodeGenerator.html#selectiondag-instruction-selection-process

I think the biggest challenge with a C backend is making it useful to
the audience.
With most other backends, they don't have to be human readable, so the
layout does not need to be pretty, it only needs to create efficient
code for a target CPU, that always has a restricted register space.
The C backend would be a very different animal. Unlimited register
space. The LLVM IR optimizations needed would be ones that make the
resulting C source code more readable, in effect, the opposite or
reverse of a majority of the current LLVM IR optimizations.
There is also the convertion from CFG (as used in LLVM IR) to AST (as
used in clang) in order to output C.
Conversion from CFG to AST is difficult, I think it would be useful to
have a whole new infrastucture to let the user easily refactor the AST
to make it look more readable when the final C is generated. The
reason for using AST, it to aid in reducing the amount of "goto"s and
instead provide structure.
Are there any other existing backends that have an unlimited register
space or do AST?

Are there any other existing backends that have an unlimited register
space or do AST?

The PTX backend is the only other (in tree) backend that has unlimited register space if I remember correctly

- Roel