New LLVM C front-end: "clang"

Hi Everyone,

I'm happy to say that we just got approval to open source the new C front-end for LLVM we've been working on.

The goal of this work is to provide a high quality front-end for LLVM that is built with the same principles as the rest of LLVM (it is built as a set of reusable libraries, integrates well with rest of the LLVM architecture, same license, etc). Among other things, this means that LLVM can now be used for a variety source-level analysis and transformation tasks that it was not suitable for before.

For more information about the motivation behind this work, please see Steve's talk at the LLVM '07 developer meeting:
   http://llvm.org/devmtg/2007-05/09-Naroff-CFE.pdf
   http://llvm.org/devmtg/2007-05/09-Naroff-CFE.mov

Currently, there is not a lot of documentation for the project, but the code is well commented, and there is a README.txt file at the root of the tree with some notes, here: http://llvm.org/svn/llvm-project/cfe/trunk/README.txt

While this work aims to provide a fully functional C/C++/ObjC front-end, it is *still very early work*. In particular, there is no real ObjC or C++ support yet (these are obviously big projects), and C support is still missing some features. Some of the more notable missing pieces of C support are:

1. The parser currently skip initializers in braces "int A = { 1,2,3,};" without trying to understand them.
2. The semantic analyzer does not produce all of the warnings and errors it should.
3. The LLVM code generator is still very early on. It does not support many important things, like any support for structs and unions. That said, it does handle scalar operations and vectors.
4. We don't consider the API to be stable yet, and reserve the right to change fundamental things :slight_smile:

We plan to continue chipping away at these issues until C works really well, but we'd love help from other interested contributors. I did create a new CFE component in bugzilla, but problems are probably better reported on cfe-dev for now.

If you are interested in contributing, or just want to follow along with the project, I recommend signing for the cfe-dev and/or cfe-commits lists:
   http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
   http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Note that we primed the cfe-commits list will all the revision history of the project, it doesn't normally get this much traffic ;-).

If you'd like to check out and build the project, the current scheme is (many thanks to Reid for his help converting our CVS repository to SVN):

1. Check out llvm and build it (the C front-end uses libsupport and libsystem). You don't need llvm-gcc :slight_smile:
2. cd llvm/tools
3. svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
4. cd clang
5. make

We will eventually integrate this better as a sub-project, but for now it builds a single tool named 'clang'. If you're not on a Mac, you'll need to make one change: paths to system header files are currently hard coded into the tool. To get this to work for you, you'll probably have to change clang/Driver/clang.cpp:606 to include the paths that 'touch empty.c; gcc -v empty.c -fsyntax-only' prints (it should list some lines after "#include <...> search starts here:").

Once this is built, you can compile C code (amazing, I know!). The clang driver takes a lot of GCC compatible options, which you can see with 'clang --help'. As a simple example:

$ cat ~/t.c

typedef float V __attribute__((vector_size(16)));
V foo(V a, V b) { return a+b*a; }

Preprocessing:
$ clang ~/t.c -E
# 1 "/Users/sabre/t.c" 1

typedef float V __attribute__((vector_size(16)));

V foo(V a, V b) { return a+b*a; }

Type checking:
$ clang -fsyntax-only ~/t.c

GCC options:
$ clang -fsyntax-only ~/t.c -pedantic
/Users/sabre/t.c:2:17: warning: extension used
typedef float V __attribute__((vector_size(16)));
                 ^
1 diagnostic generated.

Pretty printing from the AST:
$ clang ~/t.c -parse-ast-print
typedef float V __attribute__(( vector_size(16) ));

V foo(V a, V b) {
   return a + b * a;
}

LLVM code generation:
$ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llvm-dis
define <4 x float> @foo(<4 x float> %a, <4 x float> %b) {
entry:
         %mul = mul <4 x float> %b, %a ; <<4 x float>> [#uses=1]
         %add = add <4 x float> %mul, %a ; <<4 x float>> [#uses=1]
         ret <4 x float> %add
}
$ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc -march=ppc32 -mcpu=g5
..
_foo:
         vmaddfp v2, v3, v2, v2
         blr
$ clang ~/t.c -emit-llvm | llvm-as | opt -std-compile-opts | llc -march=x86 -mcpu=yonah
..
_foo:
         mulps %xmm0, %xmm1
         addps %xmm0, %xmm1
         movaps %xmm1, %xmm0
         ret

etc.

In any case, we welcome questions, comments, and especially patches :). To avoid spamming llvmdev, please take all discussion to the cfe-dev mailing list, and patch submission/discussion to cfe-commits.

Thanks!

-Chris

This is very good news. One step closer to World Domination.

                                         -Dave