clang vs llvm for verifying code

Hi,

Next week I'm going to start to write a tool to verify the absence of buffer overflows in C code. I already wrote the backend (a model checker), but now I need a frontend to dump the states+transitions info.
My question is the following: should I dump this info directly from clang's AST or from LLVM bitcode? From what I understand, both have their advantages and disadvantages:

clang:
+ we're close to the source. it's easy to pinpoint the bug down to the line number
+ simple constant folding
- AFAIK, merging the separate files' AST into a single, whole-program AST is not there yet. I may be able to merge the info later in the chain, though.
- no (inter-procedural) alias analysis

llvm:
+ easy merging of the whole program into a single bitcode file
+ inter-procedural alias analysis. especially necessary for figuring out the callees of indirect function calls
+ possibility to run cheap optimizations to reduce the code size
+ the semantics of the C language are already lowered to simpler instructions
- not clear to me if it's possible to accurately pinpoint the location of an error to the source program (maybe take advantage of the debug info?)

CIL:
+ seems to be used by many in the verification community
+ easy merging of the whole program AST
+ many different alias analysis algorithms
- no C++ support
- written in ocaml... bah!

Does anyone has any other point to add? Any recommendation?

Thanks,
Nuno

Nuno Lopes wrote:

Hi,

Next week I'm going to start to write a tool to verify the absence of buffer
overflows in C code.

Clang contains the static checker, which detects this and similar
situations. It doesn't yet do interprocedural analysis, but I think it
would be easier and more useful to extend the checker than starting your
own project.

http://clang.llvm.org/StaticAnalysis.html

Is there any reason you can't use it as a starting point?

Sebastian

Next week I'm going to start to write a tool to verify the absence of buffer
overflows in C code.

Clang contains the static checker, which detects this and similar
situations. It doesn't yet do interprocedural analysis, but I think it
would be easier and more useful to extend the checker than starting your
own project.

http://clang.llvm.org/StaticAnalysis.html

Is there any reason you can't use it as a starting point?

In fact, I know clang's static analyzer quite well :slight_smile: I think I can claim that I've implemented the first static analyzer based on clang about 2 years ago (with some help and advise from Ted)..
Anyway, I'm now looking to another problem. The goal is not to find bugs per-se, but to give a proof that a particular piece of code doesn't have, for example, buffer overflows. For this, I'll apply a (abstraction refinement) model checker.
In summary, I don't see how I could reuse clang's static analyzer for this project.

Nuno

Hi Nuno,

I'll throw my 2 cents in since I have some experience in related areas
(I have working on source checking tools on top of both LLVM IR and
CIL).

My main recommendation would be to go with LLVM IR, unless you really
feel there are some semantics of the source language which are not
accurately captured in the IR. Especially for a "verification" tool,
it is very important that the core implementation be correct and this
is much easier to verify if the language you accept is LLVM IR. But,
of course, I'm biased... :slight_smile:

Some other comments interleaved:

Hi,

Next week I'm going to start to write a tool to verify the absence of buffer
overflows in C code. I already wrote the backend (a model checker), but now
I need a frontend to dump the states+transitions info.
My question is the following: should I dump this info directly from clang's
AST or from LLVM bitcode? From what I understand, both have their advantages
and disadvantages:

clang:
+ we're close to the source. it's easy to pinpoint the bug down to the line
number

If line number information is good enough for you, you can get this as
well from the debug info in LLVM IR. Klee does this and it works quite
well, and probably works better now that the optimization passes are
friendlier to debug info.

+ simple constant folding

With IR you can of course use the whole set of IR optimization passes.

- AFAIK, merging the separate files' AST into a single, whole-program AST
is not there yet. I may be able to merge the info later in the chain,
though.
- no (inter-procedural) alias analysis

A big '+' here is the ability to use information that is present in
the language but not the LLVM IR. For example, there are things in C
which are undefined behavior, but are well defined once they have been
compiled. If your analysis cares about this distinction, then you need
the C input.

llvm:
+ easy merging of the whole program into a single bitcode file
+ inter-procedural alias analysis. especially necessary for figuring out
the callees of indirect function calls
+ possibility to run cheap optimizations to reduce the code size
+ the semantics of the C language are already lowered to simpler
instructions
- not clear to me if it's possible to accurately pinpoint the location of
an error to the source program (maybe take advantage of the debug info?)

Another + you didn't mention is the capability of taking inputs which
are not written in C. Also, for debugging and testing you can test .ll
files directly, which may be simpler or easier to debug than their
corresponding C inputs.

CIL:
+ seems to be used by many in the verification community

Yes, although as far as I know it hasn't been under active development
for a while, although it does still get maintained. However, in
general it works very well for C code. One of the additional '+'s of
CIL is that it lowers many annoying constructs in C so that your
transformation/analysis gets to deal with a much more regular
language.

+ easy merging of the whole program AST
+ many different alias analysis algorithms
- no C++ support
- written in ocaml... bah!

This is a '+'. Ocaml is a language that is definitely worth learning
if you haven't used it, and doing source transformation with CIL is
quite elegant. That said, CIL does have a number of limitations and
these days I think it is easier to build a robust system on either
LLVM or Clang.

- Daniel

Hi,

Thanks Daniel for sharing your thoughts on this. I really appreciate it.
My initial intuition seem to agree with your points. Now I just need to convince my advisor :slight_smile: (your e-mail will certainly help)

Thanks,
Nuno

P.S.: about ocaml, well I've only written 2 lines of it last week (to fix a bug in a CIL plugin) and, well, it looks "different" at first sight, at least.