"Information retrieval"-y idea suggestions for course project.

I'm currently taking an "information retrieval" (e.g. search engine,
text categorization, information extraction, information
visualization, etc) course and I have to do a pretty large relatively
open-ended project. I was thinking of doing something related to LLVM
or Clang, and would like to get some ideas from the community.

I've pinged a couple community members already, and here is a sampler
of ideas that have come up:

* a "blame" that digs up as much info as it can: commit messages,
relevant code review threads, etc.
* a way to track trends in C++ usage (naked new/delete, pimpl, RAII, etc.)
* using MC to dig around in machine code looking for performance bugs
* astmatchers ??? PROFIT

I'd like to hear any interesting ideas in this general area.

This is a larger idea which I think I might be able to get a first prototype of:

"Clang Auto-Patcher": Look at textual diffs and use Clang to get a
more semantic diff, and then suggest other places in the codebase
where the same change could happen (and directly produce patches, if
requested). Think like migrating to a new API. I think that Bill's
recent work with attributes would make a good testbed to get a
prototype working.

-- Sean Silva

You may want to look at http://coccinelle.lip6.fr/

Additionally, there's spdiff (http://www.diku.dk/~jespera/) which I am the
author of. Please feel free to contact for if there is anything at all that
I can help with.