this is the midterm report of this years GSoC project "Finding and
reporting bugs caused by copy and paste".
The goal of the project is to scan C++ source code for identical or
similar pieces of code to reduce code redundancy and detect bugs
caused by copy-pasting.
The way this project approaches this problem is by hashing all Stmts
in the AST and then searching the hash codes for identical values.
For performance reasons all hashes are calculated with a new AST
hashing code that only needs linear time to hash all Stmts in an AST.
Also, the hashing is done with a locality-sensitive hash function that
maps similarly structured Stmts into the same hash buckets, therefore
also enabling us to search for similar Stmts.
So far, the project produced a finished patch that adds
postorder-traversal support for the RecursiveASTVisitor  and a
work-in-progress patch for adding a checker implementing above
functionality  (see  for the working branch).
The checker is right now able to find similar code pieces, finding
potential errors in them and provide suggestions for fixing them (see
 for an example use case).
The next steps are testing the checker on real-world projects and
preparing it for merging into upstream. Merging the checker into
upstream is especially important for the project as it would pave the
way for testing the code in production environments on real code
After the checker is finished, we focus on researching cross-TU
support for the clang SA checker framework and ensuring that the
hashing-code stays in sync with the clang AST API. Also scheduled are
improving the checker with new ways for finding code clones and
investigating how other parts of clang (for example Stmt::Profile or
the IdenticalExprChecker) can benefit from this project.
All in all, the project is currently following the proposed work
items, with the exception that we work through the work items in a
order that allows a incremental development process that improves
existing infrastructure instead of starting from scratch as originally
That's everything for now. Feel free to mail me if you have questions