Can you give me advice on the proper strategy for a Clang-using tool I'm writing?

Hi!

I started working with Clang a couple of weeks ago with an aim to developing a
small project that would be useful to me (perhaps to others as well) and to get
involved with the clang project and community. After a couple of weeks of
experimentation I’ve reached the point where it would be helpful to ask for
advice.

(By way of background I have had commercial development experience with compiler
development, which is somewhat dated but still relevant, but I have no experience
with using or modifying either LLVM or Clang.)

I’d like to briefly describe two versions of my project and then ask for your
advice on the best way to proceed.

I want to “break encapsulation” on C++ classes to better enable unit testing of
legacy code (or just plain poorly written code) that can’t be refactored, for
which developing better tests is a higher priority (given the constraints I’m
subject to) than changing the design.

I have two approaches in mind:

The first (which I’ve tested by hand) is to build a C++ source-to-source
processor that given a source file and some class names will parse the
translation unit and then emit two files: first, a C++ header that will contain
the named classes (but renamed) with all methods and fields in the same order as
the original but where all access is public (and it will also have the
appropriate #includes), and second, an assembly language file of thunks that
will implement the methods of the new proxy class by jumping to the methods of
the original class (and this file needs the mangled names of original and proxy
class methods). So that to write a unit test you create this proxy header and
use it (by a cast) on your objects-under-test and by linking with the assembly
thunks you can transparently access all public/protected/private methods and
fields of your instance in hand. (As I’ve said, this works when I’ve done it
by hand.)

The second approach is to use Clang/LLVM as a C++ compiler (not just the front
end of a tool) by inventing a new statement which is like a reverse friend
declaration, with its own special keyword. Placed in a method it will name a
class or method that you want to “break encapsulation” of. And it will act is
if the class or method named has a friend declaration pointing back to the
method with the break declaration statement. With its own keyword that can
easily be grepped for you can make sure this statement is used only in unit
tests and not production code. (In fact, it could be enabled as any other
language extension only if a compiler switch is present, and so you could easily
ensure that only unit test projects have that compiler switch.)

Here are my questions about these approaches:

  1. What packaged Clang functionality do I need?

a. Can approach 1 be done with strictly libclang (using the AST and the lexer
to guide modification of the source and to identify methods that need
assembly thunks)? Or do I need to step up to LibTooling + LibASTMatchers
(or LibAST). Or does it need a plugin? What is your recommendation?

b. Can approach 2 be done with LibTooling + LibASTMatchers + minimal changes
to clang so it accepts the new grammar with new AST nodes to match. My
idea there is, having parsed and traversed my new “break encapsulation”
declarations, to go right to the definitions of the targeted class and
modify the AST in-place to have an actual friend declaration pointing
back, and then to finish the compilation of the modified AST.

  1. How much of this work can be done on the Windows platform with Clang? And
    can it be done with Visual Studio or do I need to use an alternate native
    compiler for Windows?

I’ve become aware of restrictions of developing on the Windows platform.
Leave aside restrictions on what you can do with Clang/LLVM as a compiler
on Windows (e.g., at this time no exceptions or anything else that requires
compiler-rt) which would only affect my second approach. Even so I’ve found
things that make Windows/Visual Studio less than a perfect development
environment for Clang.

For example, I can’t confirm my compiled Clang/LLVM is correct (using the VS
12 Win64) platform because even though it compiles without error and the
unit tests all pass I can’t successfully run the “command line tests”
(http://clang.llvm.org/hacking.html#testingCommands) as I reported here
earlier (http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/042170.html) - I
got some help from the community there but the thread petered out and I’ve
been unable to continue with them. (Just for reference I’ve attached to
this email my last log of running the tests, with 122 unexpected failures
which are all some kind of lock error I don’t understand.)

Anyway, if I continue with the Windows platform can I be successful (or
should I switch to Linux)?

This email has been quite long, and I apologize, but I’d really appreciate your
help. I’d like to start my Clang/LLVM development with some chance of success
without getting greatly frustrated by not knowing some basic things that everyone
who is working in the code “just knows” from experience. So thanks in advance!
And I hope to contribute back to the Clang/LLVM community in the future …

– David Bakin

command-line-tests.zip (25.9 KB)

Hi!

I started working with Clang a couple of weeks ago with an aim to developing a
small project that would be useful to me (perhaps to others as well) and to get
involved with the clang project and community. After a couple of weeks of
experimentation I’ve reached the point where it would be helpful to ask for
advice.

(By way of background I have had commercial development experience with compiler
development, which is somewhat dated but still relevant, but I have no experience
with using or modifying either LLVM or Clang.)

I’d like to briefly describe two versions of my project and then ask for your
advice on the best way to proceed.

I want to “break encapsulation” on C++ classes to better enable unit testing of
legacy code (or just plain poorly written code) that can’t be refactored, for
which developing better tests is a higher priority (given the constraints I’m
subject to) than changing the design.

I have two approaches in mind:

The first (which I’ve tested by hand) is to build a C++ source-to-source
processor that given a source file and some class names will parse the
translation unit and then emit two files: first, a C++ header that will contain
the named classes (but renamed) with all methods and fields in the same order as
the original but where all access is public (and it will also have the
appropriate #includes), and second, an assembly language file of thunks that
will implement the methods of the new proxy class by jumping to the methods of
the original class (and this file needs the mangled names of original and proxy
class methods). So that to write a unit test you create this proxy header and
use it (by a cast) on your objects-under-test and by linking with the assembly
thunks you can transparently access all public/protected/private methods and
fields of your instance in hand. (As I’ve said, this works when I’ve done it
by hand.)

This sounds very close to #define private public before including the headers. Any reason this does not work for you? (for protected things, you can probably derive from the classes and provide accessors that way)

The second approach is to use Clang/LLVM as a C++ compiler (not just the front
end of a tool) by inventing a new statement which is like a reverse friend
declaration, with its own special keyword. Placed in a method it will name a
class or method that you want to “break encapsulation” of. And it will act is
if the class or method named has a friend declaration pointing back to the
method with the break declaration statement. With its own keyword that can
easily be grepped for you can make sure this statement is used only in unit
tests and not production code. (In fact, it could be enabled as any other
language extension only if a compiler switch is present, and so you could easily
ensure that only unit test projects have that compiler switch.)

Here are my questions about these approaches:

  1. What packaged Clang functionality do I need?

a. Can approach 1 be done with strictly libclang (using the AST and the lexer
to guide modification of the source and to identify methods that need
assembly thunks)? Or do I need to step up to LibTooling + LibASTMatchers
(or LibAST). Or does it need a plugin? What is your recommendation?

I’d probably go with libtooling. You usually want to use libclang, if you need a tool that you want to ship to customers, so that you need a stable interface for them to work against. Libtooling gives you more power at the cost of integrating with upstream clang changes (the core interfaces don’t change that wildly though, usually mostly when we allow clang to use more parts of a new C++ standard). Plugins are more for when you want to do some extra checking as part of every build.

b. Can approach 2 be done with LibTooling + LibASTMatchers + minimal changes
to clang so it accepts the new grammar with new AST nodes to match. My
idea there is, having parsed and traversed my new “break encapsulation”
declarations, to go right to the definitions of the targeted class and
modify the AST in-place to have an actual friend declaration pointing
back, and then to finish the compilation of the modified AST.

I think both approaches are overengineering the problem.
Often there are much simpler approaches; I recommend Feather’s “Working Effectively with Legacy Code”, which has many great ideas on how to minimally change legacy systems to get access points for unit tests.
http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052