Using clang as a meta-data generator.

Hey,

I need to expose the binary layout of several c++ classes to a third
party system. I would like to use clang to parse the header files of
these classes and output a meta data table describing the count and
types of all member variables of a given class.

I have never looked at Clang and this is a very prototype project but
it could help a lot of people when they need to write, read, and
manipulate their c++ classes. An example usage, for instance, would
be to automatically produce boost::serialization bindings for your
classes. Or to a c++->XML or god forbid OODB mapping without needing
to change your original classes at all.

Has anyone tried to do this before using Clang?

Is it an appropriate usage? It seems like it would be a great test or
example usage of Clang as a pure front end.

If I should start, where should I start? My initial plan is to get
Clang, figure out which libs I need to produce the AST in memory (not
output to a file) and figure out how to manipulate this AST to find
all the target types and output their binary layout.

Chris

Hey,

I need to expose the binary layout of several c++ classes to a third
party system. I would like to use clang to parse the header files of
these classes and output a meta data table describing the count and
types of all member variables of a given class.

I have never looked at Clang and this is a very prototype project but
it could help a lot of people when they need to write, read, and
manipulate their c++ classes. An example usage, for instance, would
be to automatically produce boost::serialization bindings for your
classes. Or to a c++->XML or god forbid OODB mapping without needing
to change your original classes at all.

Has anyone tried to do this before using Clang?

Is it an appropriate usage? It seems like it would be a great test or
example usage of Clang as a pure front end.

Yes, absolutely. Clang was designed for this kind of thing.

If I should start, where should I start? My initial plan is to get
Clang, figure out which libs I need to produce the AST in memory (not
output to a file) and figure out how to manipulate this AST to find
all the target types and output their binary layout.

That's about it! One easy way to play with this is to check out the BoostConAction in the source tree, by looking through everything referred to via "boostcon". That will show you how to get to parsing an AST into memory and then walking it relatively quickly. The magic incantation to try out the BoostCon action is:

  clang -cc1 -boostcon source-file.cpp

The BoostCon action will eventually go away, once someone has time to write a decent tutorial on creating a new action.

  - Doug

OK, cool. I have it producing an AST. I don't need it to compile or
even produce a perfect AST, but I do need it to attempt to produce as
much of the AST as it can. It appears to do that; can I always count
on that behavior?

This is on a large, established code base (PhysX) so I can't mess with
the code base much nor do I want to.

So, just some notes: unless I use cc1, I get a bizarre error message
stating that clang itself can't execute "clang". It appears the
driver wraps the clang invocation with many more command line
arguments and relaunches. This doesn't appear to work on windows; at
least not right now.

__int64 isn't an intrinsic type so the windows c headers won't
compile. Judging by the compiler output, it looks like you would want
a truly portable std library implementation or something like that.

For this stage of the project, I just need to know variable names.

For the next stage, when I need to mangle binary data, I will need to
know the exact binary offsets of member variables. How do you
recommend I go about getting this information? It seems much later in
the pipeline but something must annotate the AST graph with such
information somewhere...

Chris

OK, cool. I have it producing an AST. I don't need it to compile or
even produce a perfect AST, but I do need it to attempt to produce as
much of the AST as it can. It appears to do that; can I always count
on that behavior?

Yes.

This is on a large, established code base (PhysX) so I can't mess with
the code base much nor do I want to.

So, just some notes: unless I use cc1, I get a bizarre error message
stating that clang itself can't execute "clang". It appears the
driver wraps the clang invocation with many more command line
arguments and relaunches. This doesn't appear to work on windows; at
least not right now.

I've not seen this issue before. Please file a bug with more information when you get the chance.

__int64 isn't an intrinsic type so the windows c headers won't
compile. Judging by the compiler output, it looks like you would want
a truly portable std library implementation or something like that.

Pass -fms-extensions to turn on Microsoft compatibility mode, which includes a definition of __int64.

For this stage of the project, I just need to know variable names.

For the next stage, when I need to mangle binary data, I will need to
know the exact binary offsets of member variables. How do you
recommend I go about getting this information? It seems much later in
the pipeline but something must annotate the AST graph with such
information somewhere...

ASTContext::getASTRecordLayout will give you this information for any complete RecordDecl.

  - Doug

Thanks a lot Doug!

I have enough compiling to take the next step. I just wanted to
finish up with where I am in case there is another quick set of fixes
that will help.

The exact error text is:

clang: error: unable to execute command: Couldn't execute program 'clang'
clang: error: clang frontend command failed due to signal 1 (use -v to
see invocation)

I will certainly file a bug, going through the login process of bugzilla now.

I am down to just three errors. I needed -ms-extensions and
-nobuildininc because clang has its own float.h header which was
conflict with the platform float.h.

Doing that, I am down to just three errors:

if(!ignore) __debugbreak();

Debug break intrinsic doesn't appear to exist, even with
-ms-extensions. It should be __asm { int 3 }.

next:

In file included from C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/new:6:
In file included from C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/exception:40:
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include/eh.h(61)
: error: unknown type name 'type_info'
_CRTIMP int __cdecl _is_exception_typeof(_In_ const type_info &_Type,
_In_ struct _EXCEPTION_POINTERS * _ExceptionPtr);

Don't know what to make of that yet.

In file included from C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/new:6:
In file included from C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/exception:41:
C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/malloc.h(252) : error: expected expression
            _ASSERTE(("Corrupted pointer passed to _freea", 0));

Continuing with the macro expansion:

C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/crtdbg.h:485:25: note: instantiated from:
#define _ASSERTE(expr) _ASSERT_EXPR((expr), _CRT_WIDE(#expr))
                        ^
C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/crtdbg.h:478:18: note: instantiated from:
                (_CrtDbgBreak(), 0))
                 ^
C:\Program Files (x86)\Microsoft Visual Studio
9.0\VC\include/crtdbg.h:593:24: note: instantiated from:
#define _CrtDbgBreak() __asm { int 3 }
                                   ^

The caret is under the first underscore of the asm block.

I have got enough to compile that I am totally confident to take the
next steps, btw, so thanks for you help Doug!

Chris

Clang does not currently implement enough of Microsoft's extensions to
compile most code that uses the c++ standard library included in
visual studio.

- Michael Spencer

Yep, that is cool; I have been completely successful with what I
wanted to do and am really excited to start working with clang much
more. C++ has needed a good, free frontend for a very long time (like
since it was ever conceived of) so you could auto-generate meta data
about your system. Clang is an excellently design system that really
feels solid to work with.

To seal this thread off with some answers for anyone else going the
same route....

To compile as much MS stuff as possible you need to use
-fms-extensions and -D_DEBUG -DWIN32 -nobuiltininc (for debug).

It was actually easier to copy cc1_main.cpp to my own project and
build my own executable reusing most of the code but hardcoding my own
action into the system. This is because:

LLVM doesn't build correctly into dlls (shared objects) on a windows system.
Thus clang doesn't build correctly into dlls on a windows system.
Thus writing a shared object plugin for clang isn't a possibility on windows.

Thus the easiest thing to do was to write an adjunct clang! It is
easier for my users, anyway, as they don't need to know more command
line options than absolutely necessary.

I noticed a couple thing about clang that I wanted to talk about.

1. The objects in the AST have too many responsibilities. I
understand why this is but I think it is not that clean. Printing and
dumping, in my opinion, should be completely handled outside of the
AST hierarchy for several reasons. The first is to avoid mixing
responsibilities and code bloat. Most of the AST objects do more than
one thing (although probably only one thing well). The second is to
ensure that someone coming along *could* write these routines as
plugins; it would orient those objects more towards being libraries
and less towards being end-all-be-all objects.

2. The clang macro for add_executable doesn't handle lists of files.
Thus if you use a glob routine to automatically add all the files in a
directory clang_add_executable won't work. To make this work the
cmake list object would need a flatten function; to do this you would
need a test to know if each object in the list was in fact another
list or just a single string name. I couldn't figure out how to do
this else I would have just posted my solution.

3. There really isn't an automatic plugin facility for clang in the
sense of put your directory, cmakelists.txt and files here and the
build process will automatically pick it up. I added this to the
clang system but I am unsure as to where to submit a patch. I added a
plugin directory and then also added a special CMakeLists.txt to that
directory that simple scans for subdirectories and adds every one,
ignoring ones that start with period and the cmake-created CMakeFiles
directory. That CMakeLists.txt is attached.

Hope this feedback helps! I am really impressed and excited about the
possibilities that clang enables!

Chris

CMakeLists.txt (616 Bytes)