Getting meta-data into Clang's AST

I completed (quite easily) the first part of my task which was to get
clang to parse some c++ objects and write out some serialization
mappings for them.

Now I have seemingly much harder task which is to parse some c++
objects and not only get their binary layout but also to be able to
read some pretty arbitrary meta data about each member.

Is there a recommended way to add in meta data to the Clang AST from
the parser? I noticed comments seem to be completely stripped; is
this the case or did I mis-read something?

Comments would be ideal as the other compilers for these objects will
then definitely ignore the data, but I could also declare a special
objects with string constants or something like that; assuming it
didn't change the binary layout of the object.

Has anyone done something like this before?

Chris

I completed (quite easily) the first part of my task which was to get
clang to parse some c++ objects and write out some serialization
mappings for them.

Great!

Now I have seemingly much harder task which is to parse some c++
objects and not only get their binary layout

ASTRecordLayout has that information.

but also to be able to
read some pretty arbitrary meta data about each member.

Is there a recommended way to add in meta data to the Clang AST from
the parser?

Attributes, pragmas, and comments are the typical approaches. I would recommend *not* using pragmas, because associating them with specific declarations is a real pain. Attributes are okay for lightweight metadata, and in the future it'll be far easier to add your own attributes. Comments give the most flexibility, although you'll still need to solve the problem of associating a comment with the declaration(s) it applies to.

I noticed comments seem to be completely stripped; is
this the case or did I mis-read something?

Yes, this is the case, although it was actually a recent chang. See

  http://llvm.org/viewvc/llvm-project?view=rev&revision=99007

where I ripped out our handling of comments because they were completely unused. I'm not opposed to bringing comments back if they're actually going to be used for something real *cough* Doxygen parsing *cough*.

Comments would be ideal as the other compilers for these objects will
then definitely ignore the data, but I could also declare a special
objects with string constants or something like that; assuming it
didn't change the binary layout of the object.

Has anyone done something like this before?

Not in Clang, but it's fairly common to use comments for metadata.

  - Doug

OK, lets look at attributes for a second.

What are they? A google search for clang attributes of course doesn't
turn up much.

Really, if I could embed arbitrary text in an attribute associated
with a member variable this would do it; I could design a small DSL to
do what I need. Does this sound feasible, even if this text is
somewhat verbose?

If not, the changelist about the comments looks pretty involved; I
will want to discuss that further but I am really hoping that
attributes will do.

Chris

I'm actually looking into this (not Doxygen, mind you, but Synopsis). I tried a little with the CIndex API, and concluded that it's likely more efficient to use the AST API directly (the RecursiveASTVisitor, notably), and I'm prepared for frequent adjustments. (This code will be as much "in development" as CLang itself, though perhaps with the next (2.8) release things will be sufficiently stable for me to be able to do an official Synopsis release using that API.)

As soon as I have some basic LLVM AST -> Synopsis ASG translation working, I'll come back with requests for comments (pun intended ! :wink: ).

FWIW,
         Stefan

OK, now I know what you mean by attributes; these won't work for me.

You meant things like _cdecl and such, correct?

I am going to try to use the generator to generate a portion of the
translation layer from a complex representation of a problem back to a
simple one. I need to be able to note type and variable names at the
minimum and perhaps more. Thus comments would be more appropriate.

So what is the best way to go forward? Should I merge this patch back
into my system? I believe this is still doable.

Chris

not really, attributes are part of C++0x specification using the syntax [[...]] but gcc had another syntax since a long time (__attribute__((...)) ):
http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html
MSVC has something similar with __declspec

if you want other compiler to ignore your attribute, I think you need to put them in a macro:

#if METADATAPARSING
#define ADD_MYATTRIB(param) __attribute__((myattrib(param)))
#else
#define ADD_MYATTRIB(param)
#end

class A {
     int m ADD_MYATTRIB("blabla");
};

or something like that.

Your project seems interesting, will the code be public?

regards,

Ah, I see, thanks for that info.

So Clang at this point supports both types of attributes, I would
guess? Are there examples using one or the other (ideally the c++0x
standard)? Where are these objects placed on the AST?

This sounds much better than comments so far.

My project is being paid for by my employer, NVIDIA, so making any of
it public would be a negotiation to say the least. It really would
depend on how independent I can make it of the PhysX code base.

At this point I am not pursuing that option but the first portion was
completely independent of the code base and if I do it correctly this
portion will be also. If people are interested I will certainly press
for it once my current set of objectives are met.

I do feel that c++ has needed good introspective capabilities for a
long time and I am carefully walking down that general path.

Chris

Ah, I see, thanks for that info.

So Clang at this point supports both types of attributes, I would
guess?

Yes.

Are there examples using one or the other (ideally the c++0x
standard)?

The GCC documentation has examples of GNU-style attributes. The C++0x standard and the attributes papers have examples of C++0x attributes.

Where are these objects placed on the AST?

Typically, they are Attr objects (subclasses of Attr) that reside on Decl nodes in the AST.

Some attributes also have an effect on the type system, and are translated into bits in the Type classes, but you should think long and hard before adding another such attribute: changing types often has more consequences that one realizes.

  - Doug

Hi!

Douglas Gregor meinte am 06.07.2010 18:58:

So Clang at this point supports both types of attributes, I would
guess?

Yes.

But to make use of the C++0x style attributes you have to use the
-std=c++0x option, correct?

Is it possible to add custom attributes to Clang without changing it's
code? Are there any hooks? I'm trying to build on top of the binary
distribution of LLVM and Clang just using their libs. Up to now I found
hooks in the preprocessor to add pragma and comment handlers but nothing
for attributes.

When I just add some custom attributes to the source they are parse
without any error but then just discarded. All I need is to have them in
the AST besides the built-in ones.

Regards, Jan.

But to make use of the C++0x style attributes you have to use the
-std=c++0x option, correct?

Currently, yes. Also note that the implementation is a little outdated; there were changes to the way attributes are parsed in the most recent draft and those are not reflected in the source.

Is it possible to add custom attributes to Clang without changing it's
code? Are there any hooks? I'm trying to build on top of the binary
distribution of LLVM and Clang just using their libs. Up to now I found
hooks in the preprocessor to add pragma and comment handlers but nothing
for attributes.

No. Adding custom attribute hooks is a good idea, and I'll consider it after I finish my current attribute rewrite.

When I just add some custom attributes to the source they are parse
without any error but then just discarded. All I need is to have them in
the AST besides the built-in ones.

To do this much, you would need to currently add the attribute to include/clang/Basic/Attr.td, include/clang/AST/Attr.h, include/clang/Parse/AttributeList.h, lib/Parse/AttributeList.cpp, and lib/Sema/SemaDeclAttr.cpp, I believe. If you want them to use the C++0x syntax, you would additionally need to look at lib/Parse/ParseDeclCXX.cpp, near the bottom.

My project is to simplify this process somewhat, but it's currently somewhat stuck awaiting reply from the GCC devs about their attribute syntax.

Sean

Sean Hunt meinte am 09.07.2010 09:54:

When I just add some custom attributes to the source they are parse
without any error but then just discarded. All I need is to have them in
the AST besides the built-in ones.

To do this much, you would need to currently add the attribute to
include/clang/Basic/Attr.td, include/clang/AST/Attr.h,
include/clang/Parse/AttributeList.h, lib/Parse/AttributeList.cpp, and
lib/Sema/SemaDeclAttr.cpp, I believe.

Thanks for the details, but I really want to avoid modifying the Clang
sources. Fortunately searching through the docs I happened to noticed
the annotation attribute. So instead of

__attribute__(foo)

I use

__attribute__((annotate("foo")))

This get's transferred to the AST just fine. It would be great to have
hooks for custom attributes some day, until then this is a nice
workaround IMHO.

Regards, Jan.