walking macros with clang-c

Hi,

I am trying to use clang-c/Index.h to analyze header files. I can access most things of interest, but macro definitions appear to be completely opaque.

Is there a clang-c equivalent of the MacroInfo class?
http://clang.llvm.org/doxygen/classclang_1_1MacroInfo.html

In particular, I want to know isFunctionLike/isObjectLike and to access the argument and token lists. Variadic and builtin information would be nice but not necessary at the moment.

Here are two example macros that I'd like to parse. They are from H5Tpublic.h in the HDF5 library.

#define H5OPEN H5open(),
...
#define H5T_IEEE_F32BE (H5OPEN H5T_IEEE_F32BE_g)

Thanks,
Daniel

P.S. If the answer is "help wanted, shouldn't be hard to add", please direct me to the relevant source files. Otherwise, I can stick with my existing hodge-podge of sed scripts for macros.

Hi,

I am trying to use clang-c/Index.h to analyze header files. I can access
most things of interest, but macro definitions appear to be completely
opaque.

Is there a clang-c equivalent of the MacroInfo class?
http://clang.llvm.org/doxygen/classclang_1_1MacroInfo.html

In particular, I want to know isFunctionLike/isObjectLike and to access
the argument and token lists. Variadic and builtin information would be
nice but not necessary at the moment.

Here are two example macros that I'd like to parse. They are from
H5Tpublic.h in the HDF5 library.

#define H5OPEN H5open(),
...
#define H5T_IEEE_F32BE (H5OPEN H5T_IEEE_F32BE_g)

libclang doesn't actually expose that information, but it would be really cool if it did.

P.S. If the answer is "help wanted, shouldn't be hard to add", please
direct me to the relevant source files. Otherwise, I can stick with my
existing hodge-podge of sed scripts for macros.

Yeah, that's pretty much the answer for this one :slight_smile:

Here's how I'd tackle it: add a function that takes a macro-definition cursor, which would go into tools/libclang/CIndex.cpp. This function would re-preprocess that line of source code to create a MacroInfo object and return information about it (e.g., the tokens can be CXTokens, the macro arguments can be an array of strings, etc.).

There are other implementation approaches, but they would require more invasive changes.

  -Doug

Is there a clang-c equivalent of the MacroInfo class?
http://clang.llvm.org/doxygen/classclang_1_1MacroInfo.html

In particular, I want to know isFunctionLike/isObjectLike and to access
the argument and token lists. Variadic and builtin information would be
nice but not necessary at the moment.

...

libclang doesn't actually expose that information, but it would be really cool if it did.

...

Here's how I'd tackle it: add a function that takes a macro-definition cursor, which would go into tools/libclang/CIndex.cpp. This function would re-preprocess that line of source code to create a MacroInfo object and return information about it (e.g., the tokens can be CXTokens, the macro arguments can be an array of strings, etc.).

Thanks for the reply. Finally made some progress on this. My initial experiment is giving "interesting" results... The following snippet is C code, but compiled with C++ because I was experimenting with MacroInfo in the same file. Haven't gotten far enough to know whether the MacroInfo API will help.

   if(kind==CXCursor_MacroDefinition)
     {
       CXSourceLocation sl=clang_getCursorLocation(cursor);
       CXSourceRange sr=clang_getCursorExtent(cursor);

       CXToken *tokens;
       unsigned num;
       clang_tokenize(tu, sr, &tokens, &num);
       for(int i=0; i<num; i++)
   {
     clang_getTokenKind(tokens[i]);
     CXString s=clang_getTokenSpelling(tu, tokens[i]);
     CXTokenKind tk=clang_getTokenKind(tokens[i]);
     printf("\ttoken, %s: %s\n", TokenSpellings[tk], clang_getCString(s));
     clang_disposeString(s);
   }
       clang_disposeTokens(tu, tokens, num);
     }

Here's a sample header file snippet to be walked.

#define H5T_UNIX_D32LE (H5OPEN H5T_UNIX_D32LE_g)
#define H5T_UNIX_D64BE (H5OPEN H5T_UNIX_D64BE_g)
#define H5T_UNIX_D64LE (H5OPEN H5T_UNIX_D64LE_g)
H5_DLLVAR hid_t H5T_UNIX_D32BE_g;
H5_DLLVAR hid_t H5T_UNIX_D32LE_g;

and here's some sample output.

visited H5T_UNIX_D32LE (macro definition)
         token, Identifier: H5T_UNIX_D32LE
         token, Punctuation: (
         token, Identifier: H5OPEN
         token, Identifier: H5T_UNIX_D32LE_g
         token, Punctuation: )
         token, Punctuation: #
visited H5T_UNIX_D64BE (macro definition)
         token, Identifier: H5T_UNIX_D64BE
         token, Punctuation: (
         token, Identifier: H5OPEN
         token, Identifier: H5T_UNIX_D64BE_g
         token, Punctuation: )
         token, Punctuation: #
visited H5T_UNIX_D64LE (macro definition)
         token, Identifier: H5T_UNIX_D64LE
         token, Punctuation: (
         token, Identifier: H5OPEN
         token, Identifier: H5T_UNIX_D64LE_g
         token, Punctuation: )
         token, Identifier: H5_DLLVAR
visited H5T_C_S1 (macro definition)
         token, Identifier: H5T_C_S1
         token, Punctuation: (
         token, Identifier: H5OPEN
         token, Identifier: H5T_C_S1_g
         token, Punctuation: )
         token, Identifier: H5_DLLVAR

Notice how each set of macro tokens is including the first token from the next line? Either there's a bug in clang_getCursorExtent, or I misunderstand its documentation. I can always skip the last token if that's the right thing to do.

Clarification from somebody with experience would be appreciated. I'd like to understand this before converting the CXSourceRange back into C++.

- Daniel