Hopefully this is an acceptable list to ask a question about libtooling on:
Ultimately I’m trying to pull out relevant structures from thousands of existing c and c++ header files. I’ve been able to use libtooling to pull out a structure and all of the structures/enums/typedefs etc it relies on from various different headers. Unfortunately when I get the source range backing the Decls it still references the macros defined there in. I’m currently trying to find a way to access and print the source of these macros but not having much luck when multiple macros are defined.
For example:
#define INT int #define UNSIGNED unsigned #define NAME name
typedef struct {
UNSIGNED long INT NAME;
} test;
When I get the FieldDecl corresponding to name and get the SourceRange I see the spelling location pointing to “#define UNSIGNED unsigned”. I’d like to know how to get to the other macro definition’s source locations. I know that when I change “UNSIGNED long INT NAME;” to “unsigned long INT NAME;” the spelling location will then point to “#define INT int”. It seems as if declaration names are treated differently though as changing to “unsigned long int NAME;” leaves me with no spelling location.
Is there a way to get multiple spelling locations given a SourceRange? Do I need to narrow down the source range some other way? I’ve tried lexing to the next token but the doesn’t leave me with a new spelling location. I’m also going to have to account for macros in arrays such as “int bob[MAX_WIDTH][MAX_HEIGHT]” but I’m hoping once I figure out my issues here that will become clear. Thanks in advance for any help that can be provided. john
Hopefully this is an acceptable list to ask a question about libtooling on:
Ultimately I’m trying to pull out relevant structures from thousands of existing c and c++ header files. I’ve been able to use libtooling to pull out a structure and all of the structures/enums/typedefs etc it relies on from various different headers. Unfortunately when I get the source range backing the Decls it still references the macros defined there in. I’m currently trying to find a way to access and print the source of these macros but not having much luck when multiple macros are defined.
For example:
#define INT int #define UNSIGNED unsigned #define NAME name
typedef struct {
UNSIGNED long INT NAME;
} test;
When I get the FieldDecl corresponding to name and get the SourceRange I see the spelling location pointing to “#define UNSIGNED unsigned”.
With that you probably mean the spelling location of the start location? A SourceRange doesn’t have a spelling location
I’d like to know how to get to the other macro definition’s source locations. I know that when I change “UNSIGNED long INT NAME;” to “unsigned long INT NAME;” the spelling location will then point to “#define INT int”.
Again, I’m not sure which location you’re using.
It seems as if declaration names are treated differently though as changing to “unsigned long int NAME;” leaves me with no spelling location.
Is there a way to get multiple spelling locations given a SourceRange? Do I need to narrow down the source range some other way? I’ve tried lexing to the next token but the doesn’t leave me with a new spelling location. I’m also going to have to account for macros in arrays such as “int bob[MAX_WIDTH][MAX_HEIGHT]” but I’m hoping once I figure out my issues here that will become clear. Thanks in advance for any help that can be provided. john
All the info is in the SourceRange / SourceLocation; SourceLocation actually provides all relevant instantiation points.
It depends on:
which source location you’re querying against; if you have the Decl, like FieldDecl, generally getLocation() will get you the name (that is, the spelling loc will point at ‘name’ and the expansion locs will point at the #define NAME and the NAME; respectively).
whether you really want a range; for ranges, there’s Lexer::makeFileCharRange and Lexer::getSourceText for that
Hopefully this is an acceptable list to ask a question about libtooling
on:
Ultimately I'm trying to pull out relevant structures from thousands of
existing c and c++ header files. I've been able to use libtooling to pull
out a structure and all of the structures/enums/typedefs etc it relies on
from various different headers. Unfortunately when I get the source range
backing the Decls it still references the macros defined there in. I'm
currently trying to find a way to access and print the source of these
macros but not having much luck when multiple macros are defined.
For example:
#define INT int #define UNSIGNED unsigned #define NAME name
typedef struct {
UNSIGNED long INT NAME;
} test;
When I get the FieldDecl corresponding to name and get the SourceRange I
see the spelling location pointing to "#define UNSIGNED unsigned".
With that you probably mean the spelling location of the start location? A
SourceRange doesn't have a spelling *location*
That's correct. I do mean the start location of the range. Sorry for being
confusing here. I guess my first question is if given a range I can get to
all SourceLocations that contain a macro with associated spelling location
or if I need to go back to the Decl to get the next range/sourcelocation
I'd like to know how to get to the other macro definition's source
locations. I know that when I change "UNSIGNED long INT NAME;" to "unsigned
long INT NAME;" the spelling location will then point to "#define INT int".
Again, I'm not sure which location you're using.
It seems as if declaration names are treated differently though as
changing to "unsigned long int NAME;" leaves me with no spelling location.
Is there a way to get multiple spelling locations given a SourceRange? Do
I need to narrow down the source range some other way? I've tried lexing to
the next token but the doesn't leave me with a new spelling location. I'm
also going to have to account for macros in arrays such as "int
bob[MAX_WIDTH][MAX_HEIGHT]" but I'm hoping once I figure out my issues here
that will become clear. Thanks in advance for any help that can be
provided. john
All the info is in the SourceRange / SourceLocation; SourceLocation
actually provides all relevant instantiation points.
It depends on:
- which source location you're querying against; if you have the Decl,
like FieldDecl, generally getLocation() will get you the name (that is, the
spelling loc will point at 'name' and the expansion locs will point at the #define NAME and the NAME; respectively).
- whether you really want a range; for ranges, there's
Lexer::makeFileCharRange and Lexer::getSourceText for that
Is there an easy way to iterate through the all the source locations that
would contain macro expansions? I've had good luck with nested macros by
tracing the immediate expansion locations from the original spelling
location but no luck in trying to get to another SourceLocation that has
different spelling location than the first macro in the statement. I
thought perhaps I needed to try and walk through the different
QualType/Type classes associated with the field but then I wasn't sure how
to peel those back so I got every macro expansion and then also how to get
those back to their SourceLocations. To summarize given a generic
statement like the one above "UNSIGNED long INT NAME;" I want to be able to
pull out the, in this case 3, SourceLocations that are associated with the
appropriate spelling locations. Thanks so much, john
I think that’ll be hard, mainly because I think nobody has imagined that use case yet
Perhaps you can tell us the higher level picture of what you’re trying to do? Given that, often there is a much simpler solution.
Thanks for the response. As I alluded to in the opening paragraph I have thousands of legacy header files that are a various mix of linkages and interdependencies. I have a need to strip out various structures and typedefs and all structures/enums etc. they depend on from these files and make just one small cohesive header. I looked around a lot to try and see what various technologies I could leverage for this end and ultimately decided that libtooling would give me the best shot at it. I was able to get everything pulled out except for the statements that have multiple macros as described earlier. I’m using Lexer::getSourceText to access the source for the given range. Perhaps there is a way to print out the source after it has already been preprocessed and macros expanded? I’m very open to switching gears if it is believed there is an easier way to do this. Thanks, john
Ok, yea, in that case just running it past -E first seems like that might make things considerably easier; generally, clang tries to preserve all the macro expansion history, but in your case, it sounds like you’d actually prefer a completely flat view.
Thank you for the suggestion. I had actually thought of it earlier in the process but then moved away from it as at the time I thought the macro definitions had to stay in the source. Given that you say it’s challenging or impossible to get at the multiple macro SourceLocations in a statement I’ll continue forward with preprocessing first. If you or any else can think of a way or approach to potentially get at all the expandable SourceLocations in a statement I’d love to here about it as I’ll probably come back at a later date and try and tackle this again. Thanks again for all the help.
After going through more and more structures I’ve found a case where I really need to retain the macro definition. If anyone can think of an idea to try and pull out the multiple expansion locations with their spelling locations I’d be grateful. Thanks.