macro-expanded/preprocessed string (clangLex) question

Hi,

I have a need for a function that will do macro substitution in a string.

This is for the modularize tool I am trying to enhance to show where preprocessor conditional directives’ condition expressions differ in different instances of preprocessing a header. I use a PPCallbacks-derived class to track the #if/#elif/#ifdef/#ifndef directives. The callback function arguments include a SourceRange. I can use that to get the unpreprocessed source snippet for the condition expression. I want to take either that source snippet or the SourceRange and get a string that has had any macros in the unpreprocessed source snippet expanded, including function-style macros. White space is not important, as it just needs to be consistent. I do a comparison of the macro-expanded condition strings to find out if an instance of a header differs from other instances.

In looking at the Preprocessor class, I didn’t see such a function. In thinking about how it might be done, because this is to be done in the context of a PPCallbacks callback call, I’m thinking there might be the potential for problems with interfering with the preprocessor or lexer state. Therefore my current thinking is that I probably will need to instantiate a separate Lexer, and probably a separate Preprocessor too, with the Preprocessor linked to the original preprocessor so it can find the macro definitions. (There seems to be a pointer for this in Preprocessor.) The pragma mode in Lexer seems comparable, but I don’t want the raw lexing mode, so I’d probably need a new constructor that lets me set the buffer pointers and the right mode settings. Also, I probably don’t want the Preprocessor to be calling PPCallback callbacks such as the MacroExpanded callback.

Could someone point me to existing code that could do this, or otherwise help me get the information or code I need to do this?

Thanks.

-John

Or let me rephrase it another way.

Perhaps there is another way of getting the preprocessed snippet without reprocessing, such as might be done in a diagnostic?

-John

Hi,

I have a need for a function that will do macro substitution in a string.

This is for the modularize tool I am trying to enhance to show where preprocessor conditional directives’ condition expressions differ in different instances of preprocessing a header. I use a PPCallbacks-derived class to track the #if/#elif/#ifdef/#ifndef directives. The callback function arguments include a SourceRange. I can use that to get the unpreprocessed source snippet for the condition expression. I want to take either that source snippet or the SourceRange and get a string that has had any macros in the unpreprocessed source snippet expanded, including function-style macros. White space is not important, as it just needs to be consistent. I do a comparison of the macro-expanded condition strings to find out if an instance of a header differs from other instances.

In looking at the Preprocessor class, I didn’t see such a function. In thinking about how it might be done, because this is to be done in the context of a PPCallbacks callback call, I’m thinking there might be the potential for problems with interfering with the preprocessor or lexer state. Therefore my current thinking is that I probably will need to instantiate a separate Lexer, and probably a separate Preprocessor too, with the Preprocessor linked to the original preprocessor so it can find the macro definitions. (There seems to be a pointer for this in Preprocessor.) The pragma mode in Lexer seems comparable, but I don’t want the raw lexing mode, so I’d probably need a new constructor that lets me set the buffer pointers and the right mode settings. Also, I probably don’t want the Preprocessor to be calling PPCallback callbacks such as the MacroExpanded callback.

Could someone point me to existing code that could do this, or otherwise help me get the information or code I need to do this?

It seems to me that you mainly need to keep track of whether the conditional directive block was skipped or not, so how about adding a bool parameter to the If/Elif/Else callbacks to have them provide that info ?
Then you apparently want to inform the user what macros were different (that caused the condition to evaluate differently) so maybe keep track of macro expansions inside the condition and point/warn at the macro definitions when they differ ?

Argyrios,

Sorry, I totally missed your message.

It seems to me that you mainly need to keep track of whether the conditional directive block was skipped or not, so how about adding a bool parameter to the If/Elif/Else callbacks to have them provide that info ?

Then you apparently want to inform the user what macros were different (that caused the condition to evaluate differently) so maybe keep track of macro expansions inside the condition and point/warn at the macro definitions when they differ ?

Yes, that is what I was thinking. But I wanted to have a go at it first without modifying the PPCallbacks API. Currently I’m collecting macro expansions from the MacroExpanded callback, but function-style macros are a bit trickier.

-John

Argyrios,

Sorry, I totally missed your message.

>It seems to me that you mainly need to keep track of whether the conditional directive block was skipped or not, so how about adding a bool parameter to the If/Elif/Else callbacks to have them provide that info ?
Then you apparently want to inform the user what macros were different (that caused the condition to evaluate differently) so maybe keep track of macro expansions inside the condition and point/warn at the macro definitions when they differ ?

Yes, that is what I was thinking. But I wanted to have a go at it first without modifying the PPCallbacks API. Currently I’m collecting macro expansions from the MacroExpanded callback, but function-style macros are a bit trickier.

To be more clear, what I'm suggesting is to only keep track of the location of the macro expansion and the location of its macro definition; this is simple and function-style macros do not offer any complication that way.

Argyrios,

Yeah, your comment made me think that perhaps I should just track macro expansions in general, not limiting it to conditional directives, as macro instances anywhere with different values could be problematic as well for modules. Is that what you meant as well?

What do you think, Sean?

The program structure doesn’t change a whole lot. Instead of storing the condition location and values, I store the macro information.

-John

I think the program structure will have to change a good amount anyway in
order to make it clear what the algorithm is doing. Unless it is unable to
express the check that is desired (but I think it can), I strongly believe
that the pseudocode I suggested is the right approach, which nicely models
the problem as:

1) Defining an equivalence relation on macro expansions or preprocessor
conditions or whatever (call this type T)
2) Maintaining a map keyed on the physical source file location of the T
and having T as values, and using that to ensure that all physical source
locations expand to the same thing across TU's.

This approach should work for either of the scenarios. Moreover, it has
obvious time and space complexity and it's easy to see how to parallelize
it (operate on independent maps in independent threads and at the end
perform a "union" operation on the maps).

-- Sean Silva

Sean,

Why a map instead of just a vector?

Why do you think it should be set up to do stuff in parallel with threads? I’ve run the existing version over a fairly large group of headers and it just took a few seconds.

-John

Sean,****

** **

Why a map instead of just a vector?

Because the key operation (the "inner loop") is looking up macro expansions
(or preprocessor conditionals, or whatever) by their physical source
location and seeing if this expansion differs from any previously seen
expansions. The lookup by physical source location needs to be fast, hence
a map.

****

** **

Why do you think it should be set up to do stuff in parallel with
threads? I’ve run the existing version over a fairly large group of
headers and it just took a few seconds.

I never said it should be. All I said is that it was easy to see how to
parallelize it. However, most codebase-at-a-time tools will probably want
to take advantage of multicore someday, so it's a good thing to keep in
mind.

-- Sean Silva