Emulation of MSVC's macro argument substitution

Hi all!

I was just looking at:

r184968 | rnk | 2013-06-26 13:16:08 -0400 (Wed, 26 Jun 2013) | 9 lines

Match MSVC’s handling of commas during macro argument expansion

This allows clang to parse the type_traits header in Visual Studio 2012,
which is included widely in practice.

This is a rework of r163022 by João Matos. The original patch broke
preprocessing of gtest headers, which this patch addresses.

Patch by Will Wilson!

… which makes a mistake very similar to the one I made in my first attempt at implementing this in PC-lint.

Initially, I was focussing on the coma token, and implemented a rule where in certain contexts it should not be regarded as an argument delimiter.

But that’s shown to be wrong e.g. with:

#define Comma ,
#define E1(a) E2(a)
#define E2(a,b) a * b

E1(1 Comma 2)

… for which Clang, GCC, and EDG produce "1 * 2”, and for which MSVC reports:

x.cpp(8) : warning C4003: not enough actual parameters for macro ‘E2’
1 , 2 *

With -fms-compatibility, Clang ToT does not currently reproduce this.

To make sense of it all, it may help to start by digesting Dave Prosser’s original expand() algorithm from 1986 (which was used as the basis for the English text that eventually wound up in ISO C90—see also [c++std-core-9035]); the original version and a more recent version with corrections can be found here:

http://www.spinellis.gr/blog/20060626/

(Hint: In Dave Prosser’s formulation, if you think in terms of “expanding a macro”, you’re doing it wrong. Instead, a token sequence is expanded, and this may involve the expansion of the token sequence that results from replacing an invocation by the subst()’d expansion-list of the invoked macro.)

With that in mind, it might be easier to talk about how MSVC’s rule differs. In particular, James McNellis provided a very helpful response to:

http://stackoverflow.com/questions/12945911/gcc-vs-visual-studio-macro-expansion

… which was: “This is a known bug in the Visual C++ preprocessor: it does not expand macros prior to rescanning. gcc’s behavior is correct.”

I take this to mean that, where DaveProsser::subst() has:

return subst(IS’,FP,AP,HS,OS • expand(select(i,AP)));

…MSVC::subst() instead has the effect of:

return subst(IS’,FP,AP,HS,OS • select(i,AP));

This seems consistent with the test results that I’ve seen so far, except that we need some adjustment to the implementation of the stringize operator. (For #define S(t) #t, if an identifier is mentioned directly in the argument to t, then it needs to be added to the hide set for the evaluation of #t; and anything that is not in the hide set needs to be expanded. At least, that’s what seems to happen.)

That doesn’t cover everything, though: I still need to review MSVC’s ## behavior, and then there’s something odd going on with:

#define Comma ,
#define L (
#define E1(a) E2 L a )
#define E2(a, b) ( a + b )

E1( 1 Comma 2 )

… for which ISO behavior appears to produce:

E2 ( 1 , 2 )

… but MSVC produces:

( 1 + 2 )

… which might be explained by a version of expand() where the substitution of an invocation and the substitution of parameters both happen in the same frame. (So, first replace “L" by (, then replace “a" by “1 Comma2”; then consume ), then do the rescan.

But, I’m not sure if that’s a priority. (I don’t know of any library header that actually depends on that.)

James Widman

That’s really interesting! As you guessed, though, it probably isn’t a priority for me until I find some code that relies on the remaining subtle differences.

Understood. And to be clear, I’m not advocating for a change to Clang at this time; I’m just experimenting with MSVC’s behavior and documenting what I can.

It might matter to code that uses Boost.Preprocessor, however.

Here’s another example:

#define Empty
#define T() T1 Empty ()
#define T1() T2 Empty ()
#define T2() T3 Empty ()
#define T3() T4 Empty ()
#define T4() T5 Empty ()
#define T5() T6 Empty ()
#define T6() T7 Empty ()
#define T7() T8 Empty ()
#define T8() T9 Empty ()
#define T9() >>INFINITE RECURSION ERROR<<

#define A(p) p

A(T()) // ISO C/C++: T2 ()
       // MSVC: >>INFINITE RECURSION ERROR<<

In this example, ISO rules indicate that the sub-sequence “T()” will be expanded twice. (Once for the “arg-as a separate file” phase, and once again, when the body of A is rescanned after parameter substitution.)

But since MSVC apparently performs two scans of a macro’s replacement list per macro invocation (one before parameter replacement and one “after”---even when there are no parameters to replace), it leads to infinite recursion.

Since I can’t use the pattern above to probe MSVC::expand(), I’m writing example templates that expand to uses of a trace-list like “T4...T0” in:

#define T0() ()
#define T1() ()
#define T2() ()
#define T3() ()
#define T4() ()

#define A(p) p
A(T4 T3 T2 T1 T0() )
    // ISO: T4 T3 T2 ()
    // MSVC: T4 T3 T2 ()

A(A(T4 T3 T2 T1 T0() ))
    // ISO: T4 T3 ()
    // MSVC: T4 T3 T2 ()

A(A(A(T4 T3 T2 T1 T0() )))
    // ISO: T4 ()
    // MSVC: T4 T3 T2 ()

James Widman