Clang macro expansion differs from GCC

I have found a discrepancy in behavior between the clang preprocessor and gcc.

$ cat t.c
#define X 0 … 1

0 … 1 // Ok
X // Bad: space between …

$ gcc -E t.c

0 … 1
0 … 1

$ clang -E t.c

0 … 1
0 . . 1

The problem is that when “…” is in the macro and the macro gets expanded, it adds a space between the dots. As you can see, the space is not added when the “…” is entered directly in the text. GCC does not insert a space in either case.

I’m not sure this qualifies as a bug, but the reason I need this to work is that I am trying to compile GHC (a Haskell compiler) with clang. It uses GCC to preprocess c-- files (http://www.cminusminus.org) before parsing them and “…” is a valid operator in c–. Adding the extra space messes up the subsequent parsing phase.

I’d be happy to take a look at fixing this myself if someone could point me in the right direction.

Thanks!
-Dave

I have found a discrepancy in behavior between the clang preprocessor and gcc.

$ cat t.c
#define X 0 … 1

0 … 1 // Ok
X // Bad: space between …

$ gcc -E t.c

0 … 1
0 … 1

$ clang -E t.c

0 … 1
0 . . 1

The problem is that when “…” is in the macro and the macro gets expanded, it adds a space between the dots. As you can see, the space is not added when the “…” is entered directly in the text. GCC does not insert a space in either case.

I’m not sure this qualifies as a bug, but the reason I need this to work is that I am trying to compile GHC (a Haskell compiler) with clang. It uses GCC to preprocess c-- files (http://www.cminusminus.org) before parsing them and “…” is a valid operator in c–. Adding the extra space messes up the subsequent parsing phase.

Hi David,

I’m not particularly interested in fixing this: people abusing the C preprocessor to preprocess things that aren’t C don’t get much sympathy from me. However, if you are able to produce a patch that works, doesn’t break anything else, and doesn’t affect performance, I’d be happy to apply it.

I’d be happy to take a look at fixing this myself if someone could point me in the right direction.

lib/Frontend/PrintPreprocessedOutput.cpp is the code that outputs a .i file, the “AvoidConcat” stuff is what you want. It is what decides to add spaces to prevent three . tokens from looking like a … for example.

-Chris

Hi Chris,

Thanks for the quick response.

I have found a discrepancy in behavior between the clang preprocessor and gcc.

Hi David,

I’m not particularly interested in fixing this: people abusing the C preprocessor to preprocess things that aren’t C don’t get much sympathy from me. However, if you are able to produce a patch that works, doesn’t break anything else, and doesn’t affect performance, I’d be happy to apply it.

I can fully understand your position on this point. I thought I’d report it since the behavior was different than GCC and did prevent clang from being used as a drop-in replacement.

I’d be happy to take a look at fixing this myself if someone could point me in the right direction.

lib/Frontend/PrintPreprocessedOutput.cpp is the code that outputs a .i file, the “AvoidConcat” stuff is what you want. It is what decides to add spaces to prevent three . tokens from looking like a … for example.

Great, thanks for the pointer. I’ll take a look and if I can come up with something reasonable, I will send a patch.

-Dave

Hi Chris,

I took a look at modifying the preprocessor output and it was pretty
straightforward. I have attached a patch that makes the preprocessed output
less conservative in adding spaces between periods. It will now add a space to
avoid turning “… .” into “…”, but does not always add a space between two
consecutive dots.

The change was done by keeping track of the last two tokens that were output
and testing against those two tokens to avoid creating “…”. Previously only
the last output token was tracked and tested against, which caused any two
consecutive . tokens to have a space between them.

The patch passes all the tests in the tests subdirectory. I modified one test
to reflect the output changing from “. . .” to “… .”. This output happens to
match the expected output in the test comment. I also added a test for the
original case that was causing me problems.

Please let me know if there are more tests I should run or anything else that
needs to happen for the patch to be accepted. Thanks!

-David

PPdots.patch (4.75 KB)

This patch looks great, except that it happens to conflict on mainline. :frowning: Please send and updated patch and I’ll be happy to apply it for you. One minor thing is to watch & placement in "const Token& PrevPrevTok, "

Thanks for working on this!

-Chris

Hi Chris,

Here is an updated patch that applied cleanly to the trunk. I can see that it is quite a moving target so hopefully it is still valid by the time you get it :slight_smile: I also fixed the formatting of the & token to be consistent with the rest of the code. Please let me know if you see any more issues. Thanks!

-Dave

PPdots2.patch (4.72 KB)

Applied here, thanks!
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20100412/029325.html

If there is a PR tracking this, please close it.

-Chris