[PATCH] GCC Preprocessor Assertions revision 2

Hi all,

I’d like to get feedback and consensus on whether we should add “GCC Preprocessor Assertions” to clang. This gcc extension has been deprecated since gcc-3, here’s a quote from gcc 3.1 docs, (bold added by me):

From http://gcc.gnu.org/onlinedocs/gcc-3.1/cpp/Assertions.html

Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.

Assertions were intended to provide a more systematic way to describe the compiler’s target system. However, in practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.

Do note that, after checking on http://ideone.com, gcc still supports them on gcc-4.3.4.

Andrew did great work on implementing the feature; apart from the preprocessor changes, he added support for them in the preprocessing record and serialization/deserialization support in the ASTReader/ASTWriter.
This is a significant amount of code that we have to maintain and make sure it works well with other parts of the codebase, and I’m worried about the maintenance burden that this deprecated feature will impose, e.g.

-it reserves 1 bit out of the IdentifierInfo object. There’s only 1 bit left currently so afterwards there will be non left to easily use for other purposes, without doubling the IdentifierInfo object’s size.

-it increases the size of the IdentifierInfo data that are stored in the serialized AST.

Although the changes are extensive, there may be more that are needed:

-gcc docs say that you can specify assertions by command-line options; I did not see this implemented in Andrew’s patches

-the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line

-It is not clear to me how this feature should interact with modules.

So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?

-Argyrios

I think you’ve outlined well why we shouldn’t. This is a high-cost addition to Clang, and I completely agree with your assessment of that cost.

That leaves me with the question: why should we support this extension? I see two possible reasons, but perhaps others have better justifications. Here are my two cents. =]

  1. There is a large body of (likely legacy) proprietary code that uses this feature, and a company is interested in having Clang support for that body of code.

  2. There are enough open-source codebases still using this feature that largely open-source-based code eco-systems are likely to have this missing feature as a barrier to entry.

To address these in reverse order, #2 I simply don’t believe to be true. We have built a substantial amount of open source code with Clang and have not once run into this. I’m confident that it will remain untrue as GCC has also left this feature behind. I don’t think there is any compelling reason to believe this extension matters for adoption of Clang.

On the other hand, #1 is certainly plausible. That said, it is only a good reason if the company with this large body of code is going to fund long-term maintenance and improvements of Clang. I think it’s risky to accept such extensive changes until the long-term commitment has actually materialized. I think a branch or something similar to reduce their merge costs until we’re satisfied there is a good maintenance story would be better.

-Chandler

What software out there depends on this feature?

Sebastian

From a cursory investigation of this feature, it looks like it should be straightforward to write a tiny awk script to convert any users of it, which don’t use #unassert, to macros:

#assert a(b) → #define a 1 #define a_b 1
#a (inside pp condition) → defined(a)
#a(b) (inside pp condition) → defined(a_b)

Given that, and your analysis of the costs, I think we would need quite a compelling reason to accept this extension. Unless someone can present a solid argument in favor of this extension, it does not seem valuable to me.

Hi Argyrios,

Thank you for reviewing my patches.

I have replied to your comments inline below.

I ran into preprocessor assertions when I was trying to build an older version of Grub which is what prompted my implementation.

Kind regards,
Andrew

Hi all,

I’d like to get feedback and consensus on whether we should add “GCC Preprocessor Assertions” to clang. This gcc extension has been deprecated since gcc-3, here’s a quote from gcc 3.1 docs, (bold added by me):

From http://gcc.gnu.org/onlinedocs/gcc-3.1/cpp/Assertions.html

Assertions are a deprecated alternative to macros in writing conditionals to test what sort of computer or system the compiled program will run on. Assertions are usually predefined, but you can define them with preprocessing directives or command-line options.

Assertions were intended to provide a more systematic way to describe the compiler’s target system. However, in practice they are just as unpredictable as the system-specific predefined macros. In addition, they are not part of any standard, and only a few compilers support them. Therefore, the use of assertions is less portable than the use of system-specific predefined macros. We recommend you do not use them at all.

Do note that, after checking on http://ideone.com, gcc still supports them on gcc-4.3.4.

Andrew did great work on implementing the feature; apart from the preprocessor changes, he added support for them in the preprocessing record and serialization/deserialization support in the ASTReader/ASTWriter.
This is a significant amount of code that we have to maintain and make sure it works well with other parts of the codebase, and I’m worried about the maintenance burden that this deprecated feature will impose, e.g.

-it reserves 1 bit out of the IdentifierInfo object. There’s only 1 bit left currently so afterwards there will be non left to easily use for other purposes, without doubling the IdentifierInfo object’s size.

If anyone has suggestions on how to avoid this, I am happy to consider reworking my patch to incorporate such an improvement.

-it increases the size of the IdentifierInfo data that are stored in the serialized AST.

I think this is unavoidable if the feature is going to be implemented, but again suggestions on how to improve things are most welcome.

Although the changes are extensive, there may be more that are needed:

-gcc docs say that you can specify assertions by command-line options; I did not see this implemented in Andrew’s patches

I am aware the command-line option (-A) to assert a key-value pair is not implemented in the patches I have submitted. I didn’t write this support yet because clang uses -A for other things and I wasn’t sure if that needed to be changed or if we were going to use a different flag - the actual patch will be relatively small. If the feature is accepted and guidance is provided as to what flag we should use I am happy to write and contribute this patch

-the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line

I do not believe there is anything that needs to be validated in terms of PCH vs command-line vs source preprocessor assertions. You can assert multiple values for a given key quite correctly (unlike macros) and assertions are designed to be added and removed using all of these features and it is not wrong to do so. Further, asserting the same key-value pair twice is not an error and so I don’t think there needs to be validation of clashes since, by design, these clashes are permitted.

-It is not clear to me how this feature should interact with modules.

I am not sure how these would interact since the reference implementation, GCC, does not support modules in the versions I have tested. Doug, do you have any thoughts on this since you have worked on modules? Since the behavior is undefined we can probably do whatever is easiest and makes the most sense.

So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?

I think clang should implement and maintain this feature because

  1. It’s widely supported by existing compilers including GCC 4.6.X (), Intel C/C++ (), and ARM’s toolchain () 2) Preprocessor assertions are an optional extension defined in System V Release 4 (SVR4) see the System V Interface manual Vol 3 page 270 () and so should be implemented to support SVR4 systems which have chosen to implement this feature. 3) It was clearly identified as a missing feature by FIXMEs in Chris’ original commit of PPDirectives.cpp and currently the compiler produces only cryptic errors when this feature is encountered. 4) It is not possible to work around this feature without having to modify the source and doing so with an automated sed or awk script is risky and difficult to integrate into some build processes. I did consider trying to emulate assertions using macros, but I found this wasn’t possible using only the preprocessor. Asserts and macros exist in different namespaces and so you can define and assert the same symbol. In many cases this is not a problem because all we want to do is test if a macro is defined, but if the value of the macro is important then a collision in the names would cause a problem. The assertion tests themselves would have to be found and rewritten using a sed or awk script which is a fragile hack rather than a proper fix and having to run this script before a compile would complicate the build process etc. This kind of workaround also assumes that the source can be modified prior to compilation. 5) Clang has publicly stated that it will aim for GCC compatibility where possible and these patches make compatibility with this feature possible.

Hi Andrew, I’m going to comment a bit out-of-order.

-the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line

I do not believe there is anything that needs to be validated in terms of PCH vs command-line vs source preprocessor assertions. You can assert multiple values for a given key quite correctly (unlike macros) and assertions are designed to be added and removed using all of these features and it is not wrong to do so. Further, asserting the same key-value pair twice is not an error and so I don’t think there needs to be validation of clashes since, by design, these clashes are permitted.

I had in mind checking whether there are differences in the assertions in command-line and predefined ones, (missing assertions, newly introduced), causing differences in PCH inclusion, like we do with macros, but I noticed that gcc does not care for assertion differences in command-line (when including the PCH) and even if the predefined ones are different, e.g including a PCH with

#assert cpu(i386)

when you are on x86_64, the PCH will probably be rejected by PCH validation anyway, so we probably don’t need to enhance validation to include assertions.

So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?

I think clang should implement and maintain this feature because

  1. It’s widely supported by existing compilers including GCC 4.6.X (), Intel C/C++ (), and ARM’s toolchain ()2) Preprocessor assertions are an optional extension defined in System V Release 4 (SVR4) see the System V Interface manual Vol 3 page 270 () and so should be implemented to support SVR4 systems which have chosen to implement this feature.3) It was clearly identified as a missing feature by FIXMEs in Chris’ original commit of PPDirectives.cpp and currently the compiler produces only cryptic errors when this feature is encountered.4) It is not possible to work around this feature without having to modify the source and doing so with an automated sed or awk script is risky and difficult to integrate into some build processes. I did consider trying to emulate assertions using macros, but I found this wasn’t possible using only the preprocessor. Asserts and macros exist in different namespaces and so you can define and assert the same symbol. In many cases this is not a problem because all we want to do is test if a macro is defined, but if the value of the macro is important then a collision in the names would cause a problem. The assertion tests themselves would have to be found and rewritten using a sed or awk script which is a fragile hack rather than a proper fix and having to run this script before a compile would complicate the build process etc. This kind of workaround also assumes that the source can be modified prior to compilation.5) Clang has publicly stated that it will aim for GCC compatibility where possible and these patches make compatibility with this feature possible.

“aim for GCC compatibility” does not mean “implement the exact behavior of gcc unconditionally”.
There are gcc extensions that we don’t have interest in implementing, like nested functions, and when the standard disagrees with gcc we side with the standard, see http://clang.llvm.org/compatibility.html.

It does not help the case for this feature that gcc itself is aggressively discouraging people from using it, e.g. this is from gcc 4.6.2:

$ cat t.c
#assert a(b)
#if #a(b)
#endif

$ gcc -fsyntax-only t.c
t.c:1:2: warning: #assert is a deprecated GCC extension [-Wdeprecated]
t.c:2:5: warning: assertions are a deprecated extension [-Wdeprecated]

I find the prospect that we would implement a new feature and then put it under a “deprecated” warning, at least unfortunate.

Since we in clang would discourage people from using the feature as well, we would implement it mainly to support legacy codebases or projects that, for some reason, still depend on a deprecated (and not widely used) gcc extension.
Under the this frame of discussion:

I ran into preprocessor assertions when I was trying to build an older version of Grub which is what prompted my implementation.

Could you elaborate more, is this legacy codebase that nobody wants to touch or move to a newer version ?
-Is the feature widely used in your company’s codebase or not ?
-Is it only a minority of projects that you care about that depend on it ?
-Could these just keep using gcc for building ?

-Argyrios

>From a cursory investigation of this feature, it looks like it should be
straightforward to write a tiny awk script to convert any users of it, which
don't use #unassert, to macros:

#assert a(b) -> #define a 1 #define a_b 1
#a (inside pp condition) -> defined(a)
#a(b) (inside pp condition) -> defined(a_b)

Given that, and your analysis of the costs, I think we would need quite a
compelling reason to accept this extension. Unless someone can present a
solid argument in favor of this extension, it does not seem valuable to me.

+1

- Daniel

HI Argyrios,

Comments below in response to your questions and statements.

Regards,
Andrew

Hi Andrew, I’m going to comment a bit out-of-order.

-the feature was not integrated into our PCH validation, to be more specific, validating and giving errors when assertions from PCH clash with predefined assertions or assertions from the command-line

I do not believe there is anything that needs to be validated in terms of PCH vs command-line vs source preprocessor assertions. You can assert multiple values for a given key quite correctly (unlike macros) and assertions are designed to be added and removed using all of these features and it is not wrong to do so. Further, asserting the same key-value pair twice is not an error and so I don’t think there needs to be validation of clashes since, by design, these clashes are permitted.

I had in mind checking whether there are differences in the assertions in command-line and predefined ones, (missing assertions, newly introduced), causing differences in PCH inclusion, like we do with macros, but I noticed that gcc does not care for assertion differences in command-line (when including the PCH) and even if the predefined ones are different, e.g including a PCH with

#assert cpu(i386)

when you are on x86_64, the PCH will probably be rejected by PCH validation anyway, so we probably don’t need to enhance validation to include assertions.

That agrees with what I was thinking.

So, to recap, the question is this, should we implement and maintain a gcc extension that was deprecated since gcc 3 ?

I think clang should implement and maintain this feature because

  1. It’s widely supported by existing compilers including GCC 4.6.X (), Intel C/C++ (), and ARM’s toolchain () 2) Preprocessor assertions are an optional extension defined in System V Release 4 (SVR4) see the System V Interface manual Vol 3 page 270 () and so should be implemented to support SVR4 systems which have chosen to implement this feature. 3) It was clearly identified as a missing feature by FIXMEs in Chris’ original commit of PPDirectives.cpp and currently the compiler produces only cryptic errors when this feature is encountered. 4) It is not possible to work around this feature without having to modify the source and doing so with an automated sed or awk script is risky and difficult to integrate into some build processes. I did consider trying to emulate assertions using macros, but I found this wasn’t possible using only the preprocessor. Asserts and macros exist in different namespaces and so you can define and assert the same symbol. In many cases this is not a problem because all we want to do is test if a macro is defined, but if the value of the macro is important then a collision in the names would cause a problem. The assertion tests themselves would have to be found and rewritten using a sed or awk script which is a fragile hack rather than a proper fix and having to run this script before a compile would complicate the build process etc. This kind of workaround also assumes that the source can be modified prior to compilation. 5) Clang has publicly stated that it will aim for GCC compatibility where possible and these patches make compatibility with this feature possible.

“aim for GCC compatibility” does not mean “implement the exact behavior of gcc unconditionally”.
There are gcc extensions that we don’t have interest in implementing, like nested functions, and when the standard disagrees with gcc we side with the standard, see http://clang.llvm.org/compatibility.html.

I am fully aware that Clang has chosen not to implement some gcc quirks and extensions and I appreciate that Clang is not going to blindly implement features just because gcc does. I was not trying to argue that Clang should do it just because gcc does it, but I was trying to say that because GCC and other compilers implement the feature, there is a reason Clang may want to do so as well and that doing so would increase compatibility.

It does not help the case for this feature that gcc itself is aggressively discouraging people from using it, e.g. this is from gcc 4.6.2:

$ cat t.c
#assert a(b)
#if #a(b)
#endif

$ gcc -fsyntax-only t.c
t.c:1:2: warning: #assert is a deprecated GCC extension [-Wdeprecated]
t.c:2:5: warning: assertions are a deprecated extension [-Wdeprecated]

I find the prospect that we would implement a new feature and then put it under a “deprecated” warning, at least unfortunate.

While the feature is deprecated, it has been deprecated for a long time without being removed or disabled. The deprecation has continued across a major version and multiple minor versions, so while it may be slated for removal eventually, the fact that it has continued to exist for such a long time, I feel, says something about the feature especially in light of its inclusion in the SVR4 standard and implementation in other compilers from other vendors.

Since we in clang would discourage people from using the feature as well, we would implement it mainly to support legacy codebases or projects that, for some reason, still depend on a deprecated (and not widely used) gcc extension.
Under the this frame of discussion:

I ran into preprocessor assertions when I was trying to build an older version of Grub which is what prompted my implementation.

Could you elaborate more, is this legacy codebase that nobody wants to touch or move to a newer version ?
-Is the feature widely used in your company’s codebase or not ?
-Is it only a minority of projects that you care about that depend on it ?
-Could these just keep using gcc for building ?

Here at Oracle we are using clang to compile C/C++ source to LLVM bitcode as part of the Parfait project () which is a highly scalable bug checking tool with a very low false positive rate. We have migrated from llvm-gcc to clang as our compiler for a number of reasons including the EOL announcement for llvm-gcc. Using a standard gcc is not an option because we have to produce bitcode. We have considered dragonegg, but it does not support all the target platforms we support. We have encountered this feature in codebases that are actively used in the company, these are codebases undergoing active development and that are not in a maintenance mode, and these products are being sold to and used by our customers. As a result, we feel that support for this feature in mainline Clang is needed. Unfortunately, I cannot make any more specific comments about the content of Oracle’s codebases or future development plans.

Hi Andrew,

I think it’s fair to say that the consensus of the community is that we won’t implement it.

We have encountered this feature in codebases that are actively used in the company, these are codebases undergoing active development and that are not in a maintenance mode, and these products are being sold to and used by our customers

Since they are in active development, would you find it reasonable to consider “migrating” them away from the use of preprocessor assertions ? The nature of the feature allows doing it through an automatic tool (e.g. based on clang for more accuracy) to rewrite the assertions to using macros instead.

-Argyrios