Partial Pre-processing Question

I am wondering if there is some hidden option or mechanism in CLang that I can use to pre-process like ‘-E’ or ‘-frewrite-includes’, but which does not expand the headers from the system include directories?

There are often times when I would like this type of capability, so that I can create test-cases for reported bug, but have them remain valid with future revisions of the compiler and its supporting headers and libraries. It is not uncommon for a new version of a system header (ISO C, ISO C++, or our own extended headers) to have changes that are not 100% compatible with the results of the expanded result of an older version.

The headers belonging to the programmer and their source need to be expanded, but I would like to retain the unexpanded system headers, and in particular not expand the macros defined in a system header, and leave the system headers as ‘#include <sysheadername>’.

Thanks,

MartinO - Movidius Ltd.

I don’t know how to restrict expansion but compiler tests are usually reduced to be independent of system headers, for the exact reason you stated.
C-reduce is highly useful for this task:

https://embed.cs.utah.edu/creduce/

I am wondering if there is some hidden option or mechanism in CLang that I
can use to pre-process like ‘-E’ or ‘-frewrite-includes’, but which does
not expand the headers from the system include directories?

There are often times when I would like this type of capability, so that I
can create test-cases for reported bug, but have them remain valid with
future revisions of the compiler and its supporting headers and libraries.
It is not uncommon for a new version of a system header (ISO C, ISO C++, or
our own extended headers) to have changes that are not 100% compatible with
the results of the expanded result of an older version.

I'm not quite following what you've said here /\ - if the test case has
fully preprocessed code then it won't have any system headers to conflict
with updated system headers, no? It'll be standalone. I've certainly found
these sort of test cases to be fairly robust - but I usually do the
reduction (as Yaron was mentioning) up-front, so I may not've run into the
situations you're describing. Given how much the compiler has to be robust
to user code backwards compatibility, I'd be surprised it it's often the
case that a system header used a feature that the compiler soon became
incapable of processing.

Let’s say for example I have:

#include <stddef.h>

#include “myfile.h”

int main() {

foo(NULL);

}

and “myfile.h” has:

extern void foo(); // Tentative declaration

When fully pre-processed this will have ‘<stddef.h>’ and “myfile.h” fully expanded. If in the case my definition of NULL is just ‘0’, then this becomes:

1 “myfile.c”

1 “” 1

1 “” 3

306 “” 3

1 “” 1

1 “” 2

1 “myfile.c” 2

1 “<path-to-stdinc>/stddef.h” 1 3 4

51 “<path-to-stdinc>/stddef.h” 3 4

typedef int ptrdiff_t;

62 “<path-to-stdinc>/stddef.h” 3 4

typedef unsigned int size_t;

90 “<path-to-stdinc>/stddef.h” 3 4

typedef unsigned char wchar_t;

118 “<path-to-stdinc>/stddef.h” 3 4

1 “<path-to-stdinc>/__stddef_max_align_t.h” 1 3 4

35 “<path-to-stdinc>/__stddef_max_align_t.h” 3 4

typedef struct {

long long __clang_max_align_nonce1

attribute((aligned(alignof(long long))));

long double __clang_max_align_nonce2

attribute((aligned(alignof(long double))));

} max_align_t;

119 “<path-to-stdinc>/stddef.h” 2 3 4

1 “myfile.c” 2

1 “./myfile.h” 1

extern void foo();

2 “myfile.c” 2

int main() {

foo(0);

}

But what I would really like is something like:

1 “myfile.c”

1 “” 1

1 “” 3

306 “” 3

1 “” 1

1 “” 2

1 “myfile.c” 2

#include <stddef.h>

1 “myfile.c” 2

1 “./myfile.h” 1

extern void foo();

2 “myfile.c” 2

int main() {

foo(NULL);

}

The example is contrived, but assume that ‘int’ is 32-bit, ‘void*’ is 64-bit and the actual definition of ‘void foo()’ takes a pointer argument, then the original code in ‘<stddef.h>’ is wrong, and while debugging I realise “hey, I should have defined NULL to be ‘((void*)0)’”. If my test-case contained all of the user code pre-processed but not the system includes, then I can quickly evaluate the fix.

This is a really trivialised example, in practice bug reports are usually much larger. And this example assumes I have a bug in the definition of NULL, which is unlikely, but in the real-world I may be updating the set of headers for the system includes from one version to a newer version, or I might have decided that I would prefer to distribute ‘uClibc’ headers instead of ‘newlib’ headers, but I want to keep my test cases valid in the context of the revised system header files.

Having an option to allow this kind of partial pre-processing could be very useful. I can write scripts which can collapse the expanded header back into the ‘#include <stddef.h>’ from the normal pre-processed file, but I can’t undo the expansion of the macros it contains such as NULL in this case. I could also write an external tool to do this, but it would be hard to ensure that it accurately mimics the search paths for CLang and there is plenty of opportunity for error and divergence over time. And integrated option would not have this problem.

I’m not proposing the addition of such an option, but was curious if CLang already had such a feature that I had missed when browsing the huge number of options, especially since some options have no help text associated with them and are invisible to either ‘–help’ or ‘–help-hidden’.

Thanks,

MartinO - Movidius Ltd.

Let’s say for example I have:

#include <stddef.h>

#include “myfile.h”

int main() {

  foo(NULL);

}

and “myfile.h” has:

extern void foo(); // Tentative declaration

When fully pre-processed this will have ‘<stddef.h>’ and “myfile.h” fully
expanded. If in the case my definition of NULL is just ‘0’, then this
becomes:

# 1 "myfile.c"

# 1 "<built-in>" 1

# 1 "<built-in>" 3

# 306 "<built-in>" 3

# 1 "<command line>" 1

# 1 "<built-in>" 2

# 1 "myfile.c" 2

# 1 "<*path-to-stdinc*>/stddef.h" 1 3 4

# 51 "<*path-to-stdinc*>/stddef.h" 3 4

typedef int ptrdiff_t;

# 62 "<*path-to-stdinc*>/stddef.h" 3 4

typedef unsigned int size_t;

# 90 "<*path-to-stdinc*>/stddef.h" 3 4

typedef unsigned char wchar_t;

# 118 "<*path-to-stdinc*>/stddef.h" 3 4

# 1 "<*path-to-stdinc*>/__stddef_max_align_t.h" 1 3 4

# 35 "<*path-to-stdinc*>/__stddef_max_align_t.h" 3 4

typedef struct {

  long long __clang_max_align_nonce1

      __attribute__((__aligned__(__alignof__(long long))));

  long double __clang_max_align_nonce2

      __attribute__((__aligned__(__alignof__(long double))));

} max_align_t;

# 119 "<*path-to-stdinc*>/stddef.h" 2 3 4

# 1 "myfile.c" 2

# 1 "./myfile.h" 1

extern void foo();

# 2 "myfile.c" 2

int main() {

  foo(0);

}

But what I would really like is something like:

# 1 "myfile.c"

# 1 "<built-in>" 1

# 1 "<built-in>" 3

# 306 "<built-in>" 3

# 1 "<command line>" 1

# 1 "<built-in>" 2

# 1 "myfile.c" 2

#include <stddef.h>

# 1 "myfile.c" 2

# 1 "./myfile.h" 1

extern void foo();

# 2 "myfile.c" 2

int main() {

  foo(NULL);

}

The example is contrived, but assume that ‘int’ is 32-bit, ‘void*’ is
64-bit and the actual definition of ‘void foo()’ takes a pointer
argument, then the original code in ‘<stddef.h>’ is wrong, and while
debugging I realise “hey, I should have defined NULL to be ‘((void*)0)’”.
If my test-case contained all of the user code pre-processed but not the
system includes, then I can quickly evaluate the fix.

This is a really trivialised example, in practice bug reports are usually
much larger. And this example assumes I have a bug in the definition of
NULL, which is unlikely, but in the real-world I may be updating the set
of headers for the system includes from one version to a newer version, or
I might have decided that I would prefer to distribute ‘uClibc’ headers
instead of ‘newlib’ headers, but I want to keep my test cases valid in
the context of the revised system header files.

Having an option to allow this kind of partial pre-processing could be
very useful. I can write scripts which can collapse the expanded header
back into the ‘#include <stddef.h>’ from the normal pre-processed file,
but I can’t undo the expansion of the macros it contains such as NULL in
this case. I could also write an external tool to do this, but it would be
hard to ensure that it accurately mimics the search paths for CLang and
there is plenty of opportunity for error and divergence over time. And
integrated option would not have this problem.

I’m not proposing the addition of such an option, but was curious if CLang
already had such a feature that I had missed when browsing the huge number
of options, especially since some options have no help text associated with
them and are invisible to either ‘--help’ or ‘--help-hidden’.

Right - we don't. I don't think we'd object to having one, we just haven't
had a need for one so no one's done the work.

- Dave

Thanks Dave. Mostly I have not needed to alter Clang as I am busy enough with our LLVM backend, but I will keep in mind the possibility of adding this kind of facility with the eventual possibility of proposing it for inclusion in the future.

All the best,

MartinO

I’m not proposing the addition of such an option, but was curious if CLang already had such a feature that I had missed when browsing the huge number of options, especially since some options have no help text associated with them and are invisible to either ‘–help’ or ‘–help-hidden’.

Right - we don’t. I don’t think we’d object to having one, we just haven’t had a need for one so no one’s done the work.

  • Dave

Thanks for this link, I hadn’t come across this before and I can see how I can put it to good use J

MartinO

It is feasible to add a command line switch to skip preprocessing for system include files and instead print out the include as-is. Hack around lib/Lex/PPDirectives.cpp.