If so, I'm assuming new ones have to be written from scratch due to
license these files carry in GCC. (GPL v2+, with exceptions for
binaries compiled by GCC.) Correct?
Yep, in the fullness of time, we will tackle that. For now, we're
just assuming everyone has GCC, and we're glomming onto the headers
they already provide.
Are replacement headers something the project would like at this time?
If nothing else, it'll give me an excuse to dig into the C standard to a
far greater extent than I ever have.
Yes, that would be very useful. The GCC includes have a whole lot of "stuff", much of which is target specific (e.g. SSE/altivec headers). In order to incrementally deploy this, we can set clang up to search its header directory before the GCC header directory (which is hacked into clang right now). For example, we could replace *just* iso646.h (which is trivial) while leaving xmmintrin.h alone.
One big thing I really dislike about the GCC headers is that they are target-specific. I think this that clang is a great time to finally get some of this stuff right vs GCC. Some headers (such as iso646.h) are target independent, and simple enough to do. In other stuff like limits.h GCC has mostly the right idea. They basically have it boil down to stuff like:
#define CHAR_BIT __CHAR_BIT__
#define SCHAR_MAX __SCHAR_MAX__
#define SCHAR_MIN (-SCHAR_MAX - 1)
#if __SCHAR_MAX__ == __INT_MAX__
# define UCHAR_MAX (SCHAR_MAX * 2U + 1U)
# define UCHAR_MAX (SCHAR_MAX * 2 + 1)
etc. The nice thing about this is that the header itself is target-independent, being derived from the builtin macros (like __CHAR_BIT__) that get dumped into the preprocessor when the compiler starts up.
The problem with this approach is that it requires dumping a ton of macros into the compiler when it starts up, which is suboptimal. Instead of having the current grab-bag of pre-defined macros, I'd like to move to a more consistent set of extension points. Specifically, I think we should extend the grammar to support a new builtin, and use this query for properties of the machine. For example, we could use:
#define CHAR_BIT __builtin_config_info("target_char_bit")
#define SCHAR_MAX __builtin_config_info("target_schar_max")
This should be parsed as a "builtin" builtin like __builtin_type_compatible_p, which has its own parsing logic and builds its own explicit AST. The nice thing about this is that it preserves some amount of target parameterization in the AST, reduces the amount of stuff we have to slam into the macro table at startup, reduces pressure on the identifier table, and is nicely extensible to other things in the future.
Getting this right will require updating the code to be able to handle __builtin_config_info as an [integer] constant expression, handle its use in the preprocessor conditional, etc.
The 'risk' to this is that it will change the preprocessed output of the compiler vs GCC. For example, something silly like this will expand differently.
#define foo(x) # x
However, anything that relies on that is dangerously non-conformant anyway, so I don't feel too bad about breaking it
Looking forward, I think we should aim to have a single directory of headers for clang, that are not "autoconfed". This means that arch-specific headers like xmmintrin.h need to be included in the directory of headers. We would just add something like '#indef __i386__ / #error "This is an i386-specific header" / #endif' to the top of the file. Having a single unified header directory makes it much easier for clang to support an arbitrary "--triple" option to control target selection at runtime, instead of only working for the arch it was configured for.
I shudder to think of having to get the standard C++ headers in place...
Heh, no worries, we'll just use libstdc++ or some other well known STL when the time comes.