Unicode update

Just a quick post to update plans to finish my Unicode patches.

I've been missing-in-action with C++ committee work for the last couple of weeks, but plan to resume where I left off next week.

Currently I have a reviewed-patch outstanding to extend support for the broad range of string literals in C++0x. I'll need to bring that up to spec with the latest branch but should be looking to clear that up on Monday before starting the next steps.

While my own focus is clearly C++, especially C++0x, there seems a reasonable overlap with C in this area so I would like to make sure this is handled well before we freeze for the first Clang release. So the specific features I think I should focus on are:

i/ UCN support, as required by C99.
ii/ Unicode TR support, which is essentially typedefs for Unicode types and support for their literals.

The Unicode TR has some library requirements as well, but I'm not sure what to do about those.

Should I require a command line flag to enable Unicode in C99?
Or should this be the default?

If a flag is preferred, could someone suggest something appropriate?

Also, are there any other outstanding issues with Unicode, UCN or literal support in C I should be aware of? While I am in this part of the parser I would prefer to sign off that we are feature complete before moving onto other issues.

AlisdairM

For Unicode in comments and strings, I don't see why it we'd need a
flag. We might want to copy gcc's -fextended-identifiers option,
though, because I'm not sure such constructs will actually work
correctly. gcc generates and binutils accepts assembler constructs
like "call ιΆ¨", but llvm-gcc doesn't generate ABI-compatible code, and
I'm not sure what other toolchains do.

-Eli

Sorry - I was thinking about support for char16_t/char32_t and the new literals from the Unicode TR here, rather than UCNs and extended Unicode characters in source files (which I assumed we should simply always support in C99 mode)

So should I add a command line switch to explicitly enable the Unicode TR?

Should I add an experimental C1x mode to support Unicode and _Static_assert in C?

AlisdairM

Sorry - I was thinking about support for char16_t/char32_t and the new literals from the Unicode TR here, rather than UCNs and extended Unicode characters in source files (which I assumed we should simply always support in C99 mode)

Oh, oops, I wasn't reading carefully enough.

So should I add a command line switch to explicitly enable the Unicode TR?

Should I add an experimental C1x mode to support Unicode and _Static_assert in C?

I think an experimental C1x mode would be the best way to deal with
this, similar to the C++0x mode. I don't think a separate Unicode
command-line switch is really necessary; it might be simpler to have a
"UnicodeTypes" boolean in LangOpts so that you don't have to write
constructs like "if c++0x or c1x" all over the place, though.

-Eli