Recompile of LLVM7.0.0-git Results in Unicode Errors

I recompiled LLVM from git using exactly the same process as in the past. Previous binaries have had no problem with my code.

This time, the resulting binary chokes on the first unicode character: error: expected identifier or ‘{’ (“namespace \u2063 {”).

What has changed? Is this a bug?

See ⚙ D104975 Implement P1949 (and the corresponding standard proposal C++ Identifier Syntax using Unicode Standard Annex 31 )

Could you provide a little more context for what you’re doing?

What context would you like?

I have a namespace named the unicode character ‘\u2063’. This has worked for years. Suddenly (with a newly compiled llvm from git), it will not compile. Simply using a namespace named ‘\u2063’ is sufficient to cause compile error.

So, if I am understanding correctly, TL;DR is:

  • previously UTF-8 character support was part of clang, not the standard
  • now it is being normalized
  • the result is that some characters that were previously allowed are no longer allowed
  • this includes 2063, which is less than FFFF