Making Clang support UTF-16 input?

When trying out the new LLVM Visual Studio integration extension [1]
that now supports VS 2017, we learned that at some point, VS changed
the editor to encode new project files as utf-16 by default [2].

Since Clang doesn't support utf-16 input, this creates a bad
experience for users trying out Clang on a new Visual Studio project.

Should we make Clang support utf-16 input?

Nico pointed out there was a patch [3] by Scott Conger a long time ago
to support -finput-charset=.

That seems a bit more ambitious than what's necessary here. What I was
thinking was something like, if -fallow-utf16t is passed (maybe a
clang-cl default), instead of erroring out on a byte-order mark, Clang
would try to convert to utf-8.

Scott's patch hooked into FileManager, but that's also used for PCH
and such non-source files so makes me a little nervous. Maybe
SourceManager would be a better place. One idea would be to do this in
SourceManager::ContentCache::getBuffer where byte-order-markers are
currently diagnosed: instead of emitting an error, if the flag is set
we'd convert to utf-8 and swap out the buffer.

The complexity around Clang's virtual filesystem, remapped files and
stuff makes me a little nervous though. Are there more gotchas, or
does this sound like a good way to do it?

Thanks,
Hans

1. LLVM Compiler Toolchain - Visual Studio Marketplace
2. Visual Studio Feedback
3. http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20110711/044059.html

When trying out the new LLVM Visual Studio integration extension [1]
that now supports VS 2017, we learned that at some point, VS changed
the editor to encode new project files as utf-16 by default [2].

It looks like that is only the case for the pre-created files as part of the project (not new files you create yourself), and is (now) being properly considered a bug. “[Jul 29 at 10:23 PM] We are so sorry for you are experience, We have changed the state of the feedback and also we tracked the issue with a active bug. Thanks for making VS better.”

Since Clang doesn’t support utf-16 input, this creates a bad

experience for users trying out Clang on a new Visual Studio project

Since that’s due to VS doing something silly, which seems likely to be fixed soon…

Should we make Clang support utf-16 input?

…I’d say it’s not really worth the bother.