UTF-8 conversion speed

At CPPcon last week, I saw a talk by Bob Steagall called
"Fast Conversion From UTF-8 with C++, DFAs, and SSE Intrinsics."
Part of this talk included data from a half-dozen or so conversion
libraries... one of which was labeled "LLVM".

The LLVM converters were invariably the slowest.
On Windows, the mbtowc (or something like that) syscall was pretty good.

Steagall's converters were of course wicked fast, even before he started
playing tricks with SSE intrinsics. I found his stuff at the following
link (note CppNow not CppCon) if anyone is interested in following up.



UTF conversion is not on any hot paths, as far as I know, so nobody has spent any time optimizing it. If you're interested in the history of the LLVM code, see https://reviews.llvm.org/rC68208 ; it's mostly untouched since then, except for a few bugfixes.


It may not be in llvm, but it is in Android. I did NEON versions of UTF functions in H2 2014 and they’ve been in Samsung’s Android versions for several years, making single digit percentage speedups in benchmarks.




I think moving away from it should be encouraged, assuming use of that file can be removed.



That bug is clearly bogus. Whether the copyright indication is correct
is a separate question, but the claim of the PR is just wrong.


Probably best to leave the license lawyering to lawyers & not public mailing lists.

Just as an FYI, I’m relaying questions to the Foundation’s lawyer and will report back if there is anything we need to do here.