Hi,
I have decided to go the path of generating LLVM IR source from my bootstrap compiler so that I can code it in C# instead of C++ (it is going to be thrown away anyway). As my language is going to support Unicode, I’m wondering about whether or not LLVM is capable of reading UTF-8 encoded source files? I’m not thinking about string literals - they are easy to handle by outputing them as a raw, albeit encoded byte sequence, but rather Unicode-enabled program identifiers.
Do I need to keep my LLVM IR source file as a strict ASCII file or can the reader handle UTF8?
Thanks in advance,
Mikael Lyngvig
– Frogs dug channels on Earth millions of years before Man looked to Mars.
Hi,
I have decided to go the path of generating LLVM IR source from my bootstrap
compiler so that I can code it in C# instead of C++ (it is going to be
thrown away anyway). As my language is going to support Unicode, I'm
wondering about whether or not LLVM is capable of reading UTF-8 encoded
source files? I'm not thinking about string literals - they are easy to
handle by outputing them as a raw, albeit encoded byte sequence, but rather
Unicode-enabled program identifiers.
Do I need to keep my LLVM IR source file as a strict ASCII file or can the
reader handle UTF8?
Since no one has responded and I'm not sure I'll wager a guess of
"probably the former" since as far as I can tell all of the strings
from other languages end up being encoded as escape sequences.
I'm sure if I'm wrong someone will correct me. 
-eric