Russian characters lead to wrong SourceLocation start/end!


I’ve just found that if UnsavedFile has russian characters in it’s content, SourceLocation’s start/end is calculated wrong - each symbol length is calculated to be 2 bytes instead of 1. This leads to wrong end and all other tokens locatoin while clang_tokenize().

I’ve check it on 3.3 on mac and linux (android via JNI) since i can’t see 3.4 release (though it was scheduled to be released in december) notes and downloads.

Any confirmation/suggestions?

PS. I’ve also posted bug report but i can’t see it yet.

Regards, Anton.

bug report:

I’ve just extracted/build and installed llvm, clang, compiler-rt etc for branches/release_34 and this issue relates to 3.4 release too.
IMHO it’s pretty important issue that should be fixed before 3.4 release…

Regards, Anton.

Source locations are measured in bytes, not in characters, with the assumption that most editors are better-equipped to deal with byte offsets. So I think this is correct.