LLDB sometimes escapes unicode characters, sometimes not

Hey,

We’ve noticed that LLDB sometimes escapes certain characters (e.g. in the +128/negative range) of const char* strings and sometimes it doesn’t. In particular, this happens for unicode strings:

C++:
const char* str = u8":joy:";

LLDB:
(lldb) expr str
(const char ) $0 = 0x00007ff662489d18 “≡ƒÿé”
(lldb) expr (const char
)str
(const char *) $1 = 0x00007ff662489d18 “\xfffffff0\xffffff9f\xffffff98\xffffff82”

To my understanding, evaluating ‘str’ and ‘(const char*)str’ should be the same since str is already a const char*.

We’ve found that the code takes a different path at this location:

https://source.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/lldb/source/Core/FormatEntity.cpp;l=865;rcl=294541377

Any idea what’s going on? We’d like to get the unescaped strings. Is it possible to enforce this?

Thanks,

  • Lutz

Sorry, the link was internal. Here’s the proper on:
https://github.com/llvm-mirror/lldb/blob/master/source/Core/FormatEntity.cpp#L861

Attempt at a fix: https://reviews.llvm.org/D76650