Strlen segfault with TrailingObjects?

I keep getting a segfault in StringLiteral when using getTrailingObjects() related to new/strlen.

and I don’t understand why, I’m trying to create an ArrayRef from StringLiteral’s trailing objects.

like, they’re already there, do I need to allocate them again?

I don’t understand any of this.

I don’t have a good idea of exactly what you’re trying to do, but I would sugegst instead of trying to access the trailing objects directly to just use getBytes or getString, The former doesn’t check the character byte width, so it is likely useful if you aren’t considering that case.

If its not a a char byte width of ‘1’, you can cast the pointer to the right type (uint16_t or uint32_t).

1 Like

Likewise it is hard to diagnose without knowing more of what you’re doing, but StringLiteral is intended for nul terminated strings with static lifetimes (e.g. “foo” in C/C++). Are you using it with dynamically allocated things?

1 Like

I’m working on adding format string checking to wchar_t/char16_t/char32_t, I’m converting the StringLiteral to UTF-8 and putting that through the checker.

@erichkeane

return ArrayRef<llvm::UTF16>(reinterpret_cast<llvm::UTF16>(getStrDataAsChar()), reinterpret_cast<llvm::UTF16>(getStrDataAsChar() + getByteLength()));

" error: cast from pointer to smaller type ‘llvm::UTF16’ (aka ‘unsigned short’) loses information"

So yeah, back to square one I guess.

original:

`
ArrayRef<llvm::UTF16> getArrayRef16() const {

return ArrayRef<llvm::UTF16>(
reinterpret_cast<const llvm:: UTF16 *>(getTrailingObjects()),
getLength() * sizeof(llvm:: UTF16));
}
`

I think you need just return ArrayRef<llvm::UTF16>(reinterpret_cast<const llvm:: UTF16 *>(getTrailingObjects<char>()), getLength());; ArrayRef takes the length of the array, not the size in bytes.

1 Like

It crashes with just getLength() without the times sizeof part too, I’ve tried everything I can think of.

Does StringLiteral::getBytes() work?

I’m recompiling using getBytes().str().c_str(), theres warnings about returning temporaries but I just wanna know if the pointer is valid before I start copying data for real.

Edit: Failed, strlen error again with getBytes().str().c_str() trying getBytes().data()

Error still there.

the ArrayRef code is in StringLiteral, it’s called by getStringAsChar()

READ of size 19 at 0x62500010e580 thread T0 #0 0x119412554 in wrap_strlen (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x15554)
    #1 0x111dbe2d3 in clang::StringLiteral::getStringAsChar() const (/Users/Marcus/Source/LLVM_NINJA/./bin/clang:x86_64+0x10c15d2d3)

getStringAsChar():

  std::string StringLiteral::getStringAsChar() const {
  std::string Output;
  switch (getKind()) {
  case StringKind::Ascii:
    LLVM_FALLTHROUGH;
  case StringKind::UTF8:
    return getTrailingObjects<char>();
    break;
  case StringKind::UTF16: {
    llvm::convertUTF16ToUTF8String(getArrayRef16(), Output);
    return Output;
    break;
  }
  case StringKind::UTF32: {
    llvm::convertUTF32ToUTF8String(getArrayRef32(), Output);
    return Output;
    break;
  }
  case StringKind::Wide: {
    llvm::convertWideToUTF8(getStringAsWChar(), Output);
    return Output;
    break;
  }
  }
  }

getArrayRef16():

ArrayRef<llvm::UTF16> getArrayRef16() const {
    assert(getCharByteWidth() == 2 &&
           "This function is used in places that assume strings use char16_t");
    return ArrayRef<llvm::UTF16>(reinterpret_cast<const llvm::UTF16 *>(getBytes().data()), getBytes().size());
    }

getArrayRef32():

ArrayRef<llvm::UTF32> getArrayRef32() const {
    assert(getCharByteWidth() == 4 &&
           "This function is used in places that assume strings use char32_t");
    return ArrayRef<llvm::UTF32>(reinterpret_cast<const llvm::UTF32 *>(getBytes().data()), getBytes().size());
  }

It still isn’t clear to me what you’re attempting to do. The problem with:
" " error: cast from pointer to smaller type ‘llvm::UTF16’ (aka ‘unsigned short’) loses information "
is of course because you tried to reinterpret cast the getStrDataAsChar result (which is a pointer) to a short.

Anything with getStringAsChar that does not store its return value as a std::string is obviously going to have a pointer to deleted memory (since the std::string destructor will run). By the current interface of clang::StringLiteral, Eli’s suggestion of getArrayRef16’ without the ‘sizeof’ I believe is the correct thing. You can do something similar by acquiring the pointer with getBytes if you are outside of the StringLiteral type.

The valgrind-looking crash you have looks like someone is trying to run strlen on one of the outputs, however the trailing storage is NOT stored with a trailing nullptr (the docs say: " “foo” or L"bar" (wide strings). The actual string data can be obtained with getBytes() and is NOT null-terminated. The length of the string data is determined by calling getByteLength().").

So I have to copy the data to a temp buffer that is getByteLength + 1 for a NULL terminator you’re saying?

If you want a Null Terminator, I believe that is what you’ll have to do (OR, implement whatever this is in a way that does not depend on one being present).

1 Like

Oh I’m sorry, I was referring to llvm::StringLiteral and I realize now you’re probably talking about clang::StringLiteral

1 Like