Clang produces corrupt pch leading to crash

Hi all,

I've run into (what I believe is) a memory bug with clang while producing a precompiled header. In short, under certain circumstances, clang will write a pch that causes a crash when trying to use this pch in a later compilation unit.

Occasionally, while clang is compiling the pch, malloc complains that one of its checksums has been overwritten; while at other times, clang throws an error that a definition has a different exception specification than the declaration two lines above it (when both have no exception specification). Both these symptoms lead me to believe that somewhere clang overwrites memory.

I have stripped the code that reliably causes this crash down to a few hundred lines and have created a little script to reproduce the bug (details at http://llvm.org/bugs/show_bug.cgi?id=20026). I'm not sure, however, about your process for handling such bugs, which is why I'm cross-posting here. My main question is if there is anything else I could provide you with to help fixing this issue.

Thank you very much in advance!

Best,
Tobias

Ableton AG, Schoenhauser Allee 6-7, 10119 Berlin, Germany
Sitz (Registered Office) Berlin, Amtsgericht Berlin-Charlottenburg, HRB 72838
Vorstand (Management Board): Gerhard Behles, Jan Bohl
Vorsitzender des Aufsichtsrats (Chair of the Supervisory Board): Uwe Struck

Thanks for the detailed report! The only thing you can do more is try and debug this yourself :wink:

It might make sense to file this with apple if you’re using clang shipped with XCode as they ship their own releases. What version of clang are you using? Is the isystem flag important? What about #include ? I’ve tried to reproduce this but it’s not so straightforward because I’m on linux and some of the stuff in that bash script assumes mac os… I was wondering if it’s possible to reduce this to something that’s reproducible everywhere or if this is a mac specific issue.

Thanks for the quick feedback :slight_smile:

I’ve reproduced this on the latest master as well as several versions that ship with Xcode (and I’ve also reported it to Apple). The isystem include is only necessary in versions built from source, otherwise clang doesn’t seem to find libc++. The versions shipping with Xcode don’t need this, they seem to find it using a compiled in path.

The include seems necessary; it doesn’t crash for me when I remove it. I’ll see if I can find a Linux-box where I can reproduce it.

Thanks for looking into it.
Tobias

I’ve built a debug version of clang with address sanitizer support (-fsanitize=address), and it also suggests that it is a use-after-free bug when using (not compiling as I mistakenly claimed before) the precompiled header:

SUMMARY: AddressSanitizer: heap-use-after-free ??:0 clang::LazyOffsetPtr<clang::Decl, unsigned int, &(clang::ExternalASTSource::GetExternalDecl(unsigned int))>::operator=(clang::Decl*)
Shadow bytes around the buggy address:

0x1c3200005890: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x1c32000058a0: fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd
0x1c32000058b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c32000058c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x1c32000058d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd

Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd

(I have attached the full output from asan.)

Best,
Tobias

crashIt.log (25.5 KB)

Hi Nikola,

I can reproduce the bug on linux if I tell clang to cross-compile for osx. Attached is a minimal example. You need to build clang with asan to reproduce the crash:

…/llvm/configure --enable-libcpp CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address"

Thanks again for looking into this crash!

Best,
Tobias

clang-crash.zip (2.43 KB)

Awesome work Tobias, very interesting sequence of events. Adding Richard as he’ll know how to best reshuffle this code :smiley:

The precompiled header is good, the trouble starts in ASTContext::getCurrentKeyFunctions:

LazyDeclPtr &Entry = KeyFunctions[RD];
if (!Entry)
Entry = const_cast<CXXMethodDecl*>(computeKeyFunction(*this, RD));

KeyFunctions DenseMap will undergo reallocation inside computeKeyFunctions (see two markers in the attached callstack). Upon return Entry will point to freed memory. Good lord it took me a while to figure this out, I suck at debugging on Linux…

stack.txt (3.39 KB)

A classic :slight_smile: I've updated bugzilla to include your findings.

Awesome, thanks for hunting it down!

Tobias

Awesome work Tobias, very interesting sequence of events. Adding Richard
as he'll know how to best reshuffle this code :smiley:

The precompiled header is good, the trouble starts in
ASTContext::getCurrentKeyFunctions:

LazyDeclPtr &Entry = KeyFunctions[RD];
  if (!Entry)
    Entry = const_cast<CXXMethodDecl*>(computeKeyFunction(*this, RD));

That's a classic :slight_smile:

It should be straightforward to fix this, something like:

LazyDeclPtr *Entry = &KeyFunctions[RD];
if (!*Entry) {
  auto *KeyFunction = computeKeyFunction(*this, RD);
  // Entry might have been invalidated by computeKeyFunction.
  Entry = &KeyFunctions[RD];
  *Entry = KeyFunction;
}
// do stuff with *Entry

Do we have a reduced testcase? I suspect that such a test may be fragile to
the point of being useless, so maybe we should go without =(

KeyFunctions DenseMap will undergo reallocation inside computeKeyFunctions

The last test case I sent around is as minimal as I could get it, but still over a hundred loc. It only crashes reliably when clang is built with asan (valgrind would probably be another option). I’ve tried a patch very similar to yours; and it fixes the issue.

Maybe one could construct a simpler test case with the knowledge that it has to force the KeyFunctions DenseMap to re-allocate?

I’ve also thought about how to unit-test this, but it seems impossible to write a unit test without exposing the KeyFunctions cache in ASTContext. At this point I stopped because I have no idea what your policy is for such changes and in what general directions this code is meant to evolve.

Best,
Tobias

I think this may be one of those changes for which we need to accept that a
unit test would not be useful. Does the change I suggested upthread fix the
issue for you?

Hey Richard,

I’ll check your patch in a minute, but except for a missing const_cast I believe it should be fine just by comparing it with the patch I came up with (attached below).

Best,
Tobias

0001-Fix-crash-when-KeyFunctions-cache-reallocates-while-.patch (1.28 KB)

I was about to do the same :slight_smile: But one thing I wanted to ask is, why change Entry into a pointer? Why not just call operator[] twice? As Tobias said we’ll need to cast to CXXRecordDecl. And we’ll have to change Entry.get to Entry->get if Entry needs to be a pointer.

Ignore my last email.

Richard’s change (plus const_cast and changing Entry. to Entry->) fixes the issue. The patch Tobias proposed should also work and might be a little cleaner? It definitely needs a comment justifying the second lookup. There’s also no need to call Entry.get in this case as we know the pointer is null?

Yep, that one fixes the issue as well and passes regression tests. Disregard my comment about Entry.get. Are you ok with me committing his version Richard? With ampersand aligned with variable of course :stuck_out_tongue:

Hey Richard and Nikola,

sorry for taking so long to respond.

I checked, and both Richard's patch and mine fix the reduced test-case I reported.

However, I tried running them on the original code that caused the crash, and unfortunately, it turns out that this is not the only situation that can cause the DenseMap to grow (and in turn make Entry stale). If the LazyOffsetPtr is in offset-format and needs to be deserialized using GetExternalDecl, this deserialization in turn can also cause a reallocation of the KeyFunctions DenseMap (see the attached stack trace):

#1 llvm::DenseMap<clang::CXXRecordDecl const*, clang::LazyOffsetPtr<clang::Decl, unsigned int, &(clang::ExternalASTSource::GetExternalDecl(unsigned int))>, llvm::DenseMapInfo<clang::CXXRecordDecl const*> >::grow(unsigned int)
<...>
#7 clang::ASTDeclReader::VisitCXXRecordDeclImpl(clang::CXXRecordDecl*)
<...>
#19 clang::ASTReader::loadPendingDeclChain(unsigned int)
#20 clang::ASTReader::finishPendingActions()
#21 clang::ASTReader::FinishedDeserializing()
#22 non-virtual thunk to clang::ASTReader::FinishedDeserializing()
#23 clang::ExternalASTSource::Deserializing::~Deserializing()
#24 clang::ExternalASTSource::Deserializing::~Deserializing()
#25 clang::ASTReader::ReadDeclRecord(unsigned int)
#26 clang::ASTReader::GetDecl(unsigned int)
#27 clang::ASTReader::GetExternalDecl(unsigned int)
#28 non-virtual thunk to clang::ASTReader::GetExternalDecl(unsigned int)
#29 clang::LazyOffsetPtr<clang::Decl, unsigned int, &(clang::ExternalASTSource::GetExternalDecl(unsigned int))>::get(clang::ExternalASTSource*) const
#30 clang::ASTContext::getCurrentKeyFunction(clang::CXXRecordDecl const*)

I've failed so far to get a minimal case to reproduce this issue. The new patch I've attached fixes both crashes, but to me it seems more like a workaround than a proper fix. There seems to be a conceptual problem: Anytime you try to deref a lazy pointer in KeyFunctions that happens to be in offset format can cause this pointer to be destructed while deserializing it. Right now, I can't seem to think of any suggestion how to fix this issue that I really like. The safest would fix would probably be to change the type of KeyFunctions from

   DenseMap<const CXXRecordDecl*, LazyDeclPtr>

to something like a

  DenseMap<const CXXRecordDecl*, std::shared_ptr<LazyDeclPtr>>

(ugly) or using a map that doesn't relocate its contents (like the persistent maps from Clojure).

Thoughts?

Best,
Tobias

Hey Richard and Nikola,

sorry for taking so long to respond.

I checked, and both Richard's patch and mine fix the reduced test-case I
reported.

However, I tried running them on the original code that caused the crash,
and unfortunately, it turns out that this is not the only situation that
can cause the DenseMap to grow (and in turn make Entry stale). If the
LazyOffsetPtr is in offset-format and needs to be deserialized using
GetExternalDecl, this deserialization in turn can also cause a reallocation
of the KeyFunctions DenseMap (see the attached stack trace):

#1 llvm::DenseMap<clang::CXXRecordDecl const*,
clang::LazyOffsetPtr<clang::Decl, unsigned int,
&(clang::ExternalASTSource::GetExternalDecl(unsigned int))>,
llvm::DenseMapInfo<clang::CXXRecordDecl const*> >::grow(unsigned int)
<...>
#7 clang::ASTDeclReader::VisitCXXRecordDeclImpl(clang::CXXRecordDecl*)
<...>
#19 clang::ASTReader::loadPendingDeclChain(unsigned int)
#20 clang::ASTReader::finishPendingActions()
#21 clang::ASTReader::FinishedDeserializing()
#22 non-virtual thunk to clang::ASTReader::FinishedDeserializing()
#23 clang::ExternalASTSource::Deserializing::~Deserializing()
#24 clang::ExternalASTSource::Deserializing::~Deserializing()
#25 clang::ASTReader::ReadDeclRecord(unsigned int)
#26 clang::ASTReader::GetDecl(unsigned int)
#27 clang::ASTReader::GetExternalDecl(unsigned int)
#28 non-virtual thunk to clang::ASTReader::GetExternalDecl(unsigned int)
#29 clang::LazyOffsetPtr<clang::Decl, unsigned int,
&(clang::ExternalASTSource::GetExternalDecl(unsigned
int))>::get(clang::ExternalASTSource*) const
#30 clang::ASTContext::getCurrentKeyFunction(clang::CXXRecordDecl const*)

I've failed so far to get a minimal case to reproduce this issue. The new
patch I've attached fixes both crashes, but to me it seems more like a
workaround than a proper fix. There seems to be a conceptual problem:
Anytime you try to deref a lazy pointer in KeyFunctions that happens to be
in offset format can cause this pointer to be destructed while
deserializing it. Right now, I can't seem to think of any suggestion how to
fix this issue that I really like. The safest would fix would probably be
to change the type of KeyFunctions from

   DenseMap<const CXXRecordDecl*, LazyDeclPtr>

to something like a

  DenseMap<const CXXRecordDecl*, std::shared_ptr<LazyDeclPtr>>

(ugly) or using a map that doesn't relocate its contents (like the
persistent maps from Clojure).

Thoughts?

Fixed in r212437.

You even came up with a test, nice. I’ll close the bug.

Awesome! I checked, this fixes all I could reproduce crashes (also under asan).

Great work, thanks!

Tobias