Crash on invalid during LLVMContext destruction MDNode::dropAllReferences

Hi Duncan,

I came across something like the following recently which I guess might be related to your recent work. Any ideas?

$ clang+±tot -cc1 crash_on_invalid.cpp -g -emit-obj -fexceptions -fcxx-exceptions
crash_on_invalid.cpp:13:1: error: C++ requires a type specifier for all declarations
x;
^
1 error generated.
*** Error in `clang+±tot’: corrupted double-linked list: 0x000000000754f340 ***
^C
blaikie@blaikie-linux:/tmp/dbginfo$ cat crash_on_invalid.cpp
// RUN: %clang_cc1 -fexceptions -fcxx-exceptions -g -std=c++11 -S -emit-llvm %s -o - | FileCheck %s

extern “C” __complex float complex_src();

struct foo {
__complex float k;
foo();
};

foo::foo()
: k(complex_src()) {
}
x;

In some nearby/related test cases rather than a vague corrupted double-linked list error, I get a stack something like:

#0 0x1efe5de llvm::sys::PrintStackTrace(_IO_FILE*) /usr/local/google/home/blaikie/dev/llvm/src/lib/Support/Unix/Signals.inc:422:15
#1 0x1eff37b PrintStackTraceSignalHandler(void*) /usr/local/google/home/blaikie/dev/llvm/src/lib/Support/Unix/Signals.inc:481:1
#2 0x1f01653 SignalHandler(int) /usr/local/google/home/blaikie/dev/llvm/src/lib/Support/Unix/Signals.inc:198:60
#3 0x7f6893e22340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#4 0x1ae3a9f bool llvm::DenseMapBase<llvm::SmallDenseMap<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, 4u, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >, void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >::LookupBucketFor<void*>(void* const&, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > const*&) const /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/ADT/DenseMap.h:495:34
#5 0x1ae3968 bool llvm::DenseMapBase<llvm::SmallDenseMap<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, 4u, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >, void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >::LookupBucketFor<void*>(void* const&, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> >&) /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/ADT/DenseMap.h:525:10
#6 0x1ad2453 llvm::DenseMapBase<llvm::SmallDenseMap<void
, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, 4u, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >, void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long>, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, std::pair<llvm::PointerUnion<llvm::MetadataAsValue*, llvm::Metadata*>, unsigned long> > >::erase(void* const&) /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/ADT/DenseMap.h:200:10
#7 0x1ad23da llvm::ReplaceableMetadataImpl::dropRef(void*) /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/Metadata.cpp:134:8
#8 0x1ae5c6a llvm::MetadataTracking::untrack(void*, llvm::Metadata&) /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/MetadataTracking.cpp:43:1
#9 0xa1282c llvm::MetadataTracking::untrack(llvm::Metadata*&) /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/IR/MetadataTracking.h:69:59
#10 0x1ae5221 llvm::MDOperand::untrack() /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/IR/Metadata.h:562:3
#11 0x1ad6d84 llvm::MDOperand::reset(llvm::Metadata*, llvm::Metadata*) /usr/local/google/home/blaikie/dev/llvm/src/include/llvm/IR/Metadata.h:545:5
#12 0x1ad4efd llvm::MDNode::setOperand(unsigned int, llvm::Metadata*) /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/Metadata.cpp:764:1
#13 0x1ad58cd llvm::MDNode::dropAllReferences() /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/Metadata.cpp:492:49
#14 0x1aaccc9 llvm::LLVMContextImpl::~LLVMContextImpl() /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/LLVMContextImpl.cpp:142:5
#15 0x1aaa24d llvm::LLVMContext::~LLVMContext() /usr/local/google/home/blaikie/dev/llvm/src/lib/IR/LLVMContext.cpp:97:31
#16 0x2555d3b clang::CodeGenAction::~CodeGenAction() /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/lib/CodeGen/CodeGenAction.cpp:578:5
#17 0x2558295 clang::EmitLLVMAction::~EmitLLVMAction() /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/include/clang/CodeGen/CodeGenAction.h:77:7
#18 0x25582b9 clang::EmitLLVMAction::~EmitLLVMAction() /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/include/clang/CodeGen/CodeGenAction.h:77:7
#19 0x213a2e2 std::default_deleteclang::FrontendAction::operator()(clang::FrontendAction*) const /usr/local/google/home/blaikie/install/bin/…/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/…/…/…/…/include/c++/4.9.0/bits/unique_ptr.h:77:7
#20 0x213a256 std::unique_ptr<clang::FrontendAction, std::default_deleteclang::FrontendAction >::~unique_ptr() /usr/local/google/home/blaikie/install/bin/…/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/…/…/…/…/include/c++/4.9.0/bits/unique_ptr.h:237:2
#21 0x228a7c1 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:226:1
#22 0x9f8a47 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/tools/driver/cc1_main.cpp:110:3
#23 0x9ee003 ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/tools/driver/driver.cpp:369:12
#24 0x9ed057 main /usr/local/google/home/blaikie/dev/llvm/src/tools/clang/tools/driver/driver.cpp:415:12
#25 0x7f689354eec5 __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:321:0
#26 0x9ecaf4 _start (/mnt/fast/dev/llvm/build/clang/debug/split/notypes/nostandalone/bin/clang-3.5+0x9ecaf4)

Not at a computer right now, but it looks like teardown isn't working correctly. Do you have an asserts build? Does an assertion fire there?

Looking at the stack trace, dropAllReferences() is being called on a node, so it sets its operands to nullptr, and some operand has RAUW support (so the tracking needs to be dropped) but looks like it might have been deleted or is otherwise corrupt. Hard to tell though.

Does this reproduce from preprocessed source? Can you send it to me?

Or maybe that's a test case in your email. I'll try it in the morning.

-- dpnes

Asan didn't find it either, and then I realized I was using the RUN line
instead of the command-line you were using (with -g). So the asan dump
follows.

I'll look into this when I get to work. Definitely from my stuff somehow.

$ /Users/dexonsmith/data/llvm.asan/staging/bin/clang -cc1 crash.cpp -g -emit-obj -fexceptions -fcxx-exceptions
crash.cpp:13:1: error: C++ requires a type specifier for all declarations
x;
^
1 error generated.

>
>>
>>
>>
>>
>> Not at a computer right now, but it looks like teardown isn't working
correctly. Do you have an asserts build? Does an assertion fire there?
>>
>> That was with an asserts build.
>>
>> Looking at the stack trace, dropAllReferences() is being called on a
node, so it sets its operands to nullptr, and some operand has RAUW support
(so the tracking needs to be dropped) but looks like it might have been
deleted or is otherwise corrupt. Hard to tell though.
>>
>> Does this reproduce from preprocessed source? Can you send it to me?
>>
>> Or maybe that's a test case in your email. I'll try it in the morning.
>>
>> Yeah, just the test code in the original email is what I reproduced the
linked list error with - some variations of it produced the assertion...
maybe valgrinding or asanified clang would make the failure more reliable,
etc.
>>
>
> The version here doesn't repro for me (don't have an asan build handy --
> I'll build one -- but I tried the weaker gmalloc). I tried messing with
> it but nothing happened.
>
> Can you send a version that gets the stack trace?
>
> (What revision is this, by the way? ToT as of last night?)

Asan didn't find it either, and then I realized I was using the RUN line
instead of the command-line you were using (with -g).

Ah, right - sorry about that red herring.

So the asan dump follows.

Looks related to the stack I was seeing.

I'll look into this when I get to work. Definitely from my stuff somehow.

OK - wasn't any particular rush for me, I just ran into this while writing
test cases for my debug line quality stuff in clang (accidentally
introduced a call to a function that didn't exist, which produced the
crash).

Based on how I saw it, I'm guessing it's something to do with "LLVMContext
has trouble being destroyed if <some task that we only do when successfully
finishing codegen and is skipped if we abort codegen due to an error in the

Thanks for the report. I'm really glad you told me now, since
moving MDLocation into place would have removed this way of
exposing it and it would have been way harder to track down
(MDLocation doesn't use ValueAsMetadata for its integer
operands, and that Value-indirection (with its RAUW behaviour)
is necessary to expose this).

This is the fix:

diff --git a/lib/IR/Metadata.cpp b/lib/IR/Metadata.cpp
index ab83252..302ffb4 100644
--- a/lib/IR/Metadata.cpp
+++ b/lib/IR/Metadata.cpp
@@ -167,6 +171,8 @@ void ReplaceableMetadataImpl::replaceAllUsesWith(Metadata *MD) {
     return L.second.second < R.second.second;
   });
   for (const auto &Pair : Uses) {
+ if (!UseMap.count(Pair.first))
+ continue;
     OwnerTy Owner = Pair.second.first;
     if (!Owner) {
       // Update unowned tracking references directly.

I just need to figure out a formula for reproducing reliably (in
a unit test) and I'll commit.

Another thing this exposed (which would have also hidden this
problem) is that we're not proactively dropping references on
`ValueAsMetadata` during teardown. We should. This RAUW traffic
is unnecessary. (It doesn't happen in the "common" case, where
the graph is complete (and RAUW has been dropped from MDTuples),
which is why this bug only fires when there's an error.)

Not at a computer right now, but it looks like teardown isn't working correctly. Do you have an asserts build? Does an assertion fire there?

That was with an asserts build.

Looking at the stack trace, dropAllReferences() is being called on a node, so it sets its operands to nullptr, and some operand has RAUW support (so the tracking needs to be dropped) but looks like it might have been deleted or is otherwise corrupt. Hard to tell though.

Does this reproduce from preprocessed source? Can you send it to me?

Or maybe that's a test case in your email. I'll try it in the morning.

Yeah, just the test code in the original email is what I reproduced the linked list error with - some variations of it produced the assertion... maybe valgrinding or asanified clang would make the failure more reliable, etc.

The version here doesn't repro for me (don't have an asan build handy --
I'll build one -- but I tried the weaker gmalloc). I tried messing with
it but nothing happened.

Can you send a version that gets the stack trace?

(What revision is this, by the way? ToT as of last night?)

Asan didn't find it either, and then I realized I was using the RUN line
instead of the command-line you were using (with -g).

Ah, right - sorry about that red herring.

So the asan dump follows.

Looks related to the stack I was seeing.

I'll look into this when I get to work. Definitely from my stuff somehow.

OK - wasn't any particular rush for me, I just ran into this while writing test cases for my debug line quality stuff in clang (accidentally introduced a call to a function that didn't exist, which produced the crash).

Based on how I saw it, I'm guessing it's something to do with "LLVMContext has trouble being destroyed if <some task that we only do when successfully finishing codegen and is skipped if we abort codegen due to an error in the source> is not done first".

Thanks for the report. I'm really glad you told me now, since
moving MDLocation into place would have removed this way of
exposing it and it would have been way harder to track down
(MDLocation doesn't use ValueAsMetadata for its integer
operands, and that Value-indirection (with its RAUW behaviour)
is necessary to expose this).

This is the fix:

diff --git a/lib/IR/Metadata.cpp b/lib/IR/Metadata.cpp
index ab83252..302ffb4 100644
--- a/lib/IR/Metadata.cpp
+++ b/lib/IR/Metadata.cpp
@@ -167,6 +171,8 @@ void ReplaceableMetadataImpl::replaceAllUsesWith(Metadata *MD) {
    return L.second.second < R.second.second;
  });
  for (const auto &Pair : Uses) {
+ if (!UseMap.count(Pair.first))
+ continue;
    OwnerTy Owner = Pair.second.first;
    if (!Owner) {
      // Update unowned tracking references directly.

I just need to figure out a formula for reproducing reliably (in
a unit test) and I'll commit.

r226029, thanks again. (I'll pull this into 3.6 when I get a moment.)

BTW, I also did this (in r226044).