Metadata/Value split has landed

The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the
C++ side of it. This was a rocky day, but I suppose that's what I get
for failing to stage the change in smaller pieces.

As of r223916 (lldb), I'm not aware of any remaining (in-tree) breakage,
so if I've missed some problem in the sea of buildbot errors, please
flag me down.

I'll follow up soon with bitcode and assembly changes!

The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the
C++ side of it. This was a rocky day, but I suppose that's what I get
for failing to stage the change in smaller pieces.

As of r223916 (lldb), I'm not aware of any remaining (in-tree) breakage,
so if I've missed some problem in the sea of buildbot errors, please
flag me down.

I'll follow up soon with bitcode and assembly changes!

Hi Duncan,

I started getting random assertion failures in some tests yesterday, and I think
it may be related to this change. Here is the stack trace:

#0 0x00007ffff59f4c39 in raise () from /lib64/libc.so.6
#1 0x00007ffff59f6348 in abort () from /lib64/libc.so.6
#2 0x00007ffff59edb96 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff59edc42 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff3a30e92 in llvm::LeakDetectorImpl<void>::addGarbage(void const*) [clone .part.19] () from /opt/buildbot/lib/libLLVM-3.6svn.so
#5 0x00007ffff3a30fd3 in llvm::LeakDetector::addGarbageObjectImpl(void*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#6 0x00007ffff3a40eed in llvm::MDNode::getTemporary(llvm::LLVMContext&, llvm::ArrayRef<llvm::Metadata*>) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#7 0x00007ffff3426b3f in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#8 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#9 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#10 0x00007ffff3426eed in llvm::MapValue(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#11 0x00007ffff3426f39 in llvm::MapValue(llvm::MDNode const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#12 0x00007ffff3427174 in llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#13 0x00007ffff3755786 in (anonymous namespace)::ModuleLinker::linkGlobalValueBody(llvm::GlobalValue&) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#14 0x00007ffff375767f in llvm::Linker::linkInModule(llvm::Module*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#15 0x00007ffff3758cfb in llvm::Linker::LinkModules(llvm::Module*, llvm::Module*, std::function<void (llvm::DiagnosticInfo const&)>) ()
   from /opt/buildbot/lib/libLLVM-3.6svn.so
#16 0x00007ffff6c9d8cf in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from /opt/buildbot/lib/libOpenCL.so.1
#17 0x00007ffff6e61f23 in clang::ParseAST(clang::Sema&, bool, bool) () from /opt/buildbot/lib/libOpenCL.so.1
#18 0x00007ffff6c9e6bb in clang::CodeGenAction::ExecuteAction() () from /opt/buildbot/lib/libOpenCL.so.1
#19 0x00007ffff6b7ead6 in clang::FrontendAction::Execute() () from /opt/buildbot/lib/libOpenCL.so.1
#20 0x00007ffff6b5d179 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /opt/buildbot/lib/libOpenCL.so.1
#21 0x00007ffff6b1282c in (anonymous namespace)::compile_llvm (llvm_ctx=...,
    source="\n__kernel void test_fn(__local float *sSharedStorage, __global float *srcValues, __global uint *offsets, __global float *destBuffer, uint alignmentOffset )\n{\n int tid = get_global_id( 0 );\n sSha"..., headers=..., name="input.cl", triple="r600--", processor="verde", opts="",
    address_spaces=..., optimization_level=@0x7fffffff21cc: 2, r_log=...) at llvm/invocation.cpp:255
#22 0x00007ffff6b140c8 in clover::compile_program_llvm (source=..., headers=..., ir=ir@entry=PIPE_SHADER_IR_NATIVE, target=..., opts=..., r_log=...)
    at llvm/invocation.cpp:710
#23 0x00007ffff6b0a371 in clover::program::build (this=this@entry=0x23a0530, devs=..., opts=opts@entry=0x7ffff793dc0d "", headers=...)
    at core/program.cpp:63
#24 0x00007ffff6af31c4 in clBuildProgram (d_prog=0x23a0538, num_devs=0, d_devs=0x0, p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)
    at api/program.cpp:182

Does this look related? If so, let me know what other information you need to
try to debug this issue.

Thanks,
Tom

This could be related; I'm not sure.

It looks like a leak detection assertion, and I didn't need to change
that logic at all. `ValueMap` calls `MDNode::getTemporary()` and
`MDNode::deleteTemporary()` in the same ways it used to (and I didn't
touch the implementation of those).

Can you reproduce this with `llvm-link`? If so, that sounds like the
best place to start.

>
>> The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the
>> C++ side of it. This was a rocky day, but I suppose that's what I get
>> for failing to stage the change in smaller pieces.
>>
>> As of r223916 (lldb), I'm not aware of any remaining (in-tree) breakage,
>> so if I've missed some problem in the sea of buildbot errors, please
>> flag me down.
>>
>> I'll follow up soon with bitcode and assembly changes!
>
> Hi Duncan,
>
> I started getting random assertion failures in some tests yesterday, and I think
> it may be related to this change. Here is the stack trace:
>
> #0 0x00007ffff59f4c39 in raise () from /lib64/libc.so.6
> #1 0x00007ffff59f6348 in abort () from /lib64/libc.so.6
> #2 0x00007ffff59edb96 in __assert_fail_base () from /lib64/libc.so.6
> #3 0x00007ffff59edc42 in __assert_fail () from /lib64/libc.so.6
> #4 0x00007ffff3a30e92 in llvm::LeakDetectorImpl<void>::addGarbage(void const*) [clone .part.19] () from /opt/buildbot/lib/libLLVM-3.6svn.so
> #5 0x00007ffff3a30fd3 in llvm::LeakDetector::addGarbageObjectImpl(void*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
> #6 0x00007ffff3a40eed in llvm::MDNode::getTemporary(llvm::LLVMContext&, llvm::ArrayRef<llvm::Metadata*>) () from /opt/buildbot/lib/libLLVM-3.6svn.so
> #7 0x00007ffff3426b3f in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #8 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #9 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #10 0x00007ffff3426eed in llvm::MapValue(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #11 0x00007ffff3426f39 in llvm::MapValue(llvm::MDNode const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #12 0x00007ffff3427174 in llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #13 0x00007ffff3755786 in (anonymous namespace)::ModuleLinker::linkGlobalValueBody(llvm::GlobalValue&) () from /opt/buildbot/lib/libLLVM-3.6svn.so
> #14 0x00007ffff375767f in llvm::Linker::linkInModule(llvm::Module*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
> #15 0x00007ffff3758cfb in llvm::Linker::LinkModules(llvm::Module*, llvm::Module*, std::function<void (llvm::DiagnosticInfo const&)>) ()
> from /opt/buildbot/lib/libLLVM-3.6svn.so
> #16 0x00007ffff6c9d8cf in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from /opt/buildbot/lib/libOpenCL.so.1
> #17 0x00007ffff6e61f23 in clang::ParseAST(clang::Sema&, bool, bool) () from /opt/buildbot/lib/libOpenCL.so.1
> #18 0x00007ffff6c9e6bb in clang::CodeGenAction::ExecuteAction() () from /opt/buildbot/lib/libOpenCL.so.1
> #19 0x00007ffff6b7ead6 in clang::FrontendAction::Execute() () from /opt/buildbot/lib/libOpenCL.so.1
> #20 0x00007ffff6b5d179 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /opt/buildbot/lib/libOpenCL.so.1
> #21 0x00007ffff6b1282c in (anonymous namespace)::compile_llvm (llvm_ctx=...,
> source="\n__kernel void test_fn(__local float *sSharedStorage, __global float *srcValues, __global uint *offsets, __global float *destBuffer, uint alignmentOffset )\n{\n int tid = get_global_id( 0 );\n sSha"..., headers=..., name="input.cl", triple="r600--", processor="verde", opts="",
> address_spaces=..., optimization_level=@0x7fffffff21cc: 2, r_log=...) at llvm/invocation.cpp:255
> #22 0x00007ffff6b140c8 in clover::compile_program_llvm (source=..., headers=..., ir=ir@entry=PIPE_SHADER_IR_NATIVE, target=..., opts=..., r_log=...)
> at llvm/invocation.cpp:710
> #23 0x00007ffff6b0a371 in clover::program::build (this=this@entry=0x23a0530, devs=..., opts=opts@entry=0x7ffff793dc0d "", headers=...)
> at core/program.cpp:63
> #24 0x00007ffff6af31c4 in clBuildProgram (d_prog=0x23a0538, num_devs=0, d_devs=0x0, p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)
> at api/program.cpp:182
>
> Does this look related? If so, let me know what other information you need to
> try to debug this issue.

This could be related; I'm not sure.

I'm pretty sure that this commit is the cause of the regression.

r223801 works and r223810 does not, and I don't think any of the other
commits in that range could cause this.

It looks like a leak detection assertion, and I didn't need to change
that logic at all. `ValueMap` calls `MDNode::getTemporary()` and
`MDNode::deleteTemporary()` in the same ways it used to (and I didn't
touch the implementation of those).

Can you reproduce this with `llvm-link`? If so, that sounds like the
best place to start.

I can't reproduce this using llvm-link unfortunately. Any other ideas?

-Tom

(Continuing via email, since Tom stepped away IRC.)

Tom, from the trace [1], it the problematic pointer (0x27e4c80) only shows
up once.

[1]: http://people.freedesktop.org/~tstellar/md-crash.out

That means that something *else* -- other than `MDNode::getTemporary()`
-- must be adding that address to garbage and failing to remove it.

I just dug into `LeakDetector::addGarbageObject()` and it stores *all*
calls to `addGarbage()` in the same place. There are a fair number of
these in the IR:

$ git grep -e addGarbageObject -w -- lib/IR/
lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);
lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: if (!P) LeakDetector::addGarbageObject(this);
lib/IR/Metadata.cpp: LeakDetector::addGarbageObject(N);

I think the next step is to identify who called `addGarbageObject()` with
the problematic address, and what the stack trace was.

The weird thing is, I also noticed a semantic change I made here
accidentally. `addGarbageObject()` has two overloads: `void*` and
`const Value*`. `MDNode::getTemporary()` used to match the latter, but
now it matches the former.

The weird part: all the other calls to `addGarbageObject()` look like they
send in a `Value *`.

Do you have any other calls to `addGarbageObject()`? Do they match
`void *`?

Also, what happens with the attached patch? (If this fixes your problem,
I think it's just papering over something...)

0001-IR-Detect-Metadata-leaks-separately-from-generic-obj.patch (4.27 KB)

+zalman@google.com

The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the
C++ side of it. This was a rocky day, but I suppose that's what I get
for failing to stage the change in smaller pieces.

As of r223916 (lldb), I'm not aware of any remaining (in-tree) breakage,
so if I've missed some problem in the sea of buildbot errors, please
flag me down.

I'll follow up soon with bitcode and assembly changes!

Hi Duncan,

I started getting random assertion failures in some tests yesterday, and I think
it may be related to this change. Here is the stack trace:

#0 0x00007ffff59f4c39 in raise () from /lib64/libc.so.6
#1 0x00007ffff59f6348 in abort () from /lib64/libc.so.6
#2 0x00007ffff59edb96 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff59edc42 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff3a30e92 in llvm::LeakDetectorImpl<void>::addGarbage(void const*) [clone .part.19] () from /opt/buildbot/lib/libLLVM-3.6svn.so
#5 0x00007ffff3a30fd3 in llvm::LeakDetector::addGarbageObjectImpl(void*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#6 0x00007ffff3a40eed in llvm::MDNode::getTemporary(llvm::LLVMContext&, llvm::ArrayRef<llvm::Metadata*>) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#7 0x00007ffff3426b3f in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#8 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#9 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#10 0x00007ffff3426eed in llvm::MapValue(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#11 0x00007ffff3426f39 in llvm::MapValue(llvm::MDNode const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#12 0x00007ffff3427174 in llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#13 0x00007ffff3755786 in (anonymous namespace)::ModuleLinker::linkGlobalValueBody(llvm::GlobalValue&) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#14 0x00007ffff375767f in llvm::Linker::linkInModule(llvm::Module*) () from /opt/buildbot/lib/libLLVM-3.6svn.so
#15 0x00007ffff3758cfb in llvm::Linker::LinkModules(llvm::Module*, llvm::Module*, std::function<void (llvm::DiagnosticInfo const&)>) ()
from /opt/buildbot/lib/libLLVM-3.6svn.so
#16 0x00007ffff6c9d8cf in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from /opt/buildbot/lib/libOpenCL.so.1
#17 0x00007ffff6e61f23 in clang::ParseAST(clang::Sema&, bool, bool) () from /opt/buildbot/lib/libOpenCL.so.1
#18 0x00007ffff6c9e6bb in clang::CodeGenAction::ExecuteAction() () from /opt/buildbot/lib/libOpenCL.so.1
#19 0x00007ffff6b7ead6 in clang::FrontendAction::Execute() () from /opt/buildbot/lib/libOpenCL.so.1
#20 0x00007ffff6b5d179 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /opt/buildbot/lib/libOpenCL.so.1
#21 0x00007ffff6b1282c in (anonymous namespace)::compile_llvm (llvm_ctx=...,
source="\n__kernel void test_fn(__local float *sSharedStorage, __global float *srcValues, __global uint *offsets, __global float *destBuffer, uint alignmentOffset )\n{\n int tid = get_global_id( 0 );\n sSha"..., headers=..., name="input.cl", triple="r600--", processor="verde", opts="",
address_spaces=..., optimization_level=@0x7fffffff21cc: 2, r_log=...) at llvm/invocation.cpp:255
#22 0x00007ffff6b140c8 in clover::compile_program_llvm (source=..., headers=..., ir=ir@entry=PIPE_SHADER_IR_NATIVE, target=..., opts=..., r_log=...)
at llvm/invocation.cpp:710
#23 0x00007ffff6b0a371 in clover::program::build (this=this@entry=0x23a0530, devs=..., opts=opts@entry=0x7ffff793dc0d "", headers=...)
at core/program.cpp:63
#24 0x00007ffff6af31c4 in clBuildProgram (d_prog=0x23a0538, num_devs=0, d_devs=0x0, p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)
at api/program.cpp:182

Does this look related? If so, let me know what other information you need to
try to debug this issue.

This could be related; I'm not sure.

I'm pretty sure that this commit is the cause of the regression.

r223801 works and r223810 does not, and I don't think any of the other
commits in that range could cause this.

It looks like a leak detection assertion, and I didn't need to change
that logic at all. `ValueMap` calls `MDNode::getTemporary()` and
`MDNode::deleteTemporary()` in the same ways it used to (and I didn't
touch the implementation of those).

Can you reproduce this with `llvm-link`? If so, that sounds like the
best place to start.

I can't reproduce this using llvm-link unfortunately. Any other ideas?

(Continuing via email, since Tom stepped away IRC.)

Tom, from the trace [1], it the problematic pointer (0x27e4c80) only shows
up once.

[1]: http://people.freedesktop.org/~tstellar/md-crash.out

That means that something *else* -- other than `MDNode::getTemporary()`
-- must be adding that address to garbage and failing to remove it.

I just dug into `LeakDetector::addGarbageObject()` and it stores *all*
calls to `addGarbage()` in the same place. There are a fair number of
these in the IR:

$ git grep -e addGarbageObject -w -- lib/IR/
lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);
lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);
lib/IR/Instruction.cpp: if (!P) LeakDetector::addGarbageObject(this);
lib/IR/Metadata.cpp: LeakDetector::addGarbageObject(N);

I think the next step is to identify who called `addGarbageObject()` with
the problematic address, and what the stack trace was.

The weird thing is, I also noticed a semantic change I made here
accidentally. `addGarbageObject()` has two overloads: `void*` and
`const Value*`. `MDNode::getTemporary()` used to match the latter, but
now it matches the former.

The weird part: all the other calls to `addGarbageObject()` look like they
send in a `Value *`.

Do you have any other calls to `addGarbageObject()`? Do they match
`void *`?

Also, what happens with the attached patch? (If this fixes your problem,
I think it's just papering over something...)

<0001-IR-Detect-Metadata-leaks-separately-from-generic-obj.patch>

Zalman also had a reproduction, and he's been able to track it down to
an `addGarbageObject()` call from `MachineBasicBlock`. It looks like
the MBB gets deallocated but `removeGarbageObject()` isn't called yet.

CC'ing him here so he can join the thread once he's gotten a little
further.

BTW, if this is blocking anyone, I can commit the patch I attached to
my previous email (or you can apply it locally). I think it's probably
the right thing eventually -- it improves the output when there *is* an
issue -- but I haven't committed it yet since it'll cover up the
problem.

I need to take off until later this evening. So far I’ve found the following:

The assertion failure I’m seeing is due to the to the addGarbageObject call in:
void ilist_traits::removeNodeFromList(MachineInstr *N)
in MachineBasicBlock.cpp . This is somewhat puzzling as the destructor for MachineBasicBlock calls removeGarbageObject so it should be impossible for new to return that pointer again. The only explanation I can come up with is that this class does not have a virtual destructor and the object in question is being deleted with a different type than MachineBasicBlock.

Also, changing removeGarbage in LeaksContext.h to:

void removeGarbage(const T* o) {
if (o == Cache)
Cache = nullptr; // Cache hit
else {
assert(Ts.count(o) == 1 && “Removing Object not in set!”);
Ts.erase(o);
}
}

causes the added assertion to fail in my code. I’m not sure if the leaks detector is supposed to disallow removing an object that was never added, but if so, it seems worth adding the assertion. (And if not, perhaps explicitly documenting that it is allowed to remove and object that was never added or has already been removed.)

And while I don’t expect anyone to pull and build Halide to repro this, here’s how to do it:

  1. Clone https://github.com/halide/Halide.git to a directory I’ll call “halide”.
  2. Symlink halide/llvm to your llvm build. (This may not be necessary as llvm-config may take care of everything.)
  3. Make sure llvm-config from the right development build is on your path.
  4. Type “make test_boundary_conditions” in the halide dir. (Or “make OPTIMIZE-g test_boundary_conditions” )

This should work on Linux or Mac. I’ve been debugging on Linux, but I expect this repros on Mac OS X as well.

Thanks,
-Z-

+zalman@google.com

Hi Duncan,

This patch plus another small change fixes the assertion failure for
me. With the patch alone, the void* overload of addGarbageObject()
was being used by MDNode::getTemporary(), so I had to cast the object as
an MDNode*:

diff --git a/lib/IR/Metadata.cpp b/lib/IR/Metadata.cpp
index cd5edd2..916d216 100644
--- a/lib/IR/Metadata.cpp
+++ b/lib/IR/Metadata.cpp
@@ -564,7 +564,7 @@ MDNode *MDNode::getMDNode(LLVMContext &Context,
ArrayRef<Metadata *> MDs,
MDNodeFwdDecl *MDNode::getTemporary(LLVMContext &Context,
                                     ArrayRef<Metadata *> MDs) {
   MDNodeFwdDecl *N = new (MDs.size()) MDNodeFwdDecl(Context, MDs);
- LeakDetector::addGarbageObject(N);
+ LeakDetector::addGarbageObject((MDNode*)N);
   return N;
}

I'm in favor of committing this.

-Tom

Sorry, after more extensive testing, this doesn't work. It looks like
you need to add const MDNode * overloads to addGarbageObject() adding
them for addGarbageObjectImpl() doesn't seem to work:

diff --git a/include/llvm/IR/LeakDetector.h
b/include/llvm/IR/LeakDetector.h
index e0b131e..b272eaf 100644
--- a/include/llvm/IR/LeakDetector.h
+++ b/include/llvm/IR/LeakDetector.h
@@ -79,6 +79,17 @@ struct LeakDetector {
#endif
   }

+ static void addGarbageObject(const MDNode *Object) {
+#ifndef NDEBUG
+ addGarbageObjectImpl(Object);
+#endif
+ }
+ static void removeGarbageObject(const MDNode *Object) {
+#ifndef NDEBUG
+ removeGarbageObjectImpl(Object);
+#endif
+ }

I committed:

r224058 = 966942da9e68b59c31ce770e7f94c55a63482c6b
r224060 = da75f7277e3a129aed8ef8aa4e0d84de40b76fd4
r224061 = f88e4c8e9171045454b2c8e05054c2af8da3fe4f

Let me know if somehow you're still hitting the problem.

r224061 removes leak detection entirely from `MachineInstr`. There aren't
any leaks to be had there, since they're allocated in a custom allocator.
They're just dropped away once `MachineFunction` is deleted.

@Zalman, thanks again for your help digging into this.

The assertion no longer fires in Halide with top-of-tree llvm. Thank you for the fix.

-Z-

The assertion no longer fires in Halide with top-of-tree llvm. Thank you
for the fix.

+1. Thanks for the quick fix.

-Tom

Hi Duncan,

I’m in the following situation for which this change caused an assertion failure:

Three modules, let’s say A B C
Two function, F in A, G in B

Now I CloneFunction F into B (call the new function F’) and inline F’ into G.
Now, for the problematic part, where I try to extract G (and all referenced values) into C:

upon encountering any debug node in the inlined code, it tries to clone the DISubprogram for F’, so it creates a temporary. Since that refers to F’, it’ll now go ahead and copy F’. However, here once again it tries to copy the DISubprogram, which now just uses the temporary value from above (this is fine). Unfortunately, right after, it calls resolveCycles on the debug info annotation, which crashes with

Assertion failed: (!isa(Op) && “Expected all forward declarations to be resolved”), function resolveCycles, file /Users/kfischer/julia/deps/llvm-svn/lib/IR/Metadata.cpp, line 459.

because we still have the temporary DISubprogram in there.

Any ideas what to do about this?

The logic I have between `MapValue(Metadata*)` and `MapValueImpl(Metadata*)`
was supposed to solve this. (Actually, those names are awful, I may try to
change them to `MapMetadata()`.)

The logical flow should be:

  - MapValue(Metadata*) calls MapValueImpl().
  - MapValueImpl() introduces a temporary.
  - MapValueImpl() recursively calls MapValueImpl().
  - MapValueImpl() resolves its temporary.
  - MapValue(Metadata*) calls resolveCycles().

There could be a bug here, but I just scanned the code and it seems to
match up. I suspect, rather, that there's a temporary node in the metadata
graph on entry to `CloneFunction()`; this isn't supported, and would cause
the same assertion to fire.

Try running the verifier right before calling `CloneFunction()`: I put a
check for this in `Verifier::visitMDNode()`.

Or maybe you're not using `MapValue()`, and doing your own thing? If that's
the case, you just need to do something similar with your logic!