buildbot failure in LLVM on clang-x86_64-debian-fnt

All,

This buildbot is getting lots of assertion failures in the test suite.
They were probably caused by my commit:

All,

This buildbot is getting lots of assertion failures in the test suite.
They were probably caused by my commit:


r151049 | foad | 2012-02-21 09:25:52 +0000 (Tue, 21 Feb 2012) | 6 lines
Changed paths:
M /llvm/trunk/lib/VMCore/LLVMContextImpl.h
M /llvm/trunk/lib/VMCore/Type.cpp

PR1210: make uniquing of struct and function types more efficient by
using a DenseMap and Talin’s new GeneralHash, avoiding the need for a
temporary std::vector on every lookup.

Patch by Meador Inge!


… but I can’t reproduce the failures on my machine. I’m also running
x86_64 Linux, I’ve checked out the same revision of
llvm+clang+testsuite as the buildbot, I’m doing a Release+Asserts
build.

Is anyone else seeing these failures? Any ideas how I can reproduce them here?

If the hashing is the culprit, there are many more variables to consider: different pointer values entering the hash can cause drastically different results.

The best way to spot this is often through scrutiny and direct testing of the hashing. We’ve already found one bug in the hashing library, so it could likely be tested significantly more thoroughly. I don’t have time tonight, but I can scrutinize the actual commit using the hashing library to see if we’re just putting bad data into it.

Either way, I would try speculatively reverting to ensure it is this patch. If so, we can keep experimenting with the patch, but it’s important to get the builders back. There are several others seeing the failure as well.

OK, I've reverted it. Duncan is letting me play on the build slave
machine, where I'm having more luck reproducing it. I think the
failures might depend on which version of GCC you use to compile LLVM.
Or something.

Cheers,
Jay.

There is definitely undefined behavior in the current implementation of the hashing logic that might be differently exploited by different optimizers. You brought up some of it in code review already.

I tested on x86_64 Fedora 16 with GCC 4.6.2 and OS X Lion with clang
3.0 and could not reproduce the problem. I will keep investigating.

-- Meador

I only get the failures with a Release build using GCC 4.4 (not 4.5 or
4.6). See this thread:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120220/137504.html

Thanks,
Jay.

Jay,

I can reproduce with -O2 and -O3. See
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/185

gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC)

The failures could be suppressed with -O2 -fno-strict-aliasing.
I am dubious to type punning in ADT/Hashing.h.

...Takumi