How many acceptable failures from "make check-all" for production use?

When I build any program or library that comes with unit tests and I’m going to depend on that program or library for building my applications, my requisite is that zero failures from the unit tests is a must (for production use). I won’t use that tool or library for production use if a single unit test fails, because diagnosing bugs later can be a nightmare if the culprit was that third-party software.

Said this, when the official LLVM binaries are uploaded, I see from the mailing list that most of such binaries are accepted even when dozens of failures from “make check-all”. So, I’m assuming that these tests are not considered critical for production use.

Which obviously brings a question: how can we know that a LLVM build can be considered suitable for production use?

For example, I just built on MacOS Monterey a fully static universal (x86_64 + arm64) LLVM + Clang, compiler-rt, openmp, and all runtimes, and got 47 failures (which kinda worries me, because on a similar build of LLVM 7.0.0 that I did years ago, I got only 1 failure).

The results I got are:

Failed Tests (47):
  AddressSanitizer-Unit :: ./Asan-x86_64-calls-Test/AddressSanitizerInterface/DeathCallbackTest
  AddressSanitizer-Unit :: ./Asan-x86_64-inline-Test/AddressSanitizerInterface/DeathCallbackTest
  AddressSanitizer-x86_64-darwin :: TestCases/Posix/new_array_cookie_uaf_test.cpp
  AddressSanitizer-x86_64-darwin :: TestCases/large_func_test.cpp
  AddressSanitizer-x86_64-darwin :: TestCases/malloc_context_size.cpp
  AddressSanitizer-x86_64-darwin :: TestCases/scariness_score_test.cpp
  AddressSanitizer-x86_64-darwin :: TestCases/use-after-delete.cpp
  Clang :: Preprocessor/nonportable-include-with-hmap.c
  LeakSanitizer-Standalone-x86_64 :: TestCases/swapcontext.cpp
  SanitizerCommon-asan-x86_64-Darwin :: allocator_returns_null.cpp
  SanitizerCommon-asan-x86_64-Darwin :: max_allocation_size.cpp
  SanitizerCommon-lsan-x86_64-Darwin :: allocator_returns_null.cpp
  SanitizerCommon-lsan-x86_64-Darwin :: compress_stack_depot.cpp
  SanitizerCommon-lsan-x86_64-Darwin :: malloc_hook.cpp
  SanitizerCommon-lsan-x86_64-Darwin :: max_allocation_size.cpp
  ThreadSanitizer-x86_64 :: Darwin/
  ThreadSanitizer-x86_64 :: cxa_guard_acquire.cpp
  ThreadSanitizer-x86_64 :: static_init2.cpp
  ThreadSanitizer-x86_64 :: static_init4.cpp
  UBSan-AddressSanitizer-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
  UBSan-AddressSanitizer-x86_64 :: TestCases/TypeCheck/vptr-virtual-base-construction.cpp
  UBSan-AddressSanitizer-x86_64 :: TestCases/TypeCheck/vptr.cpp
  UBSan-Standalone-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
  UBSan-Standalone-x86_64 :: TestCases/TypeCheck/vptr-virtual-base-construction.cpp
  UBSan-Standalone-x86_64 :: TestCases/TypeCheck/vptr.cpp
  UBSan-ThreadSanitizer-x86_64 :: TestCases/TypeCheck/vptr-non-unique-typeinfo.cpp
  UBSan-ThreadSanitizer-x86_64 :: TestCases/TypeCheck/vptr-virtual-base-construction.cpp
  UBSan-ThreadSanitizer-x86_64 :: TestCases/TypeCheck/vptr.cpp
  libomp :: barrier/omp_barrier.c :: libcxx/memory/aligned_allocation_macro.compile.pass.cpp :: std/depr/depr.c.headers/stdlib_h.aligned_alloc.compile.pass.cpp :: std/input.output/iostream.objects/init.pass.cpp :: std/ :: std/ :: std/ :: std/ :: std/ :: std/ :: std/ :: std/ :: std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/allocate_shared.array.bounded.pass.cpp :: std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/allocate_shared.array.unbounded.pass.cpp :: std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/make_shared.array.bounded.pass.cpp :: std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/make_shared.array.unbounded.pass.cpp :: std/utilities/utility/mem.res/mem.res.monotonic.buffer/mem.res.monotonic.buffer.mem/allocate_overaligned_request.pass.cpp :: std/utilities/utility/mem.res/mem.res.pool/mem.res.pool.mem/sync_allocate_overaligned_request.pass.cpp :: std/utilities/utility/mem.res/mem.res.pool/mem.res.pool.mem/unsync_allocate_overaligned_request.pass.cpp

Testing Time: 3589.34s
  Skipped          :    47
  Unsupported      : 18923
  Passed           : 78429
  Expectedly Failed:   151
  Failed           :    47

BTW, this not my final build yet, because I’m going to build next all the projects (but libC), …but I don’t really know if these failures mean I shouldn’t use this for production, or if it could be considered “acceptable”.

Isn’t there a test suite whose purpose would be if the build can be trusted for production use?


I would consider whether the llvm binaries are intended to be “production ready” or not. As they are by their nature snapshots of the development at a certain time and there’s no commitment to iron out all failures in a release. Though we do our best.

It’s also not the stated responsibility of the people building those releases to go and fix all those things. I’ll take Armv7 as an example. We (Linaro, not speaking for llvm here) of course do our best to fix what we find and make sure that when a release branch is made, it’s as good as it can be. Ultimately though, the first priority is providing that snapshot and making sure it isn’t worse than last time.

Would we not upload a release if it was truly dire? Yes, we did that recently where we skipped a point release because we were waiting on fixes to be merged. The next build still had failures but vastly fewer and was functional.

We have also shipped Windows on Arm builds that many would not consider production ready due to missing features. Again the goal was to provide convenience not commit to a quality level.

By bringing your own definition of production, which I know is a hard question to answer.

If you were happy using the release of llvm x, you can probably use x+1 if the test results have not regressed on the platforms you consider important (I realise that is a chicken and the egg situation though, how does one choose x in the first place, but one has to start somewhere).

For Linux at least, I would start by looking at the major distros. If they ship clang x.y and don’t have a ton of their own patches, it’s probably good for most of what I do.

You will have to look at why they failed and decide if that bothers you. In my experience running buildbots, some tests are just sensitive to hardware resources. So running a buildbot on a server, fine, building a release on a limited local machine, you might get some contention issues.

Test failed because we ran out of threads? Probably not something to be worried about as long as it passed elsewhere.

In this specific case, perhaps there is something with universal builds and sanitizers running under emulation. I see that they are all x86_64 failures. It may be worth reporting as it could be an important issue.

If you want to do that you can either capture the whole log or there are ways to run tests individually. Then you can raises issues at Issues · llvm/llvm-project · GitHub.

Sort of. You can run the external GitHub - llvm/llvm-test-suite which contains whole programs and various conformance suites for C and Fortran. But again it’s just one part of whatever your definition of production ready is.

If you’d like to try that here is an example build bot that does it Buildbot, you can get the commands from the steps there.

I believe the release build script does run the test suite also.