Forcing a build on sanitizer-aarch64-linux-bootstrap-hwasan to test fixes

Hi all. I made a PR to the LLVM repo but it had to be reverted to due a failure during a hwasan check from the sanitizer-aarch64-linux-bootstrap-hwasan buildbot.
I want to reproduce the error locally to find a fix for the issue, but I do not have access to a system that supports hwasan.
One potential way around this issue is to reland the PR a few times, each time trying a different fix, but I would rather not disrupt the entire build pipeline by doing experiments in this fashion.

Therefore I ask: Is there a way to schedule a build on the sanitizer-aarch64-linux-bootstrap-hwasan buildbot so I can test my fixes before relanding my PR?

The test in particular is for catching an error message made by a call to report_fatal_error in llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp for a test located in llvm/test/CodeGen/SPIRV/opencl.

Three fixes I want to try are:

  1. Add the --crash argument to the not command in the tests: ; RUN: not --crash ...
  2. Prepend HWASAN_OPTIONS="abort_on_error=0" to the tests: ; RUN: HWASAN_OPTIONS="abort_on_error=0" ...
  3. Mark sanitizer-aarch64-linux-bootstrap-hwasan as unsupported by the test: ; UNSUPPORTED: sanitizer-aarch64-linux-bootstrap-hwasan

I noticed this accidentally, but it’s better to create PR and add to review whoever did revert, and bot owner (me).

For reland PR I suggest to split pure “revert of revert” patch, and then patches with fixes. Easier to review.

report_fatal_error should not be a problem, I guess there are tests which do that.
It should not trigger HWASAN. If you don’t have a guess why it fails, I can connect to the bot and get more information.

To your main questing:

  1. Yes, we can try PR before submitting, but the commit needs to be in upstream git, e.g. in users/ branch. I don’t know if you have permission to create those.
  2. It’s OK to land without try if you have a good guess, we can revert/reland again if needed.
  3. If you really want, Debian on ARM on any common cloud service you should be able to reproduce the issue.

Thanks for the reply! I will create a draft reland PR and tag you in it.

My colleagues and I don’t really have a good guess as to why the test is failing. Our current guess is the hwasan check is being aborted when an error is reported via report_fatal_error in the test.

I do not yet have commit access, but my colleague does (the person who reverted the PR). Perhaps he could commit the changes to upstream git for you to test the PR on the buildbot?

I gained access to an aarch64 linux box and managed to reproduce the hwasan failure I saw w.r.t. the reflect-error.ll test.

Solutions I found to work:

  • Using not --crash in the RUN lines of the test, and changing report_fatal_error(..., /*GenCrashDiag=*/false) to report_fatal_error(..., /*GenCrashDiag=*/true)
  • Adding UNSUPPORTED: hwasan to the test so that it is ignored when running under hwasan
  • Defining HWASAN_OPTIONS="abort_on_error=0" in the environment variables, or modifying buildbot_functions.sh in the llvm-zorg repo to replace export HWASAN_OPTIONS="abort_on_error=1" with export HWASAN_OPTIONS="abort_on_error=0"

What I noticed is that in buildbot_functions.sh, hwasan is the only sanitizer that does export HWASAN_OPTIONS="abort_on_error=1". Why is this the case?

HWASAN_OPTIONS="abort_on_error=0"
Do you know why there is error?

There is no reason for llc crash with HWASAN, if this is the case.

I believe I fixed this leak recently.

Can you create a PR and assign to me? I will check again.

==llc==3790505==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 10 byte(s) in 1 object(s) allocated from:
    #0 0xae9fa5562824 in malloc /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_allocation_functions.cpp:147:3
    #1 0xe18449add62c in strdup (/lib/aarch64-linux-gnu/libc.so.6+0x9d62c) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #2 0xae9facb2ebfc in FileToRemoveList /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55
    #3 0xae9facb2ebfc in insert /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:120:37
    #4 0xae9facb2ebfc in llvm::sys::RemoveFileOnSignal(llvm::StringRef, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:446:3
    #5 0xae9faca5a9d0 in llvm::ToolOutputFile::ToolOutputFile(llvm::StringRef, std::__1::error_code&, llvm::sys::fs::OpenFlags) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/ToolOutputFile.cpp:42:7
    #6 0xae9fa55b3ccc in make_unique<llvm::ToolOutputFile, llvm::cl::opt<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, false, llvm::cl::parser<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > &, std::__1::error_code &, llvm::sys::fs::OpenFlags &, 0> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/libcxx_install_hwasan/include/c++/v1/__memory/unique_ptr.h:767:30
    #7 0xae9fa55b3ccc in GetOutputStream /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:312:16
    #8 0xae9fa55b3ccc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:616:7
    #9 0xae9fa55b0070 in main /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #10 0xe18449a684c0  (/lib/aarch64-linux-gnu/libc.so.6+0x284c0) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #11 0xe18449a68594 in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x28594) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #12 0xae9fa555a4ac in _start (/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llc+0x567a4ac)

Let’s continue there Reland "[HLSL] Implement the reflect HLSL function" by Icohedron · Pull Request #125599 · llvm/llvm-project · GitHub ?

Yea, that PR has more recent discussion. Particularly I found that the problem is related to using -o /dev/null with llc when it encounters a report_fatal_error(..., false) that results in an exit(1). Changing -o /dev/null to -o - makes hwasan no longer fail when running the test.