Hi all. I made a PR to the LLVM repo but it had to be reverted to due a failure during a hwasan check from the sanitizer-aarch64-linux-bootstrap-hwasan
buildbot.
I want to reproduce the error locally to find a fix for the issue, but I do not have access to a system that supports hwasan.
One potential way around this issue is to reland the PR a few times, each time trying a different fix, but I would rather not disrupt the entire build pipeline by doing experiments in this fashion.
Therefore I ask: Is there a way to schedule a build on the sanitizer-aarch64-linux-bootstrap-hwasan
buildbot so I can test my fixes before relanding my PR?
The test in particular is for catching an error message made by a call to report_fatal_error
in llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
for a test located in llvm/test/CodeGen/SPIRV/opencl
.
Three fixes I want to try are:
- Add the
--crash
argument to the not
command in the tests: ; RUN: not --crash ...
- Prepend
HWASAN_OPTIONS="abort_on_error=0"
to the tests: ; RUN: HWASAN_OPTIONS="abort_on_error=0" ...
- Mark
sanitizer-aarch64-linux-bootstrap-hwasan
as unsupported by the test: ; UNSUPPORTED: sanitizer-aarch64-linux-bootstrap-hwasan
I noticed this accidentally, but it’s better to create PR and add to review whoever did revert, and bot owner (me).
For reland PR I suggest to split pure “revert of revert” patch, and then patches with fixes. Easier to review.
report_fatal_error
should not be a problem, I guess there are tests which do that.
It should not trigger HWASAN. If you don’t have a guess why it fails, I can connect to the bot and get more information.
To your main questing:
- Yes, we can try PR before submitting, but the commit needs to be in upstream git, e.g. in users/ branch. I don’t know if you have permission to create those.
- It’s OK to land without try if you have a good guess, we can revert/reland again if needed.
- If you really want, Debian on ARM on any common cloud service you should be able to reproduce the issue.
Thanks for the reply! I will create a draft reland PR and tag you in it.
My colleagues and I don’t really have a good guess as to why the test is failing. Our current guess is the hwasan check is being aborted when an error is reported via report_fatal_error
in the test.
I do not yet have commit access, but my colleague does (the person who reverted the PR). Perhaps he could commit the changes to upstream git for you to test the PR on the buildbot?
I gained access to an aarch64 linux box and managed to reproduce the hwasan failure I saw w.r.t. the reflect-error.ll
test.
Solutions I found to work:
- Using
not --crash
in the RUN lines of the test, and changing report_fatal_error(..., /*GenCrashDiag=*/false)
to report_fatal_error(..., /*GenCrashDiag=*/true)
- Adding
UNSUPPORTED: hwasan
to the test so that it is ignored when running under hwasan
- Defining
HWASAN_OPTIONS="abort_on_error=0"
in the environment variables, or modifying buildbot_functions.sh
in the llvm-zorg repo to replace export HWASAN_OPTIONS="abort_on_error=1"
with export HWASAN_OPTIONS="abort_on_error=0"
What I noticed is that in buildbot_functions.sh
, hwasan is the only sanitizer that does export HWASAN_OPTIONS="abort_on_error=1"
. Why is this the case?
HWASAN_OPTIONS="abort_on_error=0"
Do you know why there is error?
There is no reason for llc
crash with HWASAN, if this is the case.
I believe I fixed this leak recently.
Can you create a PR and assign to me? I will check again.
==llc==3790505==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 10 byte(s) in 1 object(s) allocated from:
#0 0xae9fa5562824 in malloc /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_allocation_functions.cpp:147:3
#1 0xe18449add62c in strdup (/lib/aarch64-linux-gnu/libc.so.6+0x9d62c) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
#2 0xae9facb2ebfc in FileToRemoveList /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55
#3 0xae9facb2ebfc in insert /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:120:37
#4 0xae9facb2ebfc in llvm::sys::RemoveFileOnSignal(llvm::StringRef, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:446:3
#5 0xae9faca5a9d0 in llvm::ToolOutputFile::ToolOutputFile(llvm::StringRef, std::__1::error_code&, llvm::sys::fs::OpenFlags) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/ToolOutputFile.cpp:42:7
#6 0xae9fa55b3ccc in make_unique<llvm::ToolOutputFile, llvm::cl::opt<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, false, llvm::cl::parser<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > &, std::__1::error_code &, llvm::sys::fs::OpenFlags &, 0> /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/libcxx_install_hwasan/include/c++/v1/__memory/unique_ptr.h:767:30
#7 0xae9fa55b3ccc in GetOutputStream /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:312:16
#8 0xae9fa55b3ccc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:616:7
#9 0xae9fa55b0070 in main /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
#10 0xe18449a684c0 (/lib/aarch64-linux-gnu/libc.so.6+0x284c0) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
#11 0xe18449a68594 in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x28594) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
#12 0xae9fa555a4ac in _start (/home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/llc+0x567a4ac)
Yea, that PR has more recent discussion. Particularly I found that the problem is related to using -o /dev/null
with llc when it encounters a report_fatal_error(..., false)
that results in an exit(1)
. Changing -o /dev/null
to -o -
makes hwasan no longer fail when running the test.