Remaining Compiler-RT failures in ARM

Folks,

As of this run:

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/746

There are three classes of failures that need fixing before we get the
bot green:

1. AddressSanitizer.BuiltinLongJmpTest Unit Test

Two configurations fail:
* Asan-arm-inline-Test
* Asan-arm-with-calls-Test

I wonder what's the best way to run it individually and reduce the
error. I'm not very proficient with the unit tests, and I'd be happy
with some documentation on how to reduce them.

2. Illegal instruction

* AddressSanitizer-arm-linux :: TestCases/mmap_limit_mb.cc
* UndefinedBehaviorSanitizer-Standalone :: TestCases/Misc/bounds.cpp

That board doesn't have NEON. It's uncommon for an ARMv7 SoC to not
have it, but some don't (for example, NVidia Tegra3), so we can't
assume they all have.

The CMake option clearly had "-mfpu=vfpv3" on both C and C++ flags and
the tests should honour that. This is a wider problem, and the
test-suite buildbots also don't honour it and fail on non-NEON ARMv7
boards.

I'm not an expert in CMake and even less so in Compiler-RT's setup for
the tests, so pointers on how to fix this would be very welcome.

3. UndefinedBehaviorSanitizer-Standalone :: TestCases/Misc/missing_return.cpp

This test fails on Standalone and pass on AddressSanitizer mode,
making it impossible to mark it as XFAIL. Before making XFAIL more
powerful, I'd rather find the problem and try to solve it.

The file checks for two copies of missing_return.cpp, one for f() and
one for main(), but on ARM's stack trace, there's only one:

$ UBSAN_OPTIONS=print_stacktrace=1 ./missing_return.cpp.tmp
.../compiler-rt/test/ubsan/TestCases/Misc/missing_return.cpp:6:5:
runtime error: execution reached the end of a value-returning function
without returning a value

    #0 0x1c7cf in f()
.../compiler-rt/test/ubsan/TestCases/Misc/missing_return.cpp:6:9
    #1 0x195cf in __ubsan::ScopedReport::~ScopedReport()
.../compiler-rt/lib/ubsan/ubsan_diag.cc:341
    #2 0x1ae79 in handleMissingReturnImpl(__ubsan::UnreachableData*,
__ubsan::ReportOptions) [clone .constprop.7]
.../compiler-rt/lib/ubsan/ubsan_handlers.cc:254
    #3 0x1b31f in __ubsan_handle_missing_return
.../compiler-rt/lib/ubsan/ubsan_handlers.cc:259

Maybe the stack is not deep enough? Is it really necessary to get for
main in the stack trace?

cheers,
--renato

Folks,

As of this run:

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/746

There are three classes of failures that need fixing before we get the
bot green:

1. AddressSanitizer.BuiltinLongJmpTest Unit Test

Two configurations fail:
* Asan-arm-inline-Test
* Asan-arm-with-calls-Test

I wonder what's the best way to run it individually and reduce the
error. I'm not very proficient with the unit tests, and I'd be happy
with some documentation on how to reduce them.

2. Illegal instruction

* AddressSanitizer-arm-linux :: TestCases/mmap_limit_mb.cc
* UndefinedBehaviorSanitizer-Standalone :: TestCases/Misc/bounds.cpp

That board doesn't have NEON. It's uncommon for an ARMv7 SoC to not
have it, but some don't (for example, NVidia Tegra3), so we can't
assume they all have.

The CMake option clearly had "-mfpu=vfpv3" on both C and C++ flags and
the tests should honour that. This is a wider problem, and the
test-suite buildbots also don't honour it and fail on non-NEON ARMv7
boards.

I'm not an expert in CMake and even less so in Compiler-RT's setup for
the tests, so pointers on how to fix this would be very welcome.

Are you saying that CMAKE_C_FLAGS from LLVM tree are not used in compiler-rt?

This is way too complicated. We've got
COMPILER_RT_TEST_COMPILER_CFLAGS but they don't seem to be set in
non-standalone build ever. Alexey, do we simply miss an assignment
somewhere, or did you have something else in mind?

Ok, I have changed the buildbot to add the new CMake flag, whenever
that restarts, we'll see what that fixes.

cheers,
--renato

So, problem 3 was fixed by avoiding the check for main as well as the
noreturn failure, and problem 2 will hopefully be fixed by adding the
RT_COMPILER_FLAGS to CMake, which has gone in but needs a server
restart to be sure. That leaves us with problem 1.

1. AddressSanitizer.BuiltinLongJmpTest Unit Test

Two configurations fail:
* Asan-arm-inline-Test
* Asan-arm-with-calls-Test

I wonder what's the best way to run it individually and reduce the

What I get here is quite odd: "Illegal Instruction", which made me
believe it was the NEON problem like 2, but it's not. It seems to be
related to setjump/longjump routines.

Asan-arm-with-calls-Test:
   0x0014b4a4 in BuiltinLongJmpFunc1(__jmp_buf_tag*) () at
/work/llvm/src/compiler-rt/lib/asan/tests/asan_test.cc:580
   580 __builtin_longjmp((void**)buf, 1);

   0x0014b494 <+192>: andeq r1, r0, r4, asr #23
   0x0014b498 <+196>: andseq r8, r2, r12, lsl #24
   0x0014b49c <+200>: andeq r1, r0, r12, ror r7
   0x0014b4a0 <+204>: ; <UNDEFINED> instruction: 0x00128bd0
=> 0x0014b4a4 <+208>: ; <UNDEFINED> instruction: 0xfff99bda

Asan-arm-inline-Test:

   0x001d8e7c in BuiltinLongJmpFunc1(__jmp_buf_tag*) () at
/work/llvm/src/compiler-rt/lib/asan/tests/asan_test.cc:580
   580 __builtin_longjmp((void**)buf, 1);

   0x001d8e6c <+248>: andeq r1, r0, r4, asr #23
   0x001d8e70 <+252>: andseq r0, r10, r12, ror #4
   0x001d8e74 <+256>: andeq r1, r0, r12, ror r7
   0x001d8e78 <+260>: andseq r0, r10, r0, lsr r2
=> 0x001d8e7c <+264>: ; <UNDEFINED> instruction: 0xfffa45ba

Trying to decode those hex numbers as instructions, the only one that
had any sense (in ARM, Thumb, x86 or x86_64) was 0x00128bd0 (as
little-endian), but I may be getting this wrong. However, not even GDB
recognised those instructions, so I'm guessing it's a bug in the jump
library itself.

Looking for the implementation in Compiler-RT or LLVM, I haven't found
much aside from sanitizer interceptors. Where is this done?

Any ideas?

cheers,
--renato

For context, the code is:

   0x001d8e5c <+232>: mov r0, r4
   0x001d8e60 <+236>: bl 0x2ec4f0 <____asan_report_store4_veneer>
   0x001d8e64 <+240>: str r0, [r4]
   0x001d8e68 <+244>: bl 0x2ec488 <____asan_handle_no_return_veneer>
   0x001d8e6c <+248>: andeq r1, r0, r4, asr #23
   0x001d8e70 <+252>: andseq r0, r10, r12, ror #4
   0x001d8e74 <+256>: andeq r1, r0, r12, ror r7
   0x001d8e78 <+260>: andseq r0, r10, r0, lsr r2
=> 0x001d8e7c <+264>: ; <UNDEFINED> instruction: 0xfffa45ba

Inspecting the whole function, there isn't a jump past 0x001d8e68 (the
final branch), so it's only possible for the PC to be there if
____asan_handle_no_return_veneer has indeed returned.

Investigating it further, it seems that the
____asan_handle_no_return_veneer *does* return:

  0x00099c72 <+106>: ldr.w pc, [sp], #4

Where is that implemented?

cheers,
--renato

Could this be some kind of linker-generated compatibility magic?

I'm not sure. Searching for "____asan_handle_no_return_veneer" on
Google gets me this thread. :slight_smile:

I'm tempted to disable that test on ARM+Linux, since we use EHABI
instead of SjLj... At least for now...

--renato

Could this be some kind of linker-generated compatibility magic?

I'm not sure. Searching for "____asan_handle_no_return_veneer" on
Google gets me this thread. :slight_smile:

Sounds like an arm-thumb interworking veneer, generated by the linker... the
real function should be called 'asan_handle_no_return' (with some number of '_'
prefixing it. I don't remember how many get added).

Jon

It is a veneer which has just a jump and a word after it, which points
to a place in memory that had which I believe was the implementation
of the asan check.

I was wondering if the no_return on that veneer was meant to jump to a
no_return function or just that the veneer itself doesn't return
(which would be silly).

It's possible that the asan check has a no_return attribute but for
some reason it returns? Wouldn't the compiler warn/err on that?

--renato

Sounds like an arm-thumb interworking veneer, generated by the linker... the
real function should be called 'asan_handle_no_return' (with some number of '_'
prefixing it. I don't remember how many get added).

It is a veneer which has just a jump and a word after it, which points
to a place in memory that had which I believe was the implementation
of the asan check.

Yeah, that's what I would expect :slight_smile:

I was wondering if the no_return on that veneer was meant to jump to a
no_return function or just that the veneer itself doesn't return
(which would be silly).

The function is "__asan_handle_no_return()", declared in
./lib/asan/asan_interface_internal.h, implemented in ./lib/asan/asan_rtl.cc. "no
return" has nothing to do with the semantics of the asan handler itself (nor the
veneer), rather it says something about the condition that the handler cleans up.

Cheers,
Jon

Ok, I looked through and can't see any obvious problem, and it seems
is also occurring in PPC, so I updated the bug with a link to this
thread, so whomever tries to fix this in the future gets a head start.
Marking the test non-ARM.

cheers,
--renato