Unsupported MCJIT tests on ARM?

Hi folks,

Three MCJIT tests are failing on both our buildbots (check-all and self-host) and I’m not sure what’s the best way to fix it.

Some test passes, some not on { A9, A15 } x { Ubuntu 12.10, Ubuntu 12.04 }, the error is:

lli: /home/user/devel/llvm/src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:307: void llvm::RuntimeDyldELF::resolveARMRelocation(const llvm::SectionEntry&, uint64_t, uint32_t, uint32_t, int32_t): Assertion `(*TargetPtr & 0x000F0FFF) == 0’ failed.
Stack dump:
0. Program arguments: /home/user/devel/llvm/build/bin//lli -use-mcjit -remote-mcjit /home/user/devel/llvm/src/test/ExecutionEngine/MCJIT/test-global-init-nonzero-remote.ll

Since the test is marked as XFAIL on ARM, I suspect this is expected, and I’m surprised that it stopped failing on some runs and not others. Are they meant to be supported? Are they meant to be passing? Failing?

Most tests pass, and I understand MCJIT is being worked on for ARM, so there’s no reason to mark the whole directory unsupported on ARM, but would be good if the flaky behaviour was marked unsupported until properly investigated.

I tried to add “UNSUPPORTED: arm” according to the docs, but it seems it’s not making much difference. Any ideas? Should we move the “remote” tests to their own directory and mark that as unsupported? But, to be honest, if it’s easy to fix, I’d rather someone fixed it for good.

Thanks,
–renato

Hi Renato,

/home/user/devel/llvm/src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:307:
void llvm::RuntimeDyldELF::resolveARMRelocation(const llvm::SectionEntry&,
uint64_t, uint32_t, uint32_t, int32_t): Assertion `(*TargetPtr & 0x000F0FFF)
== 0' failed.

I think there's a discussion going on about this already at
http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/145699 (and
surrounding).

According to Amara that assertion was a bit of paranoia so we'd know
if someone tried emitting .rel relocations and sending the result
through MCJIT. However, now we routinely re-relocate using explicit
addends so as he says it can probably just be removed.

Cheers.

Tim.

Hi Tim,

Sorry, I saw that thread but earlier on, and Amara's answer got flushed
away.

Amara, what you're saying is that it should be ok to remove the warning and
mark the tests as passing on ARM, right?

I was in doubt if the test had to pass (despite the warning) or it was
passing because something was wrong.

I'll prepare a patch and will send for your review.

cheers,
--renato

So, it seems David beat me to it, and the assert has already been removed, but the failures are still inconsistent.

A9-check-all, compiled with GCC:
Tests XPASS:

LLVM :: ExecutionEngine__MCJIT__test-common-symbols-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-global-init-nonzero-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-ptr-reloc-remote.ll
Unit-tests pass.

A9-self-host, compiled with GCC:
Tests XPASS:

LLVM :: ExecutionEngine__MCJIT__test-data-align-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-ptr-reloc-remote.ll
Unit-tests pass.

A9-self-host, compiled with Clang:
Tests XPASS::

LLVM :: ExecutionEngine__MCJIT__test-common-symbols-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-data-align-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-global-init-nonzero-remote.ll
LLVM :: ExecutionEngine__MCJIT__test-ptr-reloc-remote.ll
Unit-tests pass.

Both A9 bots are running the same Ubuntu (13.03), with the same GCC (4.7.2), and are the same hardware (Panda ES RevB), so it really strikes me as odd that we have such a different behaviour between them.

ARM920, compiled with GCC:
All tests pass.
Unit-test fail:

MCJITTest.return_global

If I take out the XFAIL on those tests, some bots will fail, others will pass. This is not the solution. It either pass on all, or none. All errors started with David’s patch, so I’m assuming it was something enabled by that. It could be a dormant, unrelated bug to his patch, but it was certainly activated by it.

cheers,
–renato

Both A9 bots are running the same Ubuntu (13.03), with the same GCC (4.7.2),
and are the same hardware (Panda ES RevB), so it really strikes me as odd
that we have such a different behaviour between them.

Hmm. I'll see what I can do on my tablet (not tried building LLVM
there before so it could take a while), it seems like there are *some*
failures everywhere. If we're lucky it'll just be a matter of fixing
the PR16013 that David reported.

Tim.

Remote mcjit has never worked on arm. By removing the assertion, we’re no longer seeing deterministic failures. I think by fixing PR16013 we’ll enable support for remote mcjit on arm, and can then remove the xfails altogether.

In the meantime, marking the tests as unsupported seems to be the best option. I don’t have a machine for a few days but I can take a look at finally cleaning up resolveRelocations() next week.

Amara

Both A9 bots are running the same Ubuntu (13.03), with the same GCC (4.7.2),
and are the same hardware (Panda ES RevB), so it really strikes me as odd
that we have such a different behaviour between them.

Hmm. I’ll see what I can do on my tablet (not tried building LLVM
there before so it could take a while), it seems like there are some
failures everywhere. If we’re lucky it’ll just be a matter of fixing
the PR16013 that David reported.

Tim.

As opposed to what the docs say,

; UNSUPPORTED: arm

has no effect. The only way I know how to make it work is to move them to a
separate dir and mark the dir as unsupported for ARM, but that's not a good
temporary solution.

--renato

Thanks for looking at this Tim. On a pandaboard, at least with the Release+Asserts config I tried, those tests do complain on stderr but llvm-lit thinks they've passed/expected fails, they don't actually count as fails like they do on the buildbot.

Regarding solving PR16013 that looks like a relatively tractable job (on both 32-bit arm and aarch64) IF you're already familiar with the implications of what the instruction set does; unfortunately that set doesn't include me...

Cheers,
Dave

Hi David,

I'll move all remote tests to a dir and mark them unsupported for now. What
about the ExecutionEngine unittest below? Can you fix it? Or disable it on
ARM?

http://lab.llvm.org:8011/builders/llvm-armv5-linux/builds/298/steps/test-llvm/logs/LLVM-Unit%20%3A%3A%20ExecutionEngine__MCJIT____wd__buildbot__llvm-armv5-linux__llvm__unittests__ExecutionEngine__MCJIT__Debug%2BAsserts__MCJITTests__MCJITTest.return_global

cheers,
--renato

I'll move all remote tests to a dir and mark them unsupported for now. What
about the ExecutionEngine unittest below? Can you fix it? Or disable it on
ARM?

If I read your summary correctly it's only failing armv5. I wouldn't
want to stop testing it on everything else because of an ancient box
like that.

Cheers.

Tim

If I read your summary correctly it's only failing armv5. I wouldn't

want to stop testing it on everything else because of an ancient box

like that.

Well, if we don't care about ARMv5, then, please let's turn that bot off.
We either have a bot green or none at all.

We can selectively disable LIT tests on "armv5" or "armv7" and I've done so
before because it was clear that ARMv5 would never support things like the
old JIT. The bot is green, and if there isn't a very strong reason against
it, I'd like it to remain green. Disabling an unimplemented feature on that
hardware seems like the correct thing to do, to me.

Since that's a unit-test, you probably have access to the target triple and
it should be trivial to disable it on "armv5", maybe even *all* JIT tests,
since we'll never implement it for that architecture anyway.

Either that, or let's turn the bot off for good.

cheers,
--renato