JIT on armhf

I'm using the Debian LLVM package to try and do JIT on a Linux armhf
device. Unfortunately it seems to be generating armel code rather than
armhf code, and since the ABIs don't match nothing works.

I've tried overriding the triple to arm-unknown-linux-gnueabihf and
arm-linux-gnueabihf (via module->setTargetTriple), and while the triples
are accepted, the actual generated code doesn't change with either.

Unfortunately I don't have enough RAM on any of my ARM boxes to build
LLVM from source, so I can't check to see whether this is a Debian
misconfiguration or an intrinsic LLVM issue.

Before I start go filing bugs, does anyone know if the LLVM 3.2 JIT
actually works on an armhf device?

I've tried overriding the triple to arm-unknown-linux-gnueabihf and
arm-linux-gnueabihf (via module->setTargetTriple), and while the triples
are accepted, the actual generated code doesn't change with either.

Hi David,

If you set the triple to arm it won't help, since it'll default to ARMv4
which doesn't have hard float.

Try setting armv7a-unknown-linux-gnueabihf and see if it works better.

Before I start go filing bugs, does anyone know if the LLVM 3.2 JIT

actually works on an armhf device?

JIT was never the forte of ARM and I haven't tried yet, but I doubt it'll
be any Debian misconfiguration. The whole architecture configuration is a
bit odd...

cheers,
--renato

Renato Golin wrote:
[...]

Try setting armv7a-unknown-linux-gnueabihf and see if it works better.

No, that doesn't work either.

[...]

JIT was never the forte of ARM and I haven't tried yet, but I doubt
it'll be any Debian misconfiguration. The whole architecture
configuration is a bit odd...

Debian's clang packages are totally broken on armhf --- the compiler
emits a confused warning about the platform being unrecognised, and then
generates softfloat code --- so I was wondering about LLVM itself. (See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=693208; although I
notice that the bug seems to have been resolved without being fixed.
clang 3.2 seems to have hit experimental, but that fails in a different
way...)

If it makes any difference, I'm not using the just-in-time part of the
JIT, as it were. I have lazy compilation turned off and have a model
where the entire script is compiled into IR code and then to machine
code when my app starts.

See the init() method here:

https://cowlark.com/calculon/artifact/1e496bfd00104bd392b8da9dece45156dffe3039

Debian's clang packages are totally broken on armhf --- the compiler
emits a confused warning about the platform being unrecognised, and then
generates softfloat code --- so I was wondering about LLVM itself.

I'm using Ubuntu on Pandas and Chromebooks and LLVM itself behaves well,
with the right set of command line options. Though, I remember that I also
had hard-float issues when compiling to IR and then to object, but once I
started using the full options that Clang provides by default, things
started working.

Can you paste the result of a "clang -v -mcpu=CPU file.c" on your box? I
want to see what are the arguments and the assembler/linker it's choosing
to use. What CPU are we talking about?

If that works, it's possible that you'll need to set the flags Clang is
doing by default on your front-end, too.

If it makes any difference, I'm not using the just-in-time part of the

JIT, as it were. I have lazy compilation turned off and have a model
where the entire script is compiled into IR code and then to machine
code when my app starts.

This is very similar to what I did in my toy compiler, but I didn't use ARM
at all. However, that should not have made any difference.

cheers,
--renato

Hi David,

For ARM, you will need to use the MCJIT ExecutionEngine as the legacy
one is broken for ARM. (call EngineBuilder::setUseMCJIT()).

When creating your TargetOptions, setting FloatABIType to
FloatABI::Hard should trigger codegen for the correct ABI.

Amara

For ARM, you will need to use the MCJIT ExecutionEngine as the legacy
one is broken for ARM. (call EngineBuilder::setUseMCJIT()).

Also remember to include the correct MCJIT headers not the JIT one's:
calling setUseMCJIT() with the old JIT headers are the only ones being
included just constructs an old JIT, it doesn't spit out a helpful warning
like "I can't build an MCJIT JIT for you". (This bit me until Amara pointed
it out.)

Cheers,
Dave

[...]

Can you paste the result of a "clang -v -mcpu=CPU file.c" on your box? I
want to see what are the arguments and the assembler/linker it's
choosing to use. What CPU are we talking about?

The box itself is an Allwinner A10; armv7l. /proc/cpuinfo says it's got
swp half thumb fastmult vfp edsp neon vfpv3.

I've been unable to find any values for CPU which are accepted (it just
says 'unknown target CPU'. I've tried arm, armv7, armv7a, armv7l, arm7,
armv4t... Any suggestions? Is there a way to get clang and llc to emit a
list of what triples they support?

Since I posted my message I notice that clang 3.2 has hit Debian
experimental. This no longer produces the warning about an unrecognised
platform, but it still generates softfloat code --- I've compared
clang's output and gcc, and gcc produces hardfloat code. (Incidentally,
I was wrong earlier about clang 3.2 failing. Its output causes ld to
produce an assertion message, and it didn't occur to me then to look to
see whether it had actually created a binary or not.)

Here's what clang 3.2 says with 'clang -v -S -O3 test.c':

Debian clang version 3.2-1~exp3 (tags/RELEASE_32/final) (based on LLVM 3.2)
Target: arm-unknown-linux-gnueabihf
Thread model: posix
"/usr/bin/clang" -cc1 -triple armv4t-unknown-linux-gnueabihf -S
-disable-free -disable-llvm-verifier -main-file-name test.c
-mrelocation-model static -mdisable-fp-elim -fmath-errno
-mconstructor-aliases -fuse-init-array -target-abi aapcs-linux
-target-cpu arm7tdmi -mfloat-abi hard -target-linker-version 2.22
-momit-leaf-frame-pointer -v -coverage-file
/home/dg/shared/workspace/calculon/test.s -resource-dir
/usr/bin/../lib/clang/3.2 -fmodule-cache-path
/var/tmp/clang-module-cache -internal-isystem /usr/local/include
-internal-isystem /usr/bin/../lib/clang/3.2/include -internal-isystem
/usr/include/clang/3.2/include/ -internal-externc-isystem
/usr/include/arm-linux-gnueabihf -internal-externc-isystem
/usr/include/arm-linux-gnueabihf -internal-externc-isystem /usr/include
-O3 -fno-dwarf-directory-asm -fdebug-compilation-dir
/home/dg/shared/workspace/calculon -ferror-limit 19 -fmessage-length 80
-mstackrealign -fno-signed-char -fobjc-runtime=gcc
-fdiagnostics-show-option -fcolor-diagnostics -o test.s -x c test.c

Clang 3.1:

Debian clang version 3.1-8 (branches/release_31) (based on LLVM 3.1)
Target: arm-unknown-linux-gnueabihf
Thread model: posix
clang: warning: unknown platform, assuming -mfloat-abi=soft
"/usr/bin/clang" -cc1 -triple armv4t-unknown-linux-gnueabihf -S
-disable-free -disable-llvm-verifier -main-file-name test.c
-mrelocation-model static -mdisable-fp-elim -mconstructor-aliases
-target-abi apcs-gnu -target-cpu arm7tdmi -msoft-float -mfloat-abi soft
-target-feature +soft-float -target-feature +soft-float-abi
-target-feature -neon -target-linker-version 2.22
-momit-leaf-frame-pointer -v -coverage-file test.s -resource-dir
/usr/bin/../lib/clang/3.1 -fmodule-cache-path
/var/tmp/clang-module-cache -internal-isystem /usr/local/include
-internal-isystem /usr/bin/../lib/clang/3.1/include -internal-isystem
/usr/include/clang/3.1/include/ -internal-externc-isystem
-internal-externc-isystem /usr/include/arm-linux-gnueabihf
-internal-externc-isystem /usr/include/arm-linux-gnueabihf
-internal-externc-isystem /usr/include -O3 -fno-dwarf-directory-asm
-fdebug-compilation-dir /home/dg/shared/workspace/calculon -ferror-limit
19 -fmessage-length 80 -mstackrealign -fno-signed-char -fgnu-runtime
-fobjc-runtime-has-arc -fobjc-runtime-has-weak -fobjc-fragile-abi
-fdiagnostics-show-option -fcolor-diagnostics -o test.s -x c test.c

And here's clang 3.0:

Debian clang version 3.0-6 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: arm-unknown-linux-gnueabihf
Thread model: posix
clang: warning: unknown platform, assuming -mfloat-abi=soft
"/usr/bin/clang" -cc1 -triple armv4t-unknown-linux-gnueabihf -S
-disable-free -disable-llvm-verifier -main-file-name test.c
-mrelocation-model static -mdisable-fp-elim -mconstructor-aliases
-target-abi apcs-gnu -target-cpu arm7tdmi -msoft-float -mfloat-abi soft
-target-feature +soft-float -target-feature +soft-float-abi
-target-feature -neon -target-linker-version 2.22
-momit-leaf-frame-pointer -v -coverage-file test.s -resource-dir
/usr/bin/../lib/clang/3.0 -fmodule-cache-path
/var/tmp/clang-module-cache -internal-isystem /usr/local/include
-internal-isystem /usr/bin/../lib/clang/3.0/include
-internal-externc-isystem /usr/include/arm-linux-gnueabihf
-internal-externc-isystem /usr/include -O3 -ferror-limit 19
-fmessage-length 80 -fno-signed-char -fgnu-runtime
-fobjc-runtime-has-arc -fobjc-runtime-has-weak -fobjc-fragile-abi
-fdiagnostics-show-option -fcolor-diagnostics -o test.s -x c test.c

(Sorry for the spammage, but I thought it better to snip too little than
too much...)

I'm particularly curious about the way that the triple passed into the
compiler backend starts 'armv4t' when it's rejected as a CPU type if I
specify it manually.

[...]

If that works, it's possible that you'll need to set the flags Clang is
doing by default on your front-end, too.

But this *should* all be autodetected, right? If I'm using the JIT, I
shouldn't need platform-specific knowledge to set up the code generator?

So I'm using "llvm/ExecutionEngine/MCJIT.h" instead of
"llvm/ExecutionEngine/JIT.h", and I've added setUseMCJIT(true) to
EngineBuilder, but what actually happens is:

LLVM ERROR: Target does not support MC emission!

Do I need to do anything else?

Also, what's the code quality of MCJIT compared to the old JIT? As I'm
basically compiling statically at runtime, I don't mind if LLVM spends
time generating the code --- I'm using a module pass, eager code
generation and setOptLevel(llvm::CodeGenOpt::Aggressive) --- and right
now I'm extremely happy with the quality of the code being emitted for
amd64.

So I'm using "llvm/ExecutionEngine/MCJIT.h" instead of
"llvm/ExecutionEngine/JIT.h", and I've added setUseMCJIT(true) to
EngineBuilder, but what actually happens is:

LLVM ERROR: Target does not support MC emission!

Do I need to do anything else?

IIRC, this error might be due to not linking against the MCJIT library
component. Add the appropriate flags to your build config, and also
ensure that you call InitializeNativeTargetAsmPrinter() and
InitializeNativeTargetAsmParser().

Also, what's the code quality of MCJIT compared to the old JIT? As I'm
basically compiling statically at runtime, I don't mind if LLVM spends
time generating the code --- I'm using a module pass, eager code
generation and setOptLevel(llvm::CodeGenOpt::Aggressive) --- and right
now I'm extremely happy with the quality of the code being emitted for
amd64.

The MCJIT uses the exact same mechanism as the static code generator
so the code quality is therefore good. You should note though that
using just the ExecutionEngine codegen optimization level parameter
won't run the full set of target independent optimizations that a tool
like opt will run.

Unfortunately there are inconsistencies in the triple handling between
clang and the rest of LLVM. I'm not at my workstation so I can't give
much more information at the moment.

Amara

[...]

IIRC, this error might be due to not linking against the MCJIT library
component. Add the appropriate flags to your build config, and also
ensure that you call InitializeNativeTargetAsmPrinter() and
InitializeNativeTargetAsmParser().

Thanks; that now makes MCJIT generate code. (For reference, the secret
magic is apparently to call LLVMLinkInMCJIT().)

Unfortunately... after doing the code generation it crashes. The top of
the stack trace looks like this, according to valgrind:

==7470== at 0xE9B2C9:
llvm::RuntimeDyldImpl::emitSection(llvm::ObjectImage&,
llvm::object::SectionRef const&, bool) (RuntimeDyld.cpp:258)
==7470== by 0xE9AAC5:
llvm::RuntimeDyldImpl::findOrEmitSection(llvm::ObjectImage&,
llvm::object::SectionRef const&, bool,
std::map<llvm::object::SectionRef, unsigned int,
std::less<llvm::object::SectionRef>,
std::allocator<std::pair<llvm::object::SectionRef const, unsigned int> >

&) (RuntimeDyld.cpp:314)

==7470== by 0xE9A0C2:
llvm::RuntimeDyldImpl::loadObject(llvm::ObjectBuffer*) (RuntimeDyld.cpp:120)
==7470== by 0xE9BF7C:
llvm::RuntimeDyld::loadObject(llvm::ObjectBuffer*) (RuntimeDyld.cpp:497)
==7470== by 0xE9572A: llvm::MCJIT::emitObject(llvm::Module*)
(MCJIT.cpp:103)
==7470== by 0xE959A0:
llvm::MCJIT::getPointerToFunction(llvm::Function*) (MCJIT.cpp:146)

And, from looking at the machine code dumped out by turning on
llvm::TargetOptions::printMachineCode, it's *still* generating ARM
softfloat code.

The good news about the crash is that it's happening on both ARM and
amd64, which means that it's most likely something I'm doing wrong,
which means I can fix it. I, er, just don't know how. Any suggestions as
to what I'm not doing right? My usual tactic of compiling against a
version of LLVM with assertions turned on isn't helping here.

In addition, does anyone happen to have a set of vanilla LLVM 3.2
libraries compiled for armhf that they can send me? I'd like to verify
or rule out whether there's a problem with the Debian packages.

[...]

The MCJIT uses the exact same mechanism as the static code generator
so the code quality is therefore good.

That's very good to hear.

The box itself is an Allwinner A10; armv7l. /proc/cpuinfo says it's got
swp half thumb fastmult vfp edsp neon vfpv3.

Yes, it's a Cortex-A8.

I've been unable to find any values for CPU which are accepted (it just

says 'unknown target CPU'. I've tried arm, armv7, armv7a, armv7l, arm7,
armv4t... Any suggestions? Is there a way to get clang and llc to emit a
list of what triples they support?

armv7 and armv7a should default to Cortex-A8, "arm" will default to 7TDMI.

Target: arm-unknown-linux-gnueabihf

Auto-detected wrongly (as expected),

"/usr/bin/clang" -cc1 -triple armv4t-unknown-linux-gnueabihf -S

Defaulted to armv4t == ARM7TDMI.

I'm particularly curious about the way that the triple passed into the

compiler backend starts 'armv4t' when it's rejected as a CPU type if I
specify it manually.

What does: 'clang -v -mcpu=cortex-a8 -S -O3 test.c' prints as a target?

Just need to do that on the latest Clang, and only paste the target line.

But this *should* all be auto-detected, right?

Should, but it isn't. Unfortunately, auto-detection in the ARM world is not
as simple as in the Intel world, and it's just not implemented.

If your compiler's name is "armv7a-unknown-linux-gnueabihf-clang", you
might get it right, since that part is implemented and should guess
cortex-a8 (not because it'll detect your CPU, but because it's hard-coded
armv7 -> A8).

If I'm using the JIT, I

shouldn't need platform-specific knowledge to set up the code generator?

You should. The IR (which the execution engine runs) is not platform
independent. The front-end has to make some assumptions when generating IR,
depending on the platform, so you need to generate the correct code to
begin with.

Also, as Amara said, you can set hard-float manually, in the execution
engine, but if your CPU is still v4, I don't think it'll work. You should
make sure you got a v7, than force hard-float, and NEON, and then you'll
get the execution correct. However, if your command line contains
"-mcpu=cortex-a8", you should get all that for free when you build your
Target with the triple above.

cheers,
--renato

[...]

What does: 'clang -v -mcpu=cortex-a8 -S -O3 test.c' prints as a target?

Target: arm-unknown-linux-gnueabihf

...and the triple passed in to clang is armv7-unknown-linux-gnueabihf,
and it generated hardfloat code! That works!

[...]

Should, but it isn't. Unfortunately, auto-detection in the ARM world is
not as simple as in the Intel world, and it's just not implemented.

This sounds like something the Debian people should be doing: they know
what ABI clang is being built for, therefore they should be doing
whatever configuration is needed to make sure that the default
configuration allows the compiler produces binaries that actually work.
I'll file a bug. Thank-you very much.

Incidentally, clang -help doesn't list the -mcpu= option. And is there
any way to make the compiler list the supported set of architectures? It
would never have occurred to me to try cortex-a8...