JIT on armhf, again

Hello,

Last year I tried --- and failed --- to generate float-heavy ARM code
via the JIT on an armhf platform. No matter what I did, it would always
generate armel code instead. This was on LLVM 3.2, which was all that
was available then.

Now I'm running into a requirement to do this again: while it's much
less crashy than it was, I still can't seem to persuade the JIT to
generate hard-float code. This is with LLVM 3.3, 3.4 and 3.5. I'm using
MCJIT (although I've tried without, as well), and I've tried setting
FloatABI to HardFloat.

Does anyone know if this is actually working yet? If so, are there any
examples of how to do this successfully?

Hi David,

Now I'm running into a requirement to do this again: while it's much
less crashy than it was, I still can't seem to persuade the JIT to
generate hard-float code. This is with LLVM 3.3, 3.4 and 3.5. I'm using
MCJIT (although I've tried without, as well),

Using MCJIT is definitely the right thing to do.

and I've tried setting FloatABI to HardFloat.

My best guess is that you're only specifying an architecture (i.e. the
equivalent of "-march=arm" for llc). That means LLVM will default to
the older APCS calling convention, and won't ever use hard-float
(there just isn't a hard-float APCS ABI).

You should probably be specifying a triple directly, and making it an
AAPCS-VFP one for good measure: "armv7-linux-gnueabihf" for example,
or "thumbv7-none-eabihf". You shouldn't even need to set FloatABI for
those two.

Cheers.

Tim.

[...]

You should probably be specifying a triple directly, and making it an
AAPCS-VFP one for good measure: "armv7-linux-gnueabihf" for example,
or "thumbv7-none-eabihf". You shouldn't even need to set FloatABI for
those two.

How do I do this? (I can't find any examples, and the API is decidedly
unclear...)

You should probably be specifying a triple directly, and making it an
AAPCS-VFP one for good measure: "armv7-linux-gnueabihf" for example,
or "thumbv7-none-eabihf". You shouldn't even need to set FloatABI for
those two.

How do I do this? (I can't find any examples, and the API is decidedly
unclear...)

It looks like it's a case of calling Module::setTargetTriple. As with
most JIT setup questions, though, often the best way to find out is to
get something working in lli and then look at what it does
(tools/lli/lli.cpp).

Cheers.

Tim.

[...]

It looks like it's a case of calling Module::setTargetTriple. As with
most JIT setup questions, though, often the best way to find out is to
get something working in lli and then look at what it does
(tools/lli/lli.cpp).

Well, it's *almost* working --- hardfloat code is now being generated,
and it even seems to be right most of the time!

Unfortunately it looks like it's getting calling conventions wrong. This
IR code:

define void @Entrypoint(float %in, float* %out) {
  store float %in, float* %out
}

...gets compiled to this:

  STRi12 %R0<kill>, %R1<kill>, 0, pred:14, pred:%noreg; mem:ST4[%out]
  BX_RET pred:14, pred:%noreg

(typed by hand, so may contain typos).

So it looks like it's assuming that float parameters are being passed in
integer registers, which isn't the case on armhf.

Could it be under the impression that I'm running on an armel system? In
which case the above code is correct. This would explain why the default
setting appears to generate armel code. Is this controllable?

Well, it's *almost* working --- hardfloat code is now being generated,
and it even seems to be right most of the time!

OK, so that's probably coming from the default parameters in the
armv7/thumbv7 part of the triple. An alternative would have been
setting the CPU type manually.

Unfortunately it looks like it's getting calling conventions wrong. This
IR code:

Which triple are you using? And is the correct code used when you run
the same IR through "llc -mtriple=whatever"?

Finally, which version of LLVM are you using? Newer is better, LLVM
changes very quickly and we have been working on improving various
defaults to make it easier to use.

Could it be under the impression that I'm running on an armel system? In
which case the above code is correct. This would explain why the default
setting appears to generate armel code. Is this controllable?

It ought to be the environment part of the triple that controls it:
eabihf or gnueabihf, for example.

Did you try setting a breakpoint in that function I mentioned and
finding out just which test fails on the path to returning AAPCS_VFP?

Cheers.

Tim.

[...]

Which triple are you using? And is the correct code used when you run
the same IR through "llc -mtriple=whatever"?

armv7-linux-gnueabihf, as suggested; and if I use llc -mtriple then the
code compiles to:

  vstr s0, [r0]
  bx lr

...which I would consider correct. (What's more interesting is *without*
specifying the triple llc generates armel code. Should llc default to
generating code which will actually run on a given platform? Is it
possible my version of llvm has been compiled with the wrong options?
clang generates correct code, but it looks like it's not going via llc.)

Finally, which version of LLVM are you using?

llvm-3.5, as supplied by Debian.

[...]

It ought to be the environment part of the triple that controls it:
eabihf or gnueabihf, for example.

Does the system support linking together multiple modules with different
triples? Since my function is externally visible, is it deliberately
using the armel calling convention because it thinks it's being called
from code where my triple doesn't apply? Should I be setting the triple
on the execution engine as a whole (which I've avoided so far because
the API looks painful)?

[...]

Did you try setting a breakpoint in that function I mentioned and
finding out just which test fails on the path to returning AAPCS_VFP?

No; I don't have a debugger on this platform.

Should llc default to
generating code which will actually run on a given platform? Is it
possible my version of llvm has been compiled with the wrong options?

It's a configure-time option, I believe. It's entirely possible the
Debian packages have it wrong, but I'd always have and test with an
override: if I could make a default work that would just be bonus
success.

Does the system support linking together multiple modules with different
triples?

Not really. You'll probably get some code out the other end, but with
a pretty much randomly chosen triple. More features along those lines
are planned (it's useful for CPU-specific routines, particularly
during LTO).

Did you try setting a breakpoint in that function I mentioned and
finding out just which test fails on the path to returning AAPCS_VFP?

No; I don't have a debugger on this platform.

That's probably the first thing I'd look at fixing.

Beyond that, before even thinking about implementing a JITing
compiler, I'd build LLVM with debugging symbols and make sure I build
against that.

LLVM's API just isn't packaged for people to use as a black box. It's
the ad-hoc mixture of all the bells and whistles existing users have
found useful, without the important ones being marked.

I could suggest various diagnostics at this point, but they'd all come
back to a debugger very quickly.

Cheers.

Tim.

[...]

No; I don't have a debugger on this platform.

That's probably the first thing I'd look at fixing.

Oddly enough, I find myself rather disinclined to debug gdb...

[...]

LLVM's API just isn't packaged for people to use as a black box. It's
the ad-hoc mixture of all the bells and whistles existing users have
found useful, without the important ones being marked.

I've noticed.

I should add that my application works absolutely fine on amd64 on Linux
and OS X; I'm not trying to *develop* on ARM, merely get an existing app
working. I'm reasonably confident that the logic is correct.

When developing an LLVM backend, I've used tracing to find out what the
compiler was doing while generating code; is the tracing logic built in
to the JIT? Can it be enabled? That should at least tell me *why* it's
picking the registers it is.

Hi David,

The behaviour of LLVM tools should be independent of the platform you
run it on, so a triple should have the same effect on any host
environment.

Your problem seem to be related to defaults on triples being set in
Clang/llc/lli apart from *just* setting the triple, which is quite
widespread. We have extensively discussed a common API for the drivers
(any front-end), so that every architectural decision would be shared,
even for off-tree projects like yours, but that's easier said than
done.

For now, what I recommend is for you to look into llc or lli drivers
and check how do they set up the ARM parameters from the triple and do
the same on your code. On most tools, setting "arm-*-gnueabihf" should
be enough to get you hard-float, but that doesn't mean that setting
Triple = "arm-*-gnueabihf" will set the right flags in the execution
engine's / back-end's structures. You may have to do that manually.

cheers,
--renato