ARM NEON intrinsics in clang

Hello LLVM Devs,

I am starting my PhD on Automatic Parallelization for DSP and want to play with some ARM NEON intrinsics for a start. I spent the last three days trying to compile a version of LLVM that would allow me to compile sources that contain these intrinsics, but with no success.

In the process I found out that clang doesn’t support NEON (as per http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html), but there has been at least some effort in adding it (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times and I gave up. After some discussions with colleagues (notably Alberto Magni, who added OpenCL support to clang some time ago http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-November/012293.html) my current plan is to implement the ARM NEON intrinsics as a shared library, using attributes as in:

typedef float float4 attribute((ext_vector_type(4)));

or if that doesn’t work, I will try to implement the intrinsics in clang itself (not sure this is the best way of doing it).

Ideally, I want to be able to compile C code that includes ARM NEON intrinsics to other targets (TI processors, e.g.).

Suggestions, comments, and recommendations are very welcome.

Kind regards,

  • Stan

In the process I found out that clang doesn't support NEON (as per
ARM Advanced SIMD (NEON) Intrinsics and Types in LLVM - The LLVM Project Blog),
but there has been at least some effort in adding it (
https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch
).

Hi Stanislav,

LLVM does support NEON on ARM32 for a very long time. The commit you're
referring is about AArch64, and yes, support for ARM64 NEON is patchy at
the moment, but it's progressing quite quickly. What back-end are you
trying to use? 32-bits or 64-bits?

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times

and I gave up. After some discussions with colleagues (notably Alberto
Magni, who added OpenCL support to clang some time ago
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-November/012293.html) my
current plan is to implement the ARM NEON intrinsics as a shared library,
using attributes as in:

LLVM 2.9 is really old, and llvm-gcc is discontinued, so I wouldn't even
try that. If you don't want to use trunk, I recommend you to use LLVM with
Clang 3.3 and see what you get.

typedef float float4 __attribute__((ext_vector_type(4)));

or if that doesn't work, I will try to implement the intrinsics in clang
itself (not sure this is the best way of doing it).
Ideally, I want to be able to compile C code that includes ARM NEON
intrinsics to other targets (TI processors, e.g.).

So, if I get it right, you have a file with ARM NEON intrinsics (the ones
defined in arm_neon.h) and passed it through LLVM 2.9 with LLVM-GCC
front-end and failed.

As of 2010, LLVM can compile every single NEON instruction, but you should
use LLVM's own version of arm_neon.h, since the type definitions do vary
between toolchains. In the end, they amount to the same thing on each
toolchain, but their representation can be different.

I suggest you try with Clang 3.3 and if that fails, we'll start from there.

cheers,
--renato

Hi Stan,

I spent the last three days trying to compile a version of LLVM that would
allow me to compile sources that contain these intrinsics, but with no success.

Ok. This we can probably help with. Did you manage to build a version
of Clang (preferably from git/subversion)?

If so, you're probably having problems cross-compiling. Renato's
recently worked on some documentation in this area:
Cross-compilation using Clang — Clang 18.0.0git documentation.

But for a quick hack, you could try:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
  return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

("ffreestanding" will dodge any issues with your supporting toolchain,
but won't work for larger tests. You've got to actually solve the
issues before you start running code).

In the process I found out that clang doesn't support NEON (as per
ARM Advanced SIMD (NEON) Intrinsics and Types in LLVM - The LLVM Project Blog),

That's rather out of date, I'm afraid. 32-bit ARM does support both
NEON intrinsics and a reasonable amount of LLVM's own
auto-vectorisation (which is in its early stages, but we have some
kind of loop and SLP vectorisation going on).

but there has been at least some effort in adding it
(https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

That patch is part of the effort to implement NEON (instructions and
intrinsics) on the 64-bit ARM architecture (AArch64).

I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times
and I gave up.

Yep. llvm-gcc is long dead, and LLVM 2.9 isn't much healthier.

current plan is to implement the ARM NEON intrinsics as a shared library,
using attributes as in:

That would probably be possible, but very bad from a performance
perspective. The whole point of NEON intrinsics is to speed up vector
code; if you've got the overhead of a call/return for each intrinsic
and completely fixed registers around even that you'll be in for a
world of pain.

Ideally, I want to be able to compile C code that includes ARM NEON
intrinsics to other targets (TI processors, e.g.).

Now that's going to be harder. LLVM itself doesn't support any TI
processors, for a start. And many of the NEON intrinsics (those with
more complex semantics) compile to LLVM IR with LLVM-level intrinsics,
which are only supported in the ARM backend.

Your shared library idea would work semantically, of course. But I'm
not sure what useful information could be extracted from it.

Cheers.

Tim.

Hello Tim,

Hello Renato,

It turned out I just didn’t do the cross-compilation correctly, and Tim Northover already pointed me to a guide you have written on it (http://clang.llvm.org/docs/CrossCompilation.html), so I will read that before continuing with my efforts.

To answer your question I am testing on a pandaboard currently, which has an arm cortex-a9 processor, which I think is 64-bit.

I am much happy to compile the latest code and am successfully doing so. I tried to compile release 2.9, as I (wrongly) believed that I need llvm-gcc in order to compile NEON code on LLVM.

Tim’s minimalist example worked on my clang3.4:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

however it doesn’t if I remove the -ffreestanding flag. I need to figure this out next.

Thank you for your help.

Cheers,

  • Stan

To answer your question I am testing on a pandaboard currently, which has
an arm cortex-a9 processor, which I think is 64-bit.

Cortex-A9 is still 32-bits, so you'll have all support you need. :wink:

however it doesn't if I remove the -ffreestanding flag. I need to figure

this out next.

Can you at least assemble the file to .s? You won't be able to compile
Tim's example to executable because you don't have a main in there.

cheers,
--renato

To answer your question I am testing on a pandaboard currently, which has

an arm cortex-a9 processor, which I think is 64-bit.

Cortex-A9 is still 32-bits, so you'll have all support you need. :wink:

Ah, Okay, embarrassing...

however it doesn't if I remove the -ffreestanding flag. I need to figure

this out next.

Can you at least assemble the file to .s? You won't be able to compile
Tim's example to executable because you don't have a main in there.

I can compile to assembly with the -ffreestanding flag on, but without it I
get:

In file included from neon.c:1:
In file included from
/home/stan/Fortress/Dev/llvm/build-trunk/Debug+Asserts/bin/../lib/clang/3.4/include/arm_neon.h:31:
In file included from
/home/stan/Fortress/Dev/llvm/build-trunk/Debug+Asserts/bin/../lib/clang/3.4/include/stdint.h:64:
In file included from /usr/include/stdint.h:25:
In file included from /usr/include/features.h:341:
/usr/include/stdc-predef.h:30:10: fatal error: 'bits/predefs.h' file not
found
#include <bits/predefs.h>

which I suspect has something to do with the fact that in /usr/include I
have a folder called x86_64-linux-gnu but not one
called arm-linux-gnueabihf. Am I even remotely right?

Cheers,
- Stan

Yes, you are, and the docs should (hopefully) have all the information you
need to get past that, and other common problems. :wink:

cheers,
--renato