ARM assembler bug on LLVM 3.5

Hi

I have the following ARM Linux program. The program detects if the
processor has division instruction, if it does, it uses it, otherwise it
uses slower library call.

The program works with gcc, but it doesn't work with clang. clang reports
error on the sdiv instruction in the assembler.

The problem is this - you either compile this program with
-mcpu=cortex-a9, then clang reports error on the sdiv instruction because
cortex a9 doesn't have sdiv. Or - you compile the program with
-mcpu=cortex-a15, then clang compiles it, but it uses full cortex-a15
instruction set and the program crashes on cortex a9 and earlier cores.

Even if I use -no-integrated-as (as suggested in bug 18864), clang still
examines the string in "asm" statement and reports an error. GCC doesn't
examine the string in "asm" and works.

I'd like to ask how to write this program correctly so that it works in
clang. Or - if it's not possible - I'd like to ask if you could drop that
pointless restriction on instruction set in the assembler and be able to
generate all ARM instructions regardless of the cpu switch. This
restriction doesn't exist on x86 - on x86, you can compile the program
with -march=pentium2 and still use SSE instructions in the assembler, no
matter that pentium2 doesn't have SSE. The ARM backend seems overly
protective and prevents such instructions.

Mikulas

#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>

int have_hardware_division = 0;

int divide(int a, int b)
{
  int result;
  if (have_hardware_division)
    asm (".cpu cortex-a15 \n sdiv %0, %1, %2" : "=r"(result) : "r"(a), "r"(b));
  else
    result = a / b;
  return result;
}

int main(void)
{
  int h, i;
  unsigned a;
  h = open("/proc/self/auxv", O_RDONLY);
  if (h != -1) {
    uint32_t cap[2];
    while (read(h, &cap, 8) == 8) {
      if (cap[0] == 16) {
#if defined(__thumb2__)
        if (cap[1] & (1 << 18))
          have_hardware_division = 1;
#else
        if (cap[1] & (1 << 17))
          have_hardware_division = 1;
#endif
        break;
      }
    }
    close(h);
  }
  a = 0;
  for (i = 1; i < 100000000; i++) {
    a += divide(100000000, i);
  }
  printf("%u\n", a);
  return 0;
}

The problem is this - you either compile this program with
-mcpu=cortex-a9, then clang reports error on the sdiv instruction because
cortex a9 doesn't have sdiv. Or - you compile the program with
-mcpu=cortex-a15, then clang compiles it, but it uses full cortex-a15
instruction set and the program crashes on cortex a9 and earlier cores.

LLVM always validates inline assembly. This may be too restrictive but
it has been for a while and there isn't yet a good incentive to turn
that off, even with a flag. This may change in the future for some
cases, but your case is not one of them.

The problem is that there isn't currently a way to pass flags to the
integrated assembler as of now (I'm working on it). When that starts
to work, you will be able to just pass "-mcpu=cortex-a9
-Wa,-mcpu=cortex-a15" and it'll do what you want. That should work on
GCC, too.

Right now, -Wa only works with an external assembler.

Even if I use -no-integrated-as (as suggested in bug 18864), clang still
examines the string in "asm" statement and reports an error. GCC doesn't
examine the string in "asm" and works.

Have you tried passing -Wa,-mcpu=cortex-a15 together with -no-integrated-as?

cheers,
--renato

> The problem is this - you either compile this program with
> -mcpu=cortex-a9, then clang reports error on the sdiv instruction because
> cortex a9 doesn't have sdiv. Or - you compile the program with
> -mcpu=cortex-a15, then clang compiles it, but it uses full cortex-a15
> instruction set and the program crashes on cortex a9 and earlier cores.

LLVM always validates inline assembly. This may be too restrictive but
it has been for a while and there isn't yet a good incentive to turn
that off, even with a flag. This may change in the future for some
cases, but your case is not one of them.

What's the purpose of rejecting "sdiv" and other instructions?

I understand that some times ago ARM was used mainly for embedded systems
and there was no need to write ARM applications portable to different
cores - so rejecting instructions for different cores didn't do any harm.
But today it definitelly makes sense to write portable ARM applications
that run on different cores.

The problem is that there isn't currently a way to pass flags to the
integrated assembler as of now (I'm working on it).

gas unlocks all instructions if these directives are used ".cpu
cortex-a15\n .fpu neon-vfpv4".

LLVM integrated assembler will choke on it and produce invalid object
file. You can try:
int main(void)
{
        asm volatile (".cpu cortex-a15\n .fpu neon-vfpv4\n");
        return 0;
}
/usr/bin/ld: Warning: /tmp/as-b084f5.o: Unknown EABI object attribute 488
/usr/bin/ld: /tmp/as-b084f5.o: Unknown mandatory EABI object attribute
8890
/usr/bin/ld: failed to merge target specific data of file /tmp/as-b084f5.o

Maybe, all you need to do is fix handling of these directives in the
integrated assembler.

When that starts
to work, you will be able to just pass "-mcpu=cortex-a9
-Wa,-mcpu=cortex-a15" and it'll do what you want. That should work on
GCC, too.

Right now, -Wa only works with an external assembler.

> Even if I use -no-integrated-as (as suggested in bug 18864), clang still
> examines the string in "asm" statement and reports an error. GCC doesn't
> examine the string in "asm" and works.

Have you tried passing -Wa,-mcpu=cortex-a15 together with -no-integrated-as?

I have tried it and it doesn't work (clang 3.5 in Ubuntu Trusty on ARM).
With -no-integrated-as, clang uses the system assembler, but it parses the
asm string nonetheless and reports an error on sdiv instruction.

Mikulas

What's the purpose of rejecting "sdiv" and other instructions?

Warning the user that some of the options are not consistent. Handling
the .cpu in inline assembly is not always a good thing
(Renato Golin - GAS .fpu directive) and a
compile-time warning system should be used. It's not because GAS does
it that it's right.

I have tried it and it doesn't work (clang 3.5 in Ubuntu Trusty on ARM).
With -no-integrated-as, clang uses the system assembler, but it parses the
asm string nonetheless and reports an error on sdiv instruction.

Ubuntu 3.5 is not the same as LLVM 3.5, unfortunately. I've discussed
this with Debian packagers and we'll have to fix this there.

It's not because the (x86) host assembler doesn't "recognize" the
(arm) target's instruction that we should just allow any instruction
in inline asm. This has been discussed on both LLVM and GCC lists
extensively and the LLVM position is that inline assembly validation
is a good thing. In this case, you're not being kosher and the
assembler is complaining, which is good and I don't think we should
change that.

cheers,
--renato

> What's the purpose of rejecting "sdiv" and other instructions?

Warning the user that some of the options are not consistent. Handling
the .cpu in inline assembly is not always a good thing
(Renato Golin - GAS .fpu directive) and a
compile-time warning system should be used. It's not because GAS does
it that it's right.

> I have tried it and it doesn't work (clang 3.5 in Ubuntu Trusty on ARM).
> With -no-integrated-as, clang uses the system assembler, but it parses the
> asm string nonetheless and reports an error on sdiv instruction.

Ubuntu 3.5 is not the same as LLVM 3.5, unfortunately. I've discussed
this with Debian packagers and we'll have to fix this there.

It's not because the (x86) host assembler doesn't "recognize" the (arm)
target's instruction that we should just allow any instruction in inline
asm. This has been discussed on both LLVM and GCC lists extensively and
the LLVM position is that inline assembly validation is a good thing.

Why is it good?

Read the thread that you referenced above (and bug
http://llvm.org/bugs/show_bug.cgi?id=20447 that is referenced in that
thread) and you'll see that other people have the same problems as me.
They can't write NEON-optimized piece of assembler because the assembler
rejects it.

All these problems would go away if the assembler didn't limit the
instruction set - then we wouldn't have to use hacks with .cpu and .fpu.

In this case, you're not being kosher and the
assembler is complaining, which is good and I don't think we should
change that.

So, I ask again - how can I use the sdiv instruction on ARM cores that
support it and yet make the program work on ARM cores that don't have
sdiv? If you think that the program that I posted at the beginning of this
thread is wrong - please say how to do it correctly.

cheers,
--renato

Mikulas

I downloaded pre-compiled llvm/clang 3.5 from the llvm.org website. Unlike
the ubuntu clang, it doesn't attempt to parse the content of the asm
statement when -no-integrated-as is used, so that trick with asm (".cpu
cortex-a15 \n sdiv %0, %1, %2"...) works.

Using "-Wa,-mcpu=cortex-a15" doesn't work, clang puts ".cpu cortex-a8" to
on third line of the assembler file and that reverts cpu=cortex-a15 set at
the command line.

Setting ".cpu" in the assembler statement is hacky - I'd still like to
know how to generate the sdiv instruction properly.

Mikulas

Why is it good?

I encourage you to search this list on the topics of inline assembly
validation, as we already have discussed it extensively. The thread I
sent you was the other side, arguments for the non-validation, as well
as our efforts to sync with the GNU community (which I think was a
good one).

People have different problems with assembly, we're trying to fix the
best way we can, not the fastest, nor the one that works only in one
case. Most importantly, we're trying not to break other things while
we fix this one. Inline assembly is tricky and people tend to only
care about their own niche little hack, which only makes things worse.

So, I ask again - how can I use the sdiv instruction on ARM cores that
support it and yet make the program work on ARM cores that don't have
sdiv? If you think that the program that I posted at the beginning of this
thread is wrong - please say how to do it correctly.

You cannot do that yet on LLVM, at least not for ARM. This has been
discussed at Linaro Connect and the fix that has gone into the 3.5
release is *not* good.

I have a plan, to implement it in a way that works without .cpu hacks,
by allowing -Wa to work with both integrated and external assemblers,
and it may be that you should change your example a bit (maybe by
moving all A15 code into a separate .S file) or something. Please add
your program to the bug and I'll consider it when I fix it, because
that's more or less what IFUNC does, which *has* to work. You can
subscribe yourself to those bugs and I'll keep you posted.

If you read my .fpu proposal in that bug, and my -Wa,-mfpu proposal on
the other bug, you'll get what I'm trying to do, which I believe will
work on your program. But none of that has anything to do with turning
the inline asm validation off.

cheers,
--renato