Incorrect code generated for arm64

Hi all,

I’ve narrowed down a problem in my code to the following test case:

- - - -

typedef struct {float v[2];} vec2;
typedef struct {float v[3];} vec3;

vec2 getVec2();

vec3 getVec3()
{
  vec2 myVec = getVec2();

  vec3 res;
  res.v[0] = myVec.v[0];
  res.v[1] = myVec.v[1];
  res.v[2] = 1;
  return res;
}

- - - -

Compiling this with any level of optimization for arm64 gives incorrect code, unless my test case above is triggering some undefined behaviour that I’m not aware of. Other architectures appear to work OK.

$ clang -arch arm64 -O1 test.c -S -o -
  .section __TEXT,__text,regular,pure_instructions
  .ios_version_min 5, 0
  .globl _getVec3
  .align 2
_getVec3: ; @getVec3
; BB#0:
  b _getVec2

.subsections_via_symbols

- - - -

I’m happy to file a bug for this, but not sure quite where it belongs - clang, LLVM or direct to Apple. Can someone test the top of tree and see if it suffers the same issue?

I’m currently using the latest Xcode from Apple (clang -v gives “Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)”) but the LLVM 3.5-based compiler in the previous Xcode release also generates the same code.

Simon

I can confirm that, with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)

Very strange!

I don’t know if it helps you, but this looks correct:

typedef struct {float v0, v1;} vec2;
typedef struct {float v0, v1, v2;} vec3;

vec2 getVec2();

vec3 getVec3()
{
vec2 myVec = getVec2();

vec3 res;
res.v0 = myVec.v0;
res.v1 = myVec.v1;
res.v2 = 1;
return res;
}

.section __TEXT,__text,regular,pure_instructions
.globl _getVec3
.align 2
_getVec3: ; @getVec3
; BB#0:
stp fp, lr, [sp, #-16]!
mov fp, sp
bl _getVec2
fmov s2, #1.000000e+00
ldp fp, lr, [sp], #16
ret

Thanks Bruce.

I can confirm that, with Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)

Very strange!

Yes, that’s what I thought. I’ve also checked the binary downloads for OS X from llvm.org and get the same broken code from both the 3.5.2 and 3.6.0 releases. -arch arm64 wasn’t supported on the 3.4 branch, so wasn’t able to test there, but I imagine this has been there from the start of the arm64 backend.

I don’t know if it helps you, but this looks correct:

My guess was that as the result elements are returned in registers there might be a bug somewhere around the final element being treated as unused and optimized away, but I’d expect your code to trigger the same issue in that case. My actual code uses a Vector class templated on the size of the vector so your approach so your approach doesn’t really help there. I’m able to work around it easily enough though - as soon as the function does any actual work with the value it seems to work OK.

It is quite a worrying bug for me though - there are quite a lot of functions in my codebase that return small (length 2 or 3) Vector types - generally in headers so they can be inlined an optimized well - but I’d definitely like to understand the root cause of this one so I can be on the lookout for any other similar failures.

Simon

I’ve gone ahead and created a bug for this, as it seems to be a genuine issue rather than me just overlooking something obvious.
https://llvm.org/bugs/show_bug.cgi?id=23408

Simon

Thanks for taking the trouble to report this Simon. I think I've found
and fixed the issue now (r236457). It should make its way into Xcode's
clang without too much delay.

Cheers.

Tim.

Happy to help, and thanks very much for the rapid investigation and fix!

Do you think it’s worth getting into 3.6.1? Is the bug limited to tail call stuff (so any functions that make use of the extended vec3 will be fine - that’s my usual case), or does it potentially have wider effects?

Cheers

Simon

Do you think it’s worth getting into 3.6.1? Is the bug limited to tail call stuff (so any functions that make use of the extended vec3 will be fine - that’s my usual case), or does it potentially have wider effects?

I think Tom said he'd stopped taking nominations for 3.6.1. I'll try
to remember to put it on the branch after that if we do a 3.6.2
though.

Tim.