Problem with struct arguments passed by value

I have the following program that I'm trying to compile with clang on Mac OS X on Intel, which clang fails to codegen correctly.

$ cat a.c

typedef struct {
     long location;
     long length;
} CFRange;

void *CFArrayCreate(void *allocator, const void **values, long numValues,
             const void *callBacks);
char CFArrayContainsValue(void *theArray, CFRange range, const void *value);

#if CHECK
int checkrange(CFRange range);
#endif

int main(int argc, char *argv[]) {
  void *array;
  CFRange range;
  const void *values[1] = { "foo" };
  
  array = CFArrayCreate((void *)0, values, 1, (void *)0);
  
  range.location = 0;
  range.length = 1;
  
#if CHECK
  if (checkrange(range)) {
    return 2;
  }
#endif

  if (CFArrayContainsValue(array, range, values[0])) {
    return 0;
  } else {
    return 1;
  }
}

#if CHECK
int checkrange(CFRange range)
{
  if (range.length != 1 || range.location != 0) return 1;

  return 0;
}
#endif
$

The problem is that clang, both with -arch i386 and -arch x86_64, generates LLVM IR that passes a pointer to a CFRange as the second argument to CFArrayContainsValue, instead of either inlining the structure in the stack parameter area for 32-bit, or splitting the structure across the integer GPRs %rsi/%rdx for 64-bit mode. llvm-gcc-4.2 gets this right (as does gcc obviously), so I think it's a problem with how clang is codegen-ing function calls (and also how it decodes struct arguments to a function. clang is at least self-consistent between the caller and callee, even if it doesn't match the ABI of code compiled with gcc. CHECK verifies this).

$ gcc -o a a.c -framework CoreFoundation -arch i386
$ ./a
$ echo $?
0
$ gcc -o a a.c -framework CoreFoundation -arch x86_64
$ ./a
$ echo $?
0
$ clang -arch i386 a.c -emit-llvm-bc -o - | llc -o - | as -arch i386 - -o a.o
$ gcc -o a a.o -framework CoreFoundation -arch i386
$ ./a
Segmentation fault
$ clang -arch x86_64 a.c -emit-llvm-bc -o - | llc -o - | as -arch x86_64 - -o a.o
$ gcc -o a a.o -framework CoreFoundation -arch x86_64
$ ./a
Segmentation fault
$

Looking at just the i386 LLVM code between llvm-gcc-4.2 and clang shows:

$ llvm-gcc-4.2 -S -o - -emit-llvm a.c
...
  %tmp11 = getelementptr %struct.CFRange* %range, i32 0, i32 0 ; <i32*> [#uses=1]
  %tmp12 = load i32* %tmp11 ; <i32> [#uses=1]
  %tmp13 = getelementptr %struct.CFRange* %range, i32 0, i32 1 ; <i32*> [#uses=1]
  %tmp14 = load i32* %tmp13 ; <i32> [#uses=1]
  %tmp15 = call i8 @CFArrayContainsValue( i8* %tmp10, i32 %tmp12, i32 %tmp14, i8* %tmp9 ) signext nounwind ; <i8> [#uses=1]
...
$ clang a.c -o - -emit-llvm
...
  %range = alloca %struct.CFRange ; <%struct.CFRange*> [#uses=3]
  %values = alloca [1 x i8*] ; <[1 x i8*]*> [#uses=3]
  %tmp4 = alloca %struct.CFRange ; <%struct.CFRange*> [#uses=2]
...
  %tmp5 = bitcast %struct.CFRange* %tmp4 to i8* ; <i8*> [#uses=1]
  %tmp6 = bitcast %struct.CFRange* %range to i8* ; <i8*> [#uses=1]
  call void @llvm.memcpy.i32( i8* %tmp5, i8* %tmp6, i32 8, i32 4 )
...
  %call9 = call i8 @CFArrayContainsValue( i8* %tmp3, %struct.CFRange* %tmp4, i8* %tmp8 ) ; <i8> [#uses=1]

When the system CoreFoundation.framework attempts to decode the arguments CFArrayContainsValue, it explodes pretty spectacularly.

From a quick look it looks like this is intentional (although perhaps not correct) in CodeGenFunction::EmitCallExpr in clang/lib/CodeGen/CGExpr.cpp and the use of CreateTempAlloca/EmitAggExpr. Maybe something different is needed?

Shantonu Sen
ssen@apple.com

llvm-gcc:
declare i8 @CFArrayContainsValue(i8*, i32, i32, i8*) signext

clang:
declare i8 @CFArrayContainsValue(i8*, %struct.anon*, i8*)

What the heck is llvm-gcc doing here?!

-Eli

It looks as though llvm-gcc is assuming the Darwin function calling ABI:

http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/IA32.html#//apple_ref/doc/uid/TP40002492-SW4

-bw

I have the following program that I'm trying to compile with clang on
Mac OS X on Intel, which clang fails to codegen correctly.

$ cat a.c

typedef struct {
    long location;
    long length;
} CFRange;

char CFArrayContainsValue(void *theArray, CFRange range, const void
*value);

llvm-gcc:
declare i8 @CFArrayContainsValue(i8*, i32, i32, i8*) signext

clang:
declare i8 @CFArrayContainsValue(i8*, %struct.anon*, i8*)

What the heck is llvm-gcc doing here?!

Heh, it is lowering the structure to make it match the ABI. This is gross, but correct.

Clang hasn't done much of anything to try to match the ABI. At the very least though, structs passed by value should be marked with the byval attribute. The code should work better if compiled to:

declare i8 @CFArrayContainsValue(i8*, %struct.anon* byval, i8*) signext

Also, note that '-arch x86_64' doesn't currently switch the target information fully: I think it will still think that pointers are 32-bit etc:

From include/clang/Basic/TargetInfo.h:

   /// getPointerWidth - Return the width of pointers on this target, for the
   /// specified address space. FIXME: implement correctly.
   uint64_t getPointerWidth(unsigned AddrSpace) const { return 32; }
   uint64_t getPointerAlign(unsigned AddrSpace) const { return 32; }

-Chris