question about alignment of structures on the stack (arm 32)

Dear community,

I faced with code which was generated by llvm, assembly instructions of that code is relying on 8-bytes alignment for structures on the stack.
The part of Objective C code is following:
-(void)getCharacters:(unichar *)unicode {
NSRange range;
range.location = 0;
range.length = [self length];
printf("%p, %p\n", &range.location, &range.length);

And before printf call I see an argument preparation, and one of the most interesting instruction

orr    r3, r2, \#4 ;for address of range\.length

Does this mean llvm always expects "range" address aligned by 8 bytes?
Is it possible to tweak it somehow by cmd line option for clang, e.g. to set 4-bytes alignment,
and generate another code instead of orr, e.g. add?

Best Regards,
Alexey

This is certainly odd, and I can't reproduce the behaviour here. Even
if the stack itself is 8-byte aligned (it's not on iOS), that struct
would usually only be 4-byte aligned. LLVM shouldn't be using "orr"
there.

Do you have a self-contained example (code, compiler version & command
line flags)?

Cheers.

Tim.

Hello Tim, thanks for response

I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying to make it work on ARM.
The scenario is to load MachO binary (e.g. compiled in xCode) that binary is invoking function from
ELF library which implements libobjc2 and CoreFoundation.

in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects 8-bytes alignment.
So in 50% cases when call made from MachO to ELF stack pointer register contains not a 8-bytes aligned address.

Ah, that could do it. I see that LLVM does indeed make use of stack
alignment in this case. Regardless, this approach is going to go
really badly.

By default almost all ELF platforms use an ABI called AAPCS (either
hard or soft float). iOS uses an older ABI called APCS. You can't mix
code from these two worlds in any kind of non-trivial case without a
translation layer.

You've discovered one issue: AAPCS requires 8-byte alignment for sp,
APCS only requires 4. It's the first of many without a more thorough
approach to the interface between the two.

I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs attribute, and I could apply it for dangerous function, but I would prefer to solve that problem in general.

I don't think so; __attribute__((pcs("apcs"))) might work, if it
existed. But it doesn't. You might find it's fairly easy to add it in
Clang, but I worry about the assumptions being made in the backend.

Either way, I'd recommend against trying to hack just this one stack
alignment issue.

Tim.

By default almost all ELF platforms use an ABI called AAPCS (either
hard or soft float). iOS uses an older ABI called APCS. You can't mix
code from these two worlds in any kind of non-trivial case without a
translation layer.

Do you mean translation layer in loader. If so, loader could replace any ELF invocation by stub function invocation, stub will adjust stack and so on, but stub in this case should know invoking function signature, otherwise
arguments on stack could be missed,

Yep, that's pretty much exactly what I had in mind. You'd probably
need at least some assembler component.

I think it's compiler responsibility.

Compilers generally don't take the responsibility for making two ABIs
compatible, with certain exceptions (ironically, the main one I know
of *is* in ARM, where AAPCS and AAPCS-VFP have some accommodations).

I faced here with bugs, due stack alignment, but as I wrote before, I think realignment or removing orr and use add instead could solve it.

Large data types (larger than 4 bytes) are 4-byte aligned.

This is a big one. It means structs will be laid out differently
unless you're careful, but the most difficult aspect is that it
applies to function calls too. Consider:

    void func(int x, long long y)

iOS will pass y in registers r1 and r2. ELF code will expect it in
registers r2 and r3. Similar effects happen to arguments that get
passed on the stack.

+ Register R7 is used as a frame pointer
If I truly understood it's for debug purpose only, but disasmly of my CoreFoundation(ELF) shows r7 usage. Frame pointer on my system is r11.
+ Register R9 has special usage
Document says r9 could be used since iOS 3.0, and I found a usage in my CoreFoundation. So I don't think it could be a problem.

Yes, these ones are probably harmless.

There are other issues too, particularly when you get to C++ (name
mangling and exceptions spring to mind). But I expect you've got
enough to worry about for now.

- orr r2, r1, #4
+ add r2, r1, #4
add instead of orr. Unfortunately, I didn't yet put 36 clang into my chroot to build (I'm not using cross compilation).
But if somebody could point me to proper source code or name the patch, I'll be very appreciate.

I wouldn't rely on this. Trunk emits orr again, it's likely just a
random code perturbation and will bite you elsewhere without a real
solution.

Tim.

void func(int x, long long y)

iOS will pass y in registers r1 and r2. ELF code will expect it in
registers r2 and r3. Similar effects happen to arguments that get
passed on the stack.

Strange, but in that simple case on ELF I got,
    mov r0, #1
    mov r1, #18
    mov r2, #0
    bl long_long_func
with the same endian as on iOS,

That almost certainly means you're using the wrong triple on Linux
(and so would have problems calling system library functions with that
signature). Most ARM linux distributions these days use
arm-linux-gnueabihf (or possibly arm-linux-gnueabi, the difference
being where floating-point arguments get passed).

If that's changed in your upgrade to 3.6, it might account for you
seeing "add" instead of "orr" now, too.

I wouldn't rely on this. Trunk emits orr again, it's likely just a
random code perturbation and will bite you elsewhere without a real
solution.

Trunk of llvm's source code ) ?

Yes.

Tim.

Hello Tim,

I have one more question regarding call convention

for example cmd line is following:
clang -target arm-apple-darwin10 -mabi=apcs-gnu -fPIC -o main_36.llvm main.c -O1 -mfloat-abi=soft -emit-llvm

(I'm using clang from GitHub - llvm-mirror/clang: Mirror kept for legacy. Moved to https://github.com/llvm/llvm-project branch master, on top of "clang-format: Fix for #pragma option formatting." commit
in llvm tree GitHub - llvm-mirror/llvm: Project moved to: https://github.com/llvm/llvm-project branch master on top of "[AArch64] Disable complex GEP optimization by default" commit,
I'm not svn user :slight_smile: )
for function
void main_long_long_func(int a, long long b) __attribute__((pcs("aapcs"))) ;

I could see
define i32 @main(i32 %argc, i8** nocapture readnone %argv) #2 {
entry:
tail call arm_aapcscc void @main_long_long_func(i32 1, i64 19)
And it's expected, whole file in apcs-gnu, but only main_long_long_func is in AAPCS.
But when I produce just assembly code,
clang -target arm-apple-darwin10 -mabi=apcs-gnu -fPIC -o main_36.s main.c -O1 -mfloat-abi=soft -S
or without -mabi=apcs-gnu
I got call
mov r0, #1
mov r1, #19
mov r2, #0
bl _main_long_long_func
in APCS convention, but not AAPCS, and function itself expects second argument in r1, r2.
Maybe I misunderstood, but I thought that attribute is changing calling convention?
The same situation if main_long_long_func is in file with was compiled with -target armv7l-linux-gnueabi,
attribute doesn't change call convention at the invocation point.

clang -target arm-apple-darwin10 -mabi=apcs-gnu -fPIC -o main_36.llvm main.c -O1 -mfloat-abi=soft -emit-llvm

The most reliable way to emit ARM code that's compatible with iOS is probably:
    clang -target x86_64-apple-macosx10.10 -arch armv7s ...

With that you certainly won't need to override the float-abi or abi.
I've been meaning to get "thumbv7s-apple-ios8.0" or whatever working
for a while too, but at the moment you need to specify x86 and then
"-arch" to set the real target for best results.

in APCS convention, but not AAPCS, and function itself expects second argument in r1, r2.

Unfortunately, it looks like attribute((pcs("aapcs"))) doesn't work
when the general compilation environment is APCS. The immediate reason
is that it looks for the 64-bit alignment of the i64 to decide whether
to leave a hole (see ARMCallingConv.td:124), but i64 is only 32-bit
aligned in APCS. This particular issue could be fixed.

At a higher level though, it was designed as an attribute to interwork
between hard and soft-float versions of otherwise compatible ABIs.
There's no way it could reliably work on APCS (e.g. what do you do
when passing "struct { int a; long long b; }", where the layouts are
different on either side).

Cheers.

Tim.