Incorrect execution of global constructor with JIT on ARM

Hello, llvm developers!

I am running LLVM with JIT on ARM. For simple programs it runs ok, but for lager code I have stumbled upon some issues.
See following C++ code to which I have reduced the problem:

#include <stdio.h>
struct Global {
  typedef unsigned char ArrayType[4];
  ArrayType value;
  Global(const ArrayType& arg) {
    for (int i = 0; i < 4; i++) this->value[i] = arg[i];
  }
};
static const unsigned char arr[] = { 1, 2, 3, 4 };
static const Global Constant(arr);
int main() {
  for (int i=0; i<4; i++) printf("%i", Constant.value[i]);
}

I am compiling it with llvm-gcc (with -O3 or -O2 optimization), and am running it with llvm version 2.6. Instead of priniting out 1234, it prints out 4444.
I verified contents of Constant memory with this code:

        const llvm::GlobalValue* v = module->getGlobalVariable("_ZL8Constant", true);
        void* addr = EE->getPointerToGlobal(v);
        const unsigned char* ptr = (const unsigned char*)addr;
        for (int i=0; i<6; i++)
        {
            outs() << (int)ptr[i] << ',';
        }

I really see that memory is filled with 4,4,4,4 and that is incorrect.
When I put global constant definition const Global Constant(arr); in main function as local variable then everything runs fine - program prints out 1234.
Is that some issue with LLVM JIT for ARM, or LLVM in general? Same code runs fine on Windows with same version of llvm.
Global constructor code looks like this:

define internal void @_GLOBAL__I_main() nounwind {
entry:
  store i8 1, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 0), align 8
  store i8 2, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 1), align 1
  store i8 3, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 2), align 2
  store i8 4, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 3), align 1
  ret void
}

I don't see any problems with it.
When I compile same bitcode file with llc.exe -march=arm, and use generated assembler file on my ARM, then code runs fine.
What else could I check in this situation to determine more about problem?

#include <stdio.h>
struct Global {
typedef unsigned char ArrayType[4];
ArrayType value;
Global(const ArrayType& arg) {
for (int i = 0; i < 4; i++) this->value[i] = arg[i];
}
};
static const unsigned char arr[] = { 1, 2, 3, 4 };
static const Global Constant(arr);
int main() {
for (int i=0; i<4; i++) printf("%i", Constant.value[i]);
}

Compiling with clang I got lots of errors, but boils down to two problems:

typedef unsigned char ArrayType[4];

const_array.cpp:3:2: error: type name does not allow storage class to
be specified
typedef unsigned char ArrayType[4];
^

Which, as far as I can tell, it's confusing ArrayType[4] by a
declaration of an unsigned char[4] type.

I've changed your code slightly to make it compile with clang, but I
haven't been able to make it print 4444, not even with your own code,
not even at -O3. There seems to be nothing wrong with the LLVM IR
generated by your code, too, even at -O3.

#include <stdio.h>
typedef unsigned char ArrayType;
struct Global {
ArrayType value[4];
Global(const ArrayType* arg) {
   for (int i = 0; i < 4; i++) this->value[i] = arg[i];
}
};
static const unsigned char arr[] = { 1, 2, 3, 4 };
static const struct Global Constant(arr);
int main() {
for (int i=0; i<4; i++) printf("%i", Constant.value[i]);
}

See if that helps. I think it has nothing to do with code generation, though.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Thanks for answer, Renato.
But I still thing that there is some issue with ARM codegen.
When I tried running your code you modified I got exactly same LLVM IR result (verified it by comparing output from llvm-dis) - and program on runtime still produces wrong result.

With some help from another developer we managed to reduce issue to following C code that is simpler:

#include <stdio.h>
void init(int* value, int val) {
*value = val;
printf(“Values: %08x\n”, *value);
}
int main() {
static struct {
int a;
int b;
} value;

init(&value.b, 11);
init(&value.a, 10);

printf("%i\n", value.a);
printf("%i\n", value.b);
}

Correct result would be following output (I am getting this when I’m running ARM+Interpreter, or Windows+JIT):

Values: 0000000b
Values: 0000000a
10
11

But 2.6 LLVM + JIT on ARM, when compiled with llvm-gcc -O3, produces this:
Values: 0000000b
Values: 0000000a
10
10

Here is LLVM IR of main function:

define i32 @main() nounwind {

entry:
store i32 11, i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4
%0 = tail call i32 (i8*, …)* @printf(i8* noalias getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 11) nounwind ; [#uses=0]
store i32 10, i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 0), align 8
%1 = tail call i32 (i8*, …)* @printf(i8* noalias getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 10) nounwind ; [#uses=0]
%2 = load i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 0), align 8 ; [#uses=1]
%3 = tail call i32 (i8*, …)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str1, i32 0, i32 0), i32 %2) ; [#uses=0]
%4 = load i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 ; [#uses=1]
%5 = tail call i32 (i8*, …)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str1, i32 0, i32 0), i32 %4) ; [#uses=0]
ret i32 0
}

It looks like the JIT compiler doesn’t handle the following bitcode instructions correctly:
store i32 11, i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4
and
%4 = load i32* getelementptr inbounds (%struct…0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 ; [#uses=1]
It ignores the “i32 1” offset in the getelementptr bitcode instruction.

Here is produced ARM assembly of main function displayed from GDB:

(gdb) x/30i FPtr

0x40029010: sub sp, sp, #16 ; 0x10
0x40029014: str lr, [sp, #12]
0x40029018: str r11, [sp, #8]
0x4002901c: str r5, [sp, #4]
0x40029020: str r4, [sp]

Allocate four entries on the stack, and save the return address, r11, r5, and r4.

0x40029024: ldr r4, [pc, #88] ; 0x40029084

Address of the value variable in r4.

0x40029028: mov r1, #11 ; 0xb
0x4002902c: str r1, [r4]
0x40029030: ldr r5, [pc, #80] ; 0x40029088
0x40029034: mov r0, r5
0x40029038: bl 0x40009008

Inline the init function: store 11 at the address of the “value” variable, call printf with the string from r5. This is a bug, should have stored at an offset of four (str r1, [r4,4]).

0x4002903c: mov r1, #10 ; 0xa
0x40029040: str r1, [r4]
0x40029044: mov r0, r5
0x40029048: bl 0x40009008

Inline the init function: store 10 at the address of the “value” variable, call printf with the string from r5. This looks OK.

0x4002904c: ldr r1, [r4]
0x40029050: ldr r5, [pc, #52] ; 0x4002908c
0x40029054: mov r0, r5
0x40029058: bl 0x40009008

Load first number from the structure and print its value.

0x4002905c: ldr r1, [r4]
0x40029060: mov r0, r5
0x40029064: bl 0x40009008

Load first number from the structure and print its value. This is bug also, should have been “ldr r1, [r4,4]”.

0x40029068: mov r0, #0 ; 0x0
0x4002906c: ldr r4, [sp]
0x40029070: ldr r5, [sp, #4]
0x40029074: ldr r11, [sp, #8]
0x40029078: ldr lr, [sp, #12]
0x4002907c: add sp, sp, #16 ; 0x10
0x40029080: bx lr

And at the end restore registers from the stack and return.

Can somebody confirm that this is a bug? Or am I missing something else here?

Inline the init function: store 11 at the address of the "value" variable,
call printf with the string from r5. This is a bug, should have stored at an
offset of four (str r1, [r4,4]).

Exactly! The IR is correct, the bug seems to be lower down.

I'm no expert in the ARM back-end, though. But your report is detailed
enough to help whoever is. :wink:

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Thanks for confirming this.

Is there anybody with experience with ARM JIT codegen who can take a look into this?
Or can somebody direct where to look how JIT on ARM processes getelementptr instruction?

Hi.
Sorry for bringing this again.
I really need to find source of this bug. Can somebody give address/name of person who knows ARM implementation in LLVM in more details and to whom I could address my questions directly?

Hello

Sorry for bringing this again.
I really need to find source of this bug. Can somebody give address/name of person who knows ARM implementation in LLVM in more details and to whom I could address my questions directly?

Have you at least tried ARM JIT on svn head?

Hi, Anton.
I have tried with sources taken from trunk one or two weeks ago - they have same behavior.
I will now try again with latest sources.