Arm generation of 32bit load following 8bit store to stack at -O0

I observe what appears (to me) to be incorrect code generation when compiling the following test case for Arm Cortex-m0plus at -O0:

type char  myArray[6];

void set_byte(char value) {
    myArray[0] = value;

I observe that an 8bit store strb is (correctly) used to store the value on the stack (PC 0x4 below), but a 32bit load LDR is used to load the value back from the same location (PC 0x6 below):

       0: 81 b0         sub     sp, #4
       2: 69 46         mov     r1, sp
       4: 08 70         strb    r0, [r1]
       6: 00 98         ldr     r0, [sp]
       8: 01 49         ldr     r1, [pc, #4]
       a: 08 70         strb    r0, [r1]
       c: 01 b0         add     sp, #4
       e: 70 47         bx      lr

Since the 32bit load is mismatched from the 8bit store, this can yield a invalid parity condition and hard fault since the top bits of that 32bit value that is loaded were never actually written to.

I’d like to figure out why the compiler thinks this is OK to do? I can’t find any outstanding defects pertaining to this.


-Alan Phipps

LLVM just doesn’t model memory that may need to be written before it’s read as far as I know, and that’s probably OK.

In this particular example using a 32-bit load saves an extra mov since there’s no sp-based 8-bit load on a Cortex-M0+. Not that you necessarily care about that at -O0, but in general it doesn’t take many examples like that to add up to a bigger cost than an early reset loop that zeroes RAM.

Thanks for your response! It looks like a commit was added in 2015 to improve stack accesses in Thumb1 for precisely the reason you state (commit b9887ef32a5d06108dfabbbe181bd8e4ea7abbfe).

+// extload from the stack -> word load from the stack, as it avoids having to
+// materialize the base in a separate register. This only works when a word
+// load puts the byte/halfword value in the same place in the register that the
+// byte/halfword load would, i.e. when little-endian.
+def : T1Pat<(extloadi1  t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
+      Requires<[IsThumb, IsThumb1Only, IsLE]>;
+def : T1Pat<(extloadi8  t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
+      Requires<[IsThumb, IsThumb1Only, IsLE]>;
+def : T1Pat<(extloadi16 t_addrmode_sp:$addr), (tLDRspi t_addrmode_sp:$addr)>,
+      Requires<[IsThumb, IsThumb1Only, IsLE]>;

Removing these lines yields the expected behavior (STRB followed by LDRB).

I observed that GCC does not make the same improvement. Yes, the improvement makes sense, but as I mentioned, reading uninitialized memory can be a problem for some devices when parity checking is done.


In general, LLVM doesn’t guarantee it won’t generate loads from uninitialized memory. (gcc is similar, as far as I know.) They can happen for a variety of reasons, including padding in variables of struct type and load hoisting/sinking optimizations. Hacking the compiler so it never implicitly introduces loads like that might be possible, but it’s probably a lot of work. And you’d still have issues with user code that explicitly contains such loads.

If you want to be safe, just initialize your stack and heap. This isn’t worth chasing.

1 Like