ARMLoadStoreOptimizer bug

Evan & llvmdev,

I’m seeing a case where ARM Load/Store optimizer is breaking code. I have not had any luck trying to come up with a minimal example; it is breaking in our stage 2 LLVM build.

But here’s what I’m seeing in the debug output:

Before ARMLoadStoreOptimizer:

BB#21: derived from LLVM BB %cond.end

Live Ins: %LR %R0 %R1 %R7 %R10 %R11
Predecessors according to CFG: BB#14 BB#18
STRi12 %R7, %R1, 0, pred:14, pred:%noreg; mem:ST4%first257
%R1 = ADDri %R1, 4, pred:14, pred:%noreg, opt:%CPSR
Bcc <BB#23>, pred:0, pred:%CPSR
B <BB#22>
Successors according to CFG: BB#23 BB#22

After ARMLoadStoreOptimizer:

BB#21: derived from LLVM BB %cond.end

Live Ins: %LR %R0 %R1 %R7 %R10 %R11
Predecessors according to CFG: BB#14 BB#18
%R1 = STR_POST_IMM %R7, %R1, %noreg, 4, pred:14, pred:%noreg
Bcc <BB#23>, pred:0, pred:%CPSR
B <BB#22>
Successors according to CFG: BB#23 BB#22

It appears that the ARM Load/Store optimizer has rolled the ADDri and STRi12 into the STR_POST_IMM, but has ignored the fact that ADDri sets CPSR (which is used by the following Bcc), whereas STR_POST_IMM does not set CPSR.

  • pdox

Hello David,

I'm seeing a case where ARM Load/Store optimizer is breaking code. I have
not had any luck trying to come up with a minimal example; it is breaking in
our stage 2 LLVM build.

Still, *any* testcase is better than no testcase :slight_smile:

Anton,

I’m afraid I really can’t produce a meaningful example. The bug is extremely sensitive to code placement, optimization. I had to do a terrible amount of drugdery to find it in the first place.

Here’s how I found the bug:

  1. Stage 1: Compile LLVM with build/host x86, target ARM.
  2. Stage 2: Cross-compile LLVM with host ARM, target ARM, using the stage 1 Clang/LLVM.
  3. Use the stage 2 LLVM (in an ARM emulator) to compile an application, Foo.
  4. Run Foo.
  5. Foo malfunctions.

To find the bug, I had to do the following:

  1. Notice that Foo does not malfunction when compiled using the stage 1 LLVM.

  2. Diff between Foo.s generated by stage 1 LLVM and stage 2 LLVM. This made it obvious which instructions in Foo.s were causing the malfunction.

  3. Determine why the stage 2 LLVM produced those instructions, instead of the correct ones.

  4. Find the instructions (in the LLVM stage 2 executable) which are causing the incorrect behavior.

  5. Determine why the stage 1 LLVM emitted these instructions, instead of the correct ones.

My search ended at the ARM Load/Store rewrite pass. By scanning the debug output by hand, I determined that the bad code appeared during this pass. After I disabled this pass, all bugs in the stage 2 build went away.

Although, I can’t produce a reasonable example, I think it should be obvious based on the debug output I gave above what is wrong. I did a quick review of the code in the ARM Load/store pass, and it seems to never take into account the liveness state of CPSR after ADDri. Please let me know if I’m mistaken in this assessment.

Thanks,

  • pdox

David,

I'm afraid I really can't produce a meaningful example. The bug is extremely
sensitive to code placement, optimization. I had to do a terrible amount of
drugdery to find it in the first place.

Been there, done that :slight_smile:

6) Determine why the stage 1 LLVM emitted these instructions, instead of the
correct ones.

Aha, ok... perfectly makes sense to me!

I've committed a fix: r149970. Please try it. I would really appreciate it if you can provide us with a test case (unreduced test case is fine).

Evan

Evan,

A test case is extremely hard to pin down. For months now, we’ve noticed our stage 2 LLVM ARM build has sporadic failures. Tests would start failing, then start working, then start failing, etc, for no apparent reason.

The test case I have (llc.bc, which is all of llc in bitcode form, 44.8 MB), only works against r149814. And in this case, there are only 2 cases of the miscompile occurring in the 93 MB .s file. I cannot reproduce the failure in ToT LLVM using the same bitcode file, or any other bitcode I have. All it takes is a slight bit of code change for the schedule to completely change, and suddenly the problematic instruction sequence no longer exists.

And even if I give you this bitcode file, you won’t be able to run it to evaluate its functionality, because this bitcode file is targeting OS=NativeClient. (and it uses a private syscall interface that only works inside the NaCl environment in a particular context).

Do any public LLVM buildbots (internal or external) do a full three-stage ARM build? We do a two stage build, followed by rebuilding our entire system/universe.

  • pdox

Evan,

Thanks for the fix. I will apply it to our tree in a day or two, and let you know how it goes. It may take a couple of weeks before I can declare a full victory (which would be that all the sporadic flakiness has gone away).

Thanks!

  • pdox