Evan & llvmdev,
I’m seeing a case where ARM Load/Store optimizer is breaking code. I have not had any luck trying to come up with a minimal example; it is breaking in our stage 2 LLVM build.
But here’s what I’m seeing in the debug output:
Before ARMLoadStoreOptimizer:
BB#21: derived from LLVM BB %cond.end
Live Ins: %LR %R0 %R1 %R7 %R10 %R11
Predecessors according to CFG: BB#14 BB#18
STRi12 %R7, %R1, 0, pred:14, pred:%noreg; mem:ST4%first257
%R1 = ADDri %R1, 4, pred:14, pred:%noreg, opt:%CPSR
Bcc <BB#23>, pred:0, pred:%CPSR
B <BB#22>
Successors according to CFG: BB#23 BB#22
After ARMLoadStoreOptimizer:
BB#21: derived from LLVM BB %cond.end
Live Ins: %LR %R0 %R1 %R7 %R10 %R11
Predecessors according to CFG: BB#14 BB#18
%R1 = STR_POST_IMM %R7, %R1, %noreg, 4, pred:14, pred:%noreg
Bcc <BB#23>, pred:0, pred:%CPSR
B <BB#22>
Successors according to CFG: BB#23 BB#22
It appears that the ARM Load/Store optimizer has rolled the ADDri and STRi12 into the STR_POST_IMM, but has ignored the fact that ADDri sets CPSR (which is used by the following Bcc), whereas STR_POST_IMM does not set CPSR.
Hello David,
I'm seeing a case where ARM Load/Store optimizer is breaking code. I have
not had any luck trying to come up with a minimal example; it is breaking in
our stage 2 LLVM build.
Still, *any* testcase is better than no testcase 
Anton,
I’m afraid I really can’t produce a meaningful example. The bug is extremely sensitive to code placement, optimization. I had to do a terrible amount of drugdery to find it in the first place.
Here’s how I found the bug:
- Stage 1: Compile LLVM with build/host x86, target ARM.
- Stage 2: Cross-compile LLVM with host ARM, target ARM, using the stage 1 Clang/LLVM.
- Use the stage 2 LLVM (in an ARM emulator) to compile an application, Foo.
- Run Foo.
- Foo malfunctions.
To find the bug, I had to do the following:
-
Notice that Foo does not malfunction when compiled using the stage 1 LLVM.
-
Diff between Foo.s generated by stage 1 LLVM and stage 2 LLVM. This made it obvious which instructions in Foo.s were causing the malfunction.
-
Determine why the stage 2 LLVM produced those instructions, instead of the correct ones.
-
Find the instructions (in the LLVM stage 2 executable) which are causing the incorrect behavior.
-
Determine why the stage 1 LLVM emitted these instructions, instead of the correct ones.
My search ended at the ARM Load/Store rewrite pass. By scanning the debug output by hand, I determined that the bad code appeared during this pass. After I disabled this pass, all bugs in the stage 2 build went away.
Although, I can’t produce a reasonable example, I think it should be obvious based on the debug output I gave above what is wrong. I did a quick review of the code in the ARM Load/store pass, and it seems to never take into account the liveness state of CPSR after ADDri. Please let me know if I’m mistaken in this assessment.
Thanks,
David,
I'm afraid I really can't produce a meaningful example. The bug is extremely
sensitive to code placement, optimization. I had to do a terrible amount of
drugdery to find it in the first place.
Been there, done that 
6) Determine why the stage 1 LLVM emitted these instructions, instead of the
correct ones.
Aha, ok... perfectly makes sense to me!
I've committed a fix: r149970. Please try it. I would really appreciate it if you can provide us with a test case (unreduced test case is fine).
Evan
Evan,
A test case is extremely hard to pin down. For months now, we’ve noticed our stage 2 LLVM ARM build has sporadic failures. Tests would start failing, then start working, then start failing, etc, for no apparent reason.
The test case I have (llc.bc, which is all of llc in bitcode form, 44.8 MB), only works against r149814. And in this case, there are only 2 cases of the miscompile occurring in the 93 MB .s file. I cannot reproduce the failure in ToT LLVM using the same bitcode file, or any other bitcode I have. All it takes is a slight bit of code change for the schedule to completely change, and suddenly the problematic instruction sequence no longer exists.
And even if I give you this bitcode file, you won’t be able to run it to evaluate its functionality, because this bitcode file is targeting OS=NativeClient. (and it uses a private syscall interface that only works inside the NaCl environment in a particular context).
Do any public LLVM buildbots (internal or external) do a full three-stage ARM build? We do a two stage build, followed by rebuilding our entire system/universe.
Evan,
Thanks for the fix. I will apply it to our tree in a day or two, and let you know how it goes. It may take a couple of weeks before I can declare a full victory (which would be that all the sporadic flakiness has gone away).
Thanks!