Here’s our testcase:
#include <stdio.h>
struct flags {
unsigned frog: 1;
unsigned foo : 1;
unsigned bar : 1;
unsigned bat : 1;
unsigned baz : 1;
unsigned bam : 1;
};
int main() {
struct flags flags;
flags.bar = 1;
flags.foo = 1;
if (flags.foo == 1) {
printf(“Pass\n”);
return 0;
} else {
printf(“FAIL\n”);
return 1;
}
}
when we compile this using LLVM 3.9 we get the “FAIL” message. However, when we compile in LLVM 3.6 it passes. (this is only an issue with -O0, higher levels of optimization work fine)
After some investigation we discovered the problem, here’s the relevant part of our assembly generated by LVM 3.9:
load r0, r510, 24, 8
slr r0, r0, 1, 8
cmpimm r0, r0, 1, 0, 8, SNE
bitop1 r0, r0, 1<<0, AND, 64
jct .LBB0_2, r0, 0, N
jrel .LBB0_1
Notice the slr (shift logical right) instruction there is shifting to the right 1 position in order to get flags.foo into bit 0 of r0. But the problem is that the compare(cmpimm) is comparing not just the single bit but the whole value in r0 (an 8-bit value) against 1. If we insert a logical AND with ‘1’ to mask r0 just prior to the compare it works fine.
And as it turns out, we see that and in the LLVM IR generated using -O0 and -emit-llvm has the AND included:
…
%bf.lshr = lshr i8 %bf.load4, 1
%bf.clear5 = and i8 %bf.lshr, 1
%bf.cast = zext i8 %bf.clear5 to i32
%cmp = icmp eq i32 %bf.cast, 1
br i1 %cmp, label %if.then, label %if.else
(compiled with: clang -O0 -emit-llvm -S failing.c -o failing.ll )
I reran passing -debug to llc to see what’s happening at various stages of DAG optimization:
clang -O0 -mllvm -debug -S failing.c -o failing.s
The initial selection DAG has the AND op node:
t22: i8 = srl t19, Constant:i64<1>
t23: i8 = and t22, Constant:i8<1>
t24: i32 = zero_extend t23
t27: i1 = setcc t24, Constant:i32<1>, seteq:ch
t29: i1 = xor t27, Constant:i1<-1>
t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>
The Optimized lowered selection DAG does not contain the AND node, but it does have a truncate which would seem to stand in for it given the result is only 1bit wide and the xor following it is operating on 1-bit wide values:
t22: i8 = srl t19, Constant:i64<1>
t35: i1 = truncate t22
t29: i1 = xor t35, Constant:i1<-1>
t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>
Next we get to the Type-legalized selection DAG:
t22: i8 = srl t19, Constant:i64<1>
t40: i8 = xor t22, Constant:i8<1>
t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>
The truncate is now gone.
Next we have the Optimzied type-legalized DAG:
t22: i8 = srl t19, Constant:i64<1>
t43: i8 = setcc t22, Constant:i8<1>, setne:ch
t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>
The xor has been replaced with a setcc. The legalized selection DAG is essentially the same. As is the optimized legalized selection DAG.
So if t19 contains 0b00000110 then
t22 contains 0b00000011
setcc then compares t22 with a constant 1 and since they’re not equal (setne) it sets bit 0 of t43.
brcond will then test bit 0 of t43 and since it’s set it branches to the else branch (prints FAIL in this case)
If instead t22 contained 0b00000001 (as would be the case if the mask was still there) the setcc would find both values to compare equal and since setne is specified the branch in brcond will not be taken (the correct behavior)
Things seem to have gone wrong when the Type-legalized selection DAG was optimized and the xor node was changed to a setcc (and actually, the xor seems like it was more optimal than the setcc anyway).
Any ideas about why this is happening?
[in 3.6 we don’t see this issue, but then again, in 3.6 the assembly is a bit different: no srl is used to get at the foo field fo the struct]
Phil