struct bitfield regression between 3.6 and 3.9 (using -O0)

Here’s our testcase:

#include <stdio.h>

struct flags {
unsigned frog: 1;
unsigned foo : 1;
unsigned bar : 1;
unsigned bat : 1;
unsigned baz : 1;
unsigned bam : 1;
};

int main() {
struct flags flags;
flags.bar = 1;
flags.foo = 1;
if (flags.foo == 1) {
printf(“Pass\n”);
return 0;
} else {
printf(“FAIL\n”);
return 1;
}
}

when we compile this using LLVM 3.9 we get the “FAIL” message. However, when we compile in LLVM 3.6 it passes. (this is only an issue with -O0, higher levels of optimization work fine)

After some investigation we discovered the problem, here’s the relevant part of our assembly generated by LVM 3.9:

load r0, r510, 24, 8
slr r0, r0, 1, 8
cmpimm r0, r0, 1, 0, 8, SNE
bitop1 r0, r0, 1<<0, AND, 64
jct .LBB0_2, r0, 0, N
jrel .LBB0_1

Notice the slr (shift logical right) instruction there is shifting to the right 1 position in order to get flags.foo into bit 0 of r0. But the problem is that the compare(cmpimm) is comparing not just the single bit but the whole value in r0 (an 8-bit value) against 1. If we insert a logical AND with ‘1’ to mask r0 just prior to the compare it works fine.

And as it turns out, we see that and in the LLVM IR generated using -O0 and -emit-llvm has the AND included:

%bf.lshr = lshr i8 %bf.load4, 1
%bf.clear5 = and i8 %bf.lshr, 1
%bf.cast = zext i8 %bf.clear5 to i32
%cmp = icmp eq i32 %bf.cast, 1
br i1 %cmp, label %if.then, label %if.else

(compiled with: clang -O0 -emit-llvm -S failing.c -o failing.ll )

I reran passing -debug to llc to see what’s happening at various stages of DAG optimization:

clang -O0 -mllvm -debug -S failing.c -o failing.s

The initial selection DAG has the AND op node:

t22: i8 = srl t19, Constant:i64<1>
t23: i8 = and t22, Constant:i8<1>
t24: i32 = zero_extend t23
t27: i1 = setcc t24, Constant:i32<1>, seteq:ch
t29: i1 = xor t27, Constant:i1<-1>
t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

The Optimized lowered selection DAG does not contain the AND node, but it does have a truncate which would seem to stand in for it given the result is only 1bit wide and the xor following it is operating on 1-bit wide values:

t22: i8 = srl t19, Constant:i64<1>
t35: i1 = truncate t22
t29: i1 = xor t35, Constant:i1<-1>
t31: ch = brcond t18, t29, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

Next we get to the Type-legalized selection DAG:

t22: i8 = srl t19, Constant:i64<1>
t40: i8 = xor t22, Constant:i8<1>
t31: ch = brcond t18, t40, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

The truncate is now gone.

Next we have the Optimzied type-legalized DAG:

t22: i8 = srl t19, Constant:i64<1>
t43: i8 = setcc t22, Constant:i8<1>, setne:ch
t31: ch = brcond t18, t43, BasicBlock:ch<if.else 0xa5f8d48>
t33: ch = br t31, BasicBlock:ch<if.then 0xa5f8c98>

The xor has been replaced with a setcc. The legalized selection DAG is essentially the same. As is the optimized legalized selection DAG.

So if t19 contains 0b00000110 then

t22 contains 0b00000011
setcc then compares t22 with a constant 1 and since they’re not equal (setne) it sets bit 0 of t43.

brcond will then test bit 0 of t43 and since it’s set it branches to the else branch (prints FAIL in this case)

If instead t22 contained 0b00000001 (as would be the case if the mask was still there) the setcc would find both values to compare equal and since setne is specified the branch in brcond will not be taken (the correct behavior)

Things seem to have gone wrong when the Type-legalized selection DAG was optimized and the xor node was changed to a setcc (and actually, the xor seems like it was more optimal than the setcc anyway).

Any ideas about why this is happening?

[in 3.6 we don’t see this issue, but then again, in 3.6 the assembly is a bit different: no srl is used to get at the foo field fo the struct]

Phil

I would suggest starting with DAGTypeLegalizer::PromoteIntOp_BRCOND, I think… -Eli

Given that this is compiled with -O0, would there a way to skip the Optimization of the Type-legalized selection DAG? It’s fine until it optimizes the Type-legalized selection DAG into the Optimized Type-legalized selection DAG.

Phil

Umm, I wouldn't really suggest shoving the problem under the rug... I mean, turning off the optimization might make this particular testcase work the way you want it to, but the problem will still be lurking, waiting to be triggered by a different configuration.

There are patches floating around to turn off DAGCombine, and various parts of it, at -O0; you should be able to find past email threads on llvmdev discussing it. IIRC it causes problems for various targets because it exercises different codepaths.

-Eli

Given that this is compiled with -O0, would there a way to skip the
Optimization of the Type-legalized selection DAG? It's fine until it
optimizes the Type-legalized selection DAG into the Optimized
Type-legalized selection DAG.

Umm, I wouldn't really suggest shoving the problem under the rug... I
mean, turning off the optimization might make this particular testcase work
the way you want it to, but the problem will still be lurking, waiting to
be triggered by a different configuration.

Possibly, but this testcase is based on distilling some larger libraries
which we found to be failing into this testcase. And we need to move
forward with being able to compile those larger libraries correctly by year
end (they need them with -O0 -g ie. a debug build otherwise I'd tell them
to just compile with -O2 which works).

Since this is happening after the Type-legalized selection DAG is optimized
and prior to Instruction Selection I'm thinking this is an upstream LLVM
bug and thus won't be fixed in time for us to get done what we need to get
done. Of course I could be wrong about this assessment, how might this be a
target-specific bug?

There are patches floating around to turn off DAGCombine, and various
parts of it, at -O0; you should be able to find past email threads on
llvmdev discussing it. IIRC it causes problems for various targets because
it exercises different codepaths.

I suspect this is probably the best short-term solution to turn off
DAGCombine if -O0 is specified.

Phil

Is this