Incorrect loop optimization when building the Linux kernel

I was trying to build the Linux kernel with clang and observed a crash due to incorrect loop optimization:

drivers/base/firmware_class.c

extern struct builtin_fw __start_builtin_fw[];
extern struct builtin_fw __end_builtin_fw[];

static bool fw_get_builtin_firmware(struct firmware *fw, const char *name)
{
  struct builtin_fw *b_fw;
  for (b_fw = __start_builtin_fw; b_fw != __end_builtin_fw; b_fw++) {
    if (strcmp(name, b_fw->name) == 0) {
      fw->size = b_fw->size;
      fw->data = b_fw->data;
      return true;
    }
  }
  return false;
}

When compiled with -O0, the comparison (b_fw != __end_builtin_fw) is executed before loop body, which is the expected behavior.

But with -O1 optimization (opt -O1), the comparison is moved to the end of the loop, which causes the strcmp to use incorrect argument and finally crashes the kernel.

define internal fastcc i1 @fw_get_builtin_firmware(%struct.firmware* nocapture %fw, i8* nocapture readonly %name) #0 {
  tail call void @llvm.dbg.value(metadata !{%struct.firmware* %fw}, i64 0, metadata !3985, metadata !3750), !dbg !3986
  tail call void @llvm.dbg.value(metadata !{i8* %name}, i64 0, metadata !3987, metadata !3750), !dbg !3988
  tail call void @llvm.dbg.value(metadata !3839, i64 0, metadata !3989, metadata !3750), !dbg !3990
  br label %1, !dbg !3991

; <label>:1 ; preds = %13, %0
  %b_fw.02 = phi %struct.builtin_fw* [ getelementptr inbounds ([0 x %struct.builtin_fw]* @__start_builtin_fw, i64 0, i64 0), %0 ], [ %14, %13 ]
  %2 = getelementptr inbounds %struct.builtin_fw* %b_fw.02, i64 0, i32 0, !dbg !3995
  %3 = load i8** %2, align 8, !dbg !3995
  %4 = tail call i32 @strcmp(i8* %name, i8* %3) #8, !dbg !3995
  %5 = icmp eq i32 %4, 0, !dbg !3995
  br i1 %5, label %6, label %13, !dbg !3995

; <label>:6 ; preds = %1
  %b_fw.02.lcssa = phi %struct.builtin_fw* [ %b_fw.02, %1 ]
  %7 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 2, !dbg !3999
  %8 = load i64* %7, align 8, !dbg !3999
  %9 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 0, !dbg !3999
  store i64 %8, i64* %9, align 8, !dbg !3999
  %10 = getelementptr inbounds %struct.builtin_fw* %b_fw.02.lcssa, i64 0, i32 1, !dbg !4001
  %11 = load i8** %10, align 8, !dbg !4001
  %12 = getelementptr inbounds %struct.firmware* %fw, i64 0, i32 1, !dbg !4001
  store i8* %11, i8** %12, align 8, !dbg !4001
  br label %.loopexit, !dbg !4002

; <label>:13 ; preds = %1
  %14 = getelementptr %struct.builtin_fw* %b_fw.02, i64 1, !dbg !400
  tail call void @llvm.dbg.value(metadata !{%struct.builtin_fw* %14}, i64 0, metadata !3989, metadata !3750), !dbg !3990
  %15 = icmp eq %struct.builtin_fw* %14, getelementptr inbounds ([0 x %struct.builtin_fw]* @__end_builtin_fw, i64 0, i64 0), !dbg !3991
  br i1 %15, label %.loopexit.loopexit, label %1, !dbg !3991

.loopexit.loopexit: ; preds = %13
  br label %.loopexit

.loopexit: ; preds = %.loopexit.loopexit, %6
  %.0 = phi i1 [ true, %6 ], [ false, %.loopexit.loopexit ]
  ret i1 %.0, !dbg !4004
}

Could anyone explain why this is unhappening and how to avoid it.

Thanks,
Chengyu

Could anyone explain why this is unhappening and how to avoid it.

It's difficult to say without a full example, but I'm very suspicious
of those global declarations. I think the compiler would be entirely
justified in assuming you could *never* get from __start_builtin_fw to
__end_builtin_fw, let alone on the first iteration: they're distinct
array objects and by definition (within C99) can't overlap.

Trouble is, any perturbation of that code would change things so
drastically that it's going to be difficult to prove that's the cause
without digging into the exact LLVM pass that does the transformation
to see what it's basing the decision on.

Cheers.

Tim.

LLVM seems to assume that two non-weak declarations can never point to
the same thing (so "__start_builtin_fw == __end_builtin_fw" is always
false). In fact, foo in the following program constant folds to
return false (with a warning from clang):

struct F {};
extern struct F a[];
extern struct F b[];

int foo() {
  return a == b;
}

I don't know enough about the difference between different linkage
types to say if llvm's behavior is reasonable or not.

One place where this property is exploited (there are probably others):

(llvm/lib/IR/ConstantFold.cpp)

static ICmpInst::Predicate areGlobalsPotentiallyEqual(const GlobalValue *GV1,
                                                      const GlobalValue *GV2) {
  // Don't try to decide equality of aliases.
  if (!isa<GlobalAlias>(GV1) && !isa<GlobalAlias>(GV2))
    if (!GV1->hasExternalWeakLinkage() || !GV2->hasExternalWeakLinkage())
      return ICmpInst::ICMP_NE;
  return ICmpInst::BAD_ICMP_PREDICATE;
}

It's difficult to say without a full example, but I'm very suspicious
of those global declarations. I think the compiler would be entirely
justified in assuming you could *never* get from __start_builtin_fw to
__end_builtin_fw, let alone on the first iteration: they're distinct
array objects and by definition (within C99) can't overlap.

I think this should be the root cause. Once I changed the declaration to pointers (instead of arrays):

extern struct builtin_fw* __start_builtin_fw;
extern struct builtin_fw* __end_builtin_fw;

The generated code will not skip the first comparison.

Sadly, Linux kernel may contain more such declarations.

Thanks a lot!
Chengyu

Chengyu Song wrote:

It's difficult to say without a full example, but I'm very suspicious
of those global declarations. I think the compiler would be entirely
justified in assuming you could *never* get from __start_builtin_fw to
__end_builtin_fw, let alone on the first iteration: they're distinct
array objects and by definition (within C99) can't overlap.

I think this should be the root cause. Once I changed the declaration to pointers (instead of arrays):

extern struct builtin_fw* __start_builtin_fw;
extern struct builtin_fw* __end_builtin_fw;

The generated code will not skip the first comparison.

Sadly, Linux kernel may contain more such declarations.

That's a great point. Maybe clang could issue a warning on these, any for loop that declares a pointer to the address of a non-weak object, compares it to a different object, and increments only the pointer (without doing other changes to the pointer inside the loop body). We know that's a buggy pattern. Could you file a feature request for this on llvm.org/bugs ?

Nick

Nick, consider:

char a[0];
char b[0];

at run-time:

(gdb) p &a
$1 = (char ()[]) 0x60103c
(gdb) p &b
$2 = (char (
)[]) 0x60103c

Even if this is not safe at the C or C++ level, this comparison could return equal or not equal depending on what the linker chooses to do.

Do we have a bug in the constant folder?

It is invalid C as is the original code. If you want to play such linker
games, mark one of the objects involved as weak.

Joerg

Nick, consider:

char a[0];
char b[0];

at run-time:
(gdb) p &a
$1 = (char ()[]) 0x60103c
(gdb) p &b
$2 = (char (
)[]) 0x60103c

Even if this is not safe at the C or C++ level, this comparison could
return equal or not equal depending on what the linker chooses to do.

It is invalid C as is the original code. If you want to play such linker
games, mark one of the objects involved as weak.

Right, but nothing forbids this comparison at the IR level.

How is detecting the "buggy" pattern helpful to the user? The linker has
provided them with the interface of __start_my_section and __end_my_section
symbols, and they are using it in the most obvious way possible. I have
personally written this pattern before. I can't think of any other way to
use this interface.

What will the user do if the compiler tells them their code is wrong and
optimizes it away? They will probably just mutilate the code until the
compiler fails to warn and optimize on this case. We need something better.

We need a way to express iteration over the concatenated contents of a
section in LLVM IR that has a logical lowering from C. We should either
adopt the GNU binutils convention of special __start_* / __end_* symbols,
or invent a new representation that is compelling.

Joerg suggested adding 'weak' as a workaround, but it doesn't seem
particularly compelling to me:

extern void *__start_my_ptr_section[];
extern void *__end_my_ptr_section[] __attribute__((weak)); // I'm assuming
Joerg implied the attribute goes here...
void iterate_my_section() {
  for (void **i = __start_my_ptr_section; i != __end_my_ptr_section; i++) {
    use_ptr(*i);
  }
}

LLVM has had the concept of "appending" globals for a very long time, but
it doesn't have an actual object file lowering. It also provides no
mechanism for iteration, because it is impossible to determine the ending
point of the concatenated array after linking.

I’m pretty sure this is fixed in r223684. This particular use of zero-sized arrays should defeat any chance of compile-time address equality.

However, this commit does not close any open questions about the validity of the source.

I object that change. It's a horrible special case hack for either a
fundamental issue in the IR or plain UB on the source level. As such, it
should be reverted.

Joerg

The problem is that the interface depends on aliasing symbols. The
interface should provide pointers to the first and the last (or
the one-beyond-last) element. That would be semantically fine for C.

That can also be expressed already e.g. like

extern char __start_my_ptr_section, __end_my_ptr_section;
void *__start = &__start_my_ptr_section, *end =
&__end_my_ptr_section;

and "only" depending on the inability to compute the pointer value at
compile time.

Joerg

From: "Joerg Sonnenberger" <joerg@britannica.bec.de>
To: "David Majnemer" <david.majnemer@gmail.com>
Cc: "Chengyu Song" <csong84@gatech.edu>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, December 8, 2014 2:22:47 PM
Subject: Re: [LLVMdev] Incorrect loop optimization when building the Linux kernel

> I'm pretty sure this is fixed in r223684. This particular use of
> zero-sized arrays should defeat any chance of compile-time address
> equality.

I object that change. It's a horrible special case hack for either a
fundamental issue in the IR or plain UB on the source level.

I don't understand why you feel this way. It is a special case, yes, but zero-sized types often induce special cases. The fact that we need to apply this to opaque types is somewhat unfortunate, but regardless, the fact that a zero-sized object can be assigned an address overlapping with another object just seems like a fact of which the constant folder needs to be aware.

-Hal

> It's difficult to say without a full example, but I'm very suspicious
> of those global declarations. I think the compiler would be entirely
> justified in assuming you could *never* get from __start_builtin_fw to
> __end_builtin_fw, let alone on the first iteration: they're distinct
> array objects and by definition (within C99) can't overlap.

I think this should be the root cause. Once I changed the declaration to
pointers (instead of arrays):

extern struct builtin_fw* __start_builtin_fw;
extern struct builtin_fw* __end_builtin_fw;

The generated code will not skip the first comparison.

I don't think this code means the same thing. AFAIK this code says that
there is a pointer (to struct builtin_fw) in memory at __start_builtin_fw,
whereas the original code says that there is a `struct builtin_fw` (or zero
of them) in memory at __start_builtin_fw.

-- Sean Silva

There is a difference between zero-sized types and objects. For
zero-sized types, address operations are generally useless. There is no
object identity of any meaningful way. Zero-sized arrays share the same
property and that's why many languages declare them as invalid.

So back to the point -- either IR has to explicitly change to attach
semantics beyond C's "Don't do that" to zero-sized arrays and other
object OR the commit is plainly wrong.

Joerg