interesting possible compiler bug

This code is essentially from an LTP test ( http://ltp.sourceforge.net/ ).

#include <stdlib.h>

int main() {
void *curr;

do {
curr = malloc(1);
} while (curr);

return 0;

}

If you compile it with no optimization, it will keep the malloc calls.

If you compile it with -O2, it will create an infinite loop, i.e. assuming that malloc always returns a non zero result and the result is not used.

~/llvmpb/install/bin/clang loop.c -O2 -S

.file “loop.c”
.text
.globl main
.align 16, 0x90
.type main,@function
main: # @main
.cfi_startproc

BB#0: # %entry

.align 16, 0x90
.LBB0_1: # %do.body

=>This Inner Loop Header: Depth=1

jmp .LBB0_1
.Ltmp0:
.size main, .Ltmp0-main
.cfi_endproc

.section “.note.GNU-stack”,“”,@progbits

This code is essentially from an LTP test ( http://ltp.sourceforge.net/ ).

#include <stdlib.h>

int main() {
void *curr;

do {
curr = malloc(1);
} while (curr);

return 0;

}

If you compile it with no optimization, it will keep the malloc calls.

If you compile it with -O2, it will create an infinite loop, i.e. assuming that malloc always returns a non zero result and the result is not used.

As far as I know, this optimization is legal. Fix the test with a volatile pointer:

int main() {
volatile char *curr;

do {
curr = malloc(1);
int i = *curr;
(void)i;
} while (curr);

return 0;
}

which produces:

pushq %rax

.Ltmp1:

movl $1, %edi

callq malloc
movb (%rax), %cl
testq %rax, %rax
jne .LBB0_1

BB#2: # %do.end

xorl %eax, %eax
popq %rdx
ret

Oh how people don’t appreciate the luxury of having an infinite memory machine!

Nick

As far as I know, this optimization is legal. Fix the test with a volatile pointer:

Why would that be required? malloc() is defined by the standard to return a pointer that is distinct from any other valid pointer, or NULL. Any optimisation that makes any assumptions about its next value is invalid.

int main() {
  volatile char *curr;

  do {
    curr = malloc(1);
    int i = *curr;

This, in particular, looks very wrong. If curr is void, then you are dereferencing an invalid pointer, and so you are doing something undefined. In fact, this version of the code is completely free to elide the conditional loop, because by dereferencing the pointer you are asserting that it is not NULL (or, at least, that if it is then after this point the program is in an undefined state and so any behaviour is legal) and so it is completely free to generate the code that it in fact does generate without this test. So here we have another bug, because the testq in your output is redundant after the movb.

David

David Chisnall wrote:

As far as I know, this optimization is legal. Fix the test with a volatile pointer:

Why would that be required?

It isn't. My suggestion for fixing the test is to make use of the returned pointer in a fashion that the compiler is forbidden to elide it: make it an observable side-effect using volatile. Another way would be to take the pointer and print it to file or stdout. Perhaps you can think of others.

   malloc() is defined by the standard to return a pointer that is distinct from any other valid pointer, or NULL. Any optimisation that makes any assumptions about its next value is invalid.

Nowhere do we make assumptions about malloc's next value.

This is a straight-forward application of the as-if rule. The malloc call behaves as-if it allocated memory. Because we prove that the code doesn't use that memory, we can get away with allocating no memory at all and not change the behaviour of the program.

But we did change the behaviour of the program, didn't we? Well, we haven't changed the behaviour of the program in any way that is observable through a well-defined mechanism. Crashes, running out of memory, or asking the kernel for your processes' memory usage isn't behaviour you get to rely on. The first two really aren't defined in the language, and the last one goes through I/O which is permitted to do its own thing. (For instance, we don't forbid constant-folding "1+1" because a program may fopen and disassemble itself to look for the "add 1, 1" instruction.)

int main() {
   volatile char *curr;

   do {
     curr = malloc(1);
     int i = *curr;

This, in particular, looks very wrong. If curr is void, then you are dereferencing an invalid pointer, and so you are doing something undefined.

Do you mean, if curr is NULL? It's a char*, not void*.

In fact, this version of the code is completely free to elide the conditional loop, because by dereferencing the pointer you are asserting that it is not NULL (or, at least, that if it is then after this point the program is in an undefined state and so any behaviour is legal) and so it is completely free to generate the code that it in fact does generate without this test. So here we have another bug, because the testq in your output is redundant after the movb.

Yes, good point, I totally missed the NULL dereference. I haven't checked what happens if you write:

   curr = malloc(1);
   if (curr)
     int i = *curr;

but I expect that would work.

Nick

I've sent this issue to some friends on the C++ committee.

The first response I received indicates that the person thinks the optimizer is within it's rights to optimize away the call to malloc.

Here are some points, extracted from the response:

"There is no observable behavior (a technical term in the standard) that violates requirements of the standard. In particular, malloc in the abstract machine defined by the standard need not ever fail."

"On linux in particular, malloc (almost) never fails, because Linux does not actually make malloc'd memory available until it is used. Here the allocated memory is never used, so if the compiler recognizes malloc as a standard library function with well-defined semantics, it can eliminate the actual call and pretend it succeeded.

Since variable curr is not visible outside main and is not declared volatile, it can be optimized away."

See also this proposal for the next C++ committee meeting, which aims to clarify this case and others like it:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3433.html

My friend sent me the first days worth of discussion on this topic from the C committee discussion list and it was very long with many contradictory opinions and citing evidence from a variety of sources.

I don't feel comfortable posting the discussion because it was not from a public forum.

But I can say there was far from any consensus regarding this issue I originally raised.

    This code is essentially from an LTP test (

    > http://ltp.sourceforge.net/ ).

    > #include <stdlib.h>
    > int main() { void *curr;
    > do { curr = malloc(1); } while (curr);
    > return 0;
    > }

    > If you compile it with no optimization, it will keep the
    > malloc calls.

    > If you compile it with -O2, it will create an infinite
    > loop, i.e. assuming that malloc always returns a non zero
    > result and the result is not used.

    > As far as I know, this optimization is legal.

Well, indeed the result is used by the while and we have a control
dependence on the future execution. The termination or not of the
program is a side effect from the user perspective.

   [...]

    > Oh how people don't appreciate the luxury of having an
    > infinite memory machine!

You have also a potentially infinite loop, which is another kind of
luxury. More boring, especially on the end. :slight_smile:

Even if you have a lazy allocation as with the default libC on Linux
that allocates the memory pages only on demand, the memory will blow
up just because the malloc() library itself has to track the allocations
for a potential future free(). Of course, if you have an infinite
memory, you can avoid implementing free() or your interprocedural
analysis may prove you do not call free() in your program. :slight_smile:

This makes me feel uncomfortable to have this kind of optimizations with
too many hypothesis on the run-time, even if it addresses domains not
well captured by the norm...

How to know about the real target? It may be another allocation library,
some LD_PRELOAD hack to insert memory verification code or at the end
the code is to be run inside a tool like valgrind to track this very
malloc() bug that would be hidden by this optimization... :frowning:

Just to give an example from another compiler, in the PIPS
source-to-source compiler the side effect of malloc() is taken into
account by registering a write effect on an abstract location which is
tested later by some optimization phases to avoid too much liberal
optimizations, like the one above.

Anyway, this is an interesting limit case to discuss tonight at the LLVM
Bay-area Social. :slight_smile:

They have been discussing this for a solid two days on the C committee list with many pro and con opinions, all supported by various parts of the standard or other sources.

My personal opinion is that it's only a bug if some real program can be made to fail because of this.

If I were writing this test case for LPT, I would call an external function and pass the value curr to it.

If I was really paranoid about compiling it with an -O4 compiler, I would write an assembly language program.

Back in the day, in the ACVS (Ada compiler validation suite), they used to call functions that were noops but that had some complex recursion that no compiler could unwind and realise that it was a noop. So that is probably the safest. I can't remember anymore the exact form of these recursive functions in the suite as it was 28 years ago that I worked with that suite.

For what it’s worth, I will be at the social tonight, and am happy to discuss this paper. I’m a co-author and the instigator the work here to clarify the spec and allow a large number of exciting optimizations on code that have previously not be employed merely because of the use of malloc rather than (for example) alloca.

I'll be at the social tonight wearing my black Institute of Martial Arts t-shirt.

It's not legal.
1. malloc is not known to always return a non-null pointer.
2. malloc has side-effects and hence cannot be eliminated.

-K

Are you saying that we do this kind of an analysis at -O2? This could only work if the entire program was contained in a single source file, and even then it would require interprocedural analysis to get it right.

-K

The system malloc isn't, but the compiler can change any given call to
malloc to call an implementation which is known to return a non-null
pointer.

-Eli

Do we do that in LLVM? That would be surprising... Optimizing calls to malloc (like memory pooling for example) is not a trivial thing to do, and it requires a fairly strong interprocedural analysis. The fact that this seems to happen with LLVM at -O2 looks more like a bug than a clever optimization.

-K

The system malloc isn't, but the compiler can change any given call to
malloc to call an implementation which is known to return a non-null
pointer.

Do we do that in LLVM? That would be surprising... Optimizing calls to
malloc (like memory pooling for example) is not a trivial thing to do, and
it requires a fairly strong interprocedural analysis.

Why would it require that? If you can see that the malloc doesn't
escape you can, in such simple cases, simply move the data from malloc
memory into an alloca instead.

You're right about moving data from heap to stack, but I think it would be somewhat limited in its scope. The compiler should be careful doing such things anyways, since the user can set stack and heap memory limits independently, and have expectations as to where the memory will be allocated at run time.

Another thing is that if malloc is called twice, and neither call returns null, then the two pointers returned are guaranteed to be different. If you replace the call to malloc with alloca, the alloca (i.e. stack allocation) still needs to run in a loop. Of course, this doesn't have to happen in this particular example, but an optimization targeting this case is quite pointless.

-K

The optimization which triggers here is essentially dead malloc
elimination; we can end up with IR which looks a lot like the original
example in non-trivial cases because sometimes we can eliminate all
accesses to a block of malloc'ed memory (via inlining/GVN/etc.).

-Eli

You've made a number of claims on this thread, but I'm not sure what you're basing these claims on - certainly not the C standard.

If you don't want the compiler to touch well known functions like malloc, build with -fno-builtin and you'll get the behavior you want.

-Chris

The only claim you can object to is about side-effects. It was more of a general remark, but yes, the compiler can delete calls where the result is not used. This is indeed a case of a "dead malloc", which Eli pointed out.

Is there anything else you disagree with?

-K