optimization assumes malloc return is non-null

Consider the following c code:

#include <stdlib.h>

int main(int argc, char** argv){
   if(malloc(sizeof(int)) == NULL){ return 0; }
   else{ return 1; }
}

When I compile it with -O3, it produces the following bytecode:

define i32 @main(i32 %argc, i8** %argv) {
entry:
         ret i32 1
}

Is this an error? It should be possible for malloc to return NULL, if it can not allocate more space. In fact, some programs should be able to gracefully handle such situations.

Regards,
Ryan

For C code this is unquestionably a bug.

It's an allowable program transformation because a call to malloc is not in itself a side effect. See e.g. 5.1.2.3 in the C standard.

  Daveed

Daveed:

Perhaps I am looking at the wrong version of the specification. Section
5.1.2.3 appears to refer to objects having volatile-qualified type. The
type of malloc() is not volatile qualified in the standard library
definition.

In general, calls to procedures that are outside the current unit of
compilation are presumed to involve side effects performed in the body
of the external procedure (at least in the absence of annotation).

Can you say what version of the standard you are referencing, and (just
so I know) why section 5.1.2.3 makes a call to malloc() different from
any other procedure call with respect to side effects?

Thanks

Jonathan

Daveed:

Perhaps I am looking at the wrong version of the specification. Section
5.1.2.3 appears to refer to objects having volatile-qualified type. The
type of malloc() is not volatile qualified in the standard library
definition.

More importantly, malloc() is not specified to access a volatile object, modify an object, or modifying a file (directly or indirectly); i.e., it has no side effect from the language point of view.

In general, calls to procedures that are outside the current unit of
compilation are presumed to involve side effects performed in the body
of the external procedure (at least in the absence of annotation).

That may often be done in practice, but it's not a language requirement. In particular, for standard library functions (like malloc) an optimizer can exploit the known behavior of the function.

Can you say what version of the standard you are referencing, and (just
so I know) why section 5.1.2.3 makes a call to malloc() different from
any other procedure call with respect to side effects?

I'm looking at ISO/IEC 9899:1999.

<begin quote>
1 The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

2 Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, [footnote bout floating-point effects] which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)

3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
<end quote>

The same concept exists in C++, and we often refer to it as the "as if" rule; i.e., implementations can do all kinds of things, as long as the effect is "as if" specified by the abstract machine.

  Daveed

> Daveed:
>
> Perhaps I am looking at the wrong version of the specification.
> Section
> 5.1.2.3 appears to refer to objects having volatile-qualified type.
> The
> type of malloc() is not volatile qualified in the standard library
> definition.

...malloc() is not specified to access a volatile
object, modify an object, or modifying a file (directly or
indirectly); i.e., it has no side effect from the language point of
view.

Daveed:

Good to know that I was looking at the correct section. I do not agree
that your interpretation follows the as-if rule, because I do not agree
with your interpretation of the C library specification of malloc().

The standard library specification of malloc() clearly requires that it
allocates storage, and that such allocation is contingent on storage
availability. Storage availability is, in turn, a function (in part) of
previous calls to malloc() and free(). Even if free() is not called, the
possibility of realloc() implies a need to retain per-malloc() state. In
either case, it follows immediately that malloc() is stateful, and
therefore that any conforming implementation of malloc() must modify at
least one object in the sense of the standard.

If I understand your position correctly, your justification for the
optimization is that the C library standard does not say in so many
words that malloc() modifies an object. I do not believe that any such
overt statement is required in order for it to be clear that malloc() is
stateful. The functional description of malloc() and free() clearly
cannot be satisfied under the C abstract machine without mutation of at
least one object.

Also, I do not read 5.1.2.3 in the way that you do. Paragraph 2 defines
"side effect", but it does not imply any requirement that side effects
be explicitly annotated. What Paragraph 3 gives you is leeway to
optimize standard functions when you proactively know their behavior. A
standard library procedure is not side-effect free for optimization
purposes by virtue of the absence of annotation. It can only be treated
as side-effect free by virtue of proactive knowledge of the
implementation of the procedure. In this case, we clearly have knowledge
of the implementation of malloc, and that knowledge clearly precludes
any possibility that malloc is simultaneously side-effect free and
conforming.

So it seems clear that this optimization is wrong. By my reading, not
only does the standard fail to justify it under 6.1.2.3 paragraph 3, it
*prohibits* this optimization under 5.1.2.3 under Paragraph 1 because
there is no conforming implementation that is side-effect free.

Exception: there are rare cases where, under whole-program optimization,
it is possible to observe that free() is not called, that there is an
upper bound on the number of possible calls to malloc() and also an
upper bound on the total amount of storage allocated. In this very
unusual case, the compiler can perform a hypothetical inlining of the
known implementation of malloc and then do partial evaluation to
determine that no heap size tracking is required. If so, it can then
legally perform the optimization that is currently being done.

But I don't think that the current compiler is actually doing that
analysis in this case...

> In general, calls to procedures that are outside the current unit of
> compilation are presumed to involve side effects performed in the body
> of the external procedure (at least in the absence of annotation).

That may often be done in practice, but it's not a language
requirement. In particular, for standard library functions (like
malloc) an optimizer can exploit the known behavior of the function.

I disagree. In the malloc() case, the known behavior is side effecting.
In the general case, the compiler cannot assume side-effect freedom
unless it can prove it, and in the absence of implementation knowledge
the standard requires conservatism.

The same concept exists in C++, and we often refer to it as the "as
if" rule; i.e., implementations can do all kinds of things, as long as
the effect is "as if" specified by the abstract machine.

Yes. But the C++ version of this is quite different, because in any
situation where new would return NULL it would instead be raising an out
of memory exception. In consequence, the optimization is correct for
operator new whether or not operator new is side effecting.

Setting the matter of the standard entirely aside, the currently
implemented behavior deviates so blatantly from common sense and
universal convention that it really must be viewed as a bug.

Finally, I strongly suspect that LLVM will fail the standard conformance
suites so long as this optimization is retained.

shap

Jonathan S. Shapiro wrote:

Daveed:

Perhaps I am looking at the wrong version of the specification.
Section
5.1.2.3 appears to refer to objects having volatile-qualified type.
The
type of malloc() is not volatile qualified in the standard library
definition.

...malloc() is not specified to access a volatile
object, modify an object, or modifying a file (directly or
indirectly); i.e., it has no side effect from the language point of
view.

Daveed:

Good to know that I was looking at the correct section. I do not agree
that your interpretation follows the as-if rule, because I do not agree
with your interpretation of the C library specification of malloc().

Before I go on, let me state that this is not a contentious issue among WG14: There is no doubt that the intent of the standard is that this be a valid optimization.

So at most we're debating whether the wording implements the intent in this case (which it does IMO).

The standard library specification of malloc() clearly requires that it
allocates storage, and that such allocation is contingent on storage
availability.

(I think that's a contradiction: A perfectly standard-conforming implementation of malloc is:

  void *malloc(size_t size) { return 0; }

Admittedly a moot point here.)

Storage availability is, in turn, a function (in part) of
previous calls to malloc() and free().

While that's certainly true of every reasonable implementation, it's not a requirement (see above).

Even if free() is not called, the
possibility of realloc() implies a need to retain per-malloc() state.

Realloc is permitted to fail on every call.

In
either case, it follows immediately that malloc() is stateful, and
therefore that any conforming implementation of malloc() must modify at
least one object in the sense of the standard.

For every useful implementation malloc will indeed be stateful (but again, that's not a language requirement). So in practice malloc will indeed have side effects. However even in the practical cases, 5.1.2.3/3 still allows the optimization on the example shown:

<begin quote>
3 In the abstract machine, all expressions are evaluated as specified
by the semantics. An actual implementation need not evaluate part of
an expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
<end quote>

(I should have been more careful in my earlier response and clarify that modifying a nonvolatile object is a side-effect that can be optimized away in many cases.)

If I understand your position correctly, your justification for the
optimization is that the C library standard does not say in so many
words that malloc() modifies an object. I do not believe that any such
overt statement is required in order for it to be clear that malloc() is
stateful. The functional description of malloc() and free() clearly
cannot be satisfied under the C abstract machine without mutation of at
least one object.

I shown above, that isn't actually correct.

Also, I do not read 5.1.2.3 in the way that you do. Paragraph 2 defines
"side effect", but it does not imply any requirement that side effects
be explicitly annotated. What Paragraph 3 gives you is leeway to
optimize standard functions when you proactively know their behavior. A
standard library procedure is not side-effect free for optimization
purposes by virtue of the absence of annotation. It can only be treated
as side-effect free by virtue of proactive knowledge of the
implementation of the procedure.

I agree with that, and in this case a compiler can know that malloc won't change a volatile object or write to a non-temporary file. (Alternatively, it can just "use" a strange malloc implementation for this particular case -- a platonic exercise since the optimization that allows then makes the implementation unneeded.)

(It might also be worth noting that standard library functions can be "magical macros". That also opens the door to all kinds of optimizations; perhaps

In this case, we clearly have knowledge
of the implementation of malloc, and that knowledge clearly precludes
any possibility that malloc is simultaneously side-effect free and
conforming.

So it seems clear that this optimization is wrong. By my reading, not
only does the standard fail to justify it under 6.1.2.3 paragraph 3, it
*prohibits* this optimization under 5.1.2.3 under Paragraph 1 because
there is no conforming implementation that is side-effect free.

I'm afraid I don't understand how 5.1.2.3/1 applies there, nor why 5.1.2.3/3 (which I assume was the intended reference) would not apply. Even if malloc changes some state, that side effect isn't a "needed side effect" in the sense of 5.1.2.3/3.

Exception: there are rare cases where, under whole-program optimization,
it is possible to observe that free() is not called, that there is an
upper bound on the number of possible calls to malloc() and also an
upper bound on the total amount of storage allocated. In this very
unusual case, the compiler can perform a hypothetical inlining of the
known implementation of malloc and then do partial evaluation to
determine that no heap size tracking is required. If so, it can then
legally perform the optimization that is currently being done.

Right. And this is such a case.

However, the optimization is allowed even without that: Just by noting that the return value of malloc isn't used (nor leaked); I suspect that's what the compiler does in this case (though I don't know).

But I don't think that the current compiler is actually doing that
analysis in this case...

I cannot speak to that.

In general, calls to procedures that are outside the current unit of
compilation are presumed to involve side effects performed in the body
of the external procedure (at least in the absence of annotation).

That may often be done in practice, but it's not a language
requirement. In particular, for standard library functions (like
malloc) an optimizer can exploit the known behavior of the function.

I disagree. In the malloc() case, the known behavior is side effecting.

See above.

In the general case, the compiler cannot assume side-effect freedom
unless it can prove it, and in the absence of implementation knowledge
the standard requires conservatism.

Except for the leeway offered by 5.1.2.3/3.

This is no different from not requiring that an actual store operation happen in something like:

  void f(int x) { x = 3; }

The same concept exists in C++, and we often refer to it as the "as
if" rule; i.e., implementations can do all kinds of things, as long as
the effect is "as if" specified by the abstract machine.

Yes. But the C++ version of this is quite different, because in any
situation where new would return NULL it would instead be raising an out
of memory exception. In consequence, the optimization is correct for
operator new whether or not operator new is side effecting.

I wasn't talking about malloc vs. new: C++ also has malloc. I was only talking about C++ because I actively work on that standard, so I'm more familiar with the "jargon" that applies to it.

Setting the matter of the standard entirely aside, the currently
implemented behavior deviates so blatantly from common sense and
universal convention that it really must be viewed as a bug.

That's a different topic, and perhaps a matter of opinion. I personally think it's a perfectly decent optimization, though this particular instance of it isn't very useful as far as I can tell.

Finally, I strongly suspect that LLVM will fail the standard conformance
suites so long as this optimization is retained.

I'm quite familiar with the significant suites out there, and I can assure you that this won't be an issue. (Though I don't believe the gcc front end is fully conforming either -- for other reasons.)

  Daveed

Maybe I missed something, but aren't we all talking about the wrong thing
here? It seems to me that this isn't about side effects, it's about the
return value of malloc. Why can LLVM assume malloc will always return
non-zero?

                                                -Dave

I hope that Daveed will correct me on this, but I think that the theory
is as follows:

  Since the effect of malloc is not captured, the entire malloc can be
  discarded. Any call to malloc that is discarded can be presumed
  (arbitrarily) to succeed, and therefore to return non-null.

shap

It cannot assume that in the general case. It can do so here because the return value is not used further on. The side effects issue comes about because the compiler needs to know that the call to malloc was not observable other than through the return value. To re-quote a part of 5.1.2.3/3: "An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced". So there are two conditions to optimize this: Track the return value to make sure nothing interesting (i.e., unknown) happens to it, and be sure that there are no observable side effects from the call itself (in this case, take advantage of the fact that "malloc" is part of the language implementation).

Note that more interesting optimizations are possible. E.g., it's perfectly valid to transform:

  void f(size_t n) {
    char *str = (char*)malloc(n);
    // use str[0 .. 99 ]
    free(str);
  }

into

  void f(size_t n) {
    char *str = (char*)alloca(n);
    // use str[0 .. 99 ]
  }

(Can LLVM do that? Is that maybe why the optimization happens?)

  Daveed

Never mind -- this argument is okay as far as it goes, but I don't think
Daveed is relying on this.

Correct. It's an extreme form of garbage collection, I suppose :wink:

(In theory, it can also be assumed to fail -- because an implementation is allowed to make any call to malloc fail -- though that's probably not useful.)

  Daveed

> Daveed:
>
> Good to know that I was looking at the correct section. I do not agree
> that your interpretation follows the as-if rule, because I do not
> agree
> with your interpretation of the C library specification of malloc().

Before I go on, let me state that this is not a contentious issue
among WG14: There is no doubt that the intent of the standard is that
this be a valid optimization.

And of course, ANSI C working groups are infallible. Consider
"noalias". :slight_smile:

But I think I am starting to understand your argument, and if so I
concur that the optimization is correct in this case. Let me see ask
some further questions just to confirm.

Let me attempt to replay what I believe is happening here:

1. There is an observably linear chain of calls to malloc() in the
example, and because of the structure of the example, the compiler can
see that there are no *unobserved* calls to malloc() in the program.

2. After all inlining of malloc() has proceeded, the compiler is free to
observe that all calls succeed by partial evaluation (thus the non-zero
return value) and that after this is done there is no witness to the
internal heap limit variable.

3. Progressive application of dead variable elimination will then delete
the entire chain of updates to the heap limit variable, eliminating the
side effects that would normally be induced by malloc(). They will also
delete the uncaptured pointer result from malloc().

4. Since the result pointer is not captured, and the heap limit variable
is not consulted, the entire result of the call to malloc() is the
return of a non-NULL pointer.

5. Constant folding then reduces the IF condition (malloc() == NULL) to
false.

6. Dead code elimination then reduces the if/then to simply the then
body.

7. Optimizer now reduces the whole thing to "return 1".

Is this more or less the chain of events here (or if not this, then
something of a similar nature)?

If so, I understand and concur that this optimization is valid in this
example, primarily due to perverse and non-obvious consequences of the
particular bit of sample code.

Just to cement my understanding, I suspect that *either* of the
following two changes to the sample code would make this optimization
invalid:

  1. Rename main() to f()

  2. Introduce a call to some external procedure anywhere in main().

In either case, the compiler can no longer witness all reachable calls
to malloc(), and must assume conservatively that the internally produced
side effect results of malloc() may be sourced by other units of
compilation, and therefore cannot be elided.

Is this correct?

Hmm. No. I can weasel out of this in case (2), because the standard does
not require that calls to malloc() be processed in order, so in some
cases calls to malloc() appearing in main() can still be eliminated by
logically rearranging them to occur before any of the external calls to
malloc(). I bet LLVM doesn't know about this. :slight_smile:

(I should have been more careful in my earlier response and clarify
that modifying a nonvolatile object is a side-effect that can be
optimized away in many cases.)

If I understand the standard correctly, this can be done exactly if
there is no witness to the side effect. Furthermore, chains of
intermediate unwitnessed side effects can be collapsed if the optimizer
can prove that the intermediate results are unwitnessed. Yes?

...However, the optimization is allowed even without that: Just by noting
that the return value of malloc isn't used (nor leaked); I suspect
that's what the compiler does in this case (though I don't know).

I think that is not quite strong enough. If there is any possibility of
preceding or succeeding calls to malloc() in compilation units that are
not visible at compile time, then the compiler cannot determine that the
current call to malloc() must succeed or that it's side effects are
unwitnessed.

I suppose that the compiler can do special case things in main on the
theory that these calls to malloc() may be executed out of order by a
conforming implementation, but I suspect that goes well beyond the
cleverness of the current LLVM implementation. It's the sort of thing
that the VAX/VMS optimizer or the IBM PL.8 research optimizer might
actually have done. And certainly the Bulldog optimizer.

> Setting the matter of the standard entirely aside, the currently
> implemented behavior deviates so blatantly from common sense and
> universal convention that it really must be viewed as a bug.

Just to be explicit: I now see that my statement above is wrong in the
present example.

Daveed: Thanks for being patient on this. I appreciate your willingness
to help me see this one.

shap

I think it cannot in general assume that in this case, for reasons
outlined in my other note.

LLVM should not (and does not, afaik) assume the malloc succeeds in general.

If LLVM is able to eliminate all users of the malloc assuming the malloc succeeded (as in this case), then it is safe to assume the malloc returned success.

-Chris

This isn't safe in general unless you can (tightly) bound "n". You don't want to overflow the stack.

We do delete "free(malloc(n))" though.

-Chris

Note that more interesting optimizations are possible. E.g., it's
perfectly valid to transform:

  void f(size_t n) {
    char *str = (char*)malloc(n);
    // use str[0 .. 99 ]
    free(str);
  }

into

  void f(size_t n) {
    char *str = (char*)alloca(n);
    // use str[0 .. 99 ]
  }

(Can LLVM do that? Is that maybe why the optimization happens?)

This isn't safe in general unless you can (tightly) bound "n". You don't
want to overflow the stack.

Ah yes, of course. Does LLVM do this for known & small constant n?

(I suppose it could be transformed into:

  void f(size_t n) {
    bool __opt = n < __L;
    char *str = (char*)(opt ? alloca(n) : malloc(n));
    // ...
    if (!opt) free(str);
  }

The payoff is less obvious here.)

We do delete "free(malloc(n))" though.

Cool.

  Daveed

We don't do this currently, primarily because I haven't seen a case where it is a win yet: it would be very easy to do for some cases. The trick with this is that you have to prove a couple of interesting properties of the program. For example, you effectively have to prove that the malloc can only be executed once for each invocation of the function. You don't want to turn something like this:

for ()
   malloc(12) // perhaps a linked list.

Into unbounded stack allocation (even though it would work as long as you don't run out of stack).

There are interesting cases that could be caught like:

for () {
   p = malloc(42)
   use(p);
   free(p);
}

etc, which could also be done. In this case, the alloca could even be a fixed alloca coded into the prolog of the function, not a dynamic alloca.

Personally to me, I have a bigger axe to grind with C++ operator new. AFAIK, the standard doesn't give leeway to do a number of interesting optimizations for new/delete because the user is explicitly allowed to override them and the std doesn't require them to behave "as expected". Very interesting properties to me would be:

1) Safety to remove "delete (new int);" and friends.
2) Aliasing guarantees about the result of new. There are a huge number of code pessimizations that happen because the optimizer has to assume that 'new' can return a pointer that already exists in the program.
3) Lifetime guarantees. It would be really nice to be able to delete the store to X in: " double *X = ...; *X = 4.0; delete X;" which is safe with 'free'.

etc. A lot of nice guarantees that we have with malloc/free aren't available with new/delete. Also, since new/delete can be overridden at any time (as late as runtime with LD_PRELOAD and friends), there is really no way the compiler can assume anything spiffy about new/delete except with some magic "standards violation is ok" compiler option, which is gross.

Any thoughts on how to improve this situation?

-Chris

Daveed:

Good to know that I was looking at the correct section. I do not agree
that your interpretation follows the as-if rule, because I do not
agree
with your interpretation of the C library specification of malloc().

Before I go on, let me state that this is not a contentious issue
among WG14: There is no doubt that the intent of the standard is that
this be a valid optimization.

And of course, ANSI C working groups are infallible. Consider
"noalias". :slight_smile:

Ah, but with the help of DMR, noalias was avoided and infallibility maintained.

(Kidding.)

Seriously, I don't think this particular item is a committee booboo, but I didn't mean to imply it cannot be. I only intended to note that the intent is that such optimization be allowed (and that to the best of my knowledge the words reflect that).

But I think I am starting to understand your argument, and if so I
concur that the optimization is correct in this case. Let me see ask
some further questions just to confirm.

Let me attempt to replay what I believe is happening here:

1. There is an observably linear chain of calls to malloc() in the
example, and because of the structure of the example, the compiler can
see that there are no *unobserved* calls to malloc() in the program.

(You meant "no *observed*", right?)

2. After all inlining of malloc() has proceeded, the compiler is free to
observe that all calls succeed by partial evaluation (thus the non-zero
return value) and that after this is done there is no witness to the
internal heap limit variable.

3. Progressive application of dead variable elimination will then delete
the entire chain of updates to the heap limit variable, eliminating the
side effects that would normally be induced by malloc(). They will also
delete the uncaptured pointer result from malloc().

4. Since the result pointer is not captured, and the heap limit variable
is not consulted, the entire result of the call to malloc() is the
return of a non-NULL pointer.

5. Constant folding then reduces the IF condition (malloc() == NULL) to
false.

6. Dead code elimination then reduces the if/then to simply the then
body.

7. Optimizer now reduces the whole thing to "return 1".

Is this more or less the chain of events here (or if not this, then
something of a similar nature)?

I think it's a valid set of transformations, but I don't think it's the only approach that can lead to this result. I was thinking that a compiler could be aware of the semantics of malloc: It wouldn't necessarily "inline" the call, but if it could prove that the result was not leaked or "used" (indirected), then a non-heap address could be produced. Alternatively, the malloc call could be promoted to alloca (as Chris noted in his reply, only if a small bound of allocation was known; but clearly the case here), and then it becomes akin to a normal dead local variable thing.

If so, I understand and concur that this optimization is valid in this
example, primarily due to perverse and non-obvious consequences of the
particular bit of sample code.

Just to cement my understanding, I suspect that *either* of the
following two changes to the sample code would make this optimization
invalid:

1. Rename main() to f()

2. Introduce a call to some external procedure anywhere in main().

In either case, the compiler can no longer witness all reachable calls
to malloc(), and must assume conservatively that the internally produced
side effect results of malloc() may be sourced by other units of
compilation, and therefore cannot be elided.

Is this correct?

I'm of the opinion that even with such changes the optimization is valid. Let me rename main to f (and drop the unused parameters) and add calls to a function g. The original example is then:

  #include <stdlib.h>

  int f() {
    g();
    if (malloc(sizeof(int)) == NULL) { return 0; } else { return 1; }
    g();
  }

The malloc value cannot escape and the allocated storage isn't used (in other words, it can be "garbage collected" right away). So it can be replaced by a unique address (though the same address can be used for each invocation, IMO).

Note that within the confines of the C (or C++) standard, g has no way of observing the result of the call to malloc. (We might "know" that there is a static-lifetime variable somewhere that changed, but that variable isn't required by the standard, and a compiler may instead posit that for this very call to malloc, a separate heap is used.)

There is (at least) a subtlety with that reasoning in C having to do with "lifetime": The allocated _object_ has a lifetime (7.20.3/1) that extends until deallocation. My view is that the storage only becomes an object when the pointer returned is bound to a typed pointer (7.20.3/1). If that's right, every instance of the particular malloc call can produce the same value; if not, we'd need to be able to bound the number instances of that call. (C++ has a convenient "conformance within resource limits" clause that provides a different way out.)

Hmm. No. I can weasel out of this in case (2), because the standard does
not require that calls to malloc() be processed in order, so in some
cases calls to malloc() appearing in main() can still be eliminated by
logically rearranging them to occur before any of the external calls to
malloc(). I bet LLVM doesn't know about this. :slight_smile:

(I should have been more careful in my earlier response and clarify
that modifying a nonvolatile object is a side-effect that can be
optimized away in many cases.)

If I understand the standard correctly, this can be done exactly if
there is no witness to the side effect. Furthermore, chains of
intermediate unwitnessed side effects can be collapsed if the optimizer
can prove that the intermediate results are unwitnessed. Yes?

Yes, I think so.

...However, the optimization is allowed even without that: Just by noting
that the return value of malloc isn't used (nor leaked); I suspect
that's what the compiler does in this case (though I don't know).

I think that is not quite strong enough. If there is any possibility of
preceding or succeeding calls to malloc() in compilation units that are
not visible at compile time, then the compiler cannot determine that the
current call to malloc() must succeed or that it's side effects are
unwitnessed.

See above. If the allocation is small enough that it can be modelled by a "reserved heap", I think it can.

I suppose that the compiler can do special case things in main on the
theory that these calls to malloc() may be executed out of order by a
conforming implementation, but I suspect that goes well beyond the
cleverness of the current LLVM implementation. It's the sort of thing
that the VAX/VMS optimizer or the IBM PL.8 research optimizer might
actually have done. And certainly the Bulldog optimizer.

(I should probably mention that I'm no optimizer pro. So my reasoning is based on language specs.)

Setting the matter of the standard entirely aside, the currently
implemented behavior deviates so blatantly from common sense and
universal convention that it really must be viewed as a bug.

Just to be explicit: I now see that my statement above is wrong in the
present example.

Daveed: Thanks for being patient on this. I appreciate your willingness
to help me see this one.

No problem at all. I should probably worry that I find ISO exegesis mildly fun :stuck_out_tongue:

  Daveed