Call to address 0 gets removed

Dale Johannesen wrote:

Paul Schlie wrote:

Dale Johannesen wrote:

Marius Wachtler wrote:
...
The call to address 0 gets removed.

define i32 @t(i32 %a) noreturn nounwind readnone {
entry:
   unreachable
}

How can I prevent that the call is removed, without making the
function addr volatile?
Does anyone know which optimization removes it?

Calling 0 is undefined behavior; the optimizer is within its
rights to remove this. Why do you want to call 0?

Although a C translation unit may arguably not assign a
correspondingly defined function as having a pointer value
(address) comparing equal to ((void *) 0);

Nothing arguable about it, see C99 6.3.2.3

- yes agreed, however:

it's not clear that the standard forbids the invocation of such a
function

No such function can exist. I don't think the standard forbids you
to call 0, but it makes calling 0 undefined behavior ("behavior, upon
use of a nonportable or erroneous program construct or of erroneous
data"), since there can't possibly be a valid function there.

- also yes, however ((void *) 0) need not have a storage representation
  being equivalent to ((int) 0), as for example may be represented as
  being equivalent to ((int) -1) if desired (as may even be desirable
  in some circumstances); in which circumstance striping calls to "0"
  would arguably be wrong to broadly do. Although more practically as:

nor does it seem like a good idea to silently strip any such
invocations especially if allowed to be specified; as to do so
would seemingly only mask potentially legitimate problems, and/or
prevent that intended from being performed although potentially
relying on an undefined behavior.

(As for example, it's not hard to imagine that it may be desirable
to allow a machine which may trap such calls to do, and/or to allow
the invocation of some otherwise specified behavior although
considered undefined by standard itself.)

In general, a C compiler is not the right tool to use for
functionality outside the C language, so I'm not inclined to be
sympathetic to this line of reasoning.

- I presume you mean things like supporting inline assembly and a
  multitude of other other useful extensions which correspondingly
  have no have no well defined behavior? (a rhetorical question)

I understand you position, just simply don't agree that supporting
extensions which are useful and which do not violate the standard
are worthy of being broadly rejected, unless their specified undefined
behavior may be usefully capitalized upon to justify some more useful
optimization; however as stripping calls to "0", does not seem very
useful to anyone in practice, such calls may be more usefully preserved
possibly in combination with a useful non-portable warning. Merely IMHO.

There's another point that hasn't been raised yet here, which is that the
undefinedness of calling (void*) 0 is a property of C, not necessarily of
the LLVM abstract language. I think you can make an excellent case that
the standard optimizations should not be enforcing C language semantics,
or at least should allow such optimizations to be disabled.

Case in point — calls/loads/stores to null may be undefined behavior in C,
but they're certainly not undefined behavior in (say) Java. There's a well-
known implementation trick in JVMs where you optimistically emit code
assuming non-null objects, and then you install signal handlers to raise
exceptions in the cases where you're wrong. Now, obviously that trick
is going to have implications for the optimizers beyond "don't mark null
stores as unreachable" , but even so, it really shouldn't be totally precluded
by widespread assumptions of C semantics.

John.

2009/6/10 John McCall <rjmccall@apple.com>

There’s another point that hasn’t been raised yet here, which is that
the
undefinedness of calling (void*) 0 is a property of C, not necessarily
of
the LLVM abstract language. I think you can make an excellent case that
the standard optimizations should not be enforcing C language semantics,
or at least should allow such optimizations to be disabled.

All sorts of optimizations rely on this, whether as simple as eliminating comparisons of alloca against null to knowing that two malloc’d pointers can never alias (what if malloc returns null? if null is valid then you can store data there…).

Case in point — calls/loads/stores to null may be undefined behavior
in C,
but they’re certainly not undefined behavior in (say) Java. There’s a
well-
known implementation trick in JVMs where you optimistically emit code
assuming non-null objects, and then you install signal handlers to raise
exceptions in the cases where you’re wrong. Now, obviously that trick
is going to have implications for the optimizers beyond “don’t mark null
stores as unreachable” , but even so, it really shouldn’t be totally
precluded
by widespread assumptions of C semantics.

The current workaround is to use an alternate address space for your pointers. At some point we may extend the load/store/call instructions to specify their exact semantics similarly to the integer overflow proposal ( http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt ).

Nick

2009/6/10 John McCall <rjmccall@apple.com>

There’s another point that hasn’t been raised yet here, which is that
the
undefinedness of calling (void*) 0 is a property of C, not necessarily
of
the LLVM abstract language. I think you can make an excellent case that
the standard optimizations should not be enforcing C language semantics,
or at least should allow such optimizations to be disabled.

All sorts of optimizations rely on this, whether as simple as eliminating comparisons of alloca against null to knowing that two malloc’d pointers can never alias (what if malloc returns null? if null is valid then you can store data there…).

I’m not saying we should never make any assumptions about null, or that C-specific assumptions should be totally unwelcome in standard passes. I’m saying that current practice makes it very difficult to avoid certain C-specific assumptions.

Let’s take your examples. The assumption that alloca never produces null seems like a reasonable cross-language assumption to me, based on alloca’s status as a compiler-defined (and totally unstandardized) intrinsic; if I need more rigic semantics, I shouldn’t be using alloca. The assumption that the function called malloc never returns aliasing pointers is indeed a C-specific assumption, but it’s one that I can easily avoid if necessary by, well, not using C-specific libcall optimizations. And most of these C-inspired assumptions fall into one of those two categories: it’s either generally valid or easily disabled.

On the other hand, the assumption that calls to null are undefined behavior is so hard-coded into instcombine that I can only avoid it by refusing to run the entire instcombine pass, or by carefully guarding how I emit calls that might be to null. And I do think this is inappropriate for a core pass, just as if someone made BasicAliasAnalysis do type-based alias analysis based on C’s strict-aliasing rules, or if someone modified a loop-counting pass to use C’s signed-overflow semantics, or so on. At the very least, there should be some way to configure this on the pass.

Case in point — calls/loads/stores to null may be undefined behavior
in C,
but they’re certainly not undefined behavior in (say) Java. There’s a
well-
known implementation trick in JVMs where you optimistically emit code
assuming non-null objects, and then you install signal handlers to raise
exceptions in the cases where you’re wrong. Now, obviously that trick
is going to have implications for the optimizers beyond “don’t mark null
stores as unreachable” , but even so, it really shouldn’t be totally
precluded
by widespread assumptions of C semantics.

The current workaround is to use an alternate address space for your pointers. At some point we may extend the load/store/call instructions to specify their exact semantics similarly to the integer overflow proposal ( http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt ).

I’ll note that instcombine actually marks stores to null as unreachable regardless of the address space of the pointer, unless I’m missing something subtle.

John.

That's not intentional; I just filed http://llvm.org/bugs/show_bug.cgi?id=4366 .

-Eli

For the default address space, LLVM IR *should* currently treat load/store to null as an undefined operation. To support Java-style "potentially trapping" load/store, we need something like this:
http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt

-Chris