The undef story

Date: Fri, 30 Jun 2017 09:16:54 -0700
From: Peter Lawrence via llvm-dev <llvm-dev@lists.llvm.org>
To: Hal Finkel <hfinkel@anl.gov>
Cc: llvm-dev <llvm-dev@lists.llvm.org>, John Regehr
<regehr@cs.utah.edu>
Subject: Re: [llvm-dev] The undef story

Hal,
     Mehdi points out I mis-quoted you, I apologize sincerely.

Mehdi,
[ . . . ]
The problem is I don’t know how to interpret what Hal said here,
I can’t construct the “real” example from the “representative” example
given his advise.

Hopefully, the description I later provided helps. I can't share the
"real" code, but I reconstructed this example from my memory of the
original code plus some of the intermediate IR I'd looked at.

Hal,
     I’d like to see if I understand you, here’s what I think, let me know
if this is correct

1. Sometimes there are abstraction penalties in C++ code
2. That can be optimized away after template instantiation, function
inlining, etc
3. When they for example exhibit this pattern
if (A) {
stuff;
} else {
other stuff including “undefined behavior”;
}
4. Where the compiler assumes “undefined behavior” doesn’t actually happen
because
   In the C language standard it is the users responsibility to avoid it
5. Therefore in this example the compiler can a) delete the else-clause
    b) delete the if-cond, c) assume A is true and propagate that
information

Peter Lawrence.

Hal,
      How about these generalizations

if (A) {
stuff including “undefined behavior”
} else {
other stuff including “undefined behavior”
}
Then conclude that A itself must be “undefined behavior”
Therefore delete the entire diamond and propagate A as “undefined
behavior” (!?)

while (A) {
stuff including "undefined behavior";
}
Delete the loop and propagate A as being false

do {
stuff including “undefined behavior”
} while (A)
This means the encompassing scope scope has “undefined behavior"

These are all “well formed” CFGs, but not all CFGs are,
So we might want to come up with a more general algorithm

But my favorite is the CFG that collapses the entire function into a
black-hole.

What do we do then, replace it with a TRAP, or inline it into all its call
sites
and continue. And if we inline it, is just the call an instance of UB at
the call
Site, or is the result value of the function, which might be used in other
blocks,
a UB value, so it becomes more than just the call-site that is UB, and
what
about other out-args of the function.

I’m going to take a SWAG here and say that we probably haven’t entirely
thought this all through, or have we ?

We definitely have thought this through and have a very general answer that
is surprisingly easy to describe with compiler jargon.

The generalization of your examples is "all code post-dominated by
undefined behavior is dead".

In other words, if we can prove "when program statement A executes then
program statement B is guaranteed to execute and have undefined behavior"
then we can assume that program statement A is never executed.

In particular, deleting program statement A is a correct transformation.
Deleting program statement B itself is a special case of this (when A = B).

And yes, this does include everything up to and including `main`,
intraprocedurally and interprocedurally.

In fact, one interesting application of this is to provide compile-time
diagnostics for things that would be caught by our various sanitizers
(UBSan, ASan, etc.) at run time. In particular, after optimization is done,
we look to see if a call to the sanitizer's error-reporting function
post-dominates the function entry. If it does, then we can report that
diagnostic at compile-time.
(We don't currently implement this, but I remember hearing an approach like
this from Vedant at one of the LLVM socials)

-- Sean Silva

So, the email you replied to here Sean was not approved for the mailing list.

The reason is pretty simple. It comes down to the sentence you’re quoting here. ANyways, I don’t think we should really continue this discussion as it seems to not be resulting in any progress (and other paths forward have already been suggested). Moreover…

Peter, the language of your email has moved back outside of what is really appropriate for the LLVM mailing lists.

If you want to know whether people have thought of something, just ask politely. Please don’t descend to things like (in your own words) wild speculation about failure of LLVM developers to think through the work they have done. That isn’t constructive and isn’t appropriate for the list.

But I want to reiterate: this discussion isn’t productive. Paths forward have been offered. Peter, the ball is in your court to implement your ideas around undef. Continuing to debate here does not seem like a path forward.

I’m going to take a SWAG here and say that we probably haven’t entirely
thought this all through, or have we ?

We definitely have thought this through and have a very general answer
that is surprisingly easy to describe with compiler jargon.

So, the email you replied to here Sean was not approved for the mailing
list.

The reason is pretty simple. It comes down to the sentence you're quoting
here. ANyways, I don't think we should really continue this discussion as
it seems to not be resulting in any progress (and other paths forward have
already been suggested).

I think there have been a number of examples in this thread of stuff that
while it didn't really help this discussion forward per se, it help
articulate some interesting piece of information that not all onlookers
(especially newbies) would easily be exposed to (Hal's awesome example,
John's post about refinement, etc.). I was only really replying in that
sense. If I'd known that it didn't make it through to the mailing list I
wouldn't have bothered to reply.

I agree though, until Peter or someone else actually steps up to the plate
to implement a potential alternative approach, this discussion here per se
isn't going anywhere.

-- Sean Silva