GEP with a null pointer base

Message: 5
Date: Tue, 25 Jul 2017 00:12:35 -0700
From: Sean Silva via llvm-dev <llvm-dev@lists.llvm.org>
To: Peter Lawrence <peterl95124@sbcglobal.net>
Cc: llvm-dev <llvm-dev@lists.llvm.org>, John Regehr
<regehr@cs.utah.edu>
Subject: Re: [llvm-dev] GEP with a null pointer base

Sean,

Dan Gohman’s “transform” changes a loop induction variable, but does not
change the CFG,

Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.

These are two totally different transforms.

And even the analysis is different,

The first is based on an assumption of non-UB (actually there is no
analysis to perform)

the second Is based on a proof of existence of UB (here typically some
non-trivial analysis is required)

These have, practically speaking, nothing in common.

Ah, I think I see your point of confusion now. They are actually the same
thing, but one has a hidden step. You can consider the induction variable
transformation to be something like this:

a = add nsw i32 b, c
  1. ==> widen a and expand the add into:
if ("add i32 b, c" does not overflow) {
a = add nsw i64 b, c
} else {
a = (cast to i64) add nsw i32 b, c
}
  1. ==> if the else branch executes, then we provably have UB.
a = add nsw i64 b, c

The last step is the same as in Hal’s example where we delete a part of the
CFG by proving that if we reach it there is guaranteed to be UB. In Hal’s
example, the frontend marked the shift as having UB if the shift amount
exceeds the bitwidth. In this case, the frontend marked it as
non-overflowing. In this case, the fact that it is guaranteed to overflow
in the else branch has to be proven by looking at a dominating
conditional check instead of being manifest from a peephole observation
(after inlining/constant folding) as it is in Hal’s example.

Hopefully that makes sense. If not, could you indicate specifically which
of transformations 1. and/or 2. you think is illegal and why?

– Sean Silva

Sean,
Yes I can see your point, but in practice that’s not how it’s done,
no one expands the “+nsw” into control flow and then applies Hal’s
transform to it to get Dan’s transform. Dan’s transform doesn’t alter
the original CFG and is done as part of induction-variable analysis
and simplification. Hal’s does alter the original CFG, and uses a totally
different analysis as well as different transformation. The only thing they
have in common is the assumption that the program is UB-free at runtime,
but they have nothing in common in implementation.

So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
unnecessary.

And we’re missing the point, which is this, why ever delete UB code
which can cover up a bug, and more importantly, why ever make that
the default, given the emphasis in software engineering of avoiding UB
at the source level. Why are we implementing both -fsanitize=undefined
and deleting undefined as an optimization.

Peter.

Message: 5
Date: Tue, 25 Jul 2017 00:12:35 -0700
From: Sean Silva via llvm-dev <llvm-dev@lists.llvm.org>
To: Peter Lawrence <peterl95124@sbcglobal.net>
Cc: llvm-dev <llvm-dev@lists.llvm.org>, John Regehr
<regehr@cs.utah.edu>
Subject: Re: [llvm-dev] GEP with a null pointer base

Sean,

Dan Gohman’s “transform” changes a loop induction variable, but does not
change the CFG,

Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.

These are two totally different transforms.

And even the analysis is different,

The first is based on an *assumption* of non-UB (actually there is no
analysis to perform)

the second Is based on a *proof* of existence of UB (here typically some
non-trivial analysis is required)

These have, practically speaking, nothing in common.

Ah, I think I see your point of confusion now. They are actually the same
thing, but one has a hidden step. You can consider the induction variable
transformation to be something like this:

a = add nsw i32 b, c

1. ==> widen `a` and expand the add into:

if ("add i32 b, c" does not overflow) {
 a = add nsw i64 b, c
} else {
 a = (cast to i64) add nsw i32 b, c
}

2. ==> if the `else` branch executes, then we provably have UB.

a = add nsw i64 b, c

The last step is the same as in Hal's example where we delete a part of the
CFG by proving that if we reach it there is guaranteed to be UB. In Hal's
example, the frontend marked the shift as having UB if the shift amount
exceeds the bitwidth. In this case, the frontend marked it as
non-overflowing. In this case, the fact that it is guaranteed to overflow
in the `else` branch has to be proven by looking at a dominating
conditional check instead of being manifest from a peephole observation
(after inlining/constant folding) as it is in Hal's example.

Hopefully that makes sense. If not, could you indicate specifically which
of transformations 1. and/or 2. you think is illegal and why?

-- Sean Silva

Sean,
          Yes I can see your point, but in practice that’s not how it’s
done,
no one expands the “+nsw” into control flow and then applies Hal’s
transform to it to get Dan’s transform. Dan’s transform doesn’t alter
the _original_ CFG and is done as part of induction-variable analysis
and simplification. Hal’s does alter the _original_ CFG, and uses a totally
different analysis as well as different transformation. The only thing they
have in common is the assumption that the program is UB-free at runtime,
but they have nothing in common in implementation.

So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
unnecessary.

And we’re missing the point, which is this, why ever delete UB code
which can cover up a bug, and more importantly, why ever make that
the default, given the emphasis in software engineering of avoiding UB
at the source level. Why are we implementing both -fsanitize=undefined
and deleting undefined as an optimization.

Did you ever manage to get those performance numbers on 32-bit code with
-fwrapv? In the other thread you volunteered to do that as a means of
demonstrating that at least the +nsw transformations (excluding induction
variable widening) were actually unnecessary for performance, contrary to
everybody else's beliefs (and supporting your own position).

-- Sean Silva