Message: 5
Date: Tue, 25 Jul 2017 00:12:35 -0700
From: Sean Silva via llvm-dev <llvm-dev@lists.llvm.org>
To: Peter Lawrence <peterl95124@sbcglobal.net>
Cc: llvm-dev <llvm-dev@lists.llvm.org>, John Regehr
<regehr@cs.utah.edu>
Subject: Re: [llvm-dev] GEP with a null pointer baseSean,
Dan Gohman’s “transform” changes a loop induction variable, but does not
change the CFG,Hal’s “transform” deletes blocks out of the CFG, fundamentally altering it.
These are two totally different transforms.
And even the analysis is different,
The first is based on an assumption of non-UB (actually there is no
analysis to perform)the second Is based on a proof of existence of UB (here typically some
non-trivial analysis is required)These have, practically speaking, nothing in common.
Ah, I think I see your point of confusion now. They are actually the same
thing, but one has a hidden step. You can consider the induction variable
transformation to be something like this:a = add nsw i32 b, c
- ==> widen
a
and expand the add into:if ("add i32 b, c" does not overflow) { a = add nsw i64 b, c } else { a = (cast to i64) add nsw i32 b, c }
- ==> if the
else
branch executes, then we provably have UB.a = add nsw i64 b, c
The last step is the same as in Hal’s example where we delete a part of the
CFG by proving that if we reach it there is guaranteed to be UB. In Hal’s
example, the frontend marked the shift as having UB if the shift amount
exceeds the bitwidth. In this case, the frontend marked it as
non-overflowing. In this case, the fact that it is guaranteed to overflow
in theelse
branch has to be proven by looking at a dominating
conditional check instead of being manifest from a peephole observation
(after inlining/constant folding) as it is in Hal’s example.Hopefully that makes sense. If not, could you indicate specifically which
of transformations 1. and/or 2. you think is illegal and why?– Sean Silva
Sean,
Yes I can see your point, but in practice that’s not how it’s done,
no one expands the “+nsw” into control flow and then applies Hal’s
transform to it to get Dan’s transform. Dan’s transform doesn’t alter
the original CFG and is done as part of induction-variable analysis
and simplification. Hal’s does alter the original CFG, and uses a totally
different analysis as well as different transformation. The only thing they
have in common is the assumption that the program is UB-free at runtime,
but they have nothing in common in implementation.
So I’m not saying your transforms 1 & 2 are illegal, I’m saying they are
unnecessary.
And we’re missing the point, which is this, why ever delete UB code
which can cover up a bug, and more importantly, why ever make that
the default, given the emphasis in software engineering of avoiding UB
at the source level. Why are we implementing both -fsanitize=undefined
and deleting undefined as an optimization.
Peter.