killing undef and spreading poison

  1. —————
    Dan,
    The reasoning given below for how GVN operates seems odd to me,

Poison is an attribute of a value, just like nsw is an attribute of an operation,

So when GVN sees a pair of equal values, one of which has an extra attribute,
The proper choice for representative value is the one without the attribute,

Just like when GVN sees a pair of add operations, one of which has an extra attribute,
The proper choice for representative is the one without the attribute

  1. —————
    Nuno,
    If Dan agrees with the above then can we come up with a better example
    for why branch-on-poison should be “undefined behavior”.

Thoughts ?
Comments ?
Questions ?

Peter Lawrence.

1. —————
Dan,
       The reasoning given below for how GVN operates seems odd to me,

Poison is an attribute of a value, just like nsw is an attribute of an
operation,

So when GVN sees a pair of equal values, one of which has an extra
attribute,
The proper choice for representative value is the one without the
attribute,

There are many many choices of what make *good* representative values in
NewGVN.
I would deny any assertion that there is a proper choice, and that if there
is one, it must be what you suggest :slight_smile:
You actually can prove that there is no optimal such choice. Different
choices will lead you to discover different congruences in any practical
algorithms, even complete ones (as the issues here are orthogonal to
herbrand equivalence., and instead related to inference of facts from
control flow and other things, as well as what the results from symbolic
evaluation and simplification)
There are examples in the paper cited by newgvn of situations where
different choices of values for predicate and phi inference will enable you
to discover different congruences.

More importantly, given the other purpose of the representatives is to be
used during simplification and symbolic evaluation, something we know we
*cannot* possibly be optimal at, i do not believe you can assert that there
is such an easily knowable best representative.

There is also no representation we could pick that would avoid all issues
for others

Thoughts ?
Comments ?
Questions ?

The middle one.

I'm going to be frank and honest:
I don't feel like i have a good understanding of what your goal here is, or
what you see your role as.
I can tell you, again, completely personally, that i find your approach
off-putting at times.
To give a concrete example, here, what i see (again, from *my*
perspective), is you are asserting things to me and asking me to confirm
them or disprove them. I generally have not found this to be a good
collaborative approach to increasing understanding. You then say"If Dan
agrees with the above then can we come up with a better example". In
practice, so far, that has meant "can *Nuno* come up with a better example"
(IE the we seems to be turning into work for others).

I do not say this as a true complaint so much as a explanation of why i
don't feel like i understand your perspective on things, and i think it
would be helpful for me to be able to.
I also certainly do not claim that there are not multiple ways to approach
problem solving, communities, etc, or that what I or anyone else does is
best.

Daniel,
Thanks for taking the time to respond.

Regarding GVN and newGVN, I recently finished a search through the
llvm-dev archives for “nsw” in the subject line, and GVN was discussed in
some of those threads [1].

In particular it was claimed that there was a right choice for GVN to make given
two ADD instructions, one with the “nsw” attribute and one without, the one
without ‘nsw’ must be the representative.

This was described as a correctness issue, not an optimality issue. IE if you
are going to CSE those two ADDs, it is safe to loose information (drop the ‘nsw’)
but it is unsafe to put ‘nsw’ on to an operation that didn’t already have it
because that is like inserting an “llvm.assume" where there was none.

So I am extrapolating from ’nsw’ instruction attribute, to ‘poison’ value attribute,
And wondering if (expecting that) you think the same logic applies.

I make no claim as to optimality, which seems to be what you are getting at.

Are you saying the post I am citing is incorrect ?
Or that I misunderstood it ?
Or could it be that you misread my email ?

Yes I am hoping Nuno can come up with a better example. There is a claim
that “branch on poison is undefined behavior” is the right definition, which is
not justified in the Paper [2], which John is waffling on, which no one seems to
be able to justify, and which gives the appearance at least of flying in the face
of common sense.

This is the IR definition we’re talking about here, the literal heart and soul of
the compiler, everyone should be at least a little bit concerned.

Peter Lawrence.

[1. I’m having difficulty locating the post, here’s one that is similar

Tue Jul 22, 2014 “On semantics of add instruction - nsw, nuw”

It also occurs to me that the more detailed post I am remembering but
can’t find might have been about SCEV rather than GVN ]

[2. “Taming undefined behavior in LLVM” ]

The GVN example we provided was supposed to be a sufficient argument.
But.. see section 3.2 of the paper you mentioned for another example of why a non-deterministic jump on poison/undef is not a good idea.
Alternatively see slide 12 of this: http://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.pdf
And a miscompilation: https://bugs.llvm.org/show_bug.cgi?id=21412

See also my reply to your email yesterday. You cannot drop poison like you drop nsw; that doesn't exist (currently).

Nuno

Citando Peter Lawrence via llvm-dev <llvm-dev@lists.llvm.org>:

Nuno,
I am confused, you have not shown why branch on poison should
be defined to be UB, you have shown why “each use of ‘undef’ can yield
a different result” is a bad definition.

So we still don’t have a good reason that justifies “branch on poison is UB”.

Peter Lawrence.

The GVN example we provided was supposed to be a sufficient argument.
But… see section 3.2 of the paper you mentioned for another example of why a non-deterministic jump on poison/undef is not a good idea.

Here’s what 3.2 says:

“Since each use of ‘undef’ can yield a different result, we can have the top level if-condition being true and still divide by zero”,

it says nothing about branch on undefined or branch on poison.

Alternatively see slide 12 of this: http://llvm.org/devmtg/2016-11/Slides/Lopes-LongLivePoison.pdf

Slide 12 is an exact duplicate of the example in the Paper.

It actually has arrows pointing to the two uses of ‘k’ that can have different values
because of the bad definition of “undef"

And a miscompilation: https://bugs.llvm.org/show_bug.cgi?id=21412

Again, this bug shows the exact same thing once you reduce it to simplest form.

Daniel,
           Thanks for taking the time to respond.

This is going to be my last response.

Regarding GVN and newGVN, I recently finished a search through the
llvm-dev archives for “nsw” in the subject line, and GVN was discussed in
some of those threads [1].

In particular it was claimed that there was a right choice for GVN to make
given
two ADD instructions, one with the “nsw” attribute and one without, the one
without ‘nsw’ must be the representative.

GVN does not choose representatives based on the kind of attributes you
refer to because they are not properties of the value for the purposes of
representation, nor do i see how it would ever make sense to do so.
GVN can always patch the instructions after the fact to make sure the set
of attributes of the *operations* where the instructions occur is correct
for where they came from.

This was described as a correctness issue, not an optimality issue. IE if
you
are going to CSE those two ADDs, it is safe to loose information (drop the
‘nsw’)
but it is unsafe to put ‘nsw’ on to an operation that didn’t already have
it
because that is like inserting an “llvm.assume" where there was none.

This is unrelated to the choice of representative, and related to ensuring
that operations are modified for the properties of the representative
chosen.
Anything else would avoid value numbering

"add a, b" and "add nsw a, b"

to the same thing, when at worst, i just may have to drop the NSW if i
replace the second with the first