The semantics of nonnull attribute

I guess in many cases Rust/Swift functions can be lowered into IR functions with not_poison flags attached (though I’m not an Rust/Swift expert, so just a guess)
For C, people may want to allow idioms like below, so further investigation is needed:

int x, y;
if (cond) x = 1; else y = 1;
f(cond, x, y);

The same thing can be done in Rust, though I expect it to be less common:

let mut x = MaybeUninit::::uninit();

let mut y = MaybeUninit::::uninit();
unsafe {
if cond {
x.as_mut_ptr().write(1);
} else {
y.as_mut_ptr().write(1);
}
}
f(cond, x, y);

Jacob Lifshay

Hello Philip,

I agree that the current (UB) semantics is simpler in terms of interprocedural analysis, .

My concern was that the semantics of nonnull was too strong (too undefined) for optimizations that attach nonnull, such as the InstCombine optimization. It seems there is a class of such transformations, such as deriving nonnull from inttoptr(or x, 1).

If nonnull raises UB, attaching nonnull from this is problematic, as these operations don’t raise UB but propagates poison.

Or this can be supported by adding ‘nonnull_or_poison’,
Would this make sense? It wouldn’t touch the existing semantics of nonnull.

Also, if an argument is not_poison and nonnull_or_poison, they can be merged into nonnull.

Juneyoung Lee

Hello,

Would it be correct to resolve this by saying that dereferenceable(N)
implies not_poison? This would be helpful as a clarification of how
it all fits together.

Yes, I think it makes sense.

Hello,

> Would it be correct to resolve this by saying that dereferenceable(N)
> *implies* not_poison? This would be helpful as a clarification of how
> it all fits together.

Yes, I think it makes sense.

I don't we should do that.

Take the `gep inbounds` example:

char* foo(char *arg) {
  return `gep inbounds %arg, -100`
}

Here it depends if we want to deduce the output is dereferenceable(100)
or not. If we do, we need dereferenceable to mean poison if violated, as
with nonnull, because it is derived from poison. Only if we don't derive
dereferenceable for the return value we can go for dereferenceable
violations are UB.

In the end, I think, it boils down to the question if there are
situations where violation of some attributes should be poison and
violation of others should be UB. If such situations exists it is
unclear to me what makes the UB/poison ones special.

Hello,

I don’t understand this. Why should the transformed code raise UB when
the original did not? I assume the loop is executed for sure, if the
loop is not executed for sure, the situation is different but then I
don’t think the hoisting is “sound”. (Btw. nonnull should also be
applicable to integers.)

Sorry, I missed to mention the assumption that the loop may not iterate at all, leading to not raising undefined behavior.

I am unsure if there are opportunities where you need “poison” for
nonnull and “UB” for the rest. With non_poison we get, poison for
all or UB for all. We can think of other combinations but we should
determine if we actually need them.

I think at least a dereferenceable argument cannot have poison. It should guarantee that dereferencing it is not UB, so its memory access can be freely code-motioned, as shown in Nuno’s example.

It doesn’t mean that all attributes should raise UB however, as discussed so far;
Or, inversely, I gave a thought about having only a small set of attributes that are allowed to have poison, such as a new ‘nonnull_or_poison’.
This can be gracefully merged with ‘not_poison’ and be promoted to ‘nonnull’.

Hi Johannes,

> Hello,
>
> > Would it be correct to resolve this by saying that dereferenceable(N)
> > *implies* not_poison? This would be helpful as a clarification of how
> > it all fits together.
>
> Yes, I think it makes sense.

I don't we should do that.

Take the `gep inbounds` example:

char* foo(char *arg) {
  return `gep inbounds %arg, -100`
}

Here it depends if we want to deduce the output is dereferenceable(100)
or not. If we do, we need dereferenceable to mean poison if violated, as
with nonnull, because it is derived from poison. Only if we don't derive
dereferenceable for the return value we can go for dereferenceable
violations are UB.

That's a fair point. The same kind of argument actually applies to an
analog of the example 4 that started this thread. If the argument is
dead, then:

f(dereferenceable(N) %ptr)
==>
f(dereferenceable(N) undef)

... would introduce UB and therefore be forbidden. So if that example
serves as motivation for making nonnull weaker and introducing
not_poison, then it should really serve as motivation for making _all_
function argument attributes weaker.

Besides, there's a certain elegance in treating _all_ attributes on
function arguments the same:

* passing function arguments that do not satisfy the attribute
constraints turn the corresponding value into poison
* passing poison for a not_poison argument causes immediate UB (and
this rule applies _after_ the first rule had a chance to poison
things)

That seems like it would be far less error prone than having to
remember on a case-by-case basis whether certain attributes cause
poison or immediate UB.

Cheers,
Nicolai

Hello,

Would it be correct to resolve this by saying that dereferenceable(N)
*implies* not_poison? This would be helpful as a clarification of how
it all fits together.

Yes, I think it makes sense.

I don't we should do that.

Take the `gep inbounds` example:

char* foo(char *arg) {
  return `gep inbounds %arg, -100`
}

Here it depends if we want to deduce the output is dereferenceable(100)
or not. If we do, we need dereferenceable to mean poison if violated, as
with nonnull, because it is derived from poison. Only if we don't derive
dereferenceable for the return value we can go for dereferenceable
violations are UB.

Can you please clarify what it means for the output of dereferenceable to be poison? If we tag a memory address as dereferenceable, is the optimizer free to insert a load of the address immediately following that? Or we need to see some other access (prior to any thread synchronization?) to say that's valid?

Thanks again,

Hal

In the end, I think, it boils down to the question if there are
situations where violation of some attributes should be poison and
violation of others should be UB. If such situations exists it is
unclear to me what makes the UB/poison ones special.

Two thoughts:

  1. I think that we should aim for regularity, to the extent possible, and so we should treat nonnull, align, etc. similarly w.r.t. to whether they produce poison or UB.

  2. I was thinking about the following last night, and it clarified for me why having an not_poison attribute makes sense and seems useful, and how poison/UB might affect things on a function-call boundary itself. Imagine that we had a fastcc lowering strategy that took a pointer argument with an alignment attribute, followed by a suitably-small integer argument, and implemented a calling convention that passed both in the same register. If the pointer value might be poison, and thus violate the alignment attribute (or might violate the alignment attribute otherwise and produce poison), then we can’t implement this just by anding together the two values (to pass them in the one register). We need to mask off the low bits first. If the value can’t be or generate poison, and violating the alignment constraint produces UB, then the masking is not needed and we can just and together the two values (confident that the low bits will always be zero).

-Hal

Hello all,

The problem with defining all attributes as yielding poison is that, certain attributes are not meaningful to yield poison or only meaningful when it is conveyed with not_poison.

If a dereferenceable argument can have poison, I don’t think the attribute will be useful by its alone because existing analyses cannot conclude that accessing the pointer is okay; To make the conclusion, it always has to be carried with not_poison. Forgetting not_poison check will lead to a bug.
Among other attributes, byval seems problematic to me.
The value pointed by a byval pointer is copied at a temporary space, and the temporary address is passed to the callee.
This means that byval argument should raise UB if it was not dereferenceable; f(byval null); raises segmentation fault even if the null pointer is not used inside f(). (https://godbolt.org/z/sNC_RF ) So we cannot define it as f(poison).

If the semantics of attributes should be consistent, I suggest UB should be the one. not_poison will also raise UB if the input was poison, so it also satisfies the consistency as well.
The gep inbounds optimization should be fixed then. For dead argument elimination and function call hoisting, we should be able to drop relevant attributes.

Best regards,
Juneyoung Lee

The problem with defining all attributes as yielding poison is that, certain attributes are not meaningful to yield poison or only meaningful when it is conveyed with not_poison.
If a dereferenceable argument can have poison, I don’t think the attribute will be useful by its alone because existing analyses cannot conclude that accessing the pointer is okay; To make the conclusion, it always has to be carried with not_poison. Forgetting not_poison check will lead to a bug.
Among other attributes, byval seems problematic to me.
The value pointed by a byval pointer is copied at a temporary space, and the temporary address is passed to the callee.
This means that byval argument should raise UB if it was not dereferenceable; `f(byval null);` raises segmentation fault even if the null pointer is not used inside f(). (Compiler Explorer ) So we cannot define it as f(poison).

byval seems sufficiently different though. The attributes that have
been mostly discussed in this thread (nonnull, dereferenceable, align)
are attributes that claim a property of the value passed into the
argument. byval is about how the value is passed into the function.
There are other attributes that fall into different categories, such
as nocapture and nofree.

Cheers,
Nicolai