Inline assembly and poison values

Hello world,

I'm currently reviewing some code making heavy use of fast math, and
thus probably generating poison values.

This code uses an inline assembly code block in order to freeze the
poison values, as it's written in Rust and Rust doesn't currently expose
the freeze operation.

My question is, does inline assembly freeze its inputs or does it
propagate poison? I can't find in the documentation any explicit answer
with regards to freezing inline assembly inputs. My reading of the
current documentation for poison would make me think inline assembly
would propagate poison because there's no exception for it, but I'd be
surprised if that were the wanted semantics, as it'd mean inline
assembly would probably need to be UB as soon as any poison is even
transitively (through pointers) passed.

Would it be possible to add an explicit clause to the documentation
indicating whether inline assembly freezes its inputs or if it
propagates poison, or even if it generates UB when being passed a
poisoned value?

To be precise, the code uses an `in(reg) value.as_ptr()` (equivalent to
`"r" (&value)` with clang) with nostack and preserves_flags. I'd also be
interested in knowing whether `inlateout(reg) value` (approx. equivalent
to `"=r" (value)` with clang) with pure would still freeze, as it should
lead to better performance by not requiring llvm to put the variable in
memory.

Either way, thank you for LLVM, it's an awesome piece of software!

Best,
  Léo

Hello world,

I'm currently reviewing some code making heavy use of fast math, and
thus probably generating poison values.

This code uses an inline assembly code block in order to freeze the
poison values, as it's written in Rust and Rust doesn't currently expose
the freeze operation.
My question is, does inline assembly freeze its inputs or does it
propagate poison? I can't find in the documentation any explicit answer
with regards to freezing inline assembly inputs. My reading of the
current documentation for poison would make me think inline assembly
would propagate poison because there's no exception for it, but I'd be
surprised if that were the wanted semantics, as it'd mean inline
assembly would probably need to be UB as soon as any poison is even
transitively (through pointers) passed.

Would it be possible to add an explicit clause to the documentation
indicating whether inline assembly freezes its inputs or if it
propagates poison, or even if it generates UB when being passed a
poisoned value?

I very much hope inline asm can (in general) act like a freeze
but does not have to. That is, if we ever look into the box we
can determine if it does freeze or not, and consequently use the
information for follow argumentation. However, unless we look
into the box we cannot assume anything. Hence, asm does not
propagate poison but also does not freeze the inputs. That means
we shall not propagate poison trough (uninterpreted) asm but also
not remove a subsequent freeze under the assumption the asm would
have implicitly frozen the poison already.

That all said, whatever we come up with needs documentation for
sure.

~ Johannes

Johannes Doerfert <johannesdoerfert@gmail.com> writes:

I very much hope inline asm can (in general) act like a freeze
but does not have to. That is, if we ever look into the box we
can determine if it does freeze or not, and consequently use the
information for follow argumentation. However, unless we look
into the box we cannot assume anything. Hence, asm does not
propagate poison but also does not freeze the inputs. That means
we shall not propagate poison trough (uninterpreted) asm but also
not remove a subsequent freeze under the assumption the asm would
have implicitly frozen the poison already.

Do I understand correctly if I say that this means that for defining
proper semantics of the assembly+IR group, this would require defining,
for each assembly backend, what “poison” translates to and from for it?

Or maybe documentation could just say “what exactly ‘poison’ means is
backend-specific, and the outputs of an assembly block handling poisoned
data can be, or not, poisoned depending on each backend” or something
similar, thus postponing the specification work for later?

Though it'd probably be better if it were possible to have a full spec
of what exactly poison means for each backend, I guess it can take a
while to check exactly how each poison can arise and what they should
translate to for each backend in order to enable as many optimizations
as possible

I think the second solution is what you want. We also do not
define the semantics of inline ASM, so how could we say what
it means if poison goes in. Inline ASM, as it stands, should
always be allowed to produce poison, write poison into output
registers, etc. If we want ways to encode it does not, that's
a different story. That said, I'm not sure why explicit freeze
would be bad for the output, and for the input we probably
don't care as we do not "analyze" the asm anyway. No?

~ Johannes

Johannes Doerfert <johannesdoerfert@gmail.com> writes:

Do I understand correctly if I say that this means that for defining
proper semantics of the assembly+IR group, this would require defining,
for each assembly backend, what “poison” translates to and from for it?

Or maybe documentation could just say “what exactly ‘poison’ means is
backend-specific, and the outputs of an assembly block handling poisoned
data can be, or not, poisoned depending on each backend” or something
similar, thus postponing the specification work for later?

I think the second solution is what you want. We also do not
define the semantics of inline ASM, so how could we say what
it means if poison goes in. Inline ASM, as it stands, should
always be allowed to produce poison, write poison into output
registers, etc. If we want ways to encode it does not, that's
a different story. That said, I'm not sure why explicit freeze
would be bad for the output, and for the input we probably
don't care as we do not "analyze" the asm anyway. No?

It totally makes sense to me! Out of curiosity, what is the process for
adding a clause like the below in the reference? (eg. waiting some delay
then submitting a formal change, getting it reviewed and landing it?)

What exactly ‘poison’ means is backend-specific, and the outputs of
an assembly block handling poisoned data can be, or not, poisoned
depending on each backend and the exact contents of the assembly
block. In particular, the backend is allowed to peer into the
assembly block and optimize depending on that.

PS: Sorry for the duplicate mail on your personal email, I forgot to use
the correct sender address

Johannes Doerfert <johannesdoerfert@gmail.com> writes:

Do I understand correctly if I say that this means that for defining
proper semantics of the assembly+IR group, this would require defining,
for each assembly backend, what “poison” translates to and from for it?

Or maybe documentation could just say “what exactly ‘poison’ means is
backend-specific, and the outputs of an assembly block handling poisoned
data can be, or not, poisoned depending on each backend” or something
similar, thus postponing the specification work for later?

I think the second solution is what you want. We also do not
define the semantics of inline ASM, so how could we say what
it means if poison goes in. Inline ASM, as it stands, should
always be allowed to produce poison, write poison into output
registers, etc. If we want ways to encode it does not, that's
a different story. That said, I'm not sure why explicit freeze
would be bad for the output, and for the input we probably
don't care as we do not "analyze" the asm anyway. No?

It totally makes sense to me! Out of curiosity, what is the process for
adding a clause like the below in the reference? (eg. waiting some delay
then submitting a formal change, getting it reviewed and landing it?)

"like the below"? I'm confused. Adding an IR extensions requires
a RFC on the mailing list, look for recent ones to see what format
they can have. Motivation, semantics, etc. Then (or together) a
patch for the LangRef and the IR reader/writer etc. Then preferably
some users in the code base so it's not just pretty but useful :wink:

Does that answer the question?

~ Johannes

Johannes Doerfert <johannesdoerfert@gmail.com> writes:

It totally makes sense to me! Out of curiosity, what is the process for
adding a clause like the below in the reference? (eg. waiting some delay
then submitting a formal change, getting it reviewed and landing it?)

"like the below"? I'm confused. Adding an IR extensions requires
a RFC on the mailing list, look for recent ones to see what format
they can have. Motivation, semantics, etc. Then (or together) a
patch for the LangRef and the IR reader/writer etc. Then preferably
some users in the code base so it's not just pretty but useful :wink:

Does that answer the question?

Hmm so just to confirm, the thing I was thinking of adding was just the
following paragraph (which maybe formatted poorly over email?). To the
best of my understanding it's not actually an IR extension, just
explicitly stating in the reference the conclusion of the current
discussion about the wanted semantics of freezing and inline assembly.
Does that still need an RFC, or is there a lighter process for such
documentation changes? (especially as this paragraph doesn't actually
specify anything new to the best of my knowledge, it's just explicitly
stating the absence of guarantees that before that were implicitly not
there)

Said paragraph:
----------8<----------
What exactly ‘poison’ means is backend-specific, and the outputs of an
assembly block handling poisoned data can be, or not, poisoned depending
on each backend and the exact contents of the assembly block. In
particular, the backend is allowed to peer into the assembly block and
optimize depending on that.
---------->8----------