(not) initializing assembly outputs with -ftrivial-auto-var-init

Hi JF et al.,

In the Linux kernel we often encounter the following pattern:

type op(...) {
  type retval;
  inline asm(... retval ...);
  return retval;
}

, which is used to implement low-level platform-dependent memory operations.

Some of these operations turn out to be very hot, so we probably don't
want to initialize |retval| given that it's always initialized in the
assembly.

However it's practically impossible to tell that a variable is being
written to by the inline assembly, or figure out the size of that
write.
Perhaps we could speculatively treat every scalar output of an inline
assembly routine as an initialized value (which is true for the Linux
kernel, but I'm not sure about other users of inline assembly, e.g.
video codecs).

WDYT?

Do the assembly routines set the constraints to indicate that they write to the input register? If so, we can do a simple dead store elimination change to notice that the assembly “call” changes the input.

I think this will do what you want.

Please be more specific about the problem, because your simplified example doesn’t actually show an issue. If I write this function:
int foo() {
int retval;
asm("# …" : “=r”(retval));
return retval;
}
it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).

Example: https://godbolt.org/z/YYBCXL

Do the assembly routines set the constraints to indicate that they write to the input register? If so, we can do a simple dead store elimination change to notice that the assembly “call” changes the input.

The constraints allow us to figure out which variables are used as
outputs, but that doesn't necessarily mean these variables are
actually written to.

From LLVM's point of view the statements:

  asm("mov %%rdi, %0" : "=m"(out));
and
  asm("" : "=m"(out));
don't differ much.

Therefore we can't let any analysis depend on the assembly
constraints, we can only use it as a signal (maybe under a flag).

Please be more specific about the problem, because your simplified example doesn't actually show an issue. If I write this function:
int foo() {
  int retval;
  asm("# ..." : "=r"(retval));
  return retval;
}
it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).
Example: https://godbolt.org/z/YYBCXL

This is probably because you're passing retval as a register output.
If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
optimized away.
(I admit I didn't know about the difference)

>
> Please be more specific about the problem, because your simplified example doesn't actually show an issue. If I write this function:
> int foo() {
> int retval;
> asm("# ..." : "=r"(retval));
> return retval;
> }
> it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).
> Example: https://godbolt.org/z/YYBCXL
This is probably because you're passing retval as a register output.
If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
optimized away.
(I admit I didn't know about the difference)

I'm also unsure it's at all correct to optimize this store away in the
case of a register output.
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html is vague about
whether a well-formed asm directive is supposed to initialize its
register outputs.

If an asm’s constraints claim that the variable is an output, but then don’t actually write to it, that’s a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does not mean “pass through the previous value unchanged, if the asm failed to actually write to it”. If you need that behavior, it’s spelled “+m”, not “=m”.

We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init – we ought to eliminate manual initialization just the same), which I’d definitely consider an missing-optimization bug.

If an asm’s constraints claim that the variable is an output, but then don’t actually write to it, that’s a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does not mean “pass through the previous value unchanged, if the asm failed to actually write to it”. If you need that behavior, it’s spelled “+m”, not “=m”.

We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init – we ought to eliminate manual initialization just the same), which I’d definitely consider an missing-optimization bug.

Agreed on both counts. Maybe we don’t do the optimization because code tends to be wrong? If that’s the case, the optimization would make people sad (even though their code is wrong). We might need to validate asm statement constraints…

I don’t want us to just blindly do the optimization and break code, even if said code was incorrect.

I’m fairly certain it’s simply an accident of how it’s implemented in IR. At IR level, we mostly treat it as a call to a function that takes a pointer to memory. And we don’t have any general function-parameter attribute which says “this call definitely overwrites the memory pointed to by this argument”, nor special handling of the same sort for an inlineasm call. So, as far as the optimizer is concerned, there’s a call with a pointer argument. And, as with any other call, the called “function” may read, write, or do neither to the memory pointed to by the argument.

On the other hand, for register outputs, we do handle it as an output value of the call in IR, which is why that works properly.

GCC does seem to optimize this properly, so I think the risk that fixing this breaks a large body of code is fairly small.

I’m fairly certain it’s simply an accident of how it’s implemented in IR.

Likely, I’m just advocating for caution when doing this.

At IR level, we mostly treat it as a call to a function that takes a pointer to memory. And we don’t have any general function-parameter attribute which says “this call definitely overwrites the memory pointed to by this argument”, nor special handling of the same sort for an inlineasm call. So, as far as the optimizer is concerned, there’s a call with a pointer argument. And, as with any other call, the called “function” may read, write, or do neither to the memory pointed to by the argument.

I sent out my plan to add such an attribute in a prior discussion of auto-init. It’ll pay off in general, but especially so for auto-unit. I’ll likely implement it soon.

If an asm's constraints claim that the variable is an output, but then don't actually write to it, that's a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does _not_ mean "pass through the previous value unchanged, if the asm failed to actually write to it". If you need that behavior, it's spelled "+m", not "=m".

We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init -- we ought to eliminate manual initialization just the same), which I'd definitely consider an missing-optimization bug.

You mean we assume C code is buggy and asm code is not buggy because
compiler fails to disprove that there is a bug?
Doing this optimization without -ftrivial-auto-var-init looks
reasonable, compilers do optimizations assuming absence of bugs
throughout. But -ftrivial-auto-var-init is specifically about assuming
these bugs are everywhere.

Does kernel asm use "+m" or "=m"?

If asm _must_ write to that variable, then we could improve DSE in
normal case (ftrivial-auto-var-init is not enabled). If
ftrivial-auto-var-init is enabled, then strictly saying we should not
remove initialization because we did not prove that asm actually
writes. But we may remove initialization as well for practical
reasons.

Alex mentioned that in some cases we don't know actual address/size of
asm writes. But we should know it if a local var is passed to the asm,
which should be the case for kernel atomic asm blocks.

Interestingly, ftrivial-auto-var-init DSE must not be stronger then
non-ftrivial-auto-var-init DSE, unless we are talking about our own
emitted initialization stores, in such case ftrivial-auto-var-init DSE
may remove then more aggressively then what normal DSE would do, we
don't actually have to _prove_ that the init store is dead.

IMO the auto var init mitigation shouldn’t change the DSE optimization at all. We shouldn’t treat the stores we add any different. We should just improve DSE and everything benefits (auto var init moreso).

But you realize that this "just" improve involves fully understanding
static and dynamic behavior of arbitrary assembly for any architecture
without even using integrated asm? :wink:

If you want to solve every problem however unlikely, yes. If you narrow what you’re doing to a handful of cases that matter, no.

How can we improve DSE to handle all main kernel patterns that matter?
Can we? It's still unclear to me. Extending this optimization to
generic DSE and all stores can make it much harder (unsolvable)
problem...

Right now there's a handful of places in the kernel where we have to
use __attribute__((uninitialized)) just to avoid creating an extra
initializer: https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
and https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
All those assembly directives are using local scalar variables of size
<= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let
DSE remove redundant stores to local scalars that are used as asm()
"m" outputs"
False positives will sure be possible in theory, but hopefully rare in practice.

I would still love to know what's the main source of truth for the
semantics of asm() constraints.
For example, we've noticed that the BSF instruction, which can be used
as follows:

unsigned long ffs(unsigned long word) {
  unsigned long ret;
  asm("rep; bsf %1,%0" : "=r" (ret) : "rm" (word));
  return ret;
}

isn't guaranteed to initialize its output in the case |word| is 0
(according to unnamed Intel architect, it just zeroes out the top 32
bits of the return value).
Therefore the elimination of dead stores to |ret| done by both Clang
and GCC is correct only if the callers are careful enough.

If an asm's constraints claim that the variable is an output, but then don't actually write to it, that's a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does _not_ mean "pass through the previous value unchanged, if the asm failed to actually write to it". If you need that behavior, it's spelled "+m", not "=m".

We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init -- we ought to eliminate manual initialization just the same), which I'd definitely consider an missing-optimization bug.

You mean we assume C code is buggy and asm code is not buggy because
compiler fails to disprove that there is a bug?
Doing this optimization without -ftrivial-auto-var-init looks
reasonable, compilers do optimizations assuming absence of bugs
throughout. But -ftrivial-auto-var-init is specifically about assuming
these bugs are everywhere.

Please be more specific about the problem, because your simplified example doesn't actually show an issue. If I write this function:
int foo() {
int retval;
asm("# ..." : "=r"(retval));
return retval;
}
it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).
Example: https://godbolt.org/z/YYBCXL

This is probably because you're passing retval as a register output.
If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
optimized away.
(I admit I didn't know about the difference)

Hi JF et al.,

In the Linux kernel we often encounter the following pattern:

type op(...) {
type retval;
inline asm(... retval ...);
return retval;
}

, which is used to implement low-level platform-dependent memory operations.

Some of these operations turn out to be very hot, so we probably don't
want to initialize |retval| given that it's always initialized in the
assembly.

However it's practically impossible to tell that a variable is being
written to by the inline assembly, or figure out the size of that
write.
Perhaps we could speculatively treat every scalar output of an inline
assembly routine as an initialized value (which is true for the Linux
kernel, but I'm not sure about other users of inline assembly, e.g.
video codecs).

WDYT?

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Does kernel asm use "+m" or "=m"?

If asm _must_ write to that variable, then we could improve DSE in
normal case (ftrivial-auto-var-init is not enabled). If
ftrivial-auto-var-init is enabled, then strictly saying we should not
remove initialization because we did not prove that asm actually
writes. But we may remove initialization as well for practical
reasons.

Alex mentioned that in some cases we don't know actual address/size of
asm writes. But we should know it if a local var is passed to the asm,
which should be the case for kernel atomic asm blocks.

Interestingly, ftrivial-auto-var-init DSE must not be stronger then
non-ftrivial-auto-var-init DSE, unless we are talking about our own
emitted initialization stores, in such case ftrivial-auto-var-init DSE
may remove then more aggressively then what normal DSE would do, we
don't actually have to _prove_ that the init store is dead.

IMO the auto var init mitigation shouldn’t change the DSE optimization at all. We shouldn’t treat the stores we add any different. We should just improve DSE and everything benefits (auto var init moreso).

But you realize that this "just" improve involves fully understanding
static and dynamic behavior of arbitrary assembly for any architecture
without even using integrated asm? :wink:

If you want to solve every problem however unlikely, yes. If you narrow what you’re doing to a handful of cases that matter, no.

How can we improve DSE to handle all main kernel patterns that matter?
Can we? It's still unclear to me. Extending this optimization to
generic DSE and all stores can make it much harder (unsolvable)
problem...

Right now there's a handful of places in the kernel where we have to
use __attribute__((uninitialized)) just to avoid creating an extra
initializer: https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
and https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
All those assembly directives are using local scalar variables of size
<= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let
DSE remove redundant stores to local scalars that are used as asm()
"m" outputs"
False positives will sure be possible in theory, but hopefully rare in practice.

Right, you only need to teach the optimizer about asm that matters. You don’t need “extending this optimization to generic DSE”. What I’m saying is: this is generic DSE, nothing special about variable auto-init, except we’re making sure it help variable auto-init a lot. i.e. there’s no `if (VariableAutoInitIsOn)` in LLVM, there’s just some DSE smarts that are likely to kick in a lot more when variable auto-init is on.

I would still love to know what's the main source of truth for the
semantics of asm() constraints.

I don’t think you can trust programmer-provided constraints, unless you also add diagnostics to warn on incorrect constraints.

>
>>
>>>>>>>>
>>>>>>>> If an asm's constraints claim that the variable is an output, but then don't actually write to it, that's a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does _not_ mean "pass through the previous value unchanged, if the asm failed to actually write to it". If you need that behavior, it's spelled "+m", not "=m".
>>>>>>>>
>>>>>>>> We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init -- we ought to eliminate manual initialization just the same), which I'd definitely consider an missing-optimization bug.
>>>>>>>
>>>>>>> You mean we assume C code is buggy and asm code is not buggy because
>>>>>>> compiler fails to disprove that there is a bug?
>>>>>>> Doing this optimization without -ftrivial-auto-var-init looks
>>>>>>> reasonable, compilers do optimizations assuming absence of bugs
>>>>>>> throughout. But -ftrivial-auto-var-init is specifically about assuming
>>>>>>> these bugs are everywhere.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please be more specific about the problem, because your simplified example doesn't actually show an issue. If I write this function:
>>>>>>>>>> int foo() {
>>>>>>>>>> int retval;
>>>>>>>>>> asm("# ..." : "=r"(retval));
>>>>>>>>>> return retval;
>>>>>>>>>> }
>>>>>>>>>> it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).
>>>>>>>>>> Example: https://godbolt.org/z/YYBCXL
>>>>>>>>> This is probably because you're passing retval as a register output.
>>>>>>>>> If you change "=r" to "=m" (https://godbolt.org/z/ulxSgx), it won't be
>>>>>>>>> optimized away.
>>>>>>>>> (I admit I didn't know about the difference)
>>>>>>>>>>>
>>>>>>>>>>> Hi JF et al.,
>>>>>>>>>>>
>>>>>>>>>>> In the Linux kernel we often encounter the following pattern:
>>>>>>>>>>>
>>>>>>>>>>> type op(...) {
>>>>>>>>>>> type retval;
>>>>>>>>>>> inline asm(... retval ...);
>>>>>>>>>>> return retval;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> , which is used to implement low-level platform-dependent memory operations.
>>>>>>>>>>>
>>>>>>>>>>> Some of these operations turn out to be very hot, so we probably don't
>>>>>>>>>>> want to initialize |retval| given that it's always initialized in the
>>>>>>>>>>> assembly.
>>>>>>>>>>>
>>>>>>>>>>> However it's practically impossible to tell that a variable is being
>>>>>>>>>>> written to by the inline assembly, or figure out the size of that
>>>>>>>>>>> write.
>>>>>>>>>>> Perhaps we could speculatively treat every scalar output of an inline
>>>>>>>>>>> assembly routine as an initialized value (which is true for the Linux
>>>>>>>>>>> kernel, but I'm not sure about other users of inline assembly, e.g.
>>>>>>>>>>> video codecs).
>>>>>>>>>>>
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Alexander Potapenko
>>>>>>>>>>> Software Engineer
>>>>>>>>>>>
>>>>>>>>>>> Google Germany GmbH
>>>>>>>>>>> Erika-Mann-Straße, 33
>>>>>>>>>>> 80636 München
>>>>>>>>>>>
>>>>>>>>>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>>>>>>>>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>>>>>>>>> Sitz der Gesellschaft: Hamburg
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev@lists.llvm.org
>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Alexander Potapenko
>>>>>>>>> Software Engineer
>>>>>>>>>
>>>>>>>>> Google Germany GmbH
>>>>>>>>> Erika-Mann-Straße, 33
>>>>>>>>> 80636 München
>>>>>>>>>
>>>>>>>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>>>>>>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>>>>>>> Sitz der Gesellschaft: Hamburg
>>>>>>
>>>>>> Does kernel asm use "+m" or "=m"?
>>>>>>
>>>>>> If asm _must_ write to that variable, then we could improve DSE in
>>>>>> normal case (ftrivial-auto-var-init is not enabled). If
>>>>>> ftrivial-auto-var-init is enabled, then strictly saying we should not
>>>>>> remove initialization because we did not prove that asm actually
>>>>>> writes. But we may remove initialization as well for practical
>>>>>> reasons.
>>>>>>
>>>>>> Alex mentioned that in some cases we don't know actual address/size of
>>>>>> asm writes. But we should know it if a local var is passed to the asm,
>>>>>> which should be the case for kernel atomic asm blocks.
>>>>>>
>>>>>> Interestingly, ftrivial-auto-var-init DSE must not be stronger then
>>>>>> non-ftrivial-auto-var-init DSE, unless we are talking about our own
>>>>>> emitted initialization stores, in such case ftrivial-auto-var-init DSE
>>>>>> may remove then more aggressively then what normal DSE would do, we
>>>>>> don't actually have to _prove_ that the init store is dead.
>>>>>
>>>>>
>>>>> IMO the auto var init mitigation shouldn’t change the DSE optimization at all. We shouldn’t treat the stores we add any different. We should just improve DSE and everything benefits (auto var init moreso).
>>>>
>>>> But you realize that this "just" improve involves fully understanding
>>>> static and dynamic behavior of arbitrary assembly for any architecture
>>>> without even using integrated asm? :wink:
>>>
>>> If you want to solve every problem however unlikely, yes. If you narrow what you’re doing to a handful of cases that matter, no.
>>
>> How can we improve DSE to handle all main kernel patterns that matter?
>> Can we? It's still unclear to me. Extending this optimization to
>> generic DSE and all stores can make it much harder (unsolvable)
>> problem...
>
> Right now there's a handful of places in the kernel where we have to
> use __attribute__((uninitialized)) just to avoid creating an extra
> initializer: https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
> and https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
> All those assembly directives are using local scalar variables of size
> <= 8 bytes as "=qm" outputs, so we can narrow the problem down to "let
> DSE remove redundant stores to local scalars that are used as asm()
> "m" outputs"
> False positives will sure be possible in theory, but hopefully rare in practice.

Right, you only need to teach the optimizer about asm that matters. You don’t need “extending this optimization to generic DSE”. What I’m saying is: this is generic DSE, nothing special about variable auto-init, except we’re making sure it help variable auto-init a lot. i.e. there’s no `if (VariableAutoInitIsOn)` in LLVM, there’s just some DSE smarts that are likely to kick in a lot more when variable auto-init is on.

It doesn't have to be "if (VariableAutoInitIsOn), turn on DSE", it
could be just "If this is an assembly output, emit an
__attribute__((uninitialized)) for it".

> I would still love to know what's the main source of truth for the
> semantics of asm() constraints.

I don’t think you can trust programmer-provided constraints, unless you also add diagnostics to warn on incorrect constraints.

But then it's nothing left to trust. We sure don't want to parse the
assembly itself to reason about its behavior, so the constraints is
the only thing that lets us understand whether a variable is going to
be written to.

If an asm’s constraints claim that the variable is an output, but then don’t actually write to it, that’s a bug (at least if the value is actually used afterwards). An output-only constraint on inline asm definitely does not mean “pass through the previous value unchanged, if the asm failed to actually write to it”. If you need that behavior, it’s spelled “+m”, not “=m”.

We do seem to fail to take advantage of this for memory outputs (again, this is not just for ftrivial-auto-var-init – we ought to eliminate manual initialization just the same), which I’d definitely consider an missing-optimization bug.

You mean we assume C code is buggy and asm code is not buggy because
compiler fails to disprove that there is a bug?
Doing this optimization without -ftrivial-auto-var-init looks
reasonable, compilers do optimizations assuming absence of bugs
throughout. But -ftrivial-auto-var-init is specifically about assuming
these bugs are everywhere.

Please be more specific about the problem, because your simplified example doesn’t actually show an issue. If I write this function:
int foo() {
int retval;
asm("# …" : “=r”(retval));
return retval;
}
it already does get treated as definitely writing retval, and optimizes away the initialization (whether you explicitly initialize retval, or use -ftrivial-auto-var-init).
Example: https://godbolt.org/z/YYBCXL

This is probably because you’re passing retval as a register output.
If you change “=r” to “=m” (https://godbolt.org/z/ulxSgx), it won’t be
optimized away.
(I admit I didn’t know about the difference)

Hi JF et al.,

In the Linux kernel we often encounter the following pattern:

type op(…) {
type retval;
inline asm(… retval …);
return retval;
}

, which is used to implement low-level platform-dependent memory operations.

Some of these operations turn out to be very hot, so we probably don’t
want to initialize |retval| given that it’s always initialized in the
assembly.

However it’s practically impossible to tell that a variable is being
written to by the inline assembly, or figure out the size of that
write.
Perhaps we could speculatively treat every scalar output of an inline
assembly routine as an initialized value (which is true for the Linux
kernel, but I’m not sure about other users of inline assembly, e.g.
video codecs).

WDYT?


Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


cfe-dev mailing list
cfe-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Does kernel asm use “+m” or “=m”?

If asm must write to that variable, then we could improve DSE in
normal case (ftrivial-auto-var-init is not enabled). If
ftrivial-auto-var-init is enabled, then strictly saying we should not
remove initialization because we did not prove that asm actually
writes. But we may remove initialization as well for practical
reasons.

Alex mentioned that in some cases we don’t know actual address/size of
asm writes. But we should know it if a local var is passed to the asm,
which should be the case for kernel atomic asm blocks.

Interestingly, ftrivial-auto-var-init DSE must not be stronger then
non-ftrivial-auto-var-init DSE, unless we are talking about our own
emitted initialization stores, in such case ftrivial-auto-var-init DSE
may remove then more aggressively then what normal DSE would do, we
don’t actually have to prove that the init store is dead.

IMO the auto var init mitigation shouldn’t change the DSE optimization at all. We shouldn’t treat the stores we add any different. We should just improve DSE and everything benefits (auto var init moreso).

But you realize that this “just” improve involves fully understanding
static and dynamic behavior of arbitrary assembly for any architecture
without even using integrated asm? :wink:

If you want to solve every problem however unlikely, yes. If you narrow what you’re doing to a handful of cases that matter, no.

How can we improve DSE to handle all main kernel patterns that matter?
Can we? It’s still unclear to me. Extending this optimization to
generic DSE and all stores can make it much harder (unsolvable)
problem…

Right now there’s a handful of places in the kernel where we have to
use attribute((uninitialized)) just to avoid creating an extra
initializer: https://github.com/google/kmsan/commit/00387943691e6466659daac0312c8c5d8f9420b9
and https://github.com/google/kmsan/commit/2954f1c33a81c6f15c7331876f5b6e2fec0d631f
All those assembly directives are using local scalar variables of size
<= 8 bytes as “=qm” outputs, so we can narrow the problem down to “let
DSE remove redundant stores to local scalars that are used as asm()
“m” outputs”
False positives will sure be possible in theory, but hopefully rare in practice.

Right, you only need to teach the optimizer about asm that matters. You don’t need “extending this optimization to generic DSE”. What I’m saying is: this is generic DSE, nothing special about variable auto-init, except we’re making sure it help variable auto-init a lot. i.e. there’s no if (VariableAutoInitIsOn) in LLVM, there’s just some DSE smarts that are likely to kick in a lot more when variable auto-init is on.

It doesn’t have to be “if (VariableAutoInitIsOn), turn on DSE”, it
could be just "If this is an assembly output, emit an
attribute((uninitialized)) for it”.

That’s something I would really rather avoid. It’s much better to make DSE more powerful than to play around with how clang generates variable auto-init.

I would still love to know what’s the main source of truth for the
semantics of asm() constraints.

I don’t think you can trust programmer-provided constraints, unless you also add diagnostics to warn on incorrect constraints.

But then it’s nothing left to trust. We sure don’t want to parse the
assembly itself to reason about its behavior, so the constraints is
the only thing that lets us understand whether a variable is going to
be written to.

I thin you do want to look into the assembly. Have you tried instrumenting clang to dump out all assembly strings? What are in those strings?