[llvm] r175553 - Fix a bug in mayHaveSideEffects. Functions that do not return are now considered as instructions with side effects.

Hi Nadav,

I almost missed your email because you replied only to the list. I understand your argument. I think that my fix addresses a part of it. It says that instructions that do not return should not be removed.
The current implementation of mayReturn is to check the 'noreturn' function attribute. Are you suggesting to add a new attribute that is called 'mustreturn' ? It is as equally difficult to know if a function will halt.

a function would only be 'mustreturn' [*] if you can prove that it must return
in finite time. By default functions would not be 'mustreturn'. Of course it
is not always possible to prove that a function must return even when it does;
in these cases the function would not be marked 'mustreturn', because you only
mark functions that you can prove must return. This is analogous to 'noreturn'
where you only mark a function 'noreturn' if you can prove it doesn't return;
in difficult cases where it doesn't return but you failed to prove it then the
function is not marked 'noreturn'.

The big difference between the mustreturn and noreturn approaches to the
infinite loop problem is that when the noreturn approach gets it wrong then
you get miscompilations, while when the mustreturn approach gets it wrong you
get missed optimizations, not miscompilations. Summary: the MR approach is
always conservatively correct, while the NR rule is not.

Consider the different cases, comparing what happens with the following two
rules for a function (a function which otherwise wouldn't be considered to have
side effects eg because it is readonly and nounwind):

   NR rule: the function is considered to have side-effects iff it has the noreturn attribute;

   MR rule: the function is considered to have side-effects iff it does not have the mustreturn attribute.

Case 1: a function infinite loops in an obvious way. The function will be
marked NR, and will not be marked MR. Thus both rules say this function has
side effects.

Case 2: a function which obviously does not infinite loop. The function will
not be marked NR, and will be marked MR. Thus both rules say that this
function does not have side effects.

Case 3: a function which always or sometimes infinite loops, but it is too hard
for the optimizers to work this out. Then the function will not be marked NR
(because we failed to prove it is NR, even though it may be), and will not be
marked MR (because we must fail to prove it is MR, since it is not MR). Then
the NR rule will say that the function *does not have side-effects*, which is
wrong, and may lead to misoptimization. The MR rule says the function *does
have side-effects* which is correct.

Case 4: a function which never infinite loops, but it is too hard for the
optimizers to work this out. Then the function will not be marked NR (because
we must fail to prove it is NR, since it is not NR), and will not be marked MR
(because it was too hard to prove). Then the NR says that the function does not
have side-effects, which is correct. The MR rule says that the function does
have side-effects, which is wrong, but only results in missed optimizations,
*not* miscompilations.

Notice how in every case in which the NR rule says the function has side-effects
then the MR rule also says it has side-effects. Thus if we implement the MR
rule then the NR rule you added is not useful. On the other hand, as these
examples show, the NR rule is inadequate because it doesn't always say you have
side-effects when sometimes you have. By contrast, the MR rule is always
conservatively correct.

Ciao, Duncan.

[*] The difference between 'mustreturn' and my original 'finite' is the question
of what you say for functions containing a call to 'exit' or something like
that. Such a function doesn't return so 'mustreturn' would be confusing, on the
other hand it completes in finite time, thus 'finite' is not unreasonable. To
some extent which version is better is a judgement call, but I think the best
approach may fall out naturally once someone tries to implement a pass that
computes these attributes, as probably one of them is simpler to reason about,
but for the moment I don't know which one it is.