returns_twice / noreturn

Hello,

I am not sure about the semantic (if any) of returns_twice and noreturn attributes.

int fork() attribute((returns_twice));
void join(int) attribute((noreturn));

int f(int n) {
int t = fork();
n++;
if (t != 0)
join(t);
return n;
}

Produces the following LLVM IR:

; Function Attrs: nounwind uwtable
define i32 @f(i32 %n) local_unnamed_addr #0 {
entry:
%call = call i32 (…) @fork() #3
%cmp = icmp eq i32 %call, 0
br i1 %cmp, label %if.end, label %if.then

if.then: ; preds = %entry
call void @join(i32 %call) #4
unreachable

if.end: ; preds = %entry
%inc = add nsw i32 %n, 1
ret i32 %inc
}

; Function Attrs: returns_twice
declare i32 @fork(…) local_unnamed_addr #1

; Function Attrs: noreturn
declare void @join(i32) local_unnamed_addr #2

Where the n++ has been moved after the if, is that legal?

Also, technically, f could also returns_twice or noreturn (depending on the return values of fork).

So my question is: do they have semantic or they are only “clues” for heuristic purposes?

Why wouldn’t it be? They have semantics. returns_twice, however, really means, “it may return more than once”. noreturn is interpreted as the name implies. Thus the unreachable after the call. -Hal

Hello,

I am not sure about the semantic (if any) of returns_twice and noreturn
attributes.

int fork() __attribute__((returns_twice));
void join(int) __attribute__((noreturn));

int f(int n) {
  int t = fork();
  n++;
  if (t != 0)
    join(t);
  return n;
}

Where the n++ has been moved after the if, is that legal?

Why wouldn't it be?

Because fork() could return 0, then n get incremented (first time), we go
into join(t) which do not return... but jump back into fork() which returns
again, but 1 this time, then n get incremented (second time), and we return
n+2.

While if we move the n++ outside of that "region", we change that semantic?
Basically, returns_twice and noreturn have SSA-reaching side-effects.

They have semantics. returns_twice, however, really means, "it may return
more than once". noreturn is interpreted as the name implies. Thus the
unreachable after the call.

That means we can encode a loop this way? :slight_smile:

Hello,

I am not sure about the semantic (if any) of returns_twice and noreturn
attributes.

int fork() __attribute__((returns_twice));
void join(int) __attribute__((noreturn));

int f(int n) {
  int t = fork();
  n++;
  if (t != 0)
    join(t);
  return n;
}

Where the n++ has been moved after the if, is that legal?

Why wouldn't it be?

Because fork() could return 0, then n get incremented (first time), we go
into join(t) which do not return... but jump back into fork() which returns
again, but 1 this time, then n get incremented (second time), and we return
n+2.

This is a valid transformation and that's why to get the effect you
want in C/C++, the variable must be marked volatile.

Hello,

I am not sure about the semantic (if any) of returns_twice and noreturn
attributes.

int fork() __attribute__((returns_twice));
void join(int) __attribute__((noreturn));

int f(int n) {
   int t = fork();
   n++;
   if (t != 0)
     join(t);
   return n;
}

Where the n++ has been moved after the if, is that legal?

Why wouldn't it be?

Because fork() could return 0, then n get incremented (first time), we go
into join(t) which do not return... but jump back into fork() which returns
again, but 1 this time, then n get incremented (second time), and we return
n+2.

This is a valid transformation and that's why to get the effect you
want in C/C++, the variable must be marked volatile.

That's correct. The relevant semantics here come from C's setjmp/longjmp, and there's an exception in that language to deal with this situation:

7.13.2.1p3: "All accessible objects have values, and all other components of the abstract machine have state, as of the time the longjmp function was called, except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate."

We should probably import some version of this into the LangRef to more-accurately describe returns_twice/noreturn (because that's what we actually implement in this regard).

  -Hal

On Fri, Nov 3, 2017 at 6:06 PM, Hal Finkel via llvm-dev

That's correct. The relevant semantics here come from C's setjmp/longjmp,
and there's an exception in that language to deal with this situation:

7.13.2.1p3: "All accessible objects have values, and all other components of
the abstract machine have state, as of the time the longjmp function was
called, except that the values of objects of automatic storage duration that
are local to the function containing the invocation of the corresponding
setjmp macro that do not have volatile-qualified type and have been changed
between the setjmp invocation and longjmp call are indeterminate."

We should probably import some version of this into the LangRef to
more-accurately describe returns_twice/noreturn (because that's what we
actually implement in this regard).

We do not implement that restricted semantics correctly either -- see
https://bugs.llvm.org/show_bug.cgi?id=27190

IMO the Right(TM) fix is to add a CFG edge from all possibly
longjmp'ing function calls to all setjmps in a function. We can
probably do this by modeling the possibly longjmp'ing calls as invokes
that unwind to a special "setjmp" landingpad.

-- Sanjoy

We do not implement that restricted semantics correctly either -- see
https://bugs.llvm.org/show_bug.cgi?id=27190

Haha, I wondered for a minute whether I should bring up that bug...
We've seen pretty nasty crashes due to it and had to work around
it.....

One of my recent work has also uncovered another (I believe) invalid
handling of returns_twice functions....
https://bugs.llvm.org/show_bug.cgi?id=35211
noalias returns seems to be treated as stack allocation but is not
required by either LangRef or C standard to be so for returns_twice
function handling....

IMO the Right(TM) fix is to add a CFG edge from all possibly
longjmp'ing function calls to all setjmps in a function. We can
probably do this by modeling the possibly longjmp'ing calls as invokes
that unwind to a special "setjmp" landingpad.

This seems to be a way to handle setjmp without requiring volatile
anywhere? We've thought about doing that ourselves a little but
decided that it was too complicated compare to the left over issue
after the crash was worked around and it's also hard to make it handle
longjmp's from signal handlers very well....

For the record, function returning twice are not even correctly
annotated by glibc, see 20382 – getcontext and setjmp should have __attribute__((returns_twice))

Looks like gcc handles that using a pattern matching approach
(see Bernd Edlinger - [PATCH] Fix unsafe function attributes for special functions (PR 71876))

Maybe we should do so too?

We already have code in clang to mark calls to setjmp/sigsetjmp/etc. returns_twice.

-Eli

Is the idea to encode all dynamic execution traces statically in the
CFG?