missing return statement for non-void functions in C++

Hi,

In C++, the undefined behaviour of a missing return statements for a non-void function results in not generating the function epilogue (unreachable statement is inserted and the return statement is optimised away). Consequently, the runtime behaviour is that control is never properly returned from this function and thus it starts executing “garbage instructions”. As this is undefined behaviour, this is perfectly fine and according to the spec, and a compile warning for this missing return statement is issued. However, in C, the behaviour is that a function epilogue is generated, i.e. basically by returning uninitialised local variable. Codes that rely on this are not beautiful pieces of code, i.e are buggy, but it might just be okay if you for example have a function that just initialises stuff (and the return value is not checked, directly or indirectly); some one might argue that not returning from that function might be a bit harsh. So this email is to probe if there would be strong resistance to follow the C behaviour? I am not yet sure how, but would perhaps a compromise be possible/acceptable to make the undefined behaviour explicit and also generate the function epilogue?

The code snippet below is taken from CodeGenFunction.cpp, which implements this C++ behaviour and also nicely documents why that is:

919 // C++11 [stmt.return]p2:

920 // Flowing off the end of a function […] results in undefined behavior in

921 // a value-returning function.

922 // C11 6.9.1p12:

923 // If the ‘}’ that terminates a function is reached, and the value of the

924 // function call is used by the caller, the behavior is undefined.

925 if (getLangOpts().CPlusPlus && !FD->hasImplicitReturnZero() && !SawAsmBlock &&

926 !FD->getReturnType()->isVoidType() && Builder.GetInsertBlock()) {

927 if (SanOpts.has(SanitizerKind::Return)) {

928 SanitizerScope SanScope(this);

929 llvm::Value *IsFalse = Builder.getFalse();

930 EmitCheck(std::make_pair(IsFalse, SanitizerKind::Return),

931 “missing_return”, EmitCheckSourceLocation(FD->getLocation()),

932 None);

933 } else if (CGM.getCodeGenOpts().OptimizationLevel == 0) {

934 EmitTrapCall(llvm::Intrinsic::trap);

935 }

936 Builder.CreateUnreachable();

937 Builder.ClearInsertionPoint();

938 }

Cheers,

Sjoerd.

Hi,

In C++, the undefined behaviour of a missing return statements for a
non-void function results in not generating the function epilogue
(unreachable statement is inserted and the return statement is optimised
away). Consequently, the runtime behaviour is that control is never
properly returned from this function and thus it starts executing “garbage
instructions”. As this is undefined behaviour, this is perfectly fine and
according to the spec, and a compile warning for this missing return
statement is issued. However, in C, the behaviour is that a function
epilogue is generated, i.e. basically by returning uninitialised local
variable. Codes that rely on this are not beautiful pieces of code, i.e are
buggy, but it might just be okay if you for example have a function that
just initialises stuff (and the return value is not checked, directly or
indirectly); some one might argue that not returning from that function
might be a bit harsh.

I would not be one of those people.

So this email is to probe if there would be strong resistance to follow
the C behaviour? I am not yet sure how, but would perhaps a compromise be
possible/acceptable to make the undefined behaviour explicit and also
generate the function epilogue?

"undefined behavior" is exactly that.

You have no idea what is going to happen; there are no restrictions on what
the code being executed can do.

"it just might be ok" means on a particular version of
a particular compiler, on a particular architecture and OS, at
a particular optimization level. Change any of those things, and you can
change the behavior.

-- Marshall

Hi,

In C++, the undefined behaviour of a missing return statements for a
non-void function results in not generating the function epilogue
(unreachable statement is inserted and the return statement is optimised
away). Consequently, the runtime behaviour is that control is never
properly returned from this function and thus it starts executing “garbage
instructions”. As this is undefined behaviour, this is perfectly fine and
according to the spec, and a compile warning for this missing return
statement is issued. However, in C, the behaviour is that a function
epilogue is generated, i.e. basically by returning uninitialised local
variable. Codes that rely on this are not beautiful pieces of code, i.e are
buggy, but it might just be okay if you for example have a function that
just initialises stuff (and the return value is not checked, directly or
indirectly); some one might argue that not returning from that function
might be a bit harsh.

I would not be one of those people.

Nor me.

So this email is to probe if there would be strong resistance to follow
the C behaviour? I am not yet sure how, but would perhaps a compromise be
possible/acceptable to make the undefined behaviour explicit and also
generate the function epilogue?

"undefined behavior" is exactly that.

You have no idea what is going to happen; there are no restrictions on
what the code being executed can do.

"it just might be ok" means on a particular version of
a particular compiler, on a particular architecture and OS, at
a particular optimization level. Change any of those things, and you can
change the behavior.

In fact, the "it works kind of as you expected" is the worst kind of UB in
my mind. UB that causes a crash, stops or other "directly obvious that this
is wrong" are MUCH easier to debug.

So make this particular kind of UB explicit by crashing or stopping would
be a good thing. Making it explicit by "returning kind of nicely, but not
correct return value" is about the worst possible result.

Hi,

I’m definitely in the other camp to Marshall and Mats on this one. I don’t believe that such aggressive optimization of undefined behavior should be done unless there is a tangible benefit. I think we can all agree that some UB, like the treatment of signed/unsigned wrap, can eventually end up with optimizations on a correct program that wouldn’t otherwise be possible.

But what optimization can this treatment of UB enable? I can’t see how a well formed program would benefit from this. And if no well formed program can benefit, what is the use in deliberately destroying non-well-formed programs? We all know that legacy code exists; I don’t believe it’s right that we break them just because we can. Or because someone took a conforming C program and ran it through a C++ compiler!

I’m fully prepared for backlash here though (or someone showing the actual benefit of this treatment of UB on well-formed programs).

Cheers,

James

Hi,

I'm definitely in the other camp to Marshall and Mats on this one. I don't
believe that such aggressive optimization of undefined behavior should be
done unless there is a tangible benefit. I think we can all agree that some
UB, like the treatment of signed/unsigned wrap, can eventually end up with
optimizations on a correct program that wouldn't otherwise be possible.

But what optimization can this treatment of UB enable? I can't see how a
well formed program would benefit from this. And if no well formed program
can benefit, what is the use in deliberately destroying non-well-formed
programs? We all know that legacy code exists; I don't believe it's right
that we break them just because we can. Or because someone took a conforming
C program and ran it through a C++ compiler!

I would go one step further and claim that this optimization may be
actively harmful as it can result in security vulnerabilities
depending on the arbitrary instructions executed. If there's a good
optimization story for this UB on well-formed programs, I would be
perfectly happy to support the current behavior as it *is* UB.
However, if there's not a compelling story for the behavior, I would
strongly prefer something with a smaller security vulnerability
footprint.

~Aaron

Hi,

In C++, the undefined behaviour of a missing return statements for a non-void function results in not generating the function epilogue (unreachable statement is inserted and the return statement is optimised away). Consequently, the runtime behaviour is that control is never properly returned from this function and thus it starts executing “garbage instructions”. As this is undefined behaviour, this is perfectly fine and according to the spec, and a compile warning for this missing return statement is issued. However, in C, the behaviour is that a function epilogue is generated, i.e. basically by returning uninitialised local variable. Codes that rely on this are not beautiful pieces of code, i.e are buggy, but it might just be okay if you for example have a function that just initialises stuff (and the return value is not checked, directly or indirectly); some one might argue that not returning from that function might be a bit harsh.

I would not be one of those people.

Nor me.

So this email is to probe if there would be strong resistance to follow the C behaviour? I am not yet sure how, but would perhaps a compromise be possible/acceptable to make the undefined behaviour explicit and also generate the function epilogue?

“undefined behavior” is exactly that.

You have no idea what is going to happen; there are no restrictions on what the code being executed can do.

“it just might be ok” means on a particular version of a particular compiler, on a particular architecture and OS, at a particular optimization level. Change any of those things, and you can change the behavior.

In fact, the “it works kind of as you expected” is the worst kind of UB in my mind. UB that causes a crash, stops or other “directly obvious that this is wrong” are MUCH easier to debug.

So make this particular kind of UB explicit by crashing or stopping would be a good thing. Making it explicit by “returning kind of nicely, but not correct return value” is about the worst possible result.

At -O0 clang emits a trap instruction, making it more explicit as you suggest. At higher optimization levels it just falls through/off.

Hi,

If we’re going to emit a trap instruction (and thus create a broken binary), why don’t we error instead?

James

Hi,

If we’re going to emit a trap instruction (and thus create a broken binary), why don’t we error instead?

We warn, can’t error, because it may be dynamically unreached, in which case the program is valid and we can’t reject it.

From: "David Blaikie" <dblaikie@gmail.com>
To: "James Molloy" <james@jamesmolloy.co.uk>
Cc: "Marshall Clow" <mclow.lists@gmail.com>, "cfe-dev Developers" <cfe-dev@cs.uiuc.edu>
Sent: Wednesday, July 29, 2015 9:15:09 AM
Subject: Re: [cfe-dev] missing return statement for non-void functions in C++

>
> Hi,
>
> If we're going to emit a trap instruction (and thus create a broken
> binary), why don't we error instead?

We warn, can't error, because it may be dynamically unreached, in
which case the program is valid and we can't reject it.

I think this also explains why this is useful for optimization.

1. It is a code-size optimization
2. By eliminating unreachable control flow, we can remove branches and tests that are not actual necessary

int foo(int x) {
  if (x > 5) return 2*x;
  else if (x < 2) return 3 - x;
}

That having been said, there are other ways to express these things, and the situation often represents an error. I'd be fine with requiring a special flag (-fallow-nonreturning-functions or whatever) in order to put the compiler is a truly confirming mode (similar to the situation with sized delete).

-Hal

Hi,

As a missing return statements is a rare case, probably/hopefully, I am not sure how much room there would be for optimisations? Putting it under a flag would work, but is probably not what people will have on by default. So that means we're kind of in the same situation as right now. And the security vulnerabilities argument is a good one too ...

Cheers,
Sjoerd.

From: “David Blaikie” <dblaikie@gmail.com>
To: “James Molloy” <james@jamesmolloy.co.uk>
Cc: “Marshall Clow” <mclow.lists@gmail.com>, “cfe-dev Developers” <cfe-dev@cs.uiuc.edu>
Sent: Wednesday, July 29, 2015 9:15:09 AM
Subject: Re: [cfe-dev] missing return statement for non-void functions in C++

Hi,

If we’re going to emit a trap instruction (and thus create a broken
binary), why don’t we error instead?

We warn, can’t error, because it may be dynamically unreached, in
which case the program is valid and we can’t reject it.

I think this also explains why this is useful for optimization.

  1. It is a code-size optimization
  2. By eliminating unreachable control flow, we can remove branches and tests that are not actual necessary

int foo(int x) {
if (x > 5) return 2*x;
else if (x < 2) return 3 - x;
}

That having been said, there are other ways to express these things, and the situation often represents an error. I’d be fine with requiring a special flag (-fallow-nonreturning-functions or whatever) in order to put the compiler is a truly confirming mode (similar to the situation with sized delete).

Note that we already have a flag to trap on this: -fsanitize-trap=return. (You may also need -fsanitize=return, I don’t remember.) That seems consistent with how we treat most other forms of UB.

We exploit this for switches too:

unsigned f(color c) {
  switch(c) {
    case red: return 0xff0000;
    case green: return 0x00ff00;
    case blue: return 0x0000ff;
    case magenta: return 0xff00ff;
  }
}

becomes

movslq %edi, %rax
movl .Lswitch.table(,%rax,4), %eax
retq

i.e. the range check for the lookup table is omitted. (We don't do
this for jump tables yet, but it's on my todo list).