Zero length function pointer equality

LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Yes, I think at least if the optimizer turns a non-empty function into an empty function, that’s a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr, so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I'm unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we're adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

LLVM can produce zero length functions from cases like this (when
optimizations are enabled):

void f1() { __builtin_unreachable(); }
int f2() { /* missing return statement */ }

This code is valid, so long as the functions are never called.

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I’m unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we’re adding an
instruction anyway, perhaps we might as well make it an int3?)

(I came across this due to DWARF issues with zero length functions &
thinking about if/how this should be supported)

Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you’re drawing?)

Here’s what I was thinking: a case could be made that the frontend is responsible for making sure that functions don’t start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.
But you’re right, there really isn’t such a thing as an empty function at the IR level, because there’s always an entry block and it always has a terminator.

that’s a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how “f1 == f2” can be folded to a constant false. I haven’t checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there’s inlining, happens at IR generation if there’s no function
indirection, etc)

so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)

Yes, I think that’s right. We should never put a non-unnamed_addr global at the end of a section because we don’t know if it will share an address with another global.

This is also a problem with identical function merging in the linker, which link.exe does quite aggressively. The special case of zero-length functions seems less common than the more general case of merging, in both cases you will end up with a single implementation in the binary that has two symbols for the same address. For example, consider the following trivial program:

#include <stdio.h>

int a()
{
         return 42;
}

int b()
{
         return 42;
}

int main()
{
         printf("a == b? %d\n", a == b);
         return 0;
}

Compiled with cl.exe /Gy, this prints:

a == b? 1

Given that functions are immutable, it's a somewhat odd decision at the abstract machine level to assume that they have identity that is distinct from their value (though it can simplify debugging - back traces in Windows executables are sometimes quite confusing when you see a call into a function that is structurally correct but nominally incorrect).

Given that link.exe can happily violate this guarantee in the general case, I'm not too concerned that LLVM can violate it in the special case. From the perspective of a programmer, I'm not sure what kind of logic would be broken by function equality returning true when two functions with different names but identical behaviour are invoked. I'm curious if you have any examples.

David

Maybe we can just expand this to always apply: https://reviews.llvm.org/D32330

Fair point, we have unnamed_addr that helps distinguish the important
cases - though that does mean addressing this problem wouldn't
coincidentally address my DWARF problem (zero length functions are
weird/problematic in DWARF for a few reasons).

Doesn't mean it isn't worth fixing, though.

> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)

This is also a problem with identical function merging in the linker,
which link.exe does quite aggressively.

Yeah, though that's a choice of the Windows linker to be
non-conforming (& can be disabled), both with the LLVM IR semantics
and the C++ semantics - which doesn't necessarily mean Clang and LLVM
should also be non-conforming.

The special case of zero-length
functions seems less common than the more general case of merging,

On Windows, to be sure - on Linux, for instance, not as much.

in
both cases you will end up with a single implementation in the binary
that has two symbols for the same address. For example, consider the
following trivial program:

#include <stdio.h>

int a()
{
         return 42;
}

int b()
{
         return 42;
}

int main()
{
         printf("a == b? %d\n", a == b);
         return 0;
}

Compiled with cl.exe /Gy, this prints:

a == b? 1

Given that functions are immutable, it's a somewhat odd decision at the
abstract machine level to assume that they have identity that is
distinct from their value (though it can simplify debugging - back
traces in Windows executables are sometimes quite confusing when you see
a call into a function that is structurally correct but nominally
incorrect).

Yep, when I used to work on Windows myself and my teammates disabled
the linker feature to make development/debugging/backtraces easier to
read.

I think there's value in LLVM's decision here - for debuggability, and
correctly implementing C++ semantics. I don't think it'd be great if
we went the other direction (defining LLVM IR to have no naming
importance - so that merging two LLVM modules could merge function
implementations and redirect function calls to the singular remaining
instance). Opt-in, maybe (I guess you could opt-in by marking all
functions unnamed_addr - indeed that's why unnamed_addr was
introduced, I think, to allow identical code folding to be implemented
in a way that was correct for C++).

Given that link.exe can happily violate this guarantee in the general
case, I'm not too concerned that LLVM can violate it in the special
case. From the perspective of a programmer, I'm not sure what kind of
logic would be broken by function equality returning true when two
functions with different names but identical behaviour are invoked. I'm
curious if you have any examples.

I don't have any concrete examples of C++ code that depends on pointer
inequality between zero-length functions, no. (though we do lots of
work to make Clang conforming in other ways even without code that
requires such conformance)

I believe C++ requires that all functions have a distinct address (ie:
&f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
gets optimized into an unconditional assertion failure)

But these zero length functions can end up with identical addresses.

I’m unaware of anything in the C++ spec (or the LLVM langref) that
would indicate that would allow distinct functions to have identical
addresses - so should we do something about this in the LLVM backend?
add a little padding? a nop instruction? (if we’re adding an
instruction anyway, perhaps we might as well make it an int3?)

This is also a problem with identical function merging in the linker,
which link.exe does quite aggressively. The special case of zero-length
functions seems less common than the more general case of merging, in
both cases you will end up with a single implementation in the binary
that has two symbols for the same address. For example, consider the
following trivial program:

#include <stdio.h>

int a()
{
return 42;
}

int b()
{
return 42;
}

int main()
{
printf(“a == b? %d\n”, a == b);
return 0;
}

Compiled with cl.exe /Gy, this prints:

a == b? 1

Given that functions are immutable, it’s a somewhat odd decision at the
abstract machine level to assume that they have identity that is
distinct from their value (though it can simplify debugging - back
traces in Windows executables are sometimes quite confusing when you see
a call into a function that is structurally correct but nominally
incorrect).

Given that link.exe can happily violate this guarantee in the general
case, I’m not too concerned that LLVM can violate it in the special
case. From the perspective of a programmer, I’m not sure what kind of
logic would be broken by function equality returning true when two
functions with different names but identical behaviour are invoked. I’m
curious if you have any examples.

This is a well-known conformance-violating bug in link.exe; LLVM should not be making things worse by introducing a similar bug itself. Smarter linkers (for example, I think both lld and gold) will do identical function combining only if all but one of the function symbols is only used as the target of calls (and not to actually observe the address). And yes, this non-conforming behavior (rarely) breaks things in practice. See this research paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36912.pdf

Looks perfect to me!

well, a couple of questions: Why a noop, rather than int3/ud2/etc?
Might be worth using the existing code that places such an instruction
when building at -O0?
& you mention that this causes problems on Windows - but ICF done by
the Windows linker does not cause such problems? (I'd have thought
they'd result in the same situation - two functions described as being
at the same address?) is there a quick summary of why those two cases
turn out differently?

Looks perfect to me!

well, a couple of questions: Why a noop, rather than int3/ud2/etc?

Probably no reason.

Might be worth using the existing code that places such an instruction
when building at -O0?

I wasn't aware of that. Does it happen for all functions (e.g. I think
I got pulled into this due to functions with the naked attribute)?

& you mention that this causes problems on Windows - but ICF done by
the Windows linker does not cause such problems? (I'd have thought
they'd result in the same situation - two functions described as being
at the same address?) is there a quick summary of why those two cases
turn out differently?

The case that we hit was that the Control Flow Guard table of
addresses in the binary ended up listing the same address twice, which
the loader didn't expect. It may be that the linker took care to avoid
that for ICF (if two ICF'd functions got the same address, only list
it once in the CFG table) but still didn't handle the "empty function"
problem.

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Hans
Wennborg via llvm-dev
Sent: Monday, July 27, 2020 9:11 AM
To: David Blaikie <dblaikie@gmail.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>; Clang Dev <cfe-dev@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] Zero length function pointer equality

>
> Looks perfect to me!
>
> well, a couple of questions: Why a noop, rather than int3/ud2/etc?

Probably no reason.

FTR there is TargetOptions.TrapUnreachable, which some targets turn on
(for X86 it's on for MachO and PS4), this turns 'unreachable' into ud2.
Clearly it covers more than "empty" functions but is probably the kind
of thing you're looking for.
--paulr

Just writing it down in this thread - this issue’s been discussed a bit in this bug: https://bugs.llvm.org/show_bug.cgi?id=49599

And yeah, I’m considering adopting MachO’s default (TrapUnreachable + NoTrapOnNoreturn) as the default for LLVM (will require some design discussion, no doubt) since it seems to capture most of the functionality desired. Maybe there are some cases where we have extra unreachables that could’ve otherwise been optimized away/elided, but hopefully nothing drastic.

(some platforms still need the trap-on-noreturn - Windows+AArch64 and maybe Sony, etc - happy for some folks to opt into that). I wonder whether TrapUnreachable shouldn’t even be an option anymore though, if it becomes load bearing for correctness - or should it become a fallback option - “no trap unreachable” maybe means nop instead of trap, in case your target can’t handle a trap sometimes (I came across an issue with AMDGPU not being able to add traps to functions that it isn’t expecting - the function needs some special attribute to have a trap in it - but I guess it can be updated to add that attribute if the function has an unreachable in it (though then it has to recreate the no-trap-on-noreturn handling too when deciding whether to add the attribute… ))

I’m not sure whether we’d want every unreachable to emit a trap, but I do think we should try not to let code fall out of one function and into a completely unrelated one. That is: I’d propose that the last basic-block in every function should get a trap instruction added unless it already ends in a control transfer instruction (jmp, ret, or branch – call doesn’t count, since it may return).

I’m not sure whether we’d want every unreachable to emit a trap, but I do think we should try not to let code fall out of one function and into a completely unrelated one.

Yeah, that was my gut reaction too - though TrapUnreachable doesn’t inhibit SimplifyCFG from eliminating unreachable blocks (can still leave unreachable in a block after a call - because the call may or may not return).

That is: I’d propose that the last basic-block in every function should get a trap instruction added unless it already ends in a control transfer instruction

That was my initial theory, but given MachO defaults to TrapUnreachable and it doesn’t inhibit SimplifyCFG, so most unreachable blocks do get eliminated - I’m sort of leaning towards that being sufficiently correct.

(jmp, ret, or branch – call doesn’t count, since it may return).

I’m inclined to omit the trap after a call to a noreturn function for brevity - even though it does leave the possibility of violating the noreturn contract leading to that fallthrough UB.

I'm inclined to omit the trap after a call to a noreturn function
for brevity - even though it does leave the possibility of
violating the noreturn contract leading to that fallthrough UB.

This is specifically the case where we needed it, at least when the
noreturn call is in the last block. Which is commonly the case, a
noreturn call is likely to have its block laid out at the end of a
function.
--paulr

Yeah, not suggesting removing that option or changing those who’ve opted into that behavior - only about the default when platforms haven’t otherwise cared/have already been living with zero length functions.

  • Dave

Did anything come of this?

The Linux folks (cc Nick) keep running into issues where a function which ends with unreachable can fall through to an unrelated function, which confuses some machine code analyser they run (e.g. https://github.com/ClangBuiltLinux/linux/issues/1440).

It seems TrapUnreachable would avoid such issues.

I’m not aware of anyone picking this up, but then I haven’t been watching for it.

LLVMTargetMachine supports a command-line “-trap-unreachable” so the Linux build could turn it on universally that way. Some targets already have it turned on, and are willing to take the (fairly small IIRC) size hit. Note that the flag tends to be set in target-specific code but the effect is implemented in generic code, so it should Just Work. If Linux doesn’t like the size hit, then someone will have to implement one of the more targeted ideas, e.g., add a nop/trap only after the last block of a function (ideally but not necessarily omitting that if the function ends with a ret or unconditional branch).

I’ll admit that Sony would benefit from the more targeted feature, as that’s the exact use-case we need, but we haven’t deemed the size cost of TrapUnreachable to be enough to warrant the additional effort.

–paulr

No, I haven’t managed to make progress on this - I think I kept some notes in a bug somewhere… (Hans you can look up b/179837110).

TrapUnreachable does sound like about the right thing - I think I hit a few kinks in making that the default. (I’d like the zero length function issue to be solved generically - across all targets, I don’t think it’s helpful to allow it in any case due to the DWARF issues, the C++ issues, the debuggability issues (falling off into another function rather than trapping), compared to any small benefit to code size it might provide) - maybe some targets (AMDGPU?) couldn’t handle it because they can’t emit a trap into an arbitrary function (some function attribute was required for that, I think?)

Oh, hmm, maybe I hit the AMDGPU issue when trying to do something more manually targeting only the end of function case, not the trap-unreachable broader feature. (Looking back at my various branches I see an AMDGPU workaround/fixme in my “trap at end of function” branch, not in my trap_unreachable branch)