> I believe C++ requires that all functions have a distinct address (ie:
> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> gets optimized into an unconditional assertion failure)
>
> But these zero length functions can end up with identical addresses.
>
> I'm unaware of anything in the C++ spec (or the LLVM langref) that
> would indicate that would allow distinct functions to have identical
> addresses - so should we do something about this in the LLVM backend?
> add a little padding? a nop instruction? (if we're adding an
> instruction anyway, perhaps we might as well make it an int3?)
This is also a problem with identical function merging in the linker,
which link.exe does quite aggressively.
Yeah, though that's a choice of the Windows linker to be
non-conforming (& can be disabled), both with the LLVM IR semantics
and the C++ semantics - which doesn't necessarily mean Clang and LLVM
should also be non-conforming.
The special case of zero-length
functions seems less common than the more general case of merging,
On Windows, to be sure - on Linux, for instance, not as much.
in
both cases you will end up with a single implementation in the binary
that has two symbols for the same address. For example, consider the
following trivial program:
#include <stdio.h>
int a()
{
return 42;
}
int b()
{
return 42;
}
int main()
{
printf("a == b? %d\n", a == b);
return 0;
}
Compiled with cl.exe /Gy, this prints:
a == b? 1
Given that functions are immutable, it's a somewhat odd decision at the
abstract machine level to assume that they have identity that is
distinct from their value (though it can simplify debugging - back
traces in Windows executables are sometimes quite confusing when you see
a call into a function that is structurally correct but nominally
incorrect).
Yep, when I used to work on Windows myself and my teammates disabled
the linker feature to make development/debugging/backtraces easier to
read.
I think there's value in LLVM's decision here - for debuggability, and
correctly implementing C++ semantics. I don't think it'd be great if
we went the other direction (defining LLVM IR to have no naming
importance - so that merging two LLVM modules could merge function
implementations and redirect function calls to the singular remaining
instance). Opt-in, maybe (I guess you could opt-in by marking all
functions unnamed_addr - indeed that's why unnamed_addr was
introduced, I think, to allow identical code folding to be implemented
in a way that was correct for C++).
Given that link.exe can happily violate this guarantee in the general
case, I'm not too concerned that LLVM can violate it in the special
case. From the perspective of a programmer, I'm not sure what kind of
logic would be broken by function equality returning true when two
functions with different names but identical behaviour are invoked. I'm
curious if you have any examples.
I don't have any concrete examples of C++ code that depends on pointer
inequality between zero-length functions, no. (though we do lots of
work to make Clang conforming in other ways even without code that
requires such conformance)