ThinLTO failed to merge C strings

@vedantk reported this non-optimal behavior when using thinLTO comparing to non-LTO or full-LTO build, which is both a size regression and also a program behavior change:

% cat a.c
static const char *X = "foo";

const char *bar();

int main() {
        return bar() == X;
}

% cat b.c
const char *bar() {
        return "foo";
}

When thinLTO is used with optimization on (at least on apple platform for both ld64 and lld), c string “foo” is not de-dup by linker and main() returns different result. Here is how the problem manifests:

The c string in b.c is imported into module a with available_externally hidden and it is then converted to an extern symbol in elim-avail-extern pass. The string in module b is promoted to a hidden symbol from a private symbol. All of above are standard behavior. But because of the c string in module b is not named, linker can’t merge those two strings.

Here are two options to fix that:

  • Teach importer to import certain unnamed_addr symbol by duplication, instead of using available_externally.
  • Teach linker somehow to merge c string even it has a label

Any opinions here? @teresajohnson @int3 ?

This is basically what LLD is currently doing, we’re merging everything regardless of labels. This is a bit unsafe for sure, but we’re also working on having LLVM emit addrsig tables, which I think will make that a non-issue.

I am pretty sure I reproduce the issue with lld too on macOS.