A problem with names that can not be demangled.

Hi,

We are running into a problem created by renaming of static symbols by llvm-link. It first

showed up using LTO, but we can illustrate this by using llvm-link as well.

Say we have two files with the same named static symbol Bye

--------------- t1.cpp ---------

static void Bye(int* ba1) { ba1[0] /= ba1[2] - 2; }

void main_a( int* inB) { void (func)(int) = Bye; func(inB); }

--------------- t2.cpp ---------

static void Bye(int* ba1) { ba1[0] *= ba1[2] + 2; }

void main_b( int* inB) { void (func)(int) = Bye; func(inB+1); }

--------- cmd sequence -------

$ clang++ -c -emit-llvm t1.cpp -o t1.bc

$ clang++ -c -emit-llvm t1.cpp -o t2.bc

$ llvm-link t1.bc t2.bc -o t23.bc

$ clang -c t23.bc

$ nm t23.o

t1.o and t2.o have the same named function “_ZL3ByePi”. In order to distinguish them,

one gets a ‘1’ appended to it, making it “_ZL3ByePi1”.

While the code is all correct, the problem is that this modified name cannot be demangled.

That is what I am trying to fix.

In similar situations gcc appends a ‘.’ before appending the discriminating number, making “_ZL3ByePi.1”

The following change in lib/IR/ValueSymbolTable.cpp seems to fix this problem.

------------ start diff -------------------

@@ -54,5 +54,5 @@ void ValueSymbolTable::reinsertValue(Value* V) {

// Trim any suffix off and append the next number.

UniqueName.resize(BaseSize);

  • raw_svector_ostream(UniqueName) << ++LastUnique;
  • raw_svector_ostream(UniqueName) << “.” << ++LastUnique;

// Try insert the vmap entry with this suffix.

-------------- end diff ---------------------

However it causes 60 test failures. These are tests where some names that are expecting

to get a plain numeric suffix now have a ‘.’ before it. These are all local symbols, so I think

the generated code will always be correct, but the tests as written do not pass. For

example, take test/CodeGen/ARM/global-merge-addrspace.ll

; RUN: llc < %s -mtriple=thumb-apple-darwin -O3 | FileCheck %s

; Test the GlobalMerge pass. Check that the pass does not crash when using

; multiple address spaces.

; CHECK: _MergedGlobals:

@g1 = internal addrspace(1) global i32 1

@g2 = internal addrspace(1) global i32 2

; CHECK: _MergedGlobals1:

@g3 = internal addrspace(2) global i32 3

@g4 = internal addrspace(2) global i32 4

With my change, the symbol is named MergedGlobals.1, hence it fails this test.

I could change these 60 tests to match the new behavior. That will fix these 60 failures.

However, I do have a concern that there may be other places in llvm that expect the

names to be pure identifiers. Adding a ‘.’ may cause them to fail. No such failure has been

seen in running the whole clang test, but the concern is still there.

I should note that even local symbols are treated similarly, so for example, a parameter

named ‘str’ becomes ‘str.1’ with my change, instead of ‘str1’ currently (an actual

example from a test).

Alternatively, I could try to limit my change to just mangled names.

Any suggestion about how this should be fixed ?

There is another similar change about 40 lines below in ValueSymbolTable::createValueName().

That is not needed to fix this particular problem, but looks similar, so perhaps should be treated

similarly for consistency. It causes 66 more failures of the same nature though.

Thanks

Sunil Srivastava

Sony Computer Entertainment

Adding llvm-dev as that might be a more suitable audience for this discussion.

(& I know Lang’s been playing around with the same problem in the Orc JIT, so adding him too)

Is there any basis/reason to believe that the .X suffix is a better, more principled one than straight X? Is that documented somewhere as a thing the demangling tools will ignore?

Adding llvm-dev as that might be a more suitable audience for this
discussion.

(& I know Lang's been playing around with the same problem in the Orc JIT,
so adding him too)

Is there any basis/reason to believe that the .X suffix is a better, more
principled one than straight X? Is that documented somewhere as a thing the
demangling tools will ignore?

At least for the Itanium ABI, it's possible for both ${foo} and ${foo}1 to
be valid mangled names, so a demangler isn't able to ignore our current
suffixes even in theory. Not an argument for .X, but it's an argument
against plain X.

Hi David,

Thanks for adding Lang and other mailing lists.

Ø Is there any basis/reason to believe that the .X suffix is a better, more principled one

Ø than straight X? Is that documented somewhere as a thing the demangling tools

Ø will ignore?

I don’t know of any documentation but the existing demangling tools, such as c++filt on linux, already stop at ‘.’. So there is precedence.

Gcc bugzillas refer to this point. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40831#c10 for an example.

Moreover, the issue exists in C as well, where the debugger will have no way of distinguishing Bye from Bye1. They may still need some work for figuring out Bye.1, but at least that is doable.

Sunil

GCC has a generalized mangling syntax for cloned functions. See GCC PR 40831:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40831

and the discussion thread here:

   https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01375.html

-cary

https://mentorembedded.github.io/cxx-abi/abi.html#mangling

So looks like just adding a "." wold work. My preference would be to
do it for all Values and update the tests.

Also, the code currently looks like