ld64.lld fails to link some against some hidden symbols during LTO · Issue #57864 · llvm/llvm-project · GitHub was filed recently as an LLD bug, but after digging into it, it seems like a subtle ThinLTO issue. Hoping to get some thoughts from ThinLTO experts.
The problem is that the GUIDs used in ThinLTO don’t account for mangling. The GUID is a hash of the pre-mangled name (as given by GlobalValue::getGlobalIdentifier()
). This is typically not an issue since mangling occurs during assembly codegen, and is mostly a 1:1 mapping. However, LLVM IR names may be declared with a leading “\01” prefix, in which case the names are treated as “literal” and don’t get mangled during codegen. This also means that we no longer have a 1:1 mapping, since “\01_foo” and “foo” now get mangled into the same “_foo” name.
Not accounting for this leads to false “undefined symbol” errors at link time. If one module uses “\01_foo” to define a variable and another module uses “_foo” to reference it, ThinLTO may wrongly internalize “\01_foo” because it thinks it is unreferenced outside the module.
I’m posting here to see what people think the solution should be. I suppose the method getGlobalIdentifier()
should be changed to account for name mangling, but that method is implemented in terms of a static method also called getGlobalIdentifier
. Since it is static, it does not have access to the DataLayout which is needed for mangling. So it seems like this fix will require a decent bit of refactoring in order to get the right parameters passed.
But before I dig in any further, I’m wondering if anyone has thoughts and/or has looked at this before. There is a comment here that indicates there were problems related to this mangling issue previously, so I figure someone must have run into this before. Are there subtle problems that make fixing this difficult?