[ThinLTO] assert(GS != DefinedGlobals.end()) failed in FunctionImport.cpp

Encountered “assert(GS != DefinedGlobals.end())” failure while running ThinLTO. The assertion statement is in MustPreserveGV lambda function in llvm::thinLTOInternalizeModule (lib/Transforms/IPO/FunctionImport.cpp).

It seems that the assertion fails because it fails to recover the “original name” of the global value. ModuleSummaryIndex::getOriginalNameBeforePromote attempts to get the original name by stripping .llvm.{HASH}, but what I observe is that “.1” is still appended to the expected original name.

Then where this extra “.1” comes from? It is appended when the global value is materialized. IRLinker::materialize function calls IRLinker::linkGlobalValueProto function, and inside that function if DGV is nullptr or ShouldLink is true then IRLinker::copyGlobalValueProto function is called to create a global variable in the destination module that corresponds to SGV. I found that newly created global variable has the extra “.1” added to the name of the SGV.

When this happens? I don’t have a complete understanding but I observed that the same global value is materialized twice during the single invocation of IRLinker::run, once with GlobalValueMaterializer and once with LocalValueMaterializer. First, the global value

; Materializable

; Function Attrs: nounwind uwtable

define weak_odr void @foo(%1*) unnamed_addr #7 comdat($comdat1) align 2 personality i32 (…)* @__gxx_personality_v0 {}

(I renamed the function and comdat)

is materialized with GlobalValueMaterializer, (so the IRLinker::materialize function is called with ForAlias == false), and the materialized value is

; Function Attrs: nounwind uwtable

declare void @foo(%“type1”*) unnamed_addr #2 align 2

Then, later, the same value is materialized again with LocalValueMaterializer (so ForAlias == true for IRLinker::materialize), and the materialized value is

; Function Attrs: nounwind uwtable

define internal void @foo.1(%“type 0x7efb6ee89d80”) unnamed_addr #2 comdat($comdat1) align 2 personality i32 (…) @__gxx_personality_v0 !dbg !12345 {

// function definition

}

(here, “type 0x7efb6ee89d80” is not “type1” )

So for me it seems that we need a mechanism to retrieve the original name of the global value even when it is renamed during the materialization. I submitted the bug report as well (https://llvm.org/bugs/show_bug.cgi?id=28760)).

Thanks,

Taewook

Hi Taewook,

Hello Teresa,

Thank you for your reply. I’m trying to create a small repro but find it hard to nail down because originally it is a big build. This happens with gold linker.

Thanks,

Taewook

Once you get the linker invocation that causes the crash you can use delta
to reduce the testcase.
http://delta.stage.tigris.org You can use that on the cmdline to reduce the
number of objects.

Hello Teresa,

Thank you for your reply. I’m trying to create a small repro but find it
hard to nail down because originally it is a big build. This happens with
gold linker.

I think I need to see a smaller test case, looking through the code I'm not
sure how we ended up in this situation. See analysis below.

Thanks,

Taewook

*From: *Teresa Johnson <tejohnson@google.com>
*Date: *Thursday, July 28, 2016 at 5:08 PM
*To: *Taewook Oh <twoh@fb.com>
*Cc: *via llvm-dev <llvm-dev@lists.llvm.org>
*Subject: *Re: [llvm-dev] [ThinLTO] assert(GS != DefinedGlobals.end())
failed in FunctionImport.cpp

Hi Taewook,

Encountered “assert(GS != DefinedGlobals.end())” failure while running
ThinLTO. The assertion statement is in MustPreserveGV lambda function in
llvm::thinLTOInternalizeModule (lib/Transforms/IPO/FunctionImport.cpp).

It seems that the assertion fails because it fails to recover the
"original name" of the global value.
ModuleSummaryIndex::getOriginalNameBeforePromote attempts to get the
original name by stripping .llvm.{HASH}, but what I observe is that ".1" is
still appended to the expected original name.

Then where this extra ".1" comes from? It is appended when the global
value is materialized. IRLinker::materialize function calls
IRLinker::linkGlobalValueProto function, and inside that function if DGV is
nullptr or ShouldLink is true then IRLinker::copyGlobalValueProto function
is called to create a global variable in the destination module that
corresponds to SGV. I found that newly created global variable has the
extra ".1" added to the name of the SGV.

When this happens? I don't have a complete understanding but I observed
that the same global value is materialized twice during the single
invocation of IRLinker::run, once with GlobalValueMaterializer and once
with LocalValueMaterializer. First, the global value

Looking at the IRLinker, it appears this second (local) copy should only

be created if there was no copy already linked in from the same source
module:

In IRLinker::linkGlobalValueProto, we should find the DGV created by the
GlobalValueMaterializer for the same SGV (first find it by name, then if
ShouldLink==true, we should find it in ValueMap and return that entry, or
if ShouldLink==false, since ForAlias=true we should return null.

Then when linkGlobalValueProto returns, in IRLinker::materialize there is
specific handling for this case:
  // When linking a global for an alias, it will always be linked. However
we
  // need to check if it was not already scheduled to satify a reference
from a
  // regular global value initializer. We know if it has been schedule if
the
  // "New" GlobalValue that is mapped here for the alias is the same as the
one
  // already mapped. If there is an entry in the ValueMap but the value is
  // different, it means that the value already had a definition in the
  // destination module (linkonce for instance), but we need a new
definition
  // for the alias ("New" will be different.
  if (ForAlias && ValueMap.lookup(SGV) == New)
    return New;

AFAICT the only way we should be creating a new local copy is if foo()
already had a copy in the dest module. But when we call ThinLTO
internalization we are doing so before any function importing. The
gold-plugin should have linked the module being compiled in the ThinLTO
backend into an empty module from thinLTOBackendTask, so we should not
already have a copy of foo() in the dest module.

I must be missing something...in lieu of a smaller test case, can you trace
through what happens when we call linkGlobalValueProto for the local
materializer, and see why we don't find the copy of SGV already
materialized by the global materializer which should be in the ValueMap?

Thanks,
Teresa

Yes, the IRLinker will append an integer if it encounters a naming

Hello Teresa,

Thank you for your analysis. One thing to note is that the global materializer materializes the value as a function declaration, not a function definition. As I pasted on my first email,

; Materializable

; Function Attrs: nounwind uwtable

define weak_odr void @foo(%1*) unnamed_addr #7 comdat($comdat1) align 2 personality i32 (…)* @__gxx_personality_v0 {}

is materialized to

; Function Attrs: nounwind uwtable

declare void @foo(%“type1”*) unnamed_addr #2 align 2

Inside IRLinker::linkGlobalValueProto, the materialized value is returned from getLinkedToGlobal(SGV) and assigned to DGV. However, as ForAlias is true and ShouldLink is false, DGV becomes nullptr later, and NewGV is created from copyGlobalValueProto(SGV, ShouldLink) call. Therefore, returned value from linkGlobalValueProto is different from the value obtained by ValueMap.lookup(SGV) in IRLinker::materialize.

BTW, I updated LLVM to the latest version in SVN this morning, and the assertion failure is gone. Instead, it ends up with broken module found errors:

Global is external, but doesn’t have external or weak linkage!

void (%2*, %2*, i32)* @"_foo.llvm.E5F1952C"

function declaration may not have a !dbg attachment

void (%2*, %2*, i32)* @"_foo.llvm.E5F1952C"

LLVM ERROR: Broken module found, compilation aborted!

Hope this provides additional clues to fix the issue.

Thanks,

Taewook

Hello Teresa,

Thank you for your analysis. One thing to note is that the global
materializer materializes the value as a function declaration, not a
function definition. As I pasted on my first email,

; Materializable

; Function Attrs: nounwind uwtable

define weak_odr void @foo(%1*) unnamed_addr #7 comdat($comdat1) align 2
personality i32 (...)* @__gxx_personality_v0 {}

is materialized to

; Function Attrs: nounwind uwtable

declare void @foo(%"type1"*) unnamed_addr #2 align 2

Inside IRLinker::linkGlobalValueProto, the materialized value is returned
from getLinkedToGlobal(SGV) and assigned to DGV. However, as ForAlias is
true and ShouldLink is false, DGV becomes nullptr later, and NewGV is
created from copyGlobalValueProto(SGV, ShouldLink) call. Therefore,
returned value from linkGlobalValueProto is different from the value
obtained by ValueMap.lookup(SGV) in IRLinker::materialize.

Ah, sorry, somehow I completely missed the fact that it was brought in as a
decl when I looked this morning.

BTW, I updated LLVM to the latest version in SVN this morning, and the
assertion failure is gone. Instead, it ends up with broken module found
errors:

Global is external, but doesn't have external or weak linkage!

void (%2*, %2*, i32)* @"_foo.llvm.E5F1952C"

function declaration may not have a !dbg attachment

void (%2*, %2*, i32)* @"_foo.llvm.E5F1952C"

LLVM ERROR: Broken module found, compilation aborted!

This looks like a promoted local from the name. What is the linkage type?
Is this the verifier run right after the importing pass?

I'm guessing the first problem went away due to luck, as I can't think of
any specific change that would have affected it. Is this from the same
module?

Do you know what SVN revision were you at before, even roughly?

Teresa

It was r274523. I’m not sure it was the same module. By mistake I restarted the build with the previous version without backing backing up the build artifacts :frowning:

Thanks,

Taewook

It was r274523. I’m not sure it was the same module. By mistake I
restarted the build with the previous version without backing backing up
the build artifacts :frowning:

So a couple things were added to gold since then, index-based linkonce/weak
resolution and some more aggressive internalization. I don't think either
of these would have made the original problem go away, so I'm guessing that
the latter provoked the new problem (in combination with pcc's earlier
change to allow metadata attachments on decls that added that verifier
assert). Presumably we started internalizing the promoted global, and
something (presumably a later pass) then dropped the symbol. Let me see if
I can provoke that particular issue with a debug ThinLTO build of SPEC
(recently I have been testing non-debug). In the meantime, can you drop
your debug flag and see if the first problem comes back?

Thanks,
Teresa

Yes, if I drop the debug flag then the original problem (assertion failure) comes back.

Thanks,

Taewook

Ok, good to know.

Any luck on a smaller test case for either of the problems?

I tried building all the C/C++ SPEC cpu2006 benchmarks with -g -flto=thin at head and didn’t get the failure. Let me try to look into how the debug metadata is normally dropped from the imported decl. In the meantime, could you find out the answers to the questions I had (see below) about the linkage type of the _foo symbol and when the verifier was being run when it hit the failures? Also, was _foo imported into the module?

Also, I am going to try to see if I can construct a test case to trigger the original problem. But do you want to try to make forward progress by changing your local version of LLVM so that instead of asserting in MustPreserveGV, we instead conservatively return true if it isn’t found in the map?

Teresa

Ok, good to know.

Any luck on a smaller test case for either of the problems?

I tried building all the C/C++ SPEC cpu2006 benchmarks with -g -flto=thin
at head and didn't get the failure. Let me try to look into how the debug
metadata is normally dropped from the imported decl. In the meantime, could
you find out the answers to the questions I had (see below) about the
linkage type of the _foo symbol and when the verifier was being run when it
hit the failures? Also, was _foo imported into the module?

Also, I am going to try to see if I can construct a test case to trigger
the original problem. But do you want to try to make forward progress by
changing your local version of LLVM so that instead of asserting in
MustPreserveGV, we instead conservatively return true if it isn't found in
the map?

I have a small test case to repro the original problem.

I think that the conservative fix I describe above is probably the right
approach - we could also try stripping off any .\d+ suffix and querying the
module index with that. It works but I am a little wary as the variable
could originally be named with a .\d+ suffix as well and we could
inadvertently use the index for a different variable. Although they should
both have index entries, but I'd rather do something conservatively correct
rather than cause a subtle bug due to using the wrong index entry.

Will update the bugzilla entry and send a patch.

I may still need help getting a test case for the second problem (the debug
metadata on the decl), although I might not be able to look at that for a
couple days.

Teresa

Hi Taewook,

I had a better fix for this approved (D23015), but when I went to merge this with the new LTO API I committed for pcc last week I discovered that his new API already has the same effect. I will update the bug with this info as well. Can you confirm that with a compiler built after 278338 that this problem no longer occurs?

Also, any luck on a smaller test case for the -g issue? Now that I have pcc’s patches merged I will try again to see if I can trigger it, probably with some larger internal benchmarks. But if you can give me more details on that problem that would be great.

Let me know if you run into any other issues too!

Thanks,
Teresa

Hello Teresa,

Sorry I was working on another LLVM issue that more urgent for us, so didn’t have much time to work on smaller test case. I’ll try the new API and see if the issus is gone. Thanks!

Best,

Taewook

No problem! I want to make sure you aren’t blocked on trying ThinLTO.

Thanks,
Teresa

Hello Teresa,

I confirmed that the problem no longer appears after 278338. Thanks!

– Taewook

Great! I am working on some other issues and haven’t had a chance yet to look into the debug info issue to see if I can reproduce it with more testing. Let me know if you are still hitting that, or any other issues.
Thanks,
Teresa