[ThinLTO] Using two different IRMovers for the same composite module? (related to PR28180)

Hello,

While trying ThinLTO, I ran into an assertion failure in IRMover: https://llvm.org/bugs/show_bug.cgi?id=28180. I found that the assertion failure is happening because IRMover tries to map the metadata that already mapped in the destination module, and it seems that this happens because two different IRMovers are used for the same destination (or composite) module. During LTO, an IRMover is created in thinLTOBackendTask function(tools/gold/gold-plugin.cpp). linkInModule function, which is called by thinLTOBackendTask, calls the ‘move’ function of this IRMover. The other IRMover is created when “TheLinker” is created in FunctionImporter::importFunctions (lib/Transforms/IPO/FunctionImport.cpp). thinLTOBackendTask invokes FunctionImporeter::importFunctions as well, with a call chain of thinLTOBackendTask àCodeGen::runAll (tools/gold/gold-plugin.cpp) à CodeGen::runLTOPasses ((tools/gold/gold-plugin.cpp) à FunctionImporter::importFunctions.

As these two IRMovers share the same destination module, when the second IRMover tries to map the metadata already mapped by the first IRMover, it eventually results the assertion failure. It seems that IRMover maintains SharedMDs to keep the metadata mapping record across the multiple calls of its move function, but that doesn’t help between two separate IRMovers.

What would be the right fix for this? Please let me know if I misunderstand something.

Thanks,

Taewook

Hello,

While trying ThinLTO, I ran into an assertion failure in IRMover: https://llvm.org/bugs/show_bug.cgi?id=28180.

Great, we encountered this bug last month and I have an fix internally but wasn’t sure how to reproduce (I didn’t have any source with internal bug report), so I haven’t upstream the patch yet.
Do you have the repro?

I found that the assertion failure is happening because IRMover tries to map the metadata that already mapped in the destination module, and it seems that this happens because two different IRMovers are used for the same destination (or composite) module.

It is not clear to me how using two different IRMovers is the issue: as you mentioned, the assertions is encountered when the metadata is already in the map, a new IRMovers would have a new fresh map.

My debugging of this issue lead me to the new "ODR type uniquing” feature in the context as the culprit. In this mode, when multiple modules are loaded in the context the composite type metadata are uniqued by id. It means that the same composite type (same as same pointer in memory) can be reached from two modules (here source and destination). So the mapper may reach a metadata in the source module and try to map it to the destination module while it is already there (but not in the map).

This happens only in ThinLTO and not in LTO because LTO starts with an empty module, so when you move the first module into the "merged module”, the map gets initialized. In ThinLTO the mover starts with the destination module not empty.

Yes, I have the repro, though I can’t publish it externally. It would be great if you can upstream the patch so I can try it. Thank you for your explanation as well!

– Taewook

There's a reproducer attached (obtained via lld --reproduce option).
If that doesn't work, you can checkout mozjs and try to reduce from
there.
It happens while doing an LTO build with lld.
I have a fix for that (Mehdi has one as well, apparently) in my local
tree, but I don't have time to reduce. If you can take care of that,
chances are that an upstream fix will be committed shortly after.

Just to clarify, I'm talking about the issue in PR28180. I can't
comment about the ThinLTO one, sorry.

Patch:

0001-Fix-ThinLTO-crash-with-debug-info.patch (32 KB)

If lld is setting enableDebugTypeODRUniquing(); on the context and isn’t using the IRMover to target an empty module, it can be the same bug.
I mentioned that it should touch only ThinLTO but I had ld64 in mind.

It seems that the patch works for me as well, though the linker crashes with another error after that. Thanks!

Mehdi, I couldn’t quite understand what do you mean by you don’t have a repro so you couldn’t upstream the patch. Aren’t .ll files you attached sufficient to submit along with the patch? If there is anything I can help you to upstream it, please let me know.

-- Taewook

I could submit the patch as-is but it is not in my habit to do that without a total understanding of the situation, i.e. I’m not convinced the test case is totally reduced and I need to “reverse engineer” the debug metadata to get a source-code construct that would trigger this bug.

Makes sense. I’ll try to get the code as well.

Thanks,
Taewook

I just filed a bug (https://llvm.org/bugs/show_bug.cgi?id=30248), which is different from the one discussed in this thread but seems to have a same root. I attached a small repro as well.

As you said, assertion fails if the mapper tries to map a metadata that already in the destination module, when the same metadata is reached from the source module. More specifically, it is possible that a function parameter V of Mapper::mapValue in lib/Transforms/utils/ValueMapper.cpp is already in the destination module. Still, V is forced to be (re)materialized into the destination module, and during that process the original V can be erase from the module while NewV is created (last if statement of IRLinker::linkGlobalValueProto in lib/Linker/IRMover.cpp). For such case ), “getVM()[V] = NewV” statement in Mapper::mapValue is invalid.

I confirmed that the bug can be fixed by your internal fix that you’ve attached to this thread.

Thanks,
Taewook

Sorry I missed that https://reviews.llvm.org/D23841 is already submitted. Thanks!

    I just filed a bug (https://llvm.org/bugs/show_bug.cgi?id=30248), which is different from the one discussed in this thread but seems to have a same root. I attached a small repro as well.
    
    As you said, assertion fails if the mapper tries to map a metadata that already in the destination module, when the same metadata is reached from the source module. More specifically, it is possible that a function parameter V of Mapper::mapValue in lib/Transforms/utils/ValueMapper.cpp is already in the destination module. Still, V is forced to be (re)materialized into the destination module, and during that process the original V can be erase from the module while NewV is created (last if statement of IRLinker::linkGlobalValueProto in lib/Linker/IRMover.cpp). For such case ), “getVM()[V] = NewV” statement in Mapper::mapValue is invalid.
    
    I confirmed that the bug can be fixed by your internal fix that you’ve attached to this thread.
    
    Thanks,
    Taewook