Linking modules across contexts crashes

I get a crash when I try to link multiple modules registered in their individual contexts.
Documentation for Linker::LinkModules doesn't mention anything about contexts, and the first link succeeds. But the second link crashes.

Is this not the right way to merge such modules? If not, then what is the right way?

In any case, documentation for Linker::LinkModules should say if contexts are or aren't expected to be the same.

Yuri

---testcase---
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/Linker/Linker.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/DiagnosticPrinter.h"
#include <string>
#include <iostream>

using namespace llvm;
using namespace std;

int main() {
   // vars
   string Message;
   raw_string_ostream Stream(Message);
   DiagnosticPrinterRawOStream DP(Stream);
   LLVMBool Result;
   // create blank modules and contexts
   LLVMContext *ctx1 = new LLVMContext;
   Module* module1 = new Module("module1", *ctx1);
   LLVMContext *ctx2 = new LLVMContext;
   Module* module2 = new Module("module2", *ctx2);
   LLVMContext *ctx3 = new LLVMContext;
   Module* module3 = new Module("module3", *ctx3);
   // fill modules
llvm::Function::Create(llvm::FunctionType::get(Type::getInt32Ty(*ctx1), false), llvm::Function::ExternalLinkage, "f1", module1);
llvm::Function::Create(llvm::FunctionType::get(Type::getInt32Ty(*ctx2), false), llvm::Function::ExternalLinkage, "f2", module2);
llvm::Function::Create(llvm::FunctionType::get(Type::getInt32Ty(*ctx3), false), llvm::Function::ExternalLinkage, "f3", module3);

   // merge f1 <- f2
   Result = Linker::LinkModules(module1, module2, [&](const DiagnosticInfo &DI) {DI.print(DP);});
   cout << "merge result=" << Result << endl;
   delete ctx2;
   cout << "--done merge #1--" << endl;

   // merge f3 <- f1
   Result = Linker::LinkModules(module3, module1, [&](const DiagnosticInfo &DI) {DI.print(DP);});
   cout << "merge result=" << Result << endl;
   delete ctx1;
   cout << "--done merge #2--" << endl;

   return 0;
}

---output---
merge result=0
--done merge #1--
Bus error

rev.237344

I’m pretty sure module linking is expected to occur in the same LLVM context. IIRC Duncan had some proposal for how ld64 could do something clever with multiple contexts, but I’ve totally forgotten what it was.

I'm pretty sure module linking is expected to occur in the same LLVM context.

Correct.

IIRC Duncan had some proposal for how ld64 could do something clever with multiple contexts, but I've totally forgotten what it was.

This was for LTO (probably unrelated to Yuri's scenario?).

1. Lazy-load each module in its own context, parse its symbols
    (without actually loading functions, metadata, etc.), and
    destroy the module and context.
2. Decide which modules we should link (based on symbol parsing).
3. Load all the chosen modules into a single context and link them
    together.

lib/LTO has a bunch of the pieces for this, but I got the idea from
Rafael's work on tools/gold-plugin/gold-plugin.cpp. It might be
easier to look at there.

I get a crash when I try to link multiple modules registered in their individual contexts.
Documentation for Linker::LinkModules doesn't mention anything about contexts, and the first link succeeds. But the second link crashes.

The problem is that a fair number of things referenced by a Module
are owned by its LLVMContext (types, constants, metadata, and some
other things).

Is this not the right way to merge such modules? If not, then what is the right way?

You can round-trip to bitcode, reading the module into the
destination context. The following pseudo-code gives the idea:

    bool linkModuleFromDifferentContext(Module &D, const Module &S) {
      SmallVector<char, 256> Buffer;
      writeBitcodeToBuffer(S, Buffer);

      std::unique_ptr<Module> M = readBitcodeFromBuffer(D.getContext());
      return Linker::LinkModules(&D, M.get());
    }

In any case, documentation for Linker::LinkModules should say if contexts are or aren't expected to be the same.

Good idea; patch welcome.

Duncan,

Thanks for this workaround, i works. However, going to binary and back causes the significant bump in the process user time (14s->16s), while the wall clock time still lower due to parallelization.

There should be the function llvm::MergeContexts, similar to llvm::Linker::LinkModules. It should merge contexts as containers, with an error when this isn't possible, without changing most of the objects at all. It should cause the source context to disappear.

Yuri

If you want to work on a solution, I think a better approach would be
adding API for one (or both) of:

    class Module {
    public:
        /// Move this module to the given context.
        void changeContext(LLVMContext &NewContext);
    };

    /// Remap the given module in a different context.
    std::unique_ptr<Module> remapModuleToContext(
        const Module &M, LLVMContext &NewContext);

I'm not sure what your exact use case is, but I could see either/both
interfaces being useful. For example, you could change
`Linker::LinkModules()` to something like:

    bool Linker::LinkModules(Module &Dest, Module &Src) {
      if (&Src.getContext() != &Dest.getContext())
        // Recreate Src in Dest's context and link that.
        return LinkModules(
                   Dest,
                   remapModuleToContext(Src, Dest.getContext()).get());

      // Old code...
    }

or:

    bool Linker::LinkModules(Module &Dest, Module &Src) {
      if (&Src.getContext() != &Dest.getContext())
        // Destructively move Src to Dest's context.
        Src.changeContext(Dest.getContext());

      // Old code...
    }

Patches welcome!

While working on the patch, I have two questions:

1. How to find all constants in Module? Does this code find all of them, or they are somewhere else too?
   for (GlobalVariable &GV : globals()) {
     if (auto C = static_cast<Constant*>(GV.Op<0>().get())) {
       ... C is Constant*
     }
   }

2. How to create an equivalent ConstantArray/ConstantStruct objects in another context? ConstantArray::get(ArrayType *T, ArrayRef<Constant*>V) creates it, and puts it into LLVMContextImpl::ArrayConstants. ArrayConstants has the DenseMap ConstantUniqueMap::Map with ConstantArray* as a key, but how to read back ArrayRef<Constant*> ConstantArray was created with?

Yuri

1. How to find all constants in Module? Does this code find all of them, or
they are somewhere else too?
  for (GlobalVariable &GV : globals()) {
    if (auto C = static_cast<Constant*>(GV.Op<0>().get())) {
      ... C is Constant*
    }
  }

Constants are unfortunately part of the Context, not the module :frowning:

Cheers,
Rafael

But types are also part of the Context, so I need to re-create these objects in another context.

GV.Op<0> = C->changeContext(NewContext); is supposed to replace the old context with the new one inside of the GlobalValue in Module.
And same for types in many places.

The downside is that the older objects are left in the old context.

I actually went through the 80% of the changes, only such complex constants are left.

Yuri

I think the other option -- generating a clone of a Module in the
new context -- is probably easier, although I'm not sure. Either way,
you'll need something like this:

    class Remapper {
      DenseMap<Type *, Type *> Types;
      DenseMap<Constant *, Constant *> Constants;
      DenseMap<Metadata *, Metadata *> MDs;
      DenseMap<unsigned, unsigned> AttachmentIDs;
      /* ... other things stored in the context ... */

      LLVMContext &NewContext;

    public:
      Type &remapType(Type &OldType);
      Constant &remapConstant(Constant &OldConstant);
      Metadata &remapMetadata(Metadata &OldMD);
      unsigned remapAttachmentID(unsigned OldAttachmentID);

      /// Map old global value to the new one.
      void mapGlobal(GlobalValue &OldGV, GlobalValue &NewGV);

      /* ... more API and internals ... */
    };

Where I'm assuming `remapType()` checks for `OldType` in the map, and
if it's not there generates the type in `NewContext` (and caches it in
the map). Same deal with the other maps.

If you go with the "create a new module" approach, I think you
basically want to do the following:

1. Create a new module, give it a name, etc.
2. For each global value (for each global and function), create a
    dummy value, leaving the initializer unset. You'll have to
    generate types on-demand as you go. Register the new global
    values in the remapper.
3. Fill in the initializers and function bodies, remapping Values/
    Constants/Metadata/Types as you go.
4. Visit the named md nodes, remapping as you go.
5. ...?

Note that I don't think this is a small project. I suspect a fair
bit of code from the ValueMapper could/should be shared or refactored.

I think the "mutate the module" approach would have some interesting
corner cases that the "create a new module" approach doesn't, but it's
probably doable somehow, and I suspect needs roughly the same
algorithm.

I didn't take such approach, because ValueMapper will only provide the old->new mapping, while both Contexts already maintain the maps of the same objects. I chose the light approach of only changing the minimal set of references necessary to re-map the module to the new context. The original question of how to clone ConstantArray is also besides the ValueMapper approach. I also wouldn't choose to clone the module, as this would defeat the main point: to make it light on CPU. And cloning the large objects obviously isn't light.

Yuri