Replacing a function from one module into another one

Hello LLVM Developers,

I’m trying to replace a function defined in one module into another module (different files). The first issue I ran into was that llvm::Function does not have a method “moveBefore” or “moveAfter” as the llvm::BasicBlock or llvm::Instruction do, so I figured I would just move the BasicBlocks of the replacing function into the function that was being replaced, and then eliminate the original BasicBlocks. So far I had only one issue while eliminating the original BasicBlocks, I can only removeFromParent but not eraseFromParent, but the first function works fine anyways. For example, the original function is:

define i32 @foo2(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4

And the function that is going to replace that is:

define i32 @foo3(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4

So they are pretty much the same except for the name (the functions differ later in the code, but the error happens at the beginning). So what I do is move the entry block in foo3 before the entry block in foo2 and then “removeFromParent” the original entry block in the foo2. When I try to dump the Module, an error says:

Use still stuck around after Def is destroyed: store i32 %0, i32* %a.addr1, align 4

Which I understand as that the new instruction “store” inserted in the new block in foo2 is still referring to the foo3 function’s argument, which confused me at the beginning since I named both arguments the same to facilitate the process of substitution. What I thought as a solution was to then change the arguments of the function foo2 (the one that is being replaced) with the ones in the function foo3, to “move” the references, but I don’t understand the difference between the functions of llvm::Function “addAttribute” and “addParamAttr”, neither why they require to specify the attribute (or the Attribute::AttrKind ) if the function already requires the argument position. When I did a similar process to change the calling instruction in the main function, I used the functions “setArgOperand” and “getArgOperand” which worked just fine, and also the “replaceAllUsesWith”, I’m looking for something similar in the case of the function’s arguments and their use in the function’s body.

So the question is: How can I properly replace a function (in my case only the function’s body, since I’m limiting already that both functions have the same header) defined in one module intro another module.

Regards,
Daniel Moya

Hi Daniel,

CloneFunctionInto helps you copy entire functions. It only takes care of the copy though, you’ll have to empty out the replaced function manually.

Cheers,
Philip

Cross-module operations are tricky in general; I'd suggest using the Linker::linkModules API if possible.

-Eli

Hello and thanks for the answer,

I’m still facing issues, I’ll do my best to explain my situation, as I explained, I have two modules, each one with its own main and functions, I would like to replace in the oldModule a function call that is calling foo2 (defined in oldModule) to instead call foo3, which is defined in the refModule. So in summary, I have:

  1. The original instruction call, defined in the main function of the oldModule, who is a calling function to foo2, I’ll name it oInst (original Instruction)

  2. The “new” instruction call, defined in the main function of the refModule, who is a calling function to foo3, I’ll name it nInst (new Instruction)

  3. The foo2 function definition, defined in the oldModule, I’ll name it oFunc (original Function)

  4. The foo3 function definition, defined in the refModule, I’ll name it nFunc (new Function)

  5. I have the parameters (or arguments?) of both functions, both in the calling instruction and in the function’s definition, which I’ll refer to as p(oInst), p(nInst), p(oFunc), p(nFunc) (the parameters of)

  6. For testing purposes, both foo2 and foo3 and defined identical, same returning type, parameter’s type and even the same variable’s name in the IR.
    So after calling the llvm::LinkerlinkModules function, I did:

  7. First attempt:

  8. llvm::CallInst *callOInst = static_castllvm::CallInst*(oInst); // I cast the oInst to a llvm::CallInst

  9. callOInst->setCalledFunction(nFunc); // now oInst should call nFunc

Error:
Call parameter type does not match function signature!

%0 = load i32, i32* %a, align 4
i32 %call1 = call i32 @foo3(i32 %0, i32 %1)

So even though the parameters are the same type, and defined identically in both modules, the p(oInst) apparently does not match the p(nFunc).

  1. Second attempt:

  2. llvm::Instruction *nCloneInst = nInst->clone(); //Clone of the nInst, to avoid remove it from the refModule

  3. nCloneInst->insertAfter(oInst); // I’ll bring the nInst because I know p(nInst) and p(nFunc) match

  4. nCloneInst->mutateType(oInst->getType()); //Idk why I have to this, but necessary for next line

  5. oInst->replaceAllUsesWith(nCloneInst);

  6. oInst->dropAllReferences();

  7. oInst->eraseFromParent();

Error:

Instruction does not dominate all uses!

%0 = load i32, i32* %a, align 4
%2 = call i32 @foo3(i32 %0, i32 %1)

Great, now the p(nInst) are still referring to their definition in the refModule, so either I bring those instructions too (which sounds really messy) or somehow I change the p(nInst) to refer to the instructions in oldModule, which in my case are actually defined the same (but apparently the references don’t change based on the name being the same in both modules).

  1. Third attempt:

  2. The same 1-4 steps as before, from cloning instruction to replaceAllUsesWith

  3. llvm::CallInst *callNInst = static_castllvm::CallInst*(nCloneInst);

  4. llvm::CallInst *callOInst = static_castllvm::CallInst*(oInst); // cast both oInst and nInst

  5. for (unsigned int i = 0; i < callOInst->getNumArgOperands(); i++) { callNInst->setArgOperand(i,callOInst->getArgOperand(i)); } //replace p(nInst) with p(oInst)

  6. The same 5-6 steps as before, drop and erase

Error:
Call parameter type does not match function signature!
%0 = load i32, i32* %a, align 4
i32 %2 = call i32 @foo3(i32 %0, i32 %1)

So back to the first problem, the p(nInst) (now converted to p(oInst)) apparently does not match the p(nFunc).

I also looked into the CloneFunctionInto function, but I didn’t understand the arguments of it, and there’s really no documentation or examples that I could find on the internet. Specifically, I have troubles with llvm::SmallVectorImpl< llvm::ReturnInst *> &Returns argument, I don’t know how to initialize it, it doesn’t have a 0 argument constructor and if I try:

llvm::SmallVectorImpl< llvm::ReturnInst *> ReturnsArg = llvm::SmallVectorImpl< llvm::ReturnInst *>(2); // Just as an example

It says that constructor is protected. I didn’t want to go further since I’m clueless on how to properly use this function, and I’m even not completely sure if it would fix all the troubles that I’ve been having with the other three attempts.

Btw, all these errors happen when I try to run (through JIT) the module, a workaround that I know that I can do for all my attempts is just to dump the module to a file, and then reload it and execute it (I know it works since in both oldModule and refModule I use the same IR variable’s names) but I would like to do the work the right way and not having to inefficiently dump a file just to reload it again and get all the references right.

Thanks for the help in advance, I’ll be really grateful for any advice or light in my situation.

Regards,
Daniel Moya

Hi.
Besides the LLVM linker, you can also use this tool:
https://github.com/travitch/whole-program-llvm

It links all the modules and produces a single module containing every function.
Regards.

Hi Ahmad,

What does that tool does besides what LLVM linker already does? I don’t think my problem is in linking both modules, I think LLVM linker does the job for me, the issue is when changing the called function to call another function (in the example previously provided, to change it from foo2 to foo3, and adjusting the function parameter’s references).

Regards,
Daniel Moya

Hi Daniel,
The answer was for your first thread. The benefits are outlined in the repository, but your problem is still there. I’m not sure. But this looks similar to my recent problem. I think that a bitcast will solve the problem. The types after the linking process may have different names but the same contents. The links to the answers are as follows:
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125413.html

https://stackoverflow.com/questions/51894129/convert-function-pointer-call-to-function-call-at-the-ir-level

Regards.

Thank you Ahmad,

I figured out that, although the type of both p(oInst) and p(nInst) were the same, I had to:

for (unsigned int i = 0; i < callOInst->getNumArgOperands(); i++) { callOInst->getArgOperand(i)->mutateType(callNInst->getArgOperand(i)->getType()); }

that solves the issue at the calling instruction in the main function, but now I see that linkModules does not work for me (it’s a mess to bring all the other unnecessary functions from the reference module) so what I need is to directly move one function from one module to another one. I’m still facing the issue with the llvm::SmallVectorImpl< llvm::ReturnInst *> &Returns argument for the CloneFunctionInto, can anyone please give an example of use of that function? I guess that’s the only way because otherwise if I move the blocks from foo3 in the refModule to the oldModule, the references of the moved block still point to the arguments of foo3 defined in the refModule, and I don’t know how to go instruction by instruction in all the moved blocks and correct the reference to point to the arguments in the oldModule’s new function, this also sounds very messy and complicated.

Regards,
Daniel Moya

Hi Daniel,

CloneFunctionInto wants to tell you about the new ReturnInstructions it produced, you’re expected to pass a vector for this purpose. You’re free to ignore these values, though, but you still have to pass that vector: SmallVector<ReturnInst*, 8> Returns;

For the argument remapping, you’re supposed to pre-populate the VMap you pass with the appropriate argument-to-argument mapping.

It’s perfectly fine to use CloneFunctionInto across modules, btw. Only operations across LLVMContexts are tricky.

Cheers,
Philip

Hi Philip,

Thank you very much for your answer, the vector declaration example worked. I’m pretty sure the ValueToValueMapTy is the last thing I need because I even saw there is another function that could help me llvm**:**:RemapFunction; but my problem is that I don’t know how to populate the ValueToValueMapTy (VMap) and I couldn’t find a single complete example on the internet. To begin with, as the name implies (and also from some errors that I got) the ValueToValueMapTy takes a std::pair<llvm::Value, llvm::Value>, but what I need is to map the arguments (llvm::Attribute) of both functions (the first argument of foo2 is equivalent to the first argument of foo3, and so on), however, apparently there’s no transformation from a llvm::Attribute to a llvm::Value. I’m using the function getAttribute (unsigned i, StringRef Kind) const to get the attributes, but I don’t even understand why is it required to give the StringRef Kind if what I need is just the attribute of the function in the i position. I’m really thankful for the quick responses and for the attention received.

Regards,
Daniel Moya

Hi Daniel,

the implementation of CloneModule should be a good example. Generally all you need to do in your case is map old arguments to new arguments.

Cheers,
Philip

Hi Philip,

Thanks for the reference, I was able to follow it and copy the code that I saw necessary, but I still have some issues (references are still not updated). I created the function:

void populateVMap(llvm::ValueToValueMapTy &VMap, llvm::Function *fOld, llvm::Function *fNew) {
llvm::Function::arg_iterator DestI = fOld->arg_begin();
for (llvm::Function::const_arg_iterator J = fNew->arg_begin(); J != fNew->arg_end();
++J) {
DestI->setName(J->getName());
VMap[&*J] = &*DestI++;
}

}

The same as in CloneModule, then I have this code:

llvm::Function *fNew = nMod->getFunction(nFName);
llvm::Function *fOld = oMod->getFunction(oFName);

fOld->dropAllReferences();
fOld->deleteBody();

llvm::ValueToValueMapTy VMap;
populateVMap(VMap, fOld, fNew);
bool ModuleArg = true;
llvm::SmallVector<llvm::ReturnInst*, 8> Returns;
llvm::CloneFunctionInto(fOld, fNew, VMap, ModuleArg, Returns);
if (fNew->hasPersonalityFn())
fOld->setPersonalityFn(llvm::MapValue(fNew->getPersonalityFn(), VMap));

But after running the code, I still get the error: Use still stuck around after Def is destroyed

Which means that the instructions moved from fNew still refer to the arguments of fNew instead of the “mapped” (supposedly) ones in fOld. I have some ideas of what could be going on:

  1. I’m not populating correctly the VMap argument, although I wouldn’t know how to do it differently since I literally copied it from the CloneModule.
  2. I have to still add more instruction either before or after CloneFunctionInto, for example in CloneModule they call the function copyComdat, but I don’t know if it’s necessary (I guess no because it’s only defined in CloneModule.cpp), they also iterate over global alias and functions, but since I’m only concerned in the arguments of one function, I guess I don’t have to worry about that.
  3. Perhaps what you mentioned before “Only operations across LLVMContexts are tricky” has something to do, because the operation is between modules of different files, I’m guessing they have different Contexts too, what could be done in this case?
    Finally, I also tried moving the BasicBlocks manually and got the same error “Use still stuck…”, and then tried calling llvm::RemapFunction afterwards with the populated VMap, but always got the error:

Assertion `(Flags & RF_IgnoreMissingLocals) && “Referenced value not in value map!”’ failed

Which left me clueless since the mapping worked in the CloneFunctionInto (the program executed fine, the error “Use still stuck” happened at the program’s end) but not in this RemapFunction. Anyways, I believe CloneFunctionInto should do this remap work so I shouldn’t have to call RemapFunction. Any guideline would be greatly appreciated, it has been a long suffering.

Regards,
Daniel Moya

Hi Daniel,

when and where are you getting that error? Can you paste the full code?

Cheers,
Philip

Hi Philip,

The error happens when the program finishes and it automatically calls the destructors, so it is not an error specifically inside my program. Here’s the full code:

#include “llvm/ExecutionEngine/ExecutionEngine.h”
#include “llvm/ExecutionEngine/MCJIT.h”
#include “llvm/IRReader/IRReader.h”
#include “llvm/Support/TargetSelect.h”
#include “llvm/Support/SourceMgr.h”
#include “llvm/Support/CodeGen.h”
#include “llvm/Transforms/Utils/Cloning.h”
#include “llvm/Transforms/Utils/ValueMapper.h”
#include
#include <string.h>

const std::string originalFileName = “tracer.ll”;
const std::string referenceFileName = “tracer_ref.ll”;

void populateVMap(llvm::ValueToValueMapTy &VMap, llvm::Function *fOld, llvm::Function *fNew) {
llvm::Function::arg_iterator DestI = fOld->arg_begin();
for (llvm::Function::const_arg_iterator J = fNew->arg_begin(); J != fNew->arg_end();
++J) {
DestI->setName(J->getName());
VMap[&*J] = &*DestI++;
}

void addFunction(llvm::Module *oMod, llvm::Module *nMod, std::string nFName, std::string oFName) {
llvm::Function *fNew = nMod->getFunction(nFName);
llvm::Function *fOld = oMod->getFunction(oFName);

fOld->dropAllReferences();
fOld->deleteBody();

llvm::ValueToValueMapTy VMap;
populateVMap(VMap, fOld, fNew);
bool ModuleArg = true;
llvm::SmallVector<llvm::ReturnInst*, 8> Returns;
llvm::CloneFunctionInto(fOld, fNew, VMap, ModuleArg, Returns);
if (fNew->hasPersonalityFn())
fOld->setPersonalityFn(llvm::MapValue(fNew->getPersonalityFn(), VMap));
}

void replaceFunctions(std::string oldFuncName, std::string newFuncName) {
llvm::SMDiagnostic origErr;
llvm::LLVMContext origContext;
std::unique_ptrllvm::Module origMod = (llvm::parseIRFile(originalFileName, origErr, origContext));
llvm::Module *origMod_copy = origMod.get();
llvm::SMDiagnostic refErr;
llvm::LLVMContext refContext;
std::unique_ptrllvm::Module refMod(llvm::parseIRFile(referenceFileName, refErr, refContext));
llvm::Module *refMod_copy = refMod.get();

addFunction(origMod_copy, refMod_copy, newFuncName, oldFuncName);
printModule(origMod_copy);
std::cout << “Finish\n”; // Make sure the program finished

}

int main() {
std::string oldFuncName = “foo2”;
std::string newFuncName = “foo3”;
replaceFunctions(oldFuncName, newFuncName);
return 0;
}

The complete error after printing the module (I know the error it’s after the program’s execution because the “Finish” is printed before):

Finish
While deleting: i32 %
Use still stuck around after Def is destroyed: %c = alloca i32, align 4
Use still stuck around after Def is destroyed: %b.addr = alloca i32, align 4
Use still stuck around after Def is destroyed: %a.addr = alloca i32, align 4
step3F: /home/moydan00/Clang/llvm-clang-6.0.0/lib/IR/Value.cpp:88: llvm::Value::~Value(): Assertion `use_empty() && “Uses remain when a value is destroyed!”’ failed.

The beginning of the foo2 function in the tracer.ll:

define i32 @foo2(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
%c = alloca i32, align 4

The beginning of the foo3 function in the tracer_ref.ll:

define i32 @foo3(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
%c = alloca i32, align 4

I know it’s a little bit confusing since both foo2 and foo3 are almost identical, but from the error, I know that the instruction copied from foo3 to foo2 are still referring to the arguments in foo3. If any other information is needed please tell me.

Regards,
Daniel Moya

Is the source function referencing global variables? You’ll have to move those too.

Cheers,
Philip

No, the source function (foo3 defined in tracer_ref.ll in this case) is only referencing, inside its function body, to the arguments that it takes. This could be a minimum-version of the tracer_ref.ll file:

; ModuleID = ‘tracer_ref.c’
source_filename = “tracer_ref.c”
target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

@.str = private unnamed_addr constant [11 x i8] c"C1 is: %d\0A\00", align 1
@.str.1 = private unnamed_addr constant [11 x i8] c"C2 is: %d\0A\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define i32 @main() #0 {
entry:
%a = alloca i32, align 4
%b = alloca i32, align 4

store i32 7, i32* %a, align 4
store i32 5, i32* %b, align 4

%0 = load i32, i32* %a, align 4
%1 = load i32, i32* %b, align 4
%call1 = call i32 @foo3(i32 %0, i32 %1)
store i32 %call1, i32* %c2, align 4
%3 = load i32, i32* %c2, align 4
%call3 = call i32 (i8*, …) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str.1, i32 0, i32 0), i32 %3)
ret i32 0
}

; Function Attrs: noinline nounwind optnone uwtable
define i32 @foo3(i32 %a, i32 %b) #0 {
entry:
%a.addr = alloca i32, align 4
%b.addr = alloca i32, align 4
%c = alloca i32, align 4
store i32 %a, i32* %a.addr, align 4
store i32 %b, i32* %b.addr, align 4
%0 = load i32, i32* %a.addr, align 4
%1 = load i32, i32* %b.addr, align 4
%mul = mul nsw i32 %0, %1
store i32 %mul, i32* %c, align 4
%2 = load i32, i32* %c, align 4
%3 = load i32, i32* %b.addr, align 4
%mul1 = mul nsw i32 %2, %3
store i32 %mul1, i32* %c, align 4
%4 = load i32, i32* %c, align 4
%5 = load i32, i32* %a.addr, align 4
%mul2 = mul nsw i32 %4, %5
store i32 %mul2, i32* %c, align 4
%6 = load i32, i32* %c, align 4
ret i32 %6
}

attributes #0 = { noinline nounwind optnone uwtable “correctly-rounded-divide-sqrt-fp-math”=“false” “disable-tail-calls”=“false” “less-precise-fpmad”=“false” “no-frame-pointer-elim”=“true” “no-frame-pointer-elim-non-leaf” “no-infs-fp-math”=“false” “no-jump-tables”=“false” “no-nans-fp-math”=“false” “no-signed-zeros-fp-math”=“false” “no-trapping-math”=“false” “stack-protector-buffer-size”=“8” “target-cpu”=“x86-64” “target-features”="+fxsr,+mmx,+sse,+sse2,+x87" “unsafe-fp-math”=“false” “use-soft-float”=“false” }
attributes #1 = { “correctly-rounded-divide-sqrt-fp-math”=“false” “disable-tail-calls”=“false” “less-precise-fpmad”=“false” “no-frame-pointer-elim”=“true” “no-frame-pointer-elim-non-leaf” “no-infs-fp-math”=“false” “no-nans-fp-math”=“false” “no-signed-zeros-fp-math”=“false” “no-trapping-math”=“false” “stack-protector-buffer-size”=“8” “target-cpu”=“x86-64” “target-features”="+fxsr,+mmx,+sse,+sse2,+x87" “unsafe-fp-math”=“false” “use-soft-float”=“false” }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

The error happens with the first three instructions of foo3, and the first two are a direct reference to the arguments of foo3 (I don’t know why the error message includes the third instruction). There are no global variables in both module files. Thanks for the quick responses, it really helps.

Regards,
Daniel Moya

Hello LLVM Developers,

I hope not to bother, I would like to know if there’s any new information related to the status of this query, whether it’s already being analyzed or if there’s an idea of what could be wrong with my program. I know I can count on the workaround that I mentioned for this matter, but I would like to know your thoughts on the problem so far, for documentation issues. Thanks again for all the help that has been given.

Regards,
Daniel Moya