Is the flow "llvm-extract -> llvm-link -> clang++ " supposed to be used in this way? To Extract and Re-insert functions?

Hi all,
First post to the list, I hope you can help or guide me on this task.

I am involved in a project that requires to re-link extracted and edited IR code

Thus I want to know if these tools can be used in this way?

clang+±4.0 code03.cpp -emit-llvm -S -o code03.ll

llvm-extract-4.0 code03.ll -func main -S -o extracted_main.ll

llvm-link-4.0 code03.ll -only-needed -override extracted_main.ll -S -o linked_main.ll
clang+±4.0 linked_main.ll -o main.out

where code03.cpp is:

#include
using namespace std;
int main()
{
cout << "First Message\n ";
cout << "Second Message\n ";
cout << "Third Message\n ";
return 0;
}

I have been trying to extract a function’s llvm IR, modify it preserving its signature (or not), and re-insert this function back to the original IR file, however I am getting an error during the compilation step ( clang+±4.0 linked_main.ll -o main.out ):

main.ll:(.text+0x14): undefined reference to .str' main.ll:(.text+0x34): undefined reference to .str.1’
main.ll:(.text+0x51): undefined reference to `.str.2’

and linked_main.ll file has this section:

@.str.4 = private unnamed_addr constant [16 x i8] c"First Message\0A \00", align 1
@.str.1.6 = private unnamed_addr constant [17 x i8] c"Second Message\0A \00", align 1
@.str.2.8 = private unnamed_addr constant [16 x i8] c"Third Message\0A \00", align 1
@.str = external hidden unnamed_addr constant [16 x i8], align 1
@.str.1 = external hidden unnamed_addr constant [17 x i8], align 1
@.str.2 = external hidden unnamed_addr constant [16 x i8], align 1

But the function does not use the correct versions of the strings as the linked “extracted_main” keeps making calls to .str, .str.1, .str.2? Am I not supposed to do it this way?

Thank you in advance

  • nico

After trying different things, I realized that I should modify the visibility of the conflicting
variables on the target linked.ll file to hidden, before calling the linker.

This can be easily done by calling llvm-extract with the delete option to prepare a file to
receive the linked function

llvm-extract-4.0 code03.ll -func main -S -o extracted_main.ll
lvm-extract-4.0 code03.ll -func main -delete -S -o linked.ll
llvm-link-4.0 linked.ll -only-needed -override extracted_main.ll -S -o linked_main.ll

This works great for a single module compilation.
But what are the effects if I have several modules?

Thanks,

  • nico

llvm-extract changes the semantic as it gives every GlobalValue
external linkage for simplicity.
Therefore, if you have GVs with internal linkage when you run
llvm-extract that information is lost.
At least, you may want to fix this, the relevant code is around here
(Transforms/IPO/ExtractGV.cpp)

      // For simplicity, just give all GlobalValues ExternalLinkage. A trickier
      // implementation could figure out which GlobalValues are actually
      // referenced by the Named set, and which GlobalValues in the rest of
      // the module are referenced by the NamedSet, and get away with leaving
      // more internal and private things internal and private. But for now,
      // be conservative and simple.

      // Visit the GlobalVariables.
      for (Module::global_iterator I = M.global_begin(), E = M.global_end();
           I != E; ++I) {
        bool Delete =
            deleteStuff == (bool)Named.count(&*I) && !I->isDeclaration();
        if (!Delete) {
          if (I->hasAvailableExternallyLinkage())
            continue;
          if (I->getName() == "llvm.global_ctors")
            continue;
        }

Thanks,

I’m a bit rusty so forgive me if I’m not making sense, but wouldn’t you want to extract the string literals along with the function in this case and re-link both later?

I forgot, but apparently I had a bug open about this a while ago
https://bugs.llvm.org/show_bug.cgi?id=31674

wouldn’t you want to extract the string literals along with the function in this case and re-link both later?

Yes and no. If a function does not share these string literals with others, it would be fine to extract and
re-link both.

But on the application I am working on, these variables may be shared by several extracted functions,
thus it is best to keep them on the same place and just have them re-linked to this common variable.
This way, I don’t have to worry about which function has the correct version.

  • nico

Hi Davide, thank you for the answer.

Therefore, if you have GVs with internal linkage when you run
llvm-extract that information is lost.

Which means that if I try to link another module I may have overlapping
variables?
My apologies but my knowledge of the linker is not as solid as I wished.

At least, you may want to fix this, the relevant code is around here
(Transforms/IPO/ExtractGV.cpp)

I will take a close look and do some experimentation.

I forgot, but apparently I had a bug open about this a while ago
https://bugs.llvm.org/show_bug.cgi?id=31674

I believe this falls under Gordon’s comment right?
Do we really want to extract this variable with the function?

Quick question: What internalization means on this scope?

  • nico