ORC v2 question

Hi,

I am trying out ORC v2 and facing some problems.

I am using LLVM 8.0.1.
I updated my ORC v1 implementation from 6.0 to 8.0 based on
Kaleidoscope example (i.e. using Legacy classes) and that works fine.

Now I am trying out ORC v2 apis, based on
llvm/KaleidoscopeJIT.h at master · llvm-mirror/llvm · GitHub.

I have got it to compile and build.
But looks like my compiled code is not being optimized at all.

For optimization I am using the PassManagerBuilder class to setup the
passes. I use OptLevel = 2. This works perfectly with ORC v1 apis, but
with ORC v2 I am not sure what is happening. I can see that the
optimizeModule() is being called.

Second issue is - I need to use an instance method as optimizeModule()
but based on the example this is now a static method. Is this a
restriction in the new api? I need to pass some parameters from the
JIT context such as what optimization level to use.

Thanks and Regards
Dibyendu

Hi Dibyendu,
Could you please send me your unoptimized and expected optimized code? The default implementation only contains some transformations. It would be helpful to know what you are actually trying.

Optimize Module is just a function object.

Hi Praveen,

Could you please send me your unoptimized and expected optimized code? The default implementation only contains some transformations. It would be helpful to know what you are actually trying.
Optimize Module is just a function object.

You can view the code here:

https://github.com/dibyendumajumdar/ravi/blob/master/include/ravi_llvmcodegen.h
https://github.com/dibyendumajumdar/ravi/blob/master/src/ravi_llvmjit.cpp

Just look for USE_ORCv2_JIT

The code is messy but that is because LLVM's api keeps changing from
version to version, causing huge issues for users.

Hi Dibyendu
I understand that ORCv2 is changing fast, I will try to answer quickly if you provide me the

Optimized IR Module (with your expectations) and Unoptimized IR Module the ORC gives you.
I can’t able to build your project though now :slight_smile:

I understand that ORCv2 is changing fast, I will try to answer quickly if you provide me the
Optimized IR Module (with your expectations) and Unoptimized IR Module the ORC gives you.
I can't able to build your project though now :slight_smile:

Sorry Praveen, I only need someone to tell me what I am doing wrong in
ORC v2 vs ORC v1. The code I sent shows how I am using it.
There is not much point in analyzing the output at this stage as the
same optimizations should be applied in the two cases.

Thank you - I fixed that now.

Regards

Hi Dibyendu,

Sorry for the delayed reply. Looks like you have figured out how to solve your issue already. Out of interest, what did you need to do? Do you have anything that you would like to see added to http://llvm.org/docs/ORCv2.html ?

Cheers,
Lang.

Hi Lang,

Sorry for the delayed reply. Looks like you have figured out how to solve your issue already. Out of interest, what did you need to do? Do you have anything that you would like to see added to ORC Design and Implementation — LLVM 18.0.0git documentation ?

Sorry my post was misleading. I figured out below which was part of the problem.
Code is still not getting optimized at all. I don't really know what
is going on.

Yet the same setup works fine with the Legacy ORC v1 setup.

Any help is appreciated.

Here are again the links to the relevant code:

https://github.com/dibyendumajumdar/ravi/blob/master/include/ravi_llvmcodegen.h
https://github.com/dibyendumajumdar/ravi/blob/master/src/ravi_llvmjit.cpp

Just look sections marked USE_ORCv2_JIT.

> Optimize Module is just a function object.
>

Thank you - I fixed that now.

Regards
Dibyendu

Hi Dibyendu,

A couple of notes to simplify your code. The following:

#if USE_ORCv2_JIT
auto FPM = llvm::make_unique(TSM.getModule());
#else
std::unique_ptr FPM(new FunctionPassManager(M.get()));
#endif

can be reduced to

auto FPM = llvm::make_unique(&*M);

since M must be non-null.

Likewise, this:

#if USE_ORCv2_JIT
for (auto &F : *TSM.getModule())
FPM->run(F);
#else
for (auto &F : *M)
FPM->run(F);
#endif

can be reduced to

for (auto &F : *M)
FPM->run(F);

When you say your code is not getting optimized, do you mean that IR optimizations are not being applied, or that codegen optimizations are not being applied?

What do you see if you dump the modules before/after running the pass manager on them, like this:

dbgs() << “Before optimization:\n” << *M << “\n”;

for (auto &F : *M)
FPM->run(F);

dbgs() << “Before optimization:\n” << *M << “\n”;

I expect that output to be the same for both ORC and ORCv2. If not something is going wrong with IR optimization.

CodeGen optimization seems a more likely culprit: JITTargetMachineBuilder and ExecutionEngineBuilder have different defaults for their CodeGen opt-level. JITTargetMachineBuilder defaults to CodeGenOpt::None, and ExecutionEngineBuilder default to CodeGenOpt::Default.

What happens if you make the following modification to your setup?

auto JTMB = llvm::orc::JITTargetMachineBuilder::detectHost();
JTMB->setCodeGenOptLevel(CodeGenOpt::Default); // ← Explicitly set Codegen opt level
auto dataLayout = JTMB->getDefaultDataLayoutForTarget();

Hopefully one of these approaches helps. If not let me know and we can dig deeper – I would like to help you to get this working.

Cheers,
Lang.

Hi Lang,

Hi Dibyendu,

A couple of notes to simplify your code. The following:

#if USE_ORCv2_JIT
    auto FPM = llvm::make_unique<FunctionPassManager>(TSM.getModule());
#else
    std::unique_ptr<FunctionPassManager> FPM(new FunctionPassManager(M.get()));
#endif

can be reduced to

auto FPM = llvm::make_unique<FunctionPassManager>(&*M);

since M must be non-null.

Likewise, this:

#if USE_ORCv2_JIT
    for (auto &F : *TSM.getModule())
      FPM->run(F);
#else
    for (auto &F : *M)
      FPM->run(F);
#endif

can be reduced to

for (auto &F : *M)
  FPM->run(F);

When you say your code is not getting optimized, do you mean that IR optimizations are not being applied, or that codegen optimizations are not being applied?

What do you see if you dump the modules before/after running the pass manager on them, like this:

dbgs() << "Before optimization:\n" << *M << "\n";
for (auto &F : *M)
  FPM->run(F);
dbgs() << "Before optimization:\n" << *M << "\n";

I expect that output to be the same for both ORC and ORCv2. If not something is going wrong with IR optimization.

CodeGen optimization seems a more likely culprit: JITTargetMachineBuilder and ExecutionEngineBuilder have different defaults for their CodeGen opt-level. JITTargetMachineBuilder defaults to CodeGenOpt::None, and ExecutionEngineBuilder default to CodeGenOpt::Default.

What happens if you make the following modification to your setup?

auto JTMB = llvm::orc::JITTargetMachineBuilder::detectHost();
JTMB->setCodeGenOptLevel(CodeGenOpt::Default); // <-- Explicitly set Codegen opt level
auto dataLayout = JTMB->getDefaultDataLayoutForTarget();

Hopefully one of these approaches helps. If not let me know and we can dig deeper -- I would like to help you to get this working.

Cheers,
Lang.

Thank you - let me try these and get back.

Regards

Hi Lang,

When you say your code is not getting optimized, do you mean that IR optimizations are not being applied, or that codegen optimizations are not being applied?

What do you see if you dump the modules before/after running the pass manager on them, like this:

dbgs() << "Before optimization:\n" << *M << "\n";
for (auto &F : *M)
  FPM->run(F);
dbgs() << "Before optimization:\n" << *M << "\n";

I expect that output to be the same for both ORC and ORCv2. If not something is going wrong with IR optimization.

Well for ORCV2 there is no change before and after.
I also get this message:

JIT session error: Symbols not found: { raise_error }

Yes raise_error and all other extern functions are explicitly added as
global symbols.

CodeGen optimization seems a more likely culprit: JITTargetMachineBuilder and ExecutionEngineBuilder have different defaults for their CodeGen opt-level. JITTargetMachineBuilder defaults to CodeGenOpt::None, and ExecutionEngineBuilder default to CodeGenOpt::Default.

What happens if you make the following modification to your setup?

auto JTMB = llvm::orc::JITTargetMachineBuilder::detectHost();
JTMB->setCodeGenOptLevel(CodeGenOpt::Default); // <-- Explicitly set Codegen opt level
auto dataLayout = JTMB->getDefaultDataLayoutForTarget();

No change.

Regards

> When you say your code is not getting optimized, do you mean that IR optimizations are not being applied, or that codegen optimizations are not being applied?
>
> What do you see if you dump the modules before/after running the pass manager on them, like this:
>
> dbgs() << "Before optimization:\n" << *M << "\n";
> for (auto &F : *M)
> FPM->run(F);
> dbgs() << "Before optimization:\n" << *M << "\n";
>
> I expect that output to be the same for both ORC and ORCv2. If not something is going wrong with IR optimization.

Well for ORCV2 there is no change before and after.

Okay I had to put the after dump following MPM->run(*M).
So now I get optimized IR.

I also get this message:

JIT session error: Symbols not found: { raise_error }

So this must be real issue. IR is getting optimized but then codegen is failing.

Hi Dibyendu,

What do you see if you dump the modules before/after running the pass manager on them, like this:

dbgs() << “Before optimization:\n” << *M << “\n”;
for (auto &F : *M)
FPM->run(F);
dbgs() << “Before optimization:\n” << *M << “\n”;

I expect that output to be the same for both ORC and ORCv2. If not something is going wrong with IR optimization.
Well for ORCV2 there is no change before and after.

What about for ORCv1? There is nothing ORCv2 specific about this code snippet, so that seems to indicate a misconfigured function pass manager, but your pass manager config (at first glance) didn’t look like it was different between the two.

I also get this message:
JIT session error: Symbols not found: { raise_error }

Ahh – I see the problem. The DynamicLibrarySearchGenerator is using the getAddressOfSymbol method, which (under the hood) is basically issuing an appropriate dlsym lookup, and that does not find explicitly added symbols. To find explicitly added symbols you need to call DynamicLibrary::SearchForAddressOfSymbol instead, but unfortunately that method’s behavior is not a good fit for what DynamicLibrarySearchGenerator is trying to do.

There are two ways you could tackle this:
(1) Write your own generator that calls sys::DynamicLibrary::SearchforAddressOfSymbol, or
(2) Add the symbols up-front using the absoluteSymbols function.

I would be inclined to do the latter: it’s more explicit, and easier to limit searches to exactly the symbols you want.

CodeGen optimization seems a more likely culprit: JITTargetMachineBuilder and ExecutionEngineBuilder have different defaults for their CodeGen opt-level. JITTargetMachineBuilder defaults to CodeGenOpt::None, and ExecutionEngineBuilder default to CodeGenOpt::Default.

What happens if you make the following modification to your setup?

auto JTMB = llvm::orc::JITTargetMachineBuilder::detectHost();
JTMB->setCodeGenOptLevel(CodeGenOpt::Default); // ← Explicitly set Codegen opt level
auto dataLayout = JTMB->getDefaultDataLayoutForTarget();

I am not sure what to make of that. What happens if you print TM->getOptLevel() right before running CodeGen? Once your have explicitly set it I would expect them to be the same for ORCv1 and ORCv2. If they’re not then it’s a plumbing issue.

– Lang.

Well for ORCV2 there is no change before and after.
Okay I had to put the after dump following MPM->run(*M).
So now I get optimized IR.

Huh. I assumed FPM->run(F) actually ran the function passes. I may just be misunderstanding how the pass manager works.

– Lang.

Hi Lang,

I also get this message:
JIT session error: Symbols not found: { raise_error }

Ahh -- I see the problem. The DynamicLibrarySearchGenerator is using the getAddressOfSymbol method, which (under the hood) is basically issuing an appropriate dlsym lookup, and that does not find explicitly added symbols. To find explicitly added symbols you need to call DynamicLibrary::SearchForAddressOfSymbol instead, but unfortunately that method's behavior is not a good fit for what DynamicLibrarySearchGenerator is trying to do.

There are two ways you could tackle this:
(1) Write your own generator that calls sys::DynamicLibrary::SearchforAddressOfSymbol, or
(2) Add the symbols up-front using the absoluteSymbols function.

I would be inclined to do the latter: it's more explicit, and easier to limit searches to exactly the symbols you want.

Okay I will look into this. Thank you for all the help.

> CodeGen optimization seems a more likely culprit: JITTargetMachineBuilder and ExecutionEngineBuilder have different defaults for their CodeGen opt-level. JITTargetMachineBuilder defaults to CodeGenOpt::None, and ExecutionEngineBuilder default to CodeGenOpt::Default.
>
> What happens if you make the following modification to your setup?
>
> auto JTMB = llvm::orc::JITTargetMachineBuilder::detectHost();
> JTMB->setCodeGenOptLevel(CodeGenOpt::Default); // <-- Explicitly set Codegen opt level
> auto dataLayout = JTMB->getDefaultDataLayoutForTarget();

I am not sure what to make of that. What happens if you print TM->getOptLevel() right before running CodeGen? Once your have explicitly set it I would expect them to be the same for ORCv1 and ORCv2. If they're not then it's a plumbing issue.

I explicitly set TM->Options anyway so maybe I don't need this?

Regards
Dibyendu

Looks the documented approach in http://llvm.org/docs/ORCv2.html
doesn't work in LLVM 8.

Regards

Hi Lang,

Hi Lang,

> > There are two ways you could tackle this:
> > (1) Write your own generator that calls sys::DynamicLibrary::SearchforAddressOfSymbol, or
> > (2) Add the symbols up-front using the absoluteSymbols function.
> >
> > I would be inclined to do the latter: it's more explicit, and easier to limit searches to exactly the symbols you want.
> >
>
> Okay I will look into this. Thank you for all the help.
>

Looks the documented approach in ORC Design and Implementation — LLVM 18.0.0git documentation
doesn't work in LLVM 8.

Here is what I am ended up doing.

  auto &JD = ES->getMainJITDylib();
  llvm::orc::MangleAndInterner mangle(*ES, *this->DL);
  llvm::orc::SymbolMap Symbols;
  for (int i = 0; global_syms[i].name != nullptr; i++) {
    Symbols.insert( { mangle(global_syms[i].name),

llvm::JITEvaluatedSymbol(llvm::pointerToJITTargetAddress(global_syms[i].address),

llvm::JITSymbolFlags(llvm::JITSymbolFlags::FlagNames::Exported)) });
  }
  llvm::cantFail(JD.define(llvm::orc::absoluteSymbols(Symbols)),
"Failed to install extern symbols");

global_symbols is an array with name and function address.
Works on Linux but not on Windows. So I must be missing something else?

Thanks and Regards
Dibyendu