not much performance gain when reuse the JIT cached compiled object in the subsequent call to the same function ....

Hi, All

I am using MCJIT to compile a expression function and save the JIT compiled object into a ObjectCache using std::map<std::string functionName, llvm::MemoryBuffer> and then it could be reused in the subsequent call to that expression function. However I don’t get much performance gain vs. still compile the same function in each call and not to save the compiled object into ObjectCache. One interesting observation in the perf top is that there are many usage on functions under llvm::DenseMapBase. libptcompiler.so is the library contains a llvm helper class and it has the ObjectCache in it. Does the usage make sense to you guys and any suggestion? Thank you.

1.72% libptcompiler.so [.] bool llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >, void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::LookupBucketFor<void const
1.45% libptcompiler.so [.] llvm::sys::MemoryFence()
1.20% libptcompiler.so [.] llvm::DenseMapInfo<void const*>::isEqual(void const*, void const*)
1.02% libptcompiler.so [.] bool llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::PassInfo const*, llvm::DenseMapInfo<void const*> >, void const*, llvm::PassInfo const*, llvm::DenseMapInfo<void const*> >::Lookup
0.95% libptcompiler.so [.] llvm::DenseMapInfo::isEqual(unsigned int const&, unsigned int const&)
0.94% libptcompiler.so [.] llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::getBuckets() const
0.93% libptcompiler.so [.] llvm::PMTopLevelManager::findAnalysisPass(void const*)
0.81% libptcompiler.so [.] llvm::DenseMapIterator<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*>, true>::DenseMapIterator(llvm::DenseMapIterator<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*>,
0.76% libptcompiler.so [.] llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::getNumBuckets() const
0.75% libptcompiler.so [.] llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >, void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::getBucketsEnd()
0.73% libptcompiler.so [.] llvm::DenseMapIterator<llvm::Pass*, llvm::Pass*, llvm::DenseMapInfollvm::Pass*, false>::AdvancePastEmptyBuckets()
0.72% libptcompiler.so [.] bool llvm::DenseMapBase<llvm::DenseMap<unsigned int, std::pair<unsigned int, unsigned int>, llvm::DenseMapInfo >, unsigned int, std::pair<unsigned int, unsigned int>, llvm::
0.69% libpthread-2.12.so [.] pthread_rwlock_rdlock
0.65% libc-2.12.so [.] _int_malloc
0.61% libpthread-2.12.so [.] pthread_rwlock_unlock
0.61% perf [.] 0x000000000005f1dd
0.60% libptcompiler.so [.] llvm::DenseMapInfollvm::Pass*::isEqual(llvm::Pass const*, llvm::Pass const*)
0.57% libptcompiler.so [.] llvm::DenseMapIterator<void const*, llvm::PassInfo const*, llvm::DenseMapInfo<void const*>, true>::DenseMapIterator(llvm::DenseMapIterator<void const*, llvm::PassInfo const*, llvm::DenseM
0.55% libptcompiler.so [.] llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >, void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::getBuckets()
0.50% libc-2.12.so [.] malloc
0.50% libptcompiler.so [.] llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >, void const*, llvm::Pass*, llvm::DenseMapInfo<void const*> >::getNumBuckets() const
0.49% libptcompiler.so [.] llvm::DenseMapIterator<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*>, false>::DenseMapIterator(std::pair<void const*, llvm::Pass*>, std::pair<void const, llvm::Pass*>*, bool
Press ‘?’ for help on key bindings

The ObjectCache eliminates the need to recompile the function, but the resultant object file must still be processed by the RuntimeDyld (dynamic loader). This processing consists of applying relocations to the function, and that’s where all of the DenseMap references are coming from. It may be that there is an opportunity here to optimize the current code.

For a single function, I’m not surprised that using the ObjectCache doesn’t save a significant amount of time. The ObjectCache is generally more useful when you have a large block of library type functions that you know ahead of time aren’t likely to change but which need to be compiled at runtime at least once.

-Andy