I'm working on some code to re-compile the output from ahead-of-time LLVM compilers at runtime, which allows inlining of function calls whose targets are only known at runtime. This works by decorating selected functions ahead of time, adding code for determining caller-callee relationships and invoking the JIT compiler at runtime. The decoration works on the IR from the front-end compiler (e.g. clang) before generating object code with llc.
If anyone is interested in knowing more about the runtime inlining project it's available on GitHub at https://github.com/drti/drti
Making this work got a bit tricky in places and I have some questions about improvements:
1. To figure out when one "decorated" function has called another I pass some information in the r14 register as well as in the instruction stream accessible via the return address. The code is only supposed to work on Linux x86_64 for now. What I wanted to do was extend the existing X86TargetMachine to add in these features but I couldn't find any way to do this cleanly - I couldn't see any target machine extension points like RegisterPass and RegisterStandardPasses for IR passes. What I did in the end was implement a new target type "x86_64_drti" which delegates as much as possible to the real X86 target obtained via TargetRegistry::lookupTarget. This is messy because many of the virtual functions from TargetPassConfig that I want to delegate to X86PassConfig are protected (e.g. addPreRegAlloc). So I'm wondering if I missed something and if not, whether there's a reason the existing target machines don't provide any extension points?
2. To make it more robust I'd like to convert CALL instructions into a PUSH and JMP, so I can fake the return address to point at a block containing raw data and a JMP back to the instruction after the original CALL. I think we could call this a "return thunk". So instead of CALL target [...] I would have something like the below:
MOV my\_thunk, R11 PUSH R11 JMP target
Where my_thunk would have this:
I don't know if this is even feasible since it splits the basic block containing the CALL and quite likely breaks any pre-call or post-call handling. To be honest I'm also not sure how this relates to instruction "bundles" either and whether the CALL is already more complicated than a single instruction. Does anyone know what would be involved in this kind of transformation from CALL to PUSH and JMP?