(First of all, I'd like to point out that I am a newbie in this topic
and this is more of a "would it work?" kind of question. I basically
just came up with a difficult problem and decided to research on it.)
I recently tried to run Elder Scrolls: Daggerfall on an ARM netbook
Toshiba AC100 and failed even after turning on the latest patches for
"dynamic recompilation". I took a look at the code and here's what I found:
The relevant macros and functions are defined there:
So basically, it looks like there's code that translates instructions
from x86 to a few other platforms in chunks of 32 opcodes. Since this
code was too slow to me, I asked myself the question "how could it speed
up?" and assumed that perhaps LLVM could optimize it on the fly. So,
here's the question - would it be feasible given the assumptions above?
What I am thinking about is a system that would:
1. Generate LLVM IR code instead of native calls and JIT them on the fly,
2. Apply optimizations that I know from Clang.
I saw this example on pastebin [1] and generating functions on the fly
looks rather straightforward, but I am not sure if it would be as easy
if I wanted to translate machine code from one platform to another. Does
LLVM have or integrate with any libraries that would make this
practical? What would be the main challenges? Keep in mind that I would
welcome even a partial answer.
PANDA (if that’s what you’re talking about) Is really heavy weight and to be honest it’s really hard to write optimizations in a reverse engineering context. Optimizations almost always come after the code is fully working, so while it succeeded for what it strove in that respect, dynamically lifting is hard.
Also, if you’re talking about dynamically lifting to IR and emulating over the instruction semantics just to provide alternative platform support but want speed, it might be best to consider the idea of using a cross compiler for each target since you have the dosbox code. Emulating is always going to be an order of magnitude slower because you have to work at smaller step semantics. Running Dosbox in QEMU sounds like an option if you want to get off the ground really fast but can’t compile & run Dosbox to an alternative target.
Also, if you didn’t know it, LLVM’s JIT works at the function level, but the problem in your context is that it does it lazily, which is going to impose latency (understandably) on the first hit, but the JIT can’t possibly determine the end of a function because of the undecideability of disassembly. Compilers often place function definitions contiguous to one another and any one function can have multiple returns. The JIT would probably operate over a limited subset of the actual function definition, and forcibly impose multiple reinterpretation.