Would DosBox benefit from LLVM JIT?

Hello,

First of all, I'd like to point out that I am a newbie in this topic
and this is more of a "would it work?" kind of question. I basically
just came up with a difficult problem and decided to research on it.

I recently tried to run Elder Scrolls: Daggerfall on an ARM netbook
Toshiba AC100 and failed even after turning on the latest patches for
"dynamic recompilation". I took a look at the code and here's what I found:

https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/decoder.h#L34

The relevant macros and functions are defined there:

https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec/risc_armv4le-o3.h

So basically, it looks like there's code that translates instructions
from x86 to a few other platforms in chunks of 32 opcodes. Since this
code was too slow to me, I asked myself the question "how could it speed
up?" and assumed that perhaps LLVM could optimize it on the fly. So,
here's the question - would it be feasible given the assumptions above?
What I am thinking about is a system that would:

1. Generate LLVM IR code instead of native calls and JIT them on the fly,
2. Apply optimizations that I know from Clang.

I saw this example on pastebin [1] and generating functions on the fly
looks rather straightforward, but I am not sure if it would be as easy
if I wanted to translate machine code from one platform to another. Does
LLVM have or integrate with any libraries that would make this
practical? What would be the main challenges? Keep in mind that I would
welcome even a partial answer.

Cheers,
d33tah

[1] http://pastebin.com/f2NSGZGR

A sequence of 32 instruction is not very likely to have many optimisation opportunities that LLVM can take advantage of. You may get a speedup from longer traces, though of course the LLVM JITing time is likely to be longer, so you’d want to make sure that it’s done in a separate thread. If you can get longer traces (and DOSBox has the infrastructure already for invalidating on self-modifying code) then you may be able to get some speedup.

There was a similar project to use LLVM in QEMU a few years ago that failed to provide a speedup.

David

W dniu 22.07.2015 o 15:21, David Chisnall pisze:

A sequence of 32 instruction is not very likely to have many
optimisation opportunities that LLVM can take advantage of.

I don't know the codebase, but perhaps it's as easy as increasing the
number here and maybe adjusting some relevant buffers:

https://github.com/wjp/dosbox/blob/idados/src/cpu/core_dynrec.cpp#L212

You may
get a speedup from longer traces, though of course the LLVM JITing
time is likely to be longer, so you’d want to make sure that it’s
done in a separate thread. If you can get longer traces (and DOSBox
has the infrastructure already for invalidating on self-modifying
code) then you may be able to get some speedup.

Introducing another thread sounds like a difficult task, though it
definitely makes sense and could be worth it, though...

There was a similar project to use LLVM in QEMU a few years ago that
failed to provide a speedup.

David

Why did it fail?