Tablegen backend for emulator core?

Howdy llvm-dev,

Low priority question. This doesn’t really meet the requirements for an RFC, because it’s probably merely a half-baked idea at this point.

As I was working with some tablegen internals on an 8-bit processor backend, it struck me that I would need at some point to write an emulator for that processor.

And I realized that, although I could write an emulator in the traditional manner, tablegen already has most of the information it needs to automatically generate the guts of an emulator.

Tablegen’s already generating a disassembler (-gen-disassembler).

At the least, tablegen could be given an additional backend that says, “given this instruction, do this emulation step.” Such a step could be pure code copied from the .td files, or it could be compositionally constructed based on the classes that an object is composed from.

You’d have to write your own code to deal with emulated machine state, but hey, instruction parsing would be an item off the to-do list.

I would have a hard time believing that this concept is novel. Has anyone else taken a crack at this?

Sincerely,

This is a cool idea and AFAIK there isn’t any project that has done similar thing.

However IMO a potential problem is: In this story, LLVM only eases the efforts of writing a disassembler from scratch (and parsing object file formats for you). But we already can use the implementation of MCDisassembler provided by each target as a normal library to perform disassembly. Same for parsing object file formats.
In other words, it’s already easy to use LLVM’s disassembler in an emulator project.

It is, however, an attractive option to directly generate part of the emulator code. We can even provide handler code for each instruction in the TG description as you have suggested. It would be a cool out-of-tree project for sure, but I see little chance to go upstream because it’s pretty far away from LLVM’s original goal.

Best,
-Min

Hi John,

Simon Cook (CCed) previously used LLVM MC to help write a simulator
<https://llvm.org/devmtg/2016-01/slides/fosdem16-aapsim.pdf>, which
might be worth taking a look at. Though I understood from your email
that you're imagining relying more heavily on TableGen for generating
the execution loop.

Best,

Alex

And I realized that, although I could write an emulator in the traditional manner, tablegen already has most of the information it needs to automatically generate the guts of an emulator.

Simon Cook (CCed) previously used LLVM MC to help write a simulator <https://llvm.org/devmtg/2016-01/slides/fosdem16-aapsim.pdf>, which might be worth taking a look at. Though I understood from your email that you’re imagining relying more heavily on TableGen for generating the execution loop.

Thank you. I think Cook understood what I was hinting at, towards the end of his presentation. You could build such a simulator by creating a large switch statement based on MCInst’s the way that Cook has done, or you could theoretically let tablegen create that switch for you… llvm-tblgen -gen-simulator was the way he put this idea. At the least, the concept maintains tablegen’s DRY approach to representing machine instructions.

jwb

Hi John,

This is an area I'm still greatly interested in since doing the work
up to that talk, and have worked on a second simulator using
MCDisassembler as the decoder, but sadly haven't had the time to do
any of the exploring using TableGen for the semantics.

It has however still been ticking over in my brain so I have some more
thoughts on what would need to be considered, and think it would be a
good addition to LLVM, so would be happy to see that be added, making
comparisons in my mind to what CGEN adds to the GNU toolchain.

The first thing is what kind of simulation do we want to have LLVM
model, if its a simple instruction set simulator then in some regards
I don't imagine this being too hard, but if we want to be able to
stretch as far as modelling full pipelines (unlikely to be
automatically), it would be good to generate the components, even if
we have to build the pipeline by hand (I think scheduling info in
TableGen is probably insufficient for this).

The other large thing would be identifying the semantics we don't have
from TableGen patterns and working out a nice way to describe these in
Instruction definitions. For architectures that have status bits
modified by instructions and then used by future branch instructions,
these are typically modelled as registers that are implicitly written,
and so definitions would need to be extended to describe these. I'm
thinking you might have a second field that describes extra semantics
that doesn't make sense for code generation, but 100% sure on that
one. If we can auto-generate that one that IMO would dramatically
reduce how much needs manually writing.

One thing to keep in mind though, is at some scale a single switch on
Insn.getOpcode() might not be the best model for the more complex and
varied architectures. If you have multiple generations of
architectures you might want to split the simulator up into different
loops, so if you have a couple of generations of cores in one backend
maybe you want to generate different loops (maybe reusing
ProcessorModels or something similar here?) That would help
identifying which instructions would still need manual semantics
written in a more mentally scalable way.

Practically speaking if something like this would be written, I think
there's a good model to follow in how GlobalISel's reuse of TableGen
patterns has gone, having a TableGen generator that generates what it
can for some instructions, and then asking someone to write the
missing parts, and over time more things can move from hand written to
auto-generated.

As for driving the simulator, I've found the "forked objdump" approach
works well, but if this were in-tree I'd expect something to be more
specialised (and likely written from scratch) to fit in. It may also
be potentially possible to repurpose parts of LLDBs lldb-server
pulling in some "MCSimulator" library to have something that talks RSP
(for people using LLVM + GDB), but I'm not familiar with all its
components to know how feasible this would be.

Overall I still think this is something that would be a great addition
to LLVM and I think the raw pieces are there. As with most things I
think these things live and die by having people who would use and
maintain such things. I'm not sure how many others also have interest
or thoughts in this area but if I would certainly welcome such an
addition.

Thanks,
Simon

Because processors can be arbitrarily simple or complex, and the needs for quality of emulation vary as well (do we need to simulate cache effects or no? do we want RSP protocol support or no?) then I’d suggest creating a set of base classes that can be extended per processor, to whatever levels of modelling are needed for that architecture. I think this is more in line with LLVM’s philosophy of creating a bunch of base classes, that you can extend into particular tool implementations.

I can think of three basic strategies for emulating an existing architecture, within LLVM.

First, an instruction level emulator. Basically, a large switch statement, conditioned on the opcode. Each emulation step uses the existing MCDisassembler architecture to decode the operands out of each instruction, and then emulate the instruction on those operands. Slow, but well understood and easy to implement.

Second, a cross compiler from the emulated machine’s binary code, to C++. Instead of emitting a large switch statement, we emit a very large function with C-style labels on every emulated instruction. Branch instructions are modelled as conditional goto’s in the emitted code. We could even reconstruct BasicBlocks of code, by using tablegen’s knowledge of whether a particular instruction is a terminator or not. Since we’re letting LLVM use all its optimizers on each emulated BasicBlock, performance should be much more reasonable. Drawbacks include not being able to support self-modifying code. This shouldn’t be a big problem, because LLVM currently doesn’t emit self-modifying code, AFAIK.

Third, a qemu-style dynamic recompiler, Conveniently, we happen to have a JIT compiler already built into LLVM! In fact, we could modify the second approach, such that when we trap on attempting to write the read-only code areas, instead we cross compile that basic block with the changes that were just made to that memory area. So this third style would be fast, and it could even handle self-modifying code.

The key insight for all three of these strategies, is that the code that parses operands, and applies the opcode instruction to those operands, can be the same for all three styles of emulation. And there’s where tablegen could save the day, by spitting out all that code per emulator.

Internally, tablegen is dumber than a lot of people give it credit for. It’s all just records, and classes of records, and methods for determining whether one class inherits from another. The smarts are in each of tablegen’s backends, which determine how those records should be interpolated into code. What tablegen IS good at, is concatenating arbitrary strings and/or code snippets. So, each instruction in tablegen could map a list of emulator IDs, onto a list of strings (code) that implements these instructions for that emulator. This approach would allow any backend to experiment with multiple simultaneous emulation approaches. So you could bring a more sophisticated emulator online, while a simpler one is already in production.

This approach also has the advantage that the code that runs the emulation can be compositional – tablegen can concatenate the code that parses an operand, with code that checks for cache effects, with code that implements the opcode, with code that retires instructions from cache, etc. Tablegen remains DRY, but by having tablegen restricted to concatenating code snippets to emulate instructions or basic blocks, you can still make the emulation as simple or as complex as you want it to be.

I thought about pulling that existing emulator from lldb, but it seemed to me like it would introduce a novel dependency into llvm, to cause llvm to depend upon lldb. RSP server support is a good idea however, and it clearly would be the same code regardless of what style of emulation you were doing. If you build your base classes right, you could even support fancy things like reverse execution, out of box.

I’m envisioning an llvm-emu command line tool that lets you choose an -mcpu and a -triple and an -arch to emulate. And there’d also be flags for printing either emulated instructions, or the emulated program’s stdout, so that you can verify an emulated program run with lit.

Anyway, the whole purpose of such an emulator, and the immediate reason why multiple projects need it, is that it permits you to quickly test new codegen ideas, as part of the normal check-llvm or check-clang build. Codegen testing inside llvm, is currently restricted to “do you emit this exact sequence of instructions or not.” Having an on-board set of emulators, allows you to write tests of the form “does the generated code do this or not,” and get rapid feedback. Of course, a well-structured set of base classes would allow you to build a lot of other emulate-ish things as well.