Use of LLVM in a Machine Simulator.

Hi,

I'm slowly getting to grips with what makes up LLVM. I intend to use it
in a machine simulator, e.g. processor, clock, RAM, UART, and other
devices, where the processor will be one of several. It would take a
block of target instructions, e.g. ARM, and produce LLVM to simulate
those on the target machine state, and then JIT them to host
instructions and then execute.

The peripheral simulations would be in C and end up as LLVM too so
optimisations could occur across the ARM->LLVM/peripheral->LLVM
boundary.

Does this sound a good fit so far?

My main question relates to TableGen and decoding the target
instructions. I was initially going to use something specific to the
task of decoding, e.g. New Jersey Machine Code Toolkit, but wonder if I
could/should make use of the *.td for the various processors already
known to LLVM with a new TableGen back-end? (I know there isn't support
for ARM yet in LLVM.) And perhaps the DAG selector is of use in
matching patterns in ARM instructions to the desired LLVM rather than
just doing one ARM instruction at a time production? (For ARM,
substitute other ISAs, some of which aren't in LLVM.)

I'm looking for guidance so I avoid a dead-end.

Cheers,

Ralph.

I would look at the backend of the gcc (ARM) cross compiler the pass the generates .s files. These run from a table interface and from AST tree of the intermediate langauge that gcc uses. It is not llvm but it is a mapping mechanism that maps to ARM on one side (half your problem) the other half is the map from llvm byte code to the ast tree which you do not need.

Probably taking an ARM assembler map and hand mapping single instructions for all the common stuff would get you going then you could implement something more optimimal if you need. In addition you can optimize llvm bytecode then gen asm or object code depending on your model (Jit I think)

gcc intermiate code is quite simple and you should be able to remap using their ARM as a model into llvm. Perhaps even use the output of the test-suite into a mapper program that takes their instruction cases and then looks them up in a symbol table and then returns th llvm equivalent map so you could possibly automate the process of the transcoding into ARM assembler (JIT)

Anyways juswt some ideas to add to your investigation. In addition please let me know what you find. I am working on seeing about making llvm go on MacPPC on OpenBSD because of the secure environment…

regards, and good luck, please let me know how its going… Joseph Aleta(IVO)

I'm slowly getting to grips with what makes up LLVM. I intend to use it
in a machine simulator, e.g. processor, clock, RAM, UART, and other
devices, where the processor will be one of several. It would take a
block of target instructions, e.g. ARM, and produce LLVM to simulate
those on the target machine state, and then JIT them to host
instructions and then execute.

Ok.

The peripheral simulations would be in C and end up as LLVM too so
optimisations could occur across the ARM->LLVM/peripheral->LLVM
boundary.

Does this sound a good fit so far?

Sure, that makes sense.

My main question relates to TableGen and decoding the target
instructions. I was initially going to use something specific to the
task of decoding, e.g. New Jersey Machine Code Toolkit, but wonder if I
could/should make use of the *.td for the various processors already
known to LLVM with a new TableGen back-end? (I know there isn't support
for ARM yet in LLVM.) And perhaps the DAG selector is of use in
matching patterns in ARM instructions to the desired LLVM rather than
just doing one ARM instruction at a time production? (For ARM,
substitute other ISAs, some of which aren't in LLVM.)

This would be an interesting direction to take, but it may not be the easiest one. The easiest direction would be to write a hand coded machine instruction parser (or use something like the machine code toolkit) and then have a switch statement on the opcode to emit LLVM instructions.

Of interest may be this thesis. It talks about converting alpha code to LLVM (among other things): A Task Optimization Framework for MSSP

-Chris

Ralph Corderoy wrote:

I'm slowly getting to grips with what makes up LLVM. I intend to use it
in a machine simulator,

Interesting. We have a simulator that we plan to port to LLVM too :wink:

e.g. processor, clock, RAM, UART, and other
devices, where the processor will be one of several. It would take a
block of target instructions, e.g. ARM, and produce LLVM to simulate
those on the target machine state, and then JIT them to host
instructions and then execute.

The peripheral simulations would be in C and end up as LLVM too so
optimisations could occur across the ARM->LLVM/peripheral->LLVM
boundary.

Does this sound a good fit so far?

I'm not sure. I would suspect that your peripherals have some internal logic
that's independent from CPU. This means that in order to simulate the whole
system, you need to use some discrete-event simulation engine, where
processor and each peripheral will generate events that will be then
executed in order. With such scheme, I don't think you can do optimization
across ARM->LLVM/peripheral->LLVM boundary. Well, unless your
discrete-event engine is also fully in LLVM, but even then, I'm not sure
there's much potential for optimization.

Maybe, I misunderstood what you're trying to do?

My main question relates to TableGen and decoding the target
instructions. I was initially going to use something specific to the
task of decoding, e.g. New Jersey Machine Code Toolkit, but wonder if I
could/should make use of the *.td for the various processors already
known to LLVM with a new TableGen back-end? (I know there isn't support
for ARM yet in LLVM.) And perhaps the DAG selector is of use in
matching patterns in ARM instructions to the desired LLVM rather than
just doing one ARM instruction at a time production? (For ARM,
substitute other ISAs, some of which aren't in LLVM.)

I'm looking for guidance so I avoid a dead-end.

As Chris already mentioned, it does not seem like .td file format is very
suitable for that. It might be better to use some other tool or implement
bit pattern matching manually.

- Volodya

Hi Chris,

Of interest may be this thesis. It talks about converting alpha code
to LLVM (among other things):
http://llvm.org/pubs/2004-05-JoshiMSThesis.html

Thanks, it was of interest. I didn't spot its relevance from the title.

Cheers,

Ralph.

Note that the approach is still sound, but the paper is a bit dated (like anything talking about LLVM published more than 3 months ago :slight_smile: ). In particular, the spirit of this:

"Certain Alpha instructions have no direct and easy mapping to the LLVM IR. One option is to represent such instructions by complex pieces of LLVM code. For example, a ctpop instructions counts the number of set bits in a registers. Although it can be converted into a loop in LLVM, it would be very difficult for the back end to recognize such loops and regenerate the ctpop instructions. Secondly, such detailed translation is not required for optimizations like dead code elimination. Hence, we chose to represent such instructions in a simpler manner using function calls. These function calls can be easily recognized by the back end and translated to a single instruction."

... is right, but the details are no longer true (LLVM does now have support for ctpop). Likewise, the spirit of this is correct:

"The LLVM compiler framework has the ability to define intrinsic functions: functions about which the compiler knows nothing and hence makes conservative transformations. One drawback of using such intrinsics is that they interfere with the dead code elimination process: because of conservative assumptions, LLVM cannot eliminate such function calls. Specifically, these functions can, in theory, write to memory and hence cannot be eliminated."

... but the details are no longer right. In particular, you can specify whether intrinsics have side effects, etc now, which allows them to be DCE'd, CSE'd, hoisted out of loops, etc.

Also, note that the alpha backend described in the paper is quite different than the current alpha backend.

If you have any specific questions, this list is the place to ask. :slight_smile:

-Chris