LLVM to SUIF-MACH VM binary

Hi Chris,

Hi! I'm CC'ing the llvmdev list for the benefit of others.

Since I see you're very involved in LLVM, I need a little guidance on getting from C to MACH-SUIF.

I've been given the task of using LLVM to translate C code to another VM architecture known as MACH-SUIF. For this architecture, i don't think there's an assembler (but there is a disassembler), so I'm going to have to generate a binary file output.

Okay, that by itself shouldn't be a problem. We do not currently support emitting .o files directly, but it has been planned for quite some time.

Could I edit and add to the .../lib/Target directory and add in a new target that outputs the binary that this new VM supports? Do you think I would need to add to any other directories withing the LLVM directories?

The bulk of your work will certainly be in your lib/Target directory. I would start simple, and use the debugging output (enabled with -print-machineinstrs) to see if you're generating correct code when you start. The code you have to add outside of lib/Target should be restricted to the work needed to get .o files writing.

For your project, the tricky part will be that you have to start emitting .o files before you can really test out the system. This increases the number of dependencies that need to implemented before you can get "void main() {return;}" running, which is always the first milestone.

In order to get .o files writing, you'll want to implement the CodeEmitter interfaces, which are used to write binary machine code to memory (currently used by the JIT). After that, you'll also want to add your new support for writing .o files (which is basically just adding file headers and packaging up relocations). For this, you can choose to either add a new virtual method to the TargetMachine class "addPassesToEmitObjectFile", which will enable your functionality, or you can build it into the LLC tool directly. If you chose to build it into LLC, you can use the results of the MachineCodeEmitter and addPassesToEmitMachineCode. At this point, I'm not sure which way will work best for you.

If possible, it would be nice if you could make your .o file writing code as target-independent as possible (potentially defining a new TargetObjectFile interface for any hooks you need). In particular, if MACH-SUIF is an ELF-based system, it would be nice to be able to use the ELF-specific routines to write X86-ELF or PPC-ELF files as well (assuming the hooks were implemented). Clearly you can choose to worry about this or not at your discretion.

Can you say a little bit about MACH-SUIF? With a brief google search, I didn't turn up anything that described the architecture. Is it a RISC-like machine with 32-bit instruction words?

-Chris

A couple notes on this:

1. We also need to be able to *read* .o files for linking. Right now we
   just assume that any symbol not found in a bytcode file is
   implemented in some native library and will be resolved at runtime.
   This isn't the greatest assumption. To resolve native binary symbols
   we need to be able to read native .a, .so, and .o files to ensure
   the symbols are resolvable at runtime (or to actually resolve them if
   we're building a native executable).

2. GNU has a library, BFD, part of binutils suitable for writing binary
   object files in the native format of a given host machine. It
   provides a common interface and abstraction (format neutral) for
   reading and writing at least ELF and COFF files. The only thing it
   doesn't do well is translating between file types. There can be some
   loss of information between the various file formats.

3. The linking side of this effort should end up in lib/Linker

Reid.

Chris Lattner wrote:

Hi Chris,

Hi! I'm CC'ing the llvmdev list for the benefit of others.

Since I see you're very involved in LLVM, I need a little guidance on getting from C to MACH-SUIF.

I've been given the task of using LLVM to translate C code to another VM architecture known as MACH-SUIF. For this architecture, i don't think there's an assembler (but there is a disassembler), so I'm going to have to generate a binary file output.

Okay, that by itself shouldn't be a problem. We do not currently support emitting .o files directly, but it has been planned for quite some time.

...

If possible, it would be nice if you could make your .o file writing code as target-independent as possible (potentially defining a new TargetObjectFile interface for any hooks you need). In particular, if MACH-SUIF is an ELF-based system, it would be nice to be able to use the ELF-specific routines to write X86-ELF or PPC-ELF files as well (assuming the hooks were implemented). Clearly you can choose to worry about this or not at your discretion.

Thanks for all the information.

Can you say a little bit about MACH-SUIF? With a brief google search, I didn't turn up anything that described the architecture. Is it a RISC-like machine with 32-bit instruction words?

It's another VM representation. I haven't really gotten to know the nitty gritty of the language so I'm not too confortable with it, but these two links should decribe the project. It's based on the work from SUIF. More specifically, we are using MACHINE-SUIF as a backend to SUIF to generate code to an embedded processor. We want to move away from using the SUIF frontend, but the backend works fine. Essentially, for right now, I have to convert LLVM IR to SUIFvm IR.

http://www.eecs.harvard.edu/hube/software/nci/suifvm.html
http://www.eecs.harvard.edu/hube/software/nci/overview.html

-Chris

- John

Reid Spencer wrote:

A couple notes on this:

1. We also need to be able to *read* .o files for linking. Right now we
   just assume that any symbol not found in a bytcode file is
   implemented in some native library and will be resolved at runtime.
   This isn't the greatest assumption. To resolve native binary symbols
   we need to be able to read native .a, .so, and .o files to ensure the symbols are resolvable at runtime (or to actually resolve them if
   we're building a native executable).

2. GNU has a library, BFD, part of binutils suitable for writing binary
   object files in the native format of a given host machine. It
   provides a common interface and abstraction (format neutral) for reading and writing at least ELF and COFF files. The only thing it
   doesn't do well is translating between file types. There can be some
   loss of information between the various file formats.

3. The linking side of this effort should end up in lib/Linker

Reid.

Thanks for the heads up. Once I get further along on my project, I'll have to look into this. For right now we're going from one VM IR to another VM IR, so I guess it's generating bytecode instead of object code for now.

John

Okay, it's a RISCy architecture of sorts. I don't see any documentation on the binary format. Does it require register allocation? If not, it might be easier to write the target in the style of the C-backend (which doesn't use any of the code generator components). If it does, making use of the code generator infrastructure would make sense.

-Chris

A couple notes on this:
1. We also need to be able to *read* .o files for linking. Right now we
  just assume that any symbol not found in a bytcode file is
  implemented in some native library and will be resolved at runtime.
  This isn't the greatest assumption. To resolve native binary symbols
  we need to be able to read native .a, .so, and .o files to ensure
  the symbols are resolvable at runtime (or to actually resolve them if
  we're building a native executable).

I'm not sure how much of this we want to incorporate into our side of things. We definitely need the ability to inspect the symbol table of native .o files, but we don't need to be able to understand enough to do a full link (at least until we decide to start supersuming native linkers, which won't happen for a long time).

2. GNU has a library, BFD, part of binutils suitable for writing binary
  object files in the native format of a given host machine. It
  provides a common interface and abstraction (format neutral) for
  reading and writing at least ELF and COFF files. The only thing it
  doesn't do well is translating between file types. There can be some
  loss of information between the various file formats.

BFD is definitely a useful library, but it has it's own issues. The ELF file format is simple enough that it might be worth rolling our own interfaces or just using libelf. Also, license issues would have to be considered.

-Chris

3. The linking side of this effort should end up in lib/Linker

Reid.

Hi Chris,

Hi! I'm CC'ing the llvmdev list for the benefit of others.

Since I see you're very involved in LLVM, I need a little guidance on getting
from C to MACH-SUIF.

I've been given the task of using LLVM to translate C code to another VM
architecture known as MACH-SUIF. For this architecture, i don't think
there's an assembler (but there is a disassembler), so I'm going to have to
generate a binary file output.

Okay, that by itself shouldn't be a problem. We do not currently support
emitting .o files directly, but it has been planned for quite some time.

Could I edit and add to the .../lib/Target directory and add in a new target
that outputs the binary that this new VM supports? Do you think I would need
to add to any other directories withing the LLVM directories?

The bulk of your work will certainly be in your lib/Target directory. I
would start simple, and use the debugging output (enabled with
-print-machineinstrs) to see if you're generating correct code when you
start. The code you have to add outside of lib/Target should be
restricted to the work needed to get .o files writing.

For your project, the tricky part will be that you have to start emitting
.o files before you can really test out the system. This increases the
number of dependencies that need to implemented before you can get
"void main() {return;}" running, which is always the first milestone.

In order to get .o files writing, you'll want to implement the CodeEmitter
interfaces, which are used to write binary machine code to memory
(currently used by the JIT). After that, you'll also want to add your new
support for writing .o files (which is basically just adding file headers
and packaging up relocations). For this, you can choose to either add a
new virtual method to the TargetMachine class "addPassesToEmitObjectFile",
which will enable your functionality, or you can build it into the LLC
tool directly. If you chose to build it into LLC, you can use the results
of the MachineCodeEmitter and addPassesToEmitMachineCode. At this point,
I'm not sure which way will work best for you.

If possible, it would be nice if you could make your .o file writing code
as target-independent as possible (potentially defining a new
TargetObjectFile interface for any hooks you need). In particular, if
MACH-SUIF is an ELF-based system, it would be nice to be able to use the
ELF-specific routines to write X86-ELF or PPC-ELF files as well (assuming
the hooks were implemented). Clearly you can choose to worry about this
or not at your discretion.

Can you say a little bit about MACH-SUIF? With a brief google search, I
didn't turn up anything that described the architecture. Is it a
RISC-like machine with 32-bit instruction words?

-Chris

-Chris

Chris Lattner wrote:

Can you say a little bit about MACH-SUIF? With a brief google search, I didn't turn up anything that described the architecture. Is it a RISC-like machine with 32-bit instruction words?

It's another VM representation. I haven't really gotten to know the nitty gritty of the language so I'm not too confortable with it, but these two links should decribe the project. It's based on the work from SUIF. More specifically, we are using MACHINE-SUIF as a backend to SUIF to generate code to an embedded processor. We want to move away from using the SUIF frontend, but the backend works fine. Essentially, for right now, I have to convert LLVM IR to SUIFvm IR.

http://www.eecs.harvard.edu/hube/software/nci/suifvm.html
http://www.eecs.harvard.edu/hube/software/nci/overview.html

Okay, it's a RISCy architecture of sorts. I don't see any documentation on the binary format. Does it require register allocation? If not, it might be easier to write the target in the style of the C-backend (which doesn't use any of the code generator components). If it does, making use of the code generator infrastructure would make sense.

-Chris

Sample from SUIF disassembler (done by someone else):
  lda $vr10.p32 <- main.A
  cvt $vr11.p32 <- $vr10.p32
  add $vr12.p32 <- $vr11.p32,$vr9.s32
  lod $vr13.s32 <- 0($vr12.p32)
  cvt $vr8.s32 <- $vr13.s32
  mul $vr6.s32 <- $vr7.s32,$vr8.s32
  ldc $vr15.s32 <- 5
  ldc $vr18.s32 <- 1
  add $vr17.s32 <- main.i,$vr18.s32

Is it "lots" or "infinite"? If it's "lots" you'll still have to do register allocation. If not, use a special purpose pass (like the C writer does) would make sense.

-Chris