questions about LLVM

Hi Shuo,

I am CCing your questions to the LLVM developers list so others can reply or correct me.

I have a few questions about LLVM:
(1) The LLVM tutorial says LLVM can be used in architecture research. If I want to run my program on an instruction set defined by myself, is LLVM a right tool to do that?

I don't think so.

In this aspect, is LLVM similar to SimpleScalar simulator?

I am not familiar with a SimpleScalar simulators, may be others will reply.

(2) Can I compile large applications, such as Apache server, into LLVM virtual instructions, then use LLVM intepreter to execute it?

Sure. I believe the website http://safecode.cs.uiuc.edu runs on just in time (JIT) execution of apache webserver compiled to llvm virtual instructions.

(3) If my program calls libc functions, e.g., malloc(), are the instructions of malloc transformed to LLVM virtual instructions and intepreted during its execution?

In general most library calls get translated to the LLVM's call instructions and they will be executed natively using the native libary call when you use the interpreter or the JIT. But if you can compile the library call in to LLVM instructions and link it in to your application then there is no reason why you shouldn't be able to interpret instructions of that call .

Dinakar

Actually, I think that's an entirely reasonable thing to do. We are
currently planning to have a virtual instruction set that the
interpreter uses. See http://llvm.x10sys.com/rspencer/index.html#lli for
more on that. In any event, it would be possible (perhaps not easy) to
create a backend for your own instruction set. The target description
language is pretty easy to use. You might want to look here:
http://llvm.cs.uiuc.edu/docs/WritingAnLLVMBackend.html
for information on writing your own back end.

Reid.

Dinakar Dhurjati wrote:

Hi Shuo,

I am CCing your questions to the LLVM developers list so others can reply or correct me.

I have a few questions about LLVM:
(1) The LLVM tutorial says LLVM can be used in architecture research. If I want to run my program on an instruction set defined by myself, is LLVM a right tool to do that?

I don't think so.

I think it depends on what you need and what tools you currently have.

If you have developed your own instruction set, then LLVM might be useful to you as an optimizing compiler for your instruction set.

The problem is that LLVM won't have a code generator for your instruction set. You'd have to do one of the following:

1. Write a code generator (the LLVM code base makes this relatively easy).

2. Use the C Backend to generate C code from LLVM code, and then use a C compiler that targets your instruction set to compile the resulting code.

The first option is good if there is no compiler for your instruction set and you need to write one quickly. The second option is good if you have a basic compiler for your instruction set but need LLVM to provide more agressive optimization.

In this aspect, is LLVM similar to SimpleScalar simulator?

I am not familiar with a SimpleScalar simulators, may be others will reply.

(2) Can I compile large applications, such as Apache server, into LLVM virtual instructions, then use LLVM intepreter to execute it?

Sure. I believe the website http://safecode.cs.uiuc.edu runs on just in time (JIT) execution of apache webserver compiled to llvm virtual instructions.

Dinakar is correct; the SAFECode website is a version of Apache compiled with LLVM and running on the LLVM x86 JIT.

Povray is another large C program that we've compiled with LLVM, and we've successfully compiled other small to midsize programs (see http://llvm.cs.uiuc.edu/testresults/X86/ for some examples). I don't know what other large applications we've compiled; anybody care to comment?

The known issues that are most likely to cause you grief are:

o C bitfields (don't always work properly)
o Inline assembly code (unsupported; probably the most common compilation problem overall)
o Non-standard GCC union initializers

(3) If my program calls libc functions, e.g., malloc(), are the instructions of malloc transformed to LLVM virtual instructions and intepreted during its execution?

In general most library calls get translated to the LLVM's call instructions and they will be executed natively using the native libary call when you use the interpreter or the JIT. But if you can compile the library call in to LLVM instructions and link it in to your application then there is no reason why you shouldn't be able to interpret instructions of that call .

Dinakar

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

-- John T.

Shuo,

I have a few questions about LLVM:
(1) The LLVM tutorial says LLVM can be used in architecture research. If I want to run my program on an instruction set defined by myself, is LLVM a right tool to do that?

What kind of instruction set do you have in mind? The closer it is to one we already target, the easier this is, but it is quite possible to write a back-end for a relatively new one.

In this aspect, is LLVM similar to SimpleScalar simulator?

You can use the interpreter as a simulator for a very abstract machine (and extend it with any performance metrics you want). How do you want to use it?

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Prof. Adve,

The idea is to develop a memory model where each byte is extended with
3 extra bits. Programs are running on this memory model.
Load/store instructions, including those in LibC functions,
needs to deal with the extra bits in a certain manner. Basically, my
questions are:
(1) Is it feasible to implement the memory model where each byte is
extended with 3 extra bit?
(2) Is there a LLVM version of LibC (in its VM code format) currently?
(3) Is LLVM able to compile HTTP servers, FTP servers and SSH servers to
VM code so that every single VM instruction (include LibC code) is
executed by the VM interpretor?

thank you very much
-Shuo

Shuo Chen wrote:

Prof. Adve,

The idea is to develop a memory model where each byte is extended with 3 extra bits. Programs are running on this memory model.
Load/store instructions, including those in LibC functions, needs to deal with the extra bits in a certain manner. Basically, my questions are:
(1) Is it feasible to implement the memory model where each byte is extended with 3 extra bit?

I imagine that you could do this by modifying the interpretor or the JIT/code generators. Can you give a more complete description of what the memory model does (or what you have to do with these three bits)? Depending on what you're doing, you might just be able to get away with writing an LLVM transformation pass.

(2) Is there a LLVM version of LibC (in its VM code format) currently?

Currently, no. We do not have a complete libc implementation compiled into LLVM bytecode, although someone might be working on that.

In theory, the only difficult part about getting a C library to work is the interface to system calls. In a traditional libc, assembly code is used to provide that functionality and is part of the C library source code. Since LLVM cannot use assembly code, there are two options:

o Add an LLVM syscall intrinsic
o Write a small asm library that wraps the system calls

I believe the latter would be your best option. It would be fairly easy to write and could be loaded by the JIT or linked into native code generated from the LLVM bytecode. In essence, everything above the system calls (i.e. fread, printf, etc) would be compiled into LLVM bytecode, but system calls (i.e. read/write/fork) would be native code that would be linked in. For your project, that should not be a problem.

In practice, getting libc compiled with LLVM is painful because C libraries tend to have terrible build environments. For example, glibc (used on Linux) assumes that you're compiling for ELF or a.out format and uses every GCC feature that could possibly exist. Other C libraries have had strange configuration systems, or make assumptions about the operating system, etc.

You can get this to work; it may just be time consuming.

(3) Is LLVM able to compile HTTP servers, FTP servers and SSH servers to VM code so that every single VM instruction (include LibC code) is executed by the VM interpretor?

Well, we've compiled httpd (Apache), ftpd, telnetd, fingerd, the GNU NIS daemon, and maybe more. We haven't compiled sshd yet, but it would probably work (assuming that you can easily disable the inline asm it will use for quick compression/encryption).

In theory, these should work in the interpreter, but the interpreter doesn't get as much attention as the JIT or native code generators.

-- John T.

In theory, the only difficult part about getting a C library to work is
the interface to system calls. In a traditional libc, assembly code is
used to provide that functionality and is part of the C library source
code. Since LLVM cannot use assembly code, there are two options:

o Add an LLVM syscall intrinsic
o Write a small asm library that wraps the system calls

Or both. Writing the syscall intrinsic is easy, especially if it simply
links into a native asm library to do the actuall syscall. Not that it
is hard to do the codegen for the intrinsic (well, x86 linux is tricky
due to the change in calling conventions at 6 args syscalls). Somewhere
I have code that kind of works for x86.

In practice, getting libc compiled with LLVM is painful because C
libraries tend to have terrible build environments. For example, glibc
(used on Linux) assumes that you're compiling for ELF or a.out format
and uses every GCC feature that could possibly exist. Other C libraries
have had strange configuration systems, or make assumptions about the
operating system, etc.

Also, at least with glibc, it makes uses of several C99 style variable
length array at the end of stucts being statically initialized syntax
that isn't supported by the current cfrontend.

I've tried some other c libraries, and they were smaller than glibc, but
they relied much more on assembly tricks.

So the summary from last time I tried this, before I gave up for a
while, is that glibc uses some C99 that isn't supported, is big and
takes a while to figure out how best to convince it not to think it is
compiling with gcc and asm, and other c libraries are probably even
harder.

Andrew Lenharth
http://www.lenharth.org/~andrewl/