Getting To Native Code

Suppose I wanted to, say, write glibc for LLVM (fat chance! :slight_smile: I would need at some point to write (hopefully a small amount) of native code to, say, access specific registers, handle interrupts, or generate operating system traps.

Is my finding correct or can this be done currently with LLVM in some way I haven’t found?

What is the alternative? Write a library function in C and have it called by LLVM?

The reason I’m asking is that I’d like as much code as possible to be open to optimization by LLVM. If the core of my runtime library can’t be expressed in LLVM then it can’t be optimized by it either. Ideally, everything except the glue for system traps, interrupts and the like should be expressible in LLVIS.

Reid Spencer wrote:

Suppose I wanted to, say, write glibc for LLVM (fat chance! :slight_smile: I would need at some point to write (hopefully a small amount) of native code to, say, access specific registers, handle interrupts, or generate operating system traps.

  Funny you should mention that; getting a C library compiled to LLVM code is one of the tasks on my plate. :slight_smile:

From my (somewhat cursory) review of AsmParser, it seems like this can't be done with LLVM right now. There is nothing in AsmParser or the rest of LLVM that would allow native assembly instructions to be passed through to the back end.

Is my finding correct or can this be done currently with LLVM in some way I haven't found?

  You are correct that the LLVM assembly language cannot accept native assembly instructions or access machine specific features directly. However, there is hope:

  Long term, one of our projects is to port the Linux kernel to LLVM. This project has the same problem that you have encountered; how to access machine specific services.

  Our current plan is to add a set of intrinsic functions that will give us access to the hardware facilities that we need. For example, we will probably have an intrinsic that registers a function as a system call handler, another intrinsic that performs I/O, another intrinsic that tells us what hardware is connected to the machine, etc, etc.

  In this way, the actual hardware can be abstracted away so that LLVM bytecode programs don't need to know about it. All they see is a set of intrinsic functions for the particular piece of hardware/OS upon which they run.

  Please note that the Linux kernel project is still in its infancy and won't be done for awhile. :slight_smile:

  In the meantime, there are still things you can do. Please see below...

What is the alternative? Write a library function in C and have it called by LLVM?

  First, I should probably mention that an LLVM program has access to just about all OS services offered to a regular native code program. This is because undefined symbols in an LLVM program are resolved when it is generated to native code.

  Specifically, if you JIT a program using lli, the JIT looks the symbol up in its own address space. That is how LLVM programs find the system calls and other library functions that are not compiled to LLVM bytecode.

  For native code, the symbols are resolved at native code link time (i.e. you run llc on the bytecode and then assemble it and link it).

  For writing code that does special stuff (accesses machine registers, etc), there are several alternatives:

  1) As you suggested, you could write a library in C and compile it to native code. You would then be able to link this library with LLVM code (either by loading it into the JIT with -load or by using llc to generate native code and linking to your library).

  2) You could add an intrinsic for it. I'd only recommend this if it's something that would be useful as a generic primitive; an intrinsic to read performance registers would be okay; an intrinsic for creating prime numbers would not.

The reason I'm asking is that I'd like as much code as possible to be open to optimization by LLVM. If the core of my runtime library can't be expressed in LLVM then it can't be optimized by it either. Ideally, everything except the glue for system traps, interrupts and the like should be expressible in LLVIS.

  That is where we are headed; we just have a way to go before we're there.

  If you have any further questions, please feel free to ask. I realize I'm being pretty verbose here.

Regards,

-- John T.

From my (somewhat cursory) review of AsmParser, it seems like this can't
be done with LLVM right now. There is nothing in AsmParser or the rest
of LLVM that would allow native assembly instructions to be passed
through to the back end.

Correct.

Is my finding correct or can this be done currently with LLVM in some
way I haven't found?

Nope you're right. Adding to what John said, eventually we will support
GCC style 'asm' blocks, but these will only have extremely limited (if
any) support in the JIT. If you need low-level machine specific
functionality now, probably the best way to do it is to write a .s file or
a .c file with asm's in it, and compile that with a native compiler.
Since we are ABI compatible, you shouldn't have any problem linking it to
llvm compiled programs.

-Chris

Kewl Beans! You’re heading right where I need LLVM to go :slight_smile:

Details …

<i>	Funny you should mention that; getting a C library compiled to LLVM 
code is one of the tasks on my plate.	:)</i>

Good. If I can help, please let me know.

<i>	You are correct that the LLVM assembly language cannot accept native 
assembly instructions or access machine specific features directly. 
However, there is hope:

	Long term, one of our projects is to port the Linux kernel to LLVM. 
This project has the same problem that you have encountered; how to 
access machine specific services.

	Our current plan is to add a set of intrinsic functions that will give 
us access to the hardware facilities that we need.  For example, we will 
probably have an intrinsic that registers a function as a system call 
handler, another intrinsic that performs I/O, another intrinsic that 
tells us what hardware is connected to the machine, etc, etc.</i>

I’d vote for just adding a new keyword, syscall, that works just like a declare. For example:

syscall int “__mmap” (void* start, long length, int prot, int flags, int fd, unsigned offset);

This would indicate that unlike a “declare” of the same form, calls to mmap would invoke special
handling by placing arguments in registers and invoking a system trap … details left to the backend.

<i>
	In this way, the actual hardware can be abstracted away so that LLVM 
bytecode programs don't need to know about it.  All they see is a set of 
intrinsic functions for the particular piece of hardware/OS upon which 
they run.</i>

A wonderful idea. Much of the code I’ve written is to abstract away the operating system interface so
that XPL can be ported to many platforms. You’re actually going one step further and abstracting the
hardware. This is totally cool!

<i>
	Please note that the Linux kernel project is still in its infancy and 
won't be done for awhile.	:)</i>

Any ballpark ideas on when an alpha version could be available? Are we talking months or years here?

<i>
	In the meantime, there are still things you can do.  Please see below...

> 
> What is the alternative? Write a library function in C and have it 
> called by LLVM?

	First, I should probably mention that an LLVM program has access to 
just about all OS services offered to a regular native code program. 
This is because undefined symbols in an LLVM program are resolved when 
it is generated to native code.</i>

Yes, I’d figured that out. Using just LLVIS you wouldn’t be able to get anything to run :slight_smile:

<i>
	Specifically, if you JIT a program using lli, the JIT looks the symbol 
up in its own address space.  That is how LLVM programs find the system 
calls and other library functions that are not compiled to LLVM bytecode.

	For native code, the symbols are resolved at native code link time 
(i.e. you run llc on the bytecode and then assemble it and link it).

	For writing code that does special stuff (accesses machine registers, 
etc), there are several alternatives:

	1) As you suggested, you could write a library in C and compile it to 
native code.  You would then be able to link this library with LLVM code 
(either by loading it into the JIT with -load or by using llc to 
generate native code and linking to your library).

	2) You could add an intrinsic for it.  I'd only recommend this if it's 
something that would be useful as a generic primitive; an intrinsic to 
read performance registers would be okay; an intrinsic for creating 
prime numbers would not.</i>

Yes, keeping the set of intrinsics small and generally applicable to the abstraction will be important.
Unfortunately, however, there are several corner cases where the general abstraction won’t work. Machine
specific capabilities (array processors and hardware neural nets comes to mind) would have to be accessed
directly. This is why LLVM needs an assembly pass through. No matter how good the LLVM abstraction
layer is at blending out differences in machine specific operations, there will always be corner cases like
this that just don’t lend themselves to abstraction.

I’d vote for an “asm” instruction that just passes literal text through to the back end. Let the user ensure
its for the right hardware, etc. For example:

asm " movl 4(%esp), %eax
andl $0xffff, %eax
rorw $8, %ax
ret"

The danger here is that the feature gets over used and subverts optimization. Still, for the things that
need it, it would be the right thing.

<i>
> 
> The reason I'm asking is that I'd like as much code as possible to be 
> open to optimization by LLVM. If the core of my runtime library can't be 
> expressed in LLVM then it can't be optimized by it either.  Ideally, 
> everything except the glue for system traps, interrupts and the like 
> should be expressible in LLVIS.
> 

	That is where we are headed; we just have a way to go before we're there.

	If you have any further questions, please feel free to ask.  I realize 
I'm being pretty verbose here.</i>

Sounds great. Please keep me posted on progress and let me know if there are things I can do
to help. This effort is in my best interest too.

Verbosity is a good thing! Thanks for the explanation.

<i>
Regards,

-- John T.</i>

Reid.

> Funny you should mention that; getting a C library compiled to LLVM
> code is one of the tasks on my plate. :slight_smile:

Good. If I can help, please let me know.

This is currently an open project; details here:

  http://llvm.cs.uiuc.edu/docs/OpenProjects.html#glibc

If you wish to jump into it, you would be more than welcome to. If you
do, please let the list know so there isn't a duplication of effort.

> Please note that the Linux kernel project is still in its infancy and
> won't be done for awhile. :slight_smile:

Any ballpark ideas on when an alpha version could be available? Are we
talking months or years here?

First of all, I would like to point out that the goal is *NOT* to compile Linux
to run natively on your favorite architecture; instead, we aim to compile Linux
to a bytecode file which can then be run in userspace via an execution
environment (i.e., code generator + implementation of OS services for a given
target). The goal here is to abstract away the hardware from the OS, using
these magical `intrinsics' that were mentioned. This means that Linux is ported
to run on a fundamentally new architecture, which we call LLVA. You can see
our MICRO paper on this topic:

  http://llvm.cs.uiuc.edu/pubs/2003-10-01-LLVA.html

This implies that it is not intended for end-user consumption but more
for a proof-of-concept and ongoing research potential.

That said, we don't really have an estimate for when this will be functional
and available. It's ongoing at this time, but as John said, in very early
stages.

Let me add to one point that Misha made (just to show I actually read these messages!)...

Any ballpark ideas on when an alpha version could be available? Are we
talking months or years here?

First of all, I would like to point out that the goal is *NOT* to compile Linux
to run natively on your favorite architecture; instead, we aim to compile Linux
to a bytecode file which can then be run in userspace via an execution
environment (i.e., code generator + implementation of OS services for a given
target). The goal here is to abstract away the hardware from the OS, using
these magical `intrinsics' that were mentioned. This means that Linux is ported
to run on a fundamentally new architecture, which we call LLVA. You can see
our MICRO paper on this topic:

  http://llvm.cs.uiuc.edu/pubs/2003-10-01-LLVA.html

The above paper describes LLVA as a virtual instruction set for new processor designs. But in the context of Linux, LLVA can be a way to virtualize the processor even for ordinary processors. I thought I should clarify this because our MICRO paper does not try to make this point.

LLVA is essentially LLVM extended with enough low-level details to act as a hardware instruction set, as opposed to "just" a compiler and user-level code representation.

This implies that it is not intended for end-user consumption but more
for a proof-of-concept and ongoing research potential.

You're being too modest, Misha --- this is going to be the Linux of the future :slight_smile:

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

I'll add my pragmatist $0.02 worth here. Take it with a grain of salt.

If LLVA isn't "real world" or "intended for end-user", how valuable is 
it as a research tool? While I agree that it might be useful as a 
proof-of-concept, research needs to founded in reality (i.e. how real 
users would use it). Since you've decided to open this up as source 
code, don't think that you're alone in making it "real". I for one plan
to base a very real product on LLVM and possibly LLVA. Because I'm not
aiming for a research tool, you'll get the benefit of open source (i.e.
making it "real"). That should only make your research better. 

So, I tend to agree with Vikram .. shoot for the moon! Even if you only
get half way there, its better than only getting halfway down the street
around the corner. 

 

Perhaps I should say what I really meant to say instead of vaguely
generalizing. :slight_smile:

The thing that will exist at some point (hopefully soon) will be
essentially an emulator for LLVA. *THAT* is not intended for an
end-user, an end-user here being someone who is not a
compiler/architecture researcher or developer. This is because unlike
something such as VMware, it does NOT have off-the-shelf software it can
run, unless we make an effort to port it, or make LLVM able to compile
it.

THAT is what I meant. LLVA is a definitely for everyone, but as an
architecture implemented in hardware, not as an emulator.

I, for one, completely agree. My intention is for LLVM to be fully
robust and "commercial quality". This takes time, of course, but that's
the goal. LLVA will start out simple and not-quite ready for prime-time,
but will, over time, becomes something generally useful.

World domination can only be achieved one step at a time. :slight_smile:

-Chris

Okay, that makes MUCH more sense … quite reasonable. Thanks for the clarfication.

Reid.

Phew! Glad you said it. I can be very patient and quite helpful as long as the end goal is clear.
I’m glad to hear that the goal isn’t just a research playground but something “real”.

World domination indeed. Let’s make it happen … one step at a time! :slight_smile:

Reid.