Multimedia IO instructions & partial backend implementations for simple CPUs

Hi, I'm getting started with LLVM, with the intention of writing a DSL
that compiles to LLVM IR, to target a new CPU instruction set. I have
two questions:

1. Multimedia IO in LLVM

In the paper "The LLVM Instruction Set and Compilation Strategy" [1],
there is talk about a possible multimedia instruction set in a future
LLVM release:

"Note that LLVM is a virtual instruction set: it does not define
runtime and operating system functions such as I/O, memory management
(in particular, garbage collection), signals, and many others. Some
issues that do belong in the instruction set and may be added in the
future include support for multimedia instructions (e.g., packed
operations),
predicated instructions, and explicit parallelism."

My question is a very basic one: how do I use LLVM IR code to do file
IO? As a simple example, take a file with 3 tab-separated integers.
How would I read this file in to an LLVM vector <3 x i32> ? Now take a
more complicated example: decoding a file to an LLVM vector
representation. To use the multimedia instruction set described above
(though I guess not implemented yet), how do I: A) read a greyscale
3x3 pixel image file, B) translate this to an LLVM vector <9 x i8> ?

2. Implementing an LLVM backend subset.

I have not yet delved deeply in to the requirements of implementing
LLVM backends to target new hardware instruction sets. Given a
constrained hardware implementation, is it acceptable to implement a
backend subset, and throwing "Not Supported" exceptions where ever
appropriate. To give a concrete example, take a simple CPU
architecture that does not support floating point arithmetic, but does
support simple integer arithmetic. If my LLVM frontend produces only
i32 values, then `llc -march=my_new_arch` doesn't complain. If,
however, my frontend produces float values, then `llc
-marc=my_new_arch` would complain with the "Type 'float' Not Supported
by architecture my_new_arch".

Is this the philosophy of LLVM backend implementation? Must the new
target architecture support all LLVM instructions, or am I able to
develop an LLVM backend for an architecture that lacks features like
floating point arithmetic? Any related resources would be appreciated.

[1] - http://goo.gl/HA5AXU

Hi, I'm getting started with LLVM, with the intention of writing a DSL
that compiles to LLVM IR, to target a new CPU instruction set. I have
two questions:

1. Multimedia IO in LLVM

In the paper "The LLVM Instruction Set and Compilation Strategy" [1],
there is talk about a possible multimedia instruction set in a future
LLVM release:

"Note that LLVM is a virtual instruction set: it does not define
runtime and operating system functions such as I/O, memory management
(in particular, garbage collection), signals, and many others. Some
issues that do belong in the instruction set and may be added in the
future include support for multimedia instructions (e.g., packed
operations),
predicated instructions, and explicit parallelism."

We have vectors. No predicated instructions in the LLVM IR (only in the MI
layer). I'm not sure what you mean by "explicit parallelism" (do you mean
something like Cilk?) but I don't think LLVM supports that (those things
usually have a nontrivial runtime component).

My question is a very basic one: how do I use LLVM IR code to do file
IO? As a simple example, take a file with 3 tab-separated integers.
How would I read this file in to an LLVM vector <3 x i32> ? Now take a
more complicated example: decoding a file to an LLVM vector
representation. To use the multimedia instruction set described above
(though I guess not implemented yet), how do I: A) read a greyscale
3x3 pixel image file, B) translate this to an LLVM vector <9 x i8> ?

LLVM doesn't provide a runtime or "VM". You basically do these things the
same way that you do them in C. Yes, this unfortunately requires knowing
your target platform's system libraries and how to link to them and such;
LLVM doesn't paper over this.

2. Implementing an LLVM backend subset.

I have not yet delved deeply in to the requirements of implementing
LLVM backends to target new hardware instruction sets. Given a
constrained hardware implementation, is it acceptable to implement a
backend subset, and throwing "Not Supported" exceptions where ever
appropriate. To give a concrete example, take a simple CPU
architecture that does not support floating point arithmetic, but does
support simple integer arithmetic. If my LLVM frontend produces only
i32 values, then `llc -march=my_new_arch` doesn't complain. If,
however, my frontend produces float values, then `llc
-marc=my_new_arch` would complain with the "Type 'float' Not Supported
by architecture my_new_arch".

Is this the philosophy of LLVM backend implementation? Must the new
target architecture support all LLVM instructions, or am I able to
develop an LLVM backend for an architecture that lacks features like
floating point arithmetic? Any related resources would be appreciated.

If you don't want to merge this with trunk, then you can do whatever you
want; typically when starting out you will only implement a subset of
instructions (you can't implement them all simultaneously, now can you?).
The state of backend documentation is pretty dire. I brain dumped basically
all the backend docs I could think of in <
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/65898>. That thread
also has some other good pointers for a person interested in writing a
backend.

Getting a backend intro trunk generally speaking is a large long-term
commitment. I'm not sure what the exact requirements are for getting a
backend into trunk (I don't think anybody really does at a deep level,
other than "some critical subset of experienced LLVM developers think it
looks good enough to be committed"), but the biggest hurdle (generally)
with getting a backend into LLVM trunk is that it requires a demonstrated
ability to work with the community and a clear investment in maintaining
the backend from now into the foreseeable future (to put it bluntly, this
generally will mean at least 1 or 2 people with a paycheck tied to
maintaining the backend and contributing to the community, and these being
people that have a history of submitting good patches and that are familiar
with the community expectations). (The reason for these requirements is
purely experience with other backends, which have been a maintainability
nightmare without active developers responsible for maintaining them and
compensating for the increase in general codebase complexity (e.g. each new
backend makes refactoring the target-independent backend parts more
difficult)).

-- Sean Silva

There's some support for (usually explicit from the language at hand) data
parallel loops; the parallel loop metadata:

http://llvm.org/docs/LangRef.html#llvm-mem-parallel-loop-access-metadata

#pragma simd and even cilk_for could use that.

LLVM doesn't provide a runtime or "VM". You basically do these things the
same way that you do them in C. Yes, this unfortunately requires knowing
your target platform's system libraries and how to link to them and such;
LLVM doesn't paper over this.

OK. So to be specific, I am using a Haskell language binding to LLVM,
not C. With my multimedia IO examples, am I correct in thinking I have
a few options:
1. Do IO in my host language, and parse bytestrings in to LLVM data
structures, e.g. vectors. Then, pass these data structures to LLVM
generated code for JIT compilation.
2. Write IO functions in C, and compile with -emit-llvm . Then in my
LLVM code generation, I read this external function from a bitcode
file generated by clang. Here, there is no IO in my host language.
3. Call libc functions within LLVM to parse bytestrings directly in to
structs or vectors. If libc embedded in LLVM even possible?

Is there an LLVM cookbook for interaction with IO runtime systems?

The state of backend documentation is pretty dire. I brain dumped basically
all the backend docs I could think of in
<http://thread.gmane.org/gmane.comp.compilers.llvm.devel/65898>. That thread
also has some other good pointers for a person interested in writing a
backend.

That's a great resource, thanks.

One thing I'd really appreciate is a cookbook on LLVM data structures.
I have read the language reference http://llvm.org/docs/LangRef.html ,
and understand the expressivity of aggregate types. What I do not yet
have a good feeling for is when to use them. To give a concrete
example, I'd like to parse a greyscale image in to an LLVM data
structure. At each {x,y} point, there is an Int8 value between 0 and
255. Take a small 4x3 image. I could feed my pixels in to a flat Int8
vector of length 12. I could also feed it in to an array of length 4,
of Int8 arrays of length 3.

Now take 2 simple functions: one does greyscale brightening, the other
does a sobel filter. The first needs only to know the value of 1 pixel
at a time, i.e. to increase its value. For this, the vector option
would be fine, and I assume (naively) that I'd enjoy SIMD performance
over this vector, executing `add x` to each element? However, the
Sobel filter needs information not only about the value of a pixel,
but also the values of its surrounding pixels. In this case, the 2D
array would be more suitable, as the shape of the image is known.
Would I lose SIMD vectorisation, probably? Or as a third option, would
I use a struct with a triple of three elements: a vector, and two Int8
values indicating the X and Y lengths of the image?

What I'm after is a cookbook for LLVM data structures, and how to
apply them. E.g, when to use structs, when to use aggregated types,
and how to hold on to SIMD vectorisation when ever is possible.

> LLVM doesn't provide a runtime or "VM". You basically do these things the
> same way that you do them in C. Yes, this unfortunately requires knowing
> your target platform's system libraries and how to link to them and such;
> LLVM doesn't paper over this.

OK. So to be specific, I am using a Haskell language binding to LLVM,
not C. With my multimedia IO examples, am I correct in thinking I have
a few options:
1. Do IO in my host language, and parse bytestrings in to LLVM data
structures, e.g. vectors. Then, pass these data structures to LLVM
generated code for JIT compilation.
2. Write IO functions in C, and compile with -emit-llvm . Then in my
LLVM code generation, I read this external function from a bitcode
file generated by clang. Here, there is no IO in my host language.
3. Call libc functions within LLVM to parse bytestrings directly in to
structs or vectors. If libc embedded in LLVM even possible?

LLVM's JIT can use a call instruction to call a function, and the runtime

will attempt
to find a symbol for that function in the linked libraries, see for example
the bit on

http://llvm.org/docs/tutorial/LangImpl4.html

where it talks about "Whoa, how does the JIT know about sin and cos?".
There's nothing
intrinsically different about IO functions from any other functions here
(since any "function" could have
a static variable or refer to a global variable, unlike in a functional
setup like Haskell where
they're very different things) and indeed later on that page there's an
example of using an IO
output function. This is probably going to be the easiest way to do things,
rather than
taking the output of -emit-llvm (although that shouldn't be that hard
either). Note that libc isn't
really "embedded" in your LLVM code in this case: you'll be using the same
libc as is used by LLVM
itself. The only difference is in how the mapping from call names in LLVM
IR to actual callable entries is done.
You don't even have to stick to libc: you could write normal C code for IO,
compile it into a shared library,
and then call those functions.

Is there an LLVM cookbook for interaction with IO runtime systems?

There's no "in principle" difference between IO and a general runtime at
this level:
both can have both accessible and hidden "state". I'm not aware of any
specific
recipes for this.

> The state of backend documentation is pretty dire. I brain dumped
basically
> all the backend docs I could think of in
> <http://thread.gmane.org/gmane.comp.compilers.llvm.devel/65898>. That
thread
> also has some other good pointers for a person interested in writing a
> backend.

That's a great resource, thanks.

One thing I'd really appreciate is a cookbook on LLVM data structures.
I have read the language reference http://llvm.org/docs/LangRef.html ,
and understand the expressivity of aggregate types. What I do not yet
have a good feeling for is when to use them. To give a concrete
example, I'd like to parse a greyscale image in to an LLVM data
structure. At each {x,y} point, there is an Int8 value between 0 and
255. Take a small 4x3 image. I could feed my pixels in to a flat Int8
vector of length 12. I could also feed it in to an array of length 4,
of Int8 arrays of length 3.

Now take 2 simple functions: one does greyscale brightening, the other
does a sobel filter. The first needs only to know the value of 1 pixel
at a time, i.e. to increase its value. For this, the vector option
would be fine, and I assume (naively) that I'd enjoy SIMD performance
over this vector, executing `add x` to each element? However, the
Sobel filter needs information not only about the value of a pixel,
but also the values of its surrounding pixels. In this case, the 2D
array would be more suitable, as the shape of the image is known.
Would I lose SIMD vectorisation, probably? Or as a third option, would
I use a struct with a triple of three elements: a vector, and two Int8
values indicating the X and Y lengths of the image?

The paragraph above suggests you're thinking of an _LLVM_ vector as
a generic construct, including for storage. Actually LLVM's vector is
designed
as a representation of a _vector register_ but able to use standard LLVM
IR instructions rather than a CPU specific instruction set (simplifying the
story dramatically). You
probably want to store images as 2-D array of the basic
element type. Auto-vectorisation (nothing prevents you generating LLVM
IR to process your data using vectors) is successful primarily based upon
the ability of the auto-vectoriser to see what the true data dependencies
are in your code. A 2-D array will use indices in accesses which are
most easily analysed, so should give you the berst chance of
auto-vectorization.

What I'm after is a cookbook for LLVM data structures, and how to
apply them. E.g, when to use structs, when to use aggregated types,
and how to hold on to SIMD vectorisation when ever is possible.--

Unfortunately I'm not aware of such a thing. But I'd say a basic rule of
thumb would be that if you would naturally use a given structure in C
it's probably a reasonable strategy to use the LLVM analogue.