> LLVM doesn't provide a runtime or "VM". You basically do these things the
> same way that you do them in C. Yes, this unfortunately requires knowing
> your target platform's system libraries and how to link to them and such;
> LLVM doesn't paper over this.
OK. So to be specific, I am using a Haskell language binding to LLVM,
not C. With my multimedia IO examples, am I correct in thinking I have
a few options:
1. Do IO in my host language, and parse bytestrings in to LLVM data
structures, e.g. vectors. Then, pass these data structures to LLVM
generated code for JIT compilation.
2. Write IO functions in C, and compile with -emit-llvm . Then in my
LLVM code generation, I read this external function from a bitcode
file generated by clang. Here, there is no IO in my host language.
3. Call libc functions within LLVM to parse bytestrings directly in to
structs or vectors. If libc embedded in LLVM even possible?
LLVM's JIT can use a call instruction to call a function, and the runtime
will attempt
to find a symbol for that function in the linked libraries, see for example
the bit on
http://llvm.org/docs/tutorial/LangImpl4.html
where it talks about "Whoa, how does the JIT know about sin and cos?".
There's nothing
intrinsically different about IO functions from any other functions here
(since any "function" could have
a static variable or refer to a global variable, unlike in a functional
setup like Haskell where
they're very different things) and indeed later on that page there's an
example of using an IO
output function. This is probably going to be the easiest way to do things,
rather than
taking the output of -emit-llvm (although that shouldn't be that hard
either). Note that libc isn't
really "embedded" in your LLVM code in this case: you'll be using the same
libc as is used by LLVM
itself. The only difference is in how the mapping from call names in LLVM
IR to actual callable entries is done.
You don't even have to stick to libc: you could write normal C code for IO,
compile it into a shared library,
and then call those functions.
Is there an LLVM cookbook for interaction with IO runtime systems?
There's no "in principle" difference between IO and a general runtime at
this level:
both can have both accessible and hidden "state". I'm not aware of any
specific
recipes for this.
> The state of backend documentation is pretty dire. I brain dumped
basically
> all the backend docs I could think of in
> <http://thread.gmane.org/gmane.comp.compilers.llvm.devel/65898>. That
thread
> also has some other good pointers for a person interested in writing a
> backend.
That's a great resource, thanks.
One thing I'd really appreciate is a cookbook on LLVM data structures.
I have read the language reference http://llvm.org/docs/LangRef.html ,
and understand the expressivity of aggregate types. What I do not yet
have a good feeling for is when to use them. To give a concrete
example, I'd like to parse a greyscale image in to an LLVM data
structure. At each {x,y} point, there is an Int8 value between 0 and
255. Take a small 4x3 image. I could feed my pixels in to a flat Int8
vector of length 12. I could also feed it in to an array of length 4,
of Int8 arrays of length 3.
Now take 2 simple functions: one does greyscale brightening, the other
does a sobel filter. The first needs only to know the value of 1 pixel
at a time, i.e. to increase its value. For this, the vector option
would be fine, and I assume (naively) that I'd enjoy SIMD performance
over this vector, executing `add x` to each element? However, the
Sobel filter needs information not only about the value of a pixel,
but also the values of its surrounding pixels. In this case, the 2D
array would be more suitable, as the shape of the image is known.
Would I lose SIMD vectorisation, probably? Or as a third option, would
I use a struct with a triple of three elements: a vector, and two Int8
values indicating the X and Y lengths of the image?
The paragraph above suggests you're thinking of an _LLVM_ vector as
a generic construct, including for storage. Actually LLVM's vector is
designed
as a representation of a _vector register_ but able to use standard LLVM
IR instructions rather than a CPU specific instruction set (simplifying the
story dramatically). You
probably want to store images as 2-D array of the basic
element type. Auto-vectorisation (nothing prevents you generating LLVM
IR to process your data using vectors) is successful primarily based upon
the ability of the auto-vectoriser to see what the true data dependencies
are in your code. A 2-D array will use indices in accesses which are
most easily analysed, so should give you the berst chance of
auto-vectorization.
What I'm after is a cookbook for LLVM data structures, and how to
apply them. E.g, when to use structs, when to use aggregated types,
and how to hold on to SIMD vectorisation when ever is possible.--
Unfortunately I'm not aware of such a thing. But I'd say a basic rule of
thumb would be that if you would naturally use a given structure in C
it's probably a reasonable strategy to use the LLVM analogue.