Is using lots of in-register values in IR bad?

Hi,

I’m a newbie when it comes to compilers (and even close-to-machine coding), but recently started working on my own language and am using LLVM as the mid/backend. Currently I’m generating .ll files from a front-end written in Scala. The language is not really serious, just a way to learn more about compilers (and LLVM) and maybe serve as a base for further experiments. It’s somewhat based on Kaleidoscope, but with more Scala-like syntax.

I want to experiment with avoiding mutable state as far as I can. At the moment there are no mutable variables – only immutable value types (numerics, bool, vectors, tuples) and I’ve been doing everything in LLVM registers. The compiler doesn’t generate a single alloca, load or store at the moment.

I wonder if it was maybe a bad idea to do it this way? Because a lot of stuff in LLVM seem to be only available through pointers. e.g. extractvalue takes only constant indices, but GEP can take variables. Some things seem to be possible only by bitcasting pointers, e.g. splitting a Vector into equal-sized parts to partially compute the sum of it’s elements with SIMD instructions…

And there may of course be some penalty for passing large(-ish) structures by-value. I haven’t investigated at which sizes does that become worse than passing pointers.

Maybe a better alternative would be to allocate memory for every local value, and let the mem2reg pass optimize?

I hope these kind of questions are appropriate for this list.

Regards,
Erkki Lindpere

Erkki Lindpere <villane@gmail.com> writes:

I want to experiment with avoiding mutable state as far as I can. At
the moment there are no mutable variables -- only immutable value
types (numerics, bool, vectors, tuples) and I've been doing everything
in LLVM registers. The compiler doesn't generate a single alloca, load
or store at the moment.

Ok. Do you ever need to grab the address of something on the stack? If
so you're going to need an alloca. AFAIK, it's the only way to generate
an address for a local object. This is by design of the IR and it
greatly simplifies analysis.

How do you handle global data? That can only be accessed in LLVM IR via
load/store. A GlobalValue is an address by definition.

I wonder if it was maybe a bad idea to do it this way? Because a lot
of stuff in LLVM seem to be only available through pointers. e.g.
extractvalue takes only constant indices, but GEP can take variables.

Yeah, this is quite a limitation of the current IR. It is lacking a few
fundamental operations that, for example, vector machines of the '60's
and '70's implemented directly. Extract/insert from/to variable index
being one of them. Extractvalue is a little more complicated, of
course, but special cases of it are implemented on x86 (for example) and
other "modernish" targets.

For cases like these, it is best to create a target-specific intrinsic
and use that to represent the operation. For operations not implemented
directly by the target, an alloca+GEP may be necessary.

Some things seem to be possible only by bitcasting pointers, e.g.
splitting a Vector into equal-sized parts to partially compute the sum
of it's elements with SIMD instructions...

That doesn't seem like the Right Way to do it. As in the extractvalue
case, the IR has no direct support for vector reductions. If your
target has these kinds of operations, you should probably use an
intrinsic to implement them.

Think of target intrinsics as a way to extend the IR for special
operations. The analysis and transformation passes won't understand
them but typically in these cases you "know" the right sequence to
generate.

And there may of course be some penalty for passing large(-ish)
structures by-value. I haven't investigated at which sizes does that
become worse than passing pointers.

It is highly target-dependent. But usually the target's ABI has already
made that decision for you. In the case of pass-by-address you will
need an alloca.

Maybe a better alternative would be to allocate memory for every local
value, and let the mem2reg pass optimize?

That is often simpler. Then the translation of every object from your
high-level language to LLVM IR looks the same. But it is not strictly
necessary.

I hope these kind of questions are appropriate for this list.

Absolutely. Welcome!

                            -Dave

Thanks for the reply

Erkki Lindpere <villane@gmail.com> writes:
Ok. Do you ever need to grab the address of something on the stack? If

so you’re going to need an alloca. AFAIK, it’s the only way to generate
an address for a local object. This is by design of the IR and it
greatly simplifies analysis.

How do you handle global data? That can only be accessed in LLVM IR via
load/store. A GlobalValue is an address by definition.

At the moment I do have global string constants (for passing to libc functions such as puts), but I’m adding other types of global values soon so I guess I’ll need to start thinking about that.

I wonder if it was maybe a bad idea to do it this way? Because a lot
of stuff in LLVM seem to be only available through pointers. e.g.
extractvalue takes only constant indices, but GEP can take variables.

For cases like these, it is best to create a target-specific intrinsic
and use that to represent the operation. For operations not implemented
directly by the target, an alloca+GEP may be necessary.

I want to be target-neutral actually (but generate well performing code for the x86 / x86-64)

Some things seem to be possible only by bitcasting pointers, e.g.
splitting a Vector into equal-sized parts to partially compute the sum
of it’s elements with SIMD instructions…

That doesn’t seem like the Right Way to do it. As in the extractvalue
case, the IR has no direct support for vector reductions. If your
target has these kinds of operations, you should probably use an
intrinsic to implement them.

Ah, ok. I didn’t notice before that x86 SSE actually has vector reductions, but I looked it up now and indeed it does.
I guess I could introduce some target-specific codegen, then. Because at the moment I’m only using x86 anyway and in the future I may want to delay the code generation to install time.

Thanks,

Erkki

Hi Erkki,

I want to experiment with avoiding mutable state as far as I can. At the moment
there are no mutable variables -- only immutable value types (numerics, bool,
vectors, tuples) and I've been doing everything in LLVM registers. The compiler
doesn't generate a single alloca, load or store at the moment.

I wonder if it was maybe a bad idea to do it this way? Because a lot of stuff in
LLVM seem to be only available through pointers. e.g. extractvalue takes only
constant indices, but GEP can take variables. Some things seem to be possible
only by bitcasting pointers, e.g. splitting a Vector into equal-sized parts to
partially compute the sum of it's elements with SIMD instructions...

splitting a vector can (and should) be done using the shufflevector instruction.

And there may of course be some penalty for passing large(-ish) structures
by-value.

In-register structs and arrays are not intended to be used for large structs and
arrays. They are intended be used for small objects like complex numbers (two
elements), (pointer,size) pairs and so on. Use memory (pointers) for anything
larger.

Maybe a better alternative would be to allocate memory for every local value,
and let the mem2reg pass optimize?

Most front-ends do that. The dragonegg front-end is a bit different: it uses
registers directly for scalars and complex numbers, and memory for everything
else.

Ciao, Duncan.

Nella citazione venerdì 29 luglio 2011 11:24:47, Duncan Sands ha scritto:

In-register structs and arrays are not intended to be used for large structs and
arrays.

Out of curiosity, why would that be the case?

Erkki Lindpere <villane@gmail.com> writes:

Ah, ok. I didn't notice before that x86 SSE actually has vector
reductions, but I looked it up now and indeed it does. I guess I
could introduce some target-specific codegen, then. Because at the
moment I'm only using x86 anyway and in the future I may want to delay
the code generation to install time.

You could introduce a target-neutral reduction intrinsic and then
implement it for targets you care about. That might be a good midway
solution.

                          -Dave

Hi Carlo,

Nella citazione venerdì 29 luglio 2011 11:24:47, Duncan Sands ha scritto:

In-register structs and arrays are not intended to be used for large structs and
arrays.

Out of curiosity, why would that be the case?

because it is not efficient, and there's no point in putting effort into making
it efficient since it would be way easier to just go through memory in the first
place.

If you use a struct virtual register with (recursively) N fields then that is
equivalent to using N scalar virtual registers. So first off if N is large
then you will use many many registers which will doubtless be spilled to the
stack resulting in gazillions of stack loads and stores if you actually do
anything useful with the struct.

Ciao, Duncan.