Generating instruction bundles ?

Hi,

I'm retargetting llvm for a new architecture, basiclaly a target that
has v4f32 registers and a few operations, so it's mostly common stuff.
However, it also has some kind of options on top of the operations,
kinda like VLIW. For example, there is an addition operation, but
there is also a source operand negation bit which you can set. So you
can implement SUB using the ADD opcode (and there is actually no SUB
opcode proper). Besides negating the source operands, there are also
bits to take the absolute value of the source operands as input, bits
to scale the result by power of 2 constants, and bits to do arbitrary
shuffling of the 4 components. Now obviously I'd like to take
advantage of that.

I have taken a look at the ia64 target which has similar requirements,
and seen that it gets around the problem by using an external
optimizing assembler that reorders and packs instructions into
bundles. However since my requirements are so simple I'm hoping I can
do everything inside llvm (and I'm only interested in JIT anyway).

So I guess the question became obvious by now: how can I do this in
llvm? I've been thinking about a couple of solutions:
- Use the tablegen template system to generate all the variants for
eacho operation (with/without absolute value, with/without negation,
with/without post-scaling, with/without all kinds of shuffling, and
all combinations of those). Is this possible ? We're talking about
approximately 70000 variants per instruction here, approximately 300
if we ignore the shuffling which has 256 combinations by itself.
- Use an additional pass that reorders and packs instructions together
(similary to what the ia64 optimizing assembler does), maybe relying
on the instruction scheduler to place the instructions for me.
- Rewrite some of tablegen to support that feature. I'm under the
impression it would be a lot of work for a very specific case, but I
could be wrong here.

Ideas ?

Thanks in advance,
Stephane

I'm targeting an machine where the operands have a number of operations that are orthogonal to the instruction, but have the good fortune of there being an aggressively optimizing post pass.

I've considered the same option - "folding" the operand options into the instructions, but as you point out, this causes an explosion in the number of instructions. Because of the post pass, I can afford to sequence operand options, and count on later optimization. However, it would be nice for me (necessary for you) to be more direct.

I think an extended set of control bits in operands would be the most natural approach (to those of us used to working with such machines, at least), but you'd like to be able to have transformations in tablegen that let you do things like

(fneg operand) -> operand with negate bit flipped

I'm not sure exactly how this could be expressed (I'm admittedly new to compilers in general, LLVM in particular, so my limitations generally get in the way before LLVM's - my knowledge is more in the target machine).

Dan

So I guess the question became obvious by now: how can I do this in
llvm? I've been thinking about a couple of solutions:
- Use the tablegen template system to generate all the variants for
eacho operation (with/without absolute value, with/without negation,
with/without post-scaling, with/without all kinds of shuffling, and
all combinations of those). Is this possible ? We're talking about
approximately 70000 variants per instruction here, approximately 300
if we ignore the shuffling which has 256 combinations by itself.
- Use an additional pass that reorders and packs instructions together
(similary to what the ia64 optimizing assembler does), maybe relying
on the instruction scheduler to place the instructions for me.
- Rewrite some of tablegen to support that feature. I'm under the
impression it would be a lot of work for a very specific case, but I
could be wrong here.

LLVM codegen is missing some pieces for wide machines. So you should expect to roll up your sleeves and write your own. :slight_smile:

I am not sure if I completely understand the description. Is an instruction bundle made up of fixed number operations? If so, you could model each operation as a separate machine instruction. You can add a "bundle formation" (using e.g. bottom up greedy or clustering technique) pass to reorder instructions to form the bundles.

Evan

LLVM codegen is missing some pieces for wide machines. So you should
expect to roll up your sleeves and write your own. :slight_smile:

Yeah, I'm not sure if that could be tackled by a newcomer though...

I am not sure if I completely understand the description. Is an
instruction bundle made up of fixed number operations?

Well, the bundle has one "major" operation (add, mul, mad...) and a
couple of "additional" operations that will modify the inputs/outputs
of the major operation. It's not exactly VLIW in that the additional
operations don't change the value of a register, but instead change
the inputs and outputs of the major operation. Or to put it
differently, you could see the additional operations as some kind of
data routing to and from the major operation.

And yes, there is always exactly one major operation, and there are
bits allocated for each of the additional operations for each
input/output. So you can do abs+neg+shuffle on all inputs and scaling
on the output, and the instruction is still 128 bit.

If so, you
could model each operation as a separate machine instruction. You can
add a "bundle formation" (using e.g. bottom up greedy or clustering
technique) pass to reorder instructions to form the bundles.

Yes, that's the second point I had in mind. That solution is becoming
increasingly interesting to me since I don't think I can tackle
modifying llvm overnight :slight_smile:

Stephane