VLIW Ports

Has anyone attempted the port of LLVM to a VLIW architecture? Is there any publication about it?

TIA

http://blog.llvm.org/2010/06/tce-project-co-design-of-application.html

http://lists.cs.uiuc.edu/pipermail/llvmdev/2005-September/004800.html

-Eli

Hi,

Has anyone attempted the port of LLVM to a VLIW architecture? Is there
any publication about it?

I have developed a derivation of MachineInstr class, called
MachineInstrBundle, which is essnetially a VLIW-style machine
instruction which can store any MI on each "slot". After the scheduling
phase has grouped MIs in bundles, it has to call MIB->pack() method,
which takes operands from the MIs in the "slots" and transfers them to
the superinstruction. From this point on the bundle is a normal
machineinstruction which can be processed by other LLVM passes (such as
register allocation).

The idea was to make a framework on top of which VLIW/ILP scheduling
could be studies using LLVM. It is not completely finished, but it is
more or less usable and works with a trivial scheduler in a synthetic
MIPS-VLIW architecture. Code emission does not work though (yet) so
bundles have to be unpacked prior to emission.

I was waiting to finish it to send a patch to the list, but if you are
interested I can send you a patch over svn of my current code.

BR

Carlos

Carlos,

Hi,

I have developed a derivation of MachineInstr class, called
MachineInstrBundle, which is essnetially a VLIW-style machine
instruction which can store any MI on each "slot". After the scheduling
phase has grouped MIs in bundles, it has to call MIB->pack() method,
which takes operands from the MIs in the "slots" and transfers them to
the superinstruction. From this point on the bundle is a normal
machineinstruction which can be processed by other LLVM passes (such as
register allocation).

The idea was to make a framework on top of which VLIW/ILP scheduling
could be studies using LLVM. It is not completely finished, but it is
more or less usable and works with a trivial scheduler in a synthetic
MIPS-VLIW architecture. Code emission does not work though (yet) so
bundles have to be unpacked prior to emission.

I was waiting to finish it to send a patch to the list, but if you are
interested I can send you a patch over svn of my current code.

This is rather interesting.

Our VLIW architecture has variable size packets, in the sense that the shortest packet is a single instruction. Therefore, because packets can be formed at a later stage by bundling together instructions without data dependencies into a packet, I was thinking of perhaps splitting each MachineBasicBlock right before emitting the instructions so that each one will contain a packet. The next pass is to order the instructions in a packet properly.

On the other hand, I'm not comfortable to "overload" existing structures to do things that they weren't meant to be used for. I think that your approach is better. Of course, I prefer to do all this in the MC layer, as this is the way to go forward.

Thanks,

Hi all,

here is the current (unfinished) version of the VLIW support I
mentioned. It is a patch over svn rev 141176. It includes the
MachineInstrBundle class, and small required changes in a couple of
outside LLVM files.

Also includes a modification to Mips target to simulate a 2-wide VLIW
MIPS. The scheduler is really silly, I did not want to implement a
scheduler, just the bundle class, and the test scheduler is just
provided as an example.

Main thing still missing is to finish the "pack" and "unpack" methods in
the bundle class. Right now it manages operands, both implicit and
explicit, but it should also manage memory references, and update MIB
flags acording to sub-MI flags.

For any question I would be glad to help.

BR

Carlos

llvm_vliw.patch (17 KB)

Hi all,

I worked the last 2 years on a LLVM back-end that supports clustered and non-clustered VLIW architectures. I also wrote a paper about it that is currently within the review process and is hopefully going to be accepted. Here is a small summary how I realized VLIW support with a LLVM back-end. I also used packing and unpacking of VLIW bundles. My implementations do not require any modification of the LLVM core.

To support VLIW I added two representations for VLIW instructions: packed and unpacked representation. Within the unpacked representation a VLIW Bundle is separated by a NEXT instruction like it was done within the IA-64 back-end. The pack representation packs all instructions of one Bundle into a single PACK instruction and I used this representation especially for the register allocation.

I used the following pass order for the clustered VLIW back-end:

DAG->DAG Pattern Instruction Selection
...
Clustering (Not required for unicluster VLIW architectures)
Scheduling
Packing
...
Register Allocation
...
Prolog/Epilog Insertion & Frame Finalization
Unpacking
Reclustering
...
Rescheduling (Splitting, Packing, Scheduling, Unpacking)
Assembly Printer

In principle, it is possible to use the LLVM scheduler to generate parallel code by providing a custom hazard recognizer that checks true data dependencies of the current bundle. The scheduler has also the capability to output NEXT operations by using NoopHazard and outputting a NEXT instruction instead of a NOP. However, the scheduler that is used within "DAG->DAG Pattern Instruction Selection" uses this glue mechanism and that could be problematic since no NEXT instructions are issued between glued instructions.

Within my back-end I added a parallelizing scheduling after "DAG->DAG Pattern Instruction Selection" by reusing the LLVM Post-RA scheduler together with a custom hazard recognizer as explained. The Post-RA scheduler works very well with some small modifications (special PHI instruction handling and a small performance issue due to the high virtual register numbers) also before register allocation.

Before register allocation the Packing pass converts the unpacked representation outputted by the scheduler into the pack representation. So the register allocation sees the VLIW bundles as one instruction. After "Prolog/Epilog Insertion & Frame Finalization" the Unpack pass converts the PACK instruction back to the unpacked representation. Thereby, instructions that were added within the Register Allocation and Prolog/Epilog Insertion are recognized and gets into one bundle since they are not parallelized.

At the end (just before assembly output) I added several passes for doing a rescheduling. First, the splitting pass tries to split a VLIW bundle into single instructions (if possible). The Packing pass packs all Bundles with more the one instruction into a single PACK instruction. The scheduler will recognize the PACK instruction as a single scheduling unit. Scheduling is nearly the same as before RA. Unpacking establishes again the unpacked representation.

If anyone is interested in more information please send me an email. I'm also interested in increasing support for VLIW architectures within LLVM.

Kind regards,
Timo Stripf

Hi Timo,

your approach is quite similar to the one in the patch I sent a couple of weeks ago. I also have the Bundle (derivate from MachineInstruction so I call it "MachineInstructionBundle") and pack/unpack so RegAlloc works on the bundles… I really think this is the way to incorporate VLIW support to LLVM.

I guess a need for some of this to make to LLVM trunk is to have a backend that uses it (can some dev confirm this?). My backend is still on the works, I have only a synthetic MIPS-VLIW working backend, but that is not a real target.

Is your code public, or plan to be?

BR

Carlos

Hi Carlos,

I am interested in your port of a MIPS-VLIW architecture. I plan to use a similar one for which there is no LLVM backend yet. Have you some example of your code?

Best,

Julien.

Hi,
   I'm also interested in this approach.

Hi Carlos,

I agree to you that a backend of a real architecture would be beneficial for integrating VLIW support into LLVM trunk. At least an assembler, linker and simulator for this target architecture must be freely available.

My target architecture is a research clustered-VLIW processor with a flexible issue-width and a flexible instruction set described within an ADL. I plan to make my backend public available after my papers are published but I can share some relevant passes/classes for VLIW support right now.

Best Regards,
Timo Stripf

Hi Timo,

I think any implementation that makes a "bundle" a different entity from MachineInstr is going to be difficult to use. All of the current backend passes will have to taught to know about bundles. Furthermore, if this is something that no current targets are using, then it cannot be adopted into LLVM mainline.

I think what we need is a concept of a sequence of fixed machine instructions. Something that represent a number of MachineInstr's that are scheduled as a unit, something that is never broken up by MI passes such as branch folding. This is something that current targets can use to, for example, pre-schedule instructions. This can be useful for macro-fusing optimization. It can also be used for VLIW targets.

Evan

Hi Evan (and all),

I think any implementation that makes a "bundle" a different entity from MachineInstr is going to be difficult to use. All of the current backend passes will have to taught to know about bundles.

The approach in the patch I sent (and I believe Timo's code works similar, according to his explanations) is precisely to make "bundles" no different from MachineInstructions. They are MIs (a class derived from it), so all other passes work transparently with them. For example, in my code register allocator does not know it is allocating regs for a bundle, it sees it just as a MI using a lot of registers. Of course, normal (scalar) passes can not "inspect" inside bundles, and wont be able for example to put spilling code into bundles or anything like that.

But the good point is that bundles (which are MIs) and regular MIs can coexist inside a MachineBasicBlock, and bundles can easily be "broken back" to regular MIs when needed for some pass.

I think what we need is a concept of a sequence of fixed machine instructions. Something that represent a number of MachineInstr's that are scheduled as a unit, something that is never broken up by MI passes such as branch folding. This is something that current targets can use to, for example, pre-schedule instructions. This can be useful for macro-fusing optimization. It can also be used for VLIW targets.

There might be something I am missing, but I do not see the advantage here. Even more, if you use sequences you need to find a way to tell the passes how long a sequence is. On the other hand, if you use a class derived from MI, the passes know already (from their POV their are just dealing with MIs). You have of course to be careful on how you build the bundles so they have the right properties matching those of the inner MIs, and there is where the pack/unpack methods come in.

BR

Carlos

Evan, Timo, Carlos (and everyone else),

  I have somewhat similar interest.
  What would you say to a some sort of a "global cycle" field/marker to
determine all instructions scheduled at a certain "global" cycle. That way
the "bundle"/packet/multiop can be identified at any time via a common
"global cycle" value. I could see that being set first in pre-ra scheduler
and refined later (just as Timo is describing). No changes required to any
existing passes, unless a pass "needs" to know about "bundle"/packet. New
instructions added but not yet scheduled would have some invalid cycle (-1
for example).
  Exact placement of this marker - I am not sure yet, but MachineInstr sure
must have one, but as early as SUnit in DAG->DAG Pattern Instruction
Selection we should be able to do that.

  This is a trivial solution that seems to provide required functionality in
the least intrusive way, and useful to non-VLIW architectures to convey
pre-schedule information down flow.

Sergei Larin

Hi Evan (and all),

I think any implementation that makes a "bundle" a different entity from MachineInstr is going to be difficult to use. All of the current backend passes will have to taught to know about bundles.

The approach in the patch I sent (and I believe Timo's code works similar, according to his explanations) is precisely to make "bundles" no different from MachineInstructions. They are MIs (a class derived from it), so all other passes work transparently with them. For example, in my code register allocator does not know it is allocating regs for a bundle, it sees it just as a MI using a lot of registers. Of course, normal (scalar) passes can not "inspect" inside bundles, and wont be able for example to put spilling code into bundles or anything like that.

But the good point is that bundles (which are MIs) and regular MIs can coexist inside a MachineBasicBlock, and bundles can easily be "broken back" to regular MIs when needed for some pass.

Ok, so in your proposal a bundle is just a special MachineInstr? That sounds good. How are the MachineInstr's embedded inside a bundle? How are the cumulative operands, implicit register defs and uses represented?

I think what we need is a concept of a sequence of fixed machine instructions. Something that represent a number of MachineInstr's that are scheduled as a unit, something that is never broken up by MI passes such as branch folding. This is something that current targets can use to, for example, pre-schedule instructions. This can be useful for macro-fusing optimization. It can also be used for VLIW targets.

There might be something I am missing, but I do not see the advantage here. Even more, if you use sequences you need to find a way to tell the passes how long a sequence is. On the other hand, if you use a class derived from MI, the passes know already (from their POV their are just dealing with MIs). You have of course to be careful on how you build the bundles so they have the right properties matching those of the inner MIs, and there is where the pack/unpack methods come in.

A "sequence" would not be actually a sequence of MachineInstr's. I'm merely proposing you using a generic concept that is not tied to VLIW. In the VLIW bundle, there are no inter-dependencies between the instructions. However, I'm looking for a more generic concept that may represent a sequence of instructions which may or may not have dependencies between them. The key is to introduce a concept that can be used by an existing target today.

Sounds like what you are proposing is not very far what I've described. Do you have patches ready for review?

Evan

Hi all,

Ok, so in your proposal a bundle is just a special MachineInstr? That sounds good. How are the MachineInstr's embedded inside a bundle? How are the cumulative operands, implicit register defs and uses represented?

I attached the packing and unpacking pass I used within my backend. In my solution multiple MachineInstruction are packed into one variadic "PACK" MachineInstruction. The opcode and operands of the original instruction are encoded as operands of the PACK instruction. The opcode is added as immediate following by the operands of the original instructions. Within the operands one instruction is terminated by an "EndOfOp" operand. The implicit defs/uses are also added to the PACK instruction but not used for unpacking. Unpacking reconstructs them from the TargetDescriptionInfo.

I took a look at the packing/unpacking solution of Evan and I think it is more elegant to use a derived class of MachineInstr for storing multiple instructions into one.

Best regards,
Timo Stripf

TS1VLIWPacking.cpp (7.51 KB)

TS1VLIWUnpacking.cpp (6.27 KB)

Hi Sergei,

  What would you say to a some sort of a "global cycle" field/marker to
determine all instructions scheduled at a certain "global" cycle. That way
the "bundle"/packet/multiop can be identified at any time via a common
"global cycle" value.

But RA would need to know about this global cycle field, right? Cause a
register can be reused in the same "global cycle" as it is killed.

Carlos

>> I think any implementation that makes a "bundle" a different entity from MachineInstr is going to be difficult to use. All of the current backend passes will have to taught to know about bundles.
>
> The approach in the patch I sent (and I believe Timo's code works similar, according to his explanations) is precisely to make "bundles" no different from MachineInstructions. They are MIs (a class derived from it), so all other passes work transparently with them. For example, in my code register allocator does not know it is allocating regs for a bundle, it sees it just as a MI using a lot of registers. Of course, normal (scalar) passes can not "inspect" inside bundles, and wont be able for example to put spilling code into bundles or anything like that.
>
> But the good point is that bundles (which are MIs) and regular MIs can coexist inside a MachineBasicBlock, and bundles can easily be "broken back" to regular MIs when needed for some pass.

Ok, so in your proposal a bundle is just a special MachineInstr? That sounds good. How are the MachineInstr's embedded inside a bundle? How are the cumulative operands, implicit register defs and uses represented?

Yes, it is a special MachineInstruction (but again, not an
MachineInstruction object directly, I have this MachineInstructionBundle
class which is derived from MachineInstruction, but as a derived class
all existing passes can handle it directly).

The MIB has a vector of MIs inside. Basically your scheduler (in the
patch I provided just an example very trivial scheduler) will get some
MIs, remove them from the MachineBasicBlock, put them into a MIB, and
add the MIB to the basic block. At some point (when scheduling VLIW
scheduling is finished, but can also be before, as the target requires),
you call MIB->pack() for all your bundles, which has the effect of
"copying" the registers and other flags of internal instructions to the
bundle. From this point on, all the passes will see only the bundle. If
needed, you can "unpack" at anytime and return the inner MIs to the
basic block directly.

>> I think what we need is a concept of a sequence of fixed machine instructions. Something that represent a number of MachineInstr's that are scheduled as a unit, something that is never broken up by MI passes such as branch folding. This is something that current targets can use to, for example, pre-schedule instructions. This can be useful for macro-fusing optimization. It can also be used for VLIW targets.
>
> There might be something I am missing, but I do not see the advantage here. Even more, if you use sequences you need to find a way to tell the passes how long a sequence is. On the other hand, if you use a class derived from MI, the passes know already (from their POV their are just dealing with MIs). You have of course to be careful on how you build the bundles so they have the right properties matching those of the inner MIs, and there is where the pack/unpack methods come in.

A "sequence" would not be actually a sequence of MachineInstr's. I'm merely proposing you using a generic concept that is not tied to VLIW. In the VLIW bundle, there are no inter-dependencies between the instructions. However, I'm looking for a more generic concept that may represent a sequence of instructions which may or may not have dependencies between them. The key is to introduce a concept that can be used by an existing target today.

Sounds like what you are proposing is not very far what I've described. Do you have patches ready for review?

Sure, I already sent it couple of weeks ago (is where this thread
started, actually). The original message is here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043798.html

BR,

Carlos

Carlos,

  Absolutely. And an addition to live range detection needs to be made aware of the global cycle... and it needs to be done regardless of representation methodology. Same for any pass that would care for packets. The important observation here IMHO is that "packetization" at early stage (before RA) is tentative, and RA can change the landscape, which must be somewhat finalized in Post RA scheduler. Nevertheless, I think one last pass, right before code emission is still needed to "clean up" the final schedule.

  I do not have a patch handy, it would have been easier to illustrate my proposal, but the fact of this discussion alone shows growing interest to the problem. I have a feeling that we might obtain a VLIW target/back end shortly, and then it would become a real (and burning) issue. This might be our chance to outperform GCC RISC centric philosophy in an elegant and powerful way.

  First step to healing is to recognize that we have an issue :wink:

Sergei

It seems to me that the concept of insn bundles or packets is needed with different characteristics, depending where it's used.

At early scheduling, when there may be no MachineInstruction objects yet, the data structure or annotation that's needed may be quite different from that needed at or near code generation and emission. I think that what Sergei is talking about fits well with the former case, while Carlos' patch, which I'm still to examine, seems to fit well with the latter.

Since my focus is near the final phases of the compiler, Carlos' suggestion was more eloquent to me and seem to be a good starting point that I intend to investigate further.

One hesitation that I have about sub-classing though is that base-class objects may be destroyed and replaced by passes and thus the sub-class information may be lost. This is not much of a problem at the final stages, when there are fewer passes to worry about, but it may be something difficult to control at early stages.

Having said this, if indeed the bundling of insns is done differently at different stages, while I would be comfortable sub-classing MI, provided that the life-time of the bundle used for scheduling is relatively short, it could be OK too.

However, if different bundling representations are used, later scheduling would have to understand both representations.

Anyways, just thinking out loud...

Since my focus is near the final phases of the compiler, Carlos’
suggestion was more eloquent to me and seem to be a good starting point
that I intend to investigate further.

Ok, if you have any questions about the patch do not hesitate to ask :slight_smile:

One hesitation that I have about sub-classing though is that base-class
objects may be destroyed and replaced by passes and thus the sub-class
information may be lost. This is not much of a problem at the final
stages, when there are fewer passes to worry about, but it may be
something difficult to control at early stages.

They cannot really be destroyed, cause they get “hidden” somehow. The scheduling (quite simple in my code, but valid as example) removes the base MIs from the machine basic block and puts them inside the bundle. Packing them takes operands “off” the sub MIs, and assigns them to the bundle. So the MIs are no longer visible, no pass can touch or destroy them, they belong to no BB (thus no function), they are just stores inside the bundle for 2 reasons:

  1. bundle just have operands, does not define an operation, so sub MIs need to be saved to record the opcodes.
  2. it has to be possible to “recreate” the scalara MIs from the bundle, if needed.

BR,

Carlos