Loop Unroll

I would like to create VHDL backend for LLVM and now i’m testing unroll loop passes. I would like to unroll loop but to parallel form (each basic block of unrolled loop has the same parent node). Now i can only unrool loop to serial form (each basic block is a parent node of another).
It is possible to unroll loop to parallel form (each basic block of onrolled loop has the same parent node in CFG)?

Radoslaw Cieszewski

Hello Radoslaw,

As far as I can make out, there is a mismatch between the VHDL-level picture that you have in mind, and the way a traditional CPU compiler works. Here, "unroll" simply means "serialize". A basic block with multiple successors in LLVM has a conditional branch that transfers control to only one of all the successors. What you have in mind is a way to transfer control to all successors in parallel. This cannot be represented in LLVM IR.

The implicit assumption is that there are no dependencies in the loop of interest, and all iterations can be executed in parallel. There can be several ways to handle this:

1. Merge all the unrolled basic blocks into one block. Then maybe the
    instruction-level parallelism between them will automatically show
    up in your VHDL. This is the simplest way to do it.
2. Vectorize the loop body in LLVM, then generate VHDL entities that
    can handle vector inputs. This will be limited by the size of
    vectors that the LLVM vectorizer can generate. Also your memory
    subsystem will need to handle vector load/stores.
3. This last one is purely in the VHDL generator: Somehow mark loops
    that can be parallelized, and generate a custom VHDL entity that
    captures the loop body. Then instead of generating a loop control
    structure in your VHDL, generate a fork/join structure that
    transfers control to multiple instances of your entity, one for each
    iteration of the loop. This will be limited by the number of
    load/store requests that your memory subsystem can accept in parallel.