llvm-dev Digest, Vol 159, Issue 2

Hal, Tobias, et al. –

I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen. I have a couple of questions:

1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode. How does Polly deal with this issue?
2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations? I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.

Thanks,

-- Vikram Adve

// Interim Head and Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// Admin Assistant: Amanda Foley - ajfoley2@illinois.edu
// Google Hangouts: vikram.s.adve@gmail.com || Skype: vikramsadve
// Research page: http://vikram.cs.illinois.edu/>

[tying to original thread]

[tying to original thread]

> Hal, Tobias, et al. –
>
> I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen. I have a couple of questions:
>
> 1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode. How does Polly deal with this issue?

Our integer set library has a compute out functionality. It counts
expensive operations such as memory allocations but also so-called
pivots which occur during ilp solving. In case a certain limit is
reached the library skips all subsequent operations and returns a "null"
result to the user. We guard potentially expensive parts of Polly
(scheduling, dependence analysis) with these compute outs, and bail out
gracefully. The important point here is that these are compute-outs not
time-outs. Hence, the resulting binaries will be identical across system
with differing performance (and obviously also across runs).

> 2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations? I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.

Loop transformations are transformations on the polyhedral schedule.
Tools can set an arbitrary multi-dimensional scheduling function, which
allow individual loop iterations to be remapped into a new loop
structure. (With your background, this might already be clear). For
everybody who does not yet speak "polyhedral", we generally work on a
tree representation of the execution order, which -- even though not an
AST -- can be modified in a similar way, such that loop transformations
can be applied on a per-node level. We use the latter to implement
certain specific transformations, e.g. the GOTO/BLIS matrix
multiplication transformation, which is a very specific sequence of
transformations that is known to yield "optimal" performance.

Best,
Tobias

Responses inline, marked with (***VSA***) because Outlook on a Mac sucks for inline replies!

    > [tying to original thread]
    >
    > > Hal, Tobias, et al. –
    > >
    > > I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen. I have a couple of questions:
    > >
    > > 1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode. How does Polly deal with this issue?
    
    Our integer set library has a compute out functionality. It counts
    expensive operations such as memory allocations but also so-called
    pivots which occur during ilp solving. In case a certain limit is
    reached the library skips all subsequent operations and returns a "null"
    result to the user. We guard potentially expensive parts of Polly
    (scheduling, dependence analysis) with these compute outs, and bail out
    gracefully. The important point here is that these are compute-outs not
    time-outs. Hence, the resulting binaries will be identical across system
    with differing performance (and obviously also across runs).

(***VSA***) Thanks, Tobias. This is exactly what I was looking for. Using compute-outs is definitely much better than timeouts (but are probably more intrusive and cannot have been easy to implement, so kudos on that).

    > > 2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations? I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.
    
    Loop transformations are transformations on the polyhedral schedule.
    Tools can set an arbitrary multi-dimensional scheduling function, which
    allow individual loop iterations to be remapped into a new loop
    structure. (With your background, this might already be clear).

(***VSA***) Yes, that makes sense. When a client creates such a scheduling function, does Polly automatically enforce correctness internally, or is the tool responsible for checking that separately?

For
    everybody who does not yet speak "polyhedral", we generally work on a
    tree representation of the execution order, which -- even though not an
    AST -- can be modified in a similar way, such that loop transformations
    can be applied on a per-node level.

(***VSA***) What granularity can clients transform in this representation? For example, are schedules applicable to individual IR instructions? Or only to entire basic blocks?

We use the latter to implement
    certain specific transformations, e.g. the GOTO/BLIS matrix
    multiplication transformation, which is a very specific sequence of
    transformations that is known to yield "optimal" performance.
    
    Best,
    Tobias

-- Vikram Adve

// Interim Head and Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// Admin Assistant: Amanda Foley - ajfoley2@illinois.edu
// Google Hangouts: vikram.s.adve@gmail.com || Skype: vikramsadve
// Research page: http://vikram.cs.illinois.edu/>

Responses inline, marked with (***VSA***) because Outlook on a Mac sucks
for inline replies!

    > [tying to original thread]
    >
    > > Hal, Tobias, et al. –
    > >
    > > I am strongly in favor of seeing a broader range of loop transformations, supported by strong dependence analysis, added to LLVM, and the Polly infrastructure seems to be by far our best bet to make that happen. I have a couple of questions:
    > >
    > > 1) Integer constraint libraries like ISL (and Omega, which I used extensively in a previous project) are fundamentally solving exponential problems or worse, and can in rare cases cause compile times to explode. How does Polly deal with this issue?
    
    Our integer set library has a compute out functionality. It counts
    expensive operations such as memory allocations but also so-called
    pivots which occur during ilp solving. In case a certain limit is
    reached the library skips all subsequent operations and returns a
    "null"
    result to the user. We guard potentially expensive parts of Polly
    (scheduling, dependence analysis) with these compute outs, and bail
    out
    gracefully. The important point here is that these are compute-outs
    not
    time-outs. Hence, the resulting binaries will be identical across
    system
    with differing performance (and obviously also across runs).

(***VSA***) Thanks, Tobias. This is exactly what I was looking for.
Using compute-outs is definitely much better than timeouts (but are
probably more intrusive and cannot have been easy to implement, so kudos
on that).

Thanks!

    > > 2) Are the loop transformations composable through a reasonable API, i.e., can a tool create an arbitrary pipeline of the transformations? I know about the Polly scheduling work, which explores flexible combinations of these, but I don’t know whether there are interfaces other tools can use to create arbitrary, fixed pipelines of such transforms.
    
    Loop transformations are transformations on the polyhedral schedule.
    Tools can set an arbitrary multi-dimensional scheduling function,
    which
    allow individual loop iterations to be remapped into a new loop
    structure. (With your background, this might already be clear).

(***VSA***) Yes, that makes sense. When a client creates such a
scheduling function, does Polly automatically enforce correctness
internally, or is the tool responsible for checking that separately?

When you import a schedule, we verify that the dependences are not
violated. However, this was until know mostly used as debugging tool.
More complex transformations (data-layout transformations) are possibly
not yet well covered.

For
    everybody who does not yet speak "polyhedral", we generally work on a
    tree representation of the execution order, which -- even though not
    an
    AST -- can be modified in a similar way, such that loop
    transformations
    can be applied on a per-node level.

(***VSA***) What granularity can clients transform in this
representation? For example, are schedules applicable to individual IR
instructions? Or only to entire basic blocks?

Until recently, basic blocks. We are transitioning towards finer grained
statements.
The main road blocker here is the cost of modeling such. We enhanced our
internal scheduler to do incremental scheduling to address this issue.
Traditionally polyhedral schedulers formed a single ILP for scheduling,
where the dimensionality of this ILP
increased with the number of statements scheduled. As most of our
algorithms are
exponential in the number of dimensions, this limited scalability. The
new isl scheduler instead incrementally adds statements and can be
stopped (but is not yet),
when sufficient statements have been fused. With our new approach the
dimensionality of the ILP is bound by the number of statements to fuse.

Tobias

Hi,

as I don't see people saying "don't do that",
I think it is a matter of having the LLVM community agree on "how we do that".

Let's send a follow-up email that explains step by step
how we intend to integrate Polly in LLVM,
and once we agree on a plan, start to execute.

Thanks,
Sebastian

Hi,

as I don't see people saying "don't do that",
I think it is a matter of having the LLVM community agree on "how we do that".

Let's send a follow-up email that explains step by step
how we intend to integrate Polly in LLVM,
and once we agree on a plan, start to execute.

I agree.

  -Hal

Sounds great. I prepare a proposal.

Best,
Tobias